Commit graph

20 commits

Author SHA1 Message Date
448561604c Future-proof our nginx config for IPv6
Today I learned that nginx requires a special invocation to listen to
IPv6 addresses as well as IPv4. On some of my other projects, this was
causing Let's Encrypt certificate renewal to fail, because Let's
Encrypt prefers to connect over IPv6 when an AAAA record is present, so
its challenges were always returning 404, because nginx wasn't
listening on IPv6.

This shouldn't be affecting impress-2020 in production, because we
don't have an AAAA record right now. But I'm just making this change in
all my projects, to make sure this doesn't bite me in the future!
2024-02-13 08:52:52 -08:00
e70e67211e Restart the app every 8 minutes
Idk why it's memory leaking so hard lately? But let's uhhh just reboot it a ton for now, while we continue to work on migrating workloads offa there.
2023-10-17 01:00:23 -07:00
d591eabd0a Add modeling cron job to deploy-setup
This should run it every 10 minutes! Wowie, cron config on the new box is easy! :3
2022-10-11 12:54:02 -07:00
0c2939dfe4 Use Puppeteer instead of Playwright
We used Playwright in the first place to try to work around a Vercel deploy issue, and I'm not sure it really ended up mattering lol :p

But yeah, I'm putting the new Puppeteer code through the same prod stress test, and it just doesn't seem to be getting into the same broken state that Playwright was. I'm guessing it's just that Puppeteer has more investment in edge-case handling? (There's also the fact that we're no longer running things as root, which could have been a fucky problem, too?)
2021-11-13 02:16:58 -08:00
587aa09efc Oops, fix bug in pm2 setup
Oh, I made a typo that caused pm2 to be running our processes as `root` instead of `matchu`! Let's very fix that!! 😳

I noticed this because I'm trying Puppeteer, and it got upset about running in sandboxed mode as root, and I'm like "as root??"

So yeah, good, fixed, lol 😬
2021-11-13 02:12:05 -08:00
9753cbe173 /api/assetImage fixes in production
Now that we're not on Vercel's AWS Lambda deployment, we can switch to something a bit more standard!

I also tweaked up our version of Playwright, because, hey, why not?

Getting the package list was a bit tricky, but we got there! Left a comment to explain where it's from.
2021-11-12 21:39:35 -08:00
07d54b9a9e Add SSH keys to deploy-setup
This helps me set up new devices, while still keeping passworded SSH access locked down!
2021-11-12 20:14:44 -08:00
d37d958a36 Enable automatic updates & reboots on deploy box
Oh right, like the SSH stuff, I did this the first time I set up, but didn't add it to the script! I like having things in the script :3 (I also had forgotten to check on the time zone last time, nice to have it with some rigor!)
2021-11-04 19:17:35 -07:00
8f28f87bee Close most ports on the deploy box by default
I noticed that incoming port 3000 connections were being allowed, oops! Not a huge deal, but I don't want to allow connections without HTTPS, and I don't want surprise surface area even if I'm not currently aware of attacks on it. Close it out!
2021-11-04 18:57:00 -07:00
9310a250d6 Fix some bugs running deploy-setup from scratch
As an exercise, I've wiped the box clean, and I'm reinstalling from the scripts! :3

I added the SSH hardening rules to the playbook instead of doing them by hand this time.

I made a mistake with creating `/srv/impress-2020`, right, you need to *say* what it should be created *as* for the creation step to work!

I also guess my recent pm2 changes made it not actually be willing to start the app anymore, because `/srv/impress-2020/current` doesn't exist or have `node_modules` yet. I'm doing a cute thing where I create a placeholder app during setup, so there's always something to run, without introducing the complexities of a real deploy to the setup process.

And right, of course, we need to install nginx before running certbot! But we need to add certbot config *after* running certbot!

And then just some misc cleanups for consistency and correctness!
2021-11-03 23:11:50 -07:00
e8ed459afd Remove the web group permission stuff from deploy
I'm not doing this thoroughly enough for it to matter (e.g. the deployed rsynced versions aren't having the group permissions set).

I think doing this *right* (to be extensible to additional users) is too much complexity to be worth it, and doing it halfway is more confusing than helpful.

I did this because I was anticipating multi-users permissions to be a bit of an issue for like, granting the web server permission to access the source code. But it turns out, since we're running with pm2, it's all working just fine!
2021-11-03 16:59:23 -07:00
bd8ccf19d7 Remove monit from our deployment
Okay well, we added monit to solve a problem that I coincidentally solved within an hour of getting monit working lol!

This also enables us to remove the pm2 pid file, which we were only using to allow monit to track the pm2 app.
2021-11-03 16:48:38 -07:00
d17263139e Fix pm2 monitoring
Okay huh, while digging a bit into another issue, I found what was wrong with our config and pm2's built-in monitoring! You can't use `yarn start`, because the wrapper script breaks its ability to look inside and see what's happening.

I also removed the compiler flag thing from the `start` script in `package.json`, because I think it's redundant? There's no compilation to be done in a live server.

I think I might remove monit after this? It's nice extra resilience in a sense, but it feels like extra complexity when it's doing the job `pm2` is supposed to do. (And tbh I've almost never heard of nginx crashing, and if it does it's probably a scenario worth investigating by hand.)
2021-11-03 16:46:35 -07:00
792da067e3 Add monit watching for nginx and pm2
When I woke up this morning, the app had crashed because the mysql connection was closed!

I'm not sure, why that caused a _crash_? Or why pm2 didn't pick up on it, and said the process was still online? (Maybe the process was running, but the server had stopped?) Those could be good to investigate?…

…but better than diving too far into the details, is to just address the high-level problem: if the app goes down for unexpected reasons, I want it back up!! lol

In this change, we add `monit`, a solid system for monitoring processes (including checking for behavior, like responding to net requests), and configure it to watch the app process and the nginx process.

To test, you can run `pm2 stop impress-2020`, or `systemctl stop nginx`, to see that Monit brings them back up within seconds!

This does add some potential surprise if you're _trying_ to take the processes down. The easiest way is to send the stop command through monit, like `monit stop nginx`. This will disable monitoring until you start it again through monit, I think? (You can also disable/enable monitoring as a direct command, regardless of app state.)
2021-11-03 16:32:14 -07:00
2f874653bf Update pm2 tasks to update the config correctly
Previously, if you changed the pm2 ecosystem file content, it wouldn't actually be reflected in the running services. Now it will be!
2021-11-03 15:43:37 -07:00
7131bc0ea9 Set up certbot during setup playbook
You can see how, instead of the default experience where certbot edits your config for you, I've referenced the certificates in the config in the first place, and set up certbot to just generate them!

Also, I learned about certbot non-interactive mode! At first I wrote this with the Ansible `expect` module lol :p
2021-11-03 01:00:28 -07:00
9a4b905639 Set up basic nginx in front of impress-2020
It loads kinda! auth0 is crashing us because it refuses to run over http:// but hey! That's pretty cool!
2021-11-03 00:07:30 -07:00
9d41e80942 Use pm2 to run the deployed app 2021-11-02 22:49:45 -07:00
edd983c97a Refactor deploy to build locally, not remotely 2021-11-02 18:47:13 -07:00
dde8cee1e3 Add deploy playbook: pulls git and installs deps 2021-11-02 16:36:39 -07:00
Renamed from deploy/setup.yml (Browse further)