Commit graph

46 commits

Author SHA1 Message Date
a57b3629db Refactor Neopets import tasks all into a neopets:import namespace
and with a `rails neopets:import` task you can call to do them all at
once!

I'm gonna do some other stuff here too to make `neopets:import` easier
to call all in one go, like prompting for the Neologin cookie just
once at the start.

Note that this changes the cron setup, so you gotta run
`bin/deploy:setup` after this deploys!
2024-11-16 11:58:43 -08:00
9ebc498888 Upgrade to Ruby 3.3.5, and improve the mechanisms for it a bit
I move `ruby_version` into an Ansible variable, to make it easier to
update in the future!
2024-09-20 12:47:35 -07:00
65eaa031dd Speed up deploys with Ansible's pipelining option
The main bottleneck for us is still just uploading the full source code,
there might be some clever option I'm not using for that yet of like,
compression or something? But this change did take the process down
from like 5 minutes to 3 minutes, so, works for me!
2024-09-06 12:22:28 -07:00
2e7bdc47d7 Move some Ansible config out of scripts and into ansible.cfg
My immediate motivation is that I'm going to try turning on the
pipelining setting, to improve performance, and I'd like to have the
consistent place to put it! But also, I like standardizing our setup a
bit more, too
2024-09-06 12:16:26 -07:00
e67830642c Upgrade to Ruby 3.3.4 in production
Oh right, I did a dev-side version of this, and forgot prod needs it
too! (Maybe a bit silly to bother for a patch-level but whatever!)
2024-08-31 12:49:50 -07:00
fdfc6c9756 Add matchu@deathstar SSH key
I got a new laptop!
2024-08-19 10:48:16 -07:00
cedceeaf3c Set production hostname to impress.openneo.net
This doesn't matter-matter, it's mostly just so that when we SSH into
the production machine, the prompt presents itself as `impress` rather
than as `localhost`!
2024-06-10 13:09:37 -07:00
d7fc624b72 Create rails nc_mall:sync cron job
We run every ten minutes! It's pretty darn fast, and not very many
requests, so I'm not too worried about that applying server pressure.
2024-05-19 19:14:06 -07:00
73c2d4327a Oops, don't have old Rubies in the PATH when deploying!
Ahh right, this `lineinfile` trick has a gotcha: if we ever change the
Ruby version, it injects the line into the file as a *new* line,
instead of updating or removing the existing one.

When poking at the content of `/etc/profile` to remove old versions of
the line, I noticed that `/etc/profile.d` is a thing! We can drop a
file into there and manage it more directly, instead. Let's do that!
2024-05-02 12:47:02 -07:00
12764c44fc Attempt to fix scheduled public data export cron
This hasn't actually been running, and I'm finally looking into why!
I tested this by running `sudo -u impress COMMAND_GOES_HERE`, and found
that there were two errors: both the lack of `production.env` that I
had noticed and expected, but also that Ruby 3.3.0 wasn't in the `PATH`
value.

To fix this, I now pull in both `/etc/profile` and `~/.bash_profile`,
much like what happens automatically when we log into a shell as
`impress`, to get the environment set up! I haven't actually validated
that this Works, but I guess we'll see! I *could* change the cron
timing to some immediate time to try to watch it happen, but I'm not
invested enough right now, there's other things to do!
2024-05-02 12:21:14 -07:00
e1a5eaeb68 Install cron job to run rails public_data:commit weekly in production
The Sunday 1:15am time was chosen pretty arbitrarily; I think having it
happen at a "start of week" kind of weekday is clarifying for weekly
tasks, but I chose ":15" mostly to mitigate that thing where cron jobs
all run on the hour at the same time, while still feeling normal :p
2024-03-01 13:20:59 -08:00
8dc11f9940 Create rails public_data:commit task, to share public data dumps
I'm starting to port over the functionality that was previously just,
me running `yarn db:export:public-data` in `impress-2020` and
committing it to Git LFS every time.

My immediate motivation is that the `impress-2020` git repository is
getting weirdly large?? Idk how these 40MB files have blown up to a
solid 16GB of Git LFS data (we don't have THAT many!!!), but I guess
there's something about Git LFS's architecture and disk usage that I'm
not understanding.

So, let's move to a simpler system in which we don't bind the public
data to the codebase, but instead just regularly dump it in production
and make it available for download.

This change adds the `rails public_data:commit` task, which when run in
production will make the latest available at
`https://impress.openneo.net/public-data/latest.sql.gz`, and will also
store a running log of previous dumps, viewable at
`https://impress.openneo.net/public-data/`.

Things left to do:
1. Create a `rails public_data:pull` task, to download `latest.sql.gz`
   and import it into the local development database.
2. Set up a cron job to dump this out regularly, idk maybe weekly? That
   will grow, but not very fast (about 2GB per year), and we can add
   logic to rotate out old ones if it starts to grow too far. (If we
   wanted to get really intricate, we could do like, daily for the past
   week, then weekly for the past 3 months, then monthly for the past
   year, idk. There must be tools that do this!)
2024-02-29 14:30:33 -08:00
ec6dca1c16 Improve Unicode support, emojis don't crash us anymore lol!
A few pieces here:

1. Convert all tables to `utf8mb4`+`utf8mb4_unicode_520_ci` strings.
2. Configure that as the server's default.
3. Configure the Rails database connection to use this encoding too.

Came together pretty well, whew! This has been a LONG time coming,
`latin1` is NOT a good charset for the year 2024!
2024-02-28 18:54:27 -08:00
345a45ee0c Add slow query logging to MariaDB config
The database did a weird thing today, where it wouldn't even respond to
the usual stop signal, I had to fully `kill -9` it??

I didn't see anything in the logs indicating what it was busy doing,
and people online seem to describe having this problem sometimes but
with no obvious solution.

For now, I'll try turning on the slow query logger, to see if that
might give us hints about whether there was like a denial-of-service
query attack hitting us or something?
2024-02-26 11:06:51 -08:00
2cc46703b9 Create NeopetsMediaArchive, read the actual manifests for Alt Styles
The Neopets Media Archive is a service that mirrors `images.neopets.com`
over time! Right now we're starting by just loading manifests, and
using them to replace the hacks we used for determining the Alt Style
PNG and SVG URLs; but with time, I want to load *all* customization
media files, to have our own secondary file source that isn't dependent
on Neopets to always be up.

Impress 2020 already caches manifest files, but this strategy is
different in two ways:

1. We're using the filesystem rather than a database column. (That is,
   manifest data is kinda duplicated in the system right now!) This is
   because I intend to go in a more file-y way long-term anyway, to
   load more than just the manifests.
2. Impress 2020 guesses at the manifest URLs by pattern, and reloads
   them on a regular basis. Instead, we use the modeling system: when
   TNT changes the URL of a manifest by appending a new `?v=` query
   string to it, this system will consider it a new URL, and will load
   the new copy accordingly.

Fun fact, I actually have been prototyping some of this stuff in a side
project I'd named `impress-media-server`! It's a little Sinatra app
that indeed *does* save all the files needed for customization, and can
generate lightweight lil preview iframes and images pretty easily. I
had initially been planning this as a separate service, but after
thinking over the arch a bit, I think it'll go smoother to just give
the main app all the same access and awareness—and I wrote it all in
Ruby and plain HTML/JS/CSS, so it should be pretty easy to port over
bit-by-bit!

Anyway, only Alt Styles use this for now, but my motivation is to be
able to use more-correct asset URL logic to be able to finally swap
over wardrobe-2020's item search to impress.openneo.net's item search
API endpoint—which will get "Items You Own" searches working again, and
whittle down one of the last big things Impress 2020 can do that the
main app can't. Let's see how it goes!
2024-02-23 12:02:39 -08:00
42bf4b8edb Use local gems instead of installing from web when deploying, oops!
I hadn't realized for a while that we weren't already doing this lol, I
had noticed that `bundle install` in production was slower than I
expected when adding new stuff, but it was when we did this big recent
`bundle update` that I really noticed the difference.

Fixed now, I think! Though the real test will come when we actually
have a new gem to install, since this was a no-op case.
2024-02-22 12:16:59 -08:00
472ae645a0 Finish migrating to Ruby 3.3.0
As the comment in `deploy.yml` explains, this was a multi-step process,
but it went very smoothly as planned, hooray!!

I noticed again while making this change that Bundler doesn't seem to
be availing itself of the checked-in dependencies in `vendor/cache`. I
think I know the fix for this, I'll toss it into an upcoming change and
see if it works!
2024-02-22 12:05:02 -08:00
b18dd115a1 Build Ruby 3.3.0, but don't switch over to it yet
Still need to test the app with it, and getting this to deploy right
will be a bit tricky! Here's my thinking for sequencing once the code
is ready:

1. Temporarily modify `deploy.yml` to push the version, but not set it
   as `current` or restart the app.
2. Update the service file to use Ruby 3.3.0 and reference that version
   directly (instead of `current`), and restart the app.
3. Once it's already running, link that version as `current`.
4. Update the service file to reference `current` as usual, and restart
   the app.
2024-02-22 11:48:48 -08:00
e178505d2d Add redirect from openneo.net to impress.openneo.net
The homepage used to point to old projects that don't work anymore
anyway! This is the only project that stuck, so just redirect here!

We also remove the openneo.net link from the footer, because there's
nothing useful to say there anymore!
2024-02-20 10:35:59 -08:00
abbde80f60 Install MySQL server during deployment setup
It's finally colocated onto this box, instead of being on the old
server! I think I'm noticing substantial perf improvements, probably
both from increased colocation (tho they were in the same house
before), and also from like ten years of performance optimizations LOL!

As part of this, I created a new `setup_secrets.yml` file that's
similar to `production.env`, but is for values that the setup script
itself needs access to, whereas `production.env` is for values that the
app needs at runtime. (Though they have some things in common, like the
MySQL user password!) It's gitignored for security, as per usual!
2024-02-19 13:21:24 -08:00
ead0003397 Add custom 502 error page, for when the app goes down but nginx is up 2024-02-19 13:19:31 -08:00
aa108190b6 Oops, only redirect to maintenance.html internally
Oh I see, if I start with a slash, then it's interpreted as a reference
to a file; whereas if I don't, it's interpreted as a URL redirect. Ok!
2024-02-19 11:18:28 -08:00
7c36ba81e5 Minor change to explanation text in authorized-ssh-keys.txt 2024-02-19 11:12:40 -08:00
974aaa48ff Add maintenance.html page 2024-02-19 09:45:45 -08:00
e9b0fa0779 Future-proof our nginx config for IPv6
Today I learned that nginx requires a special invocation to listen to
IPv6 addresses as well as IPv4. On some of my other projects, this was
causing Let's Encrypt certificate renewal to fail, because Let's
Encrypt prefers to connect over IPv6 when an AAAA record is present, so
its challenges were always returning 404, because nginx wasn't
listening on IPv6.

This shouldn't be affecting impress in production, because we don't
have an AAAA record right now. But I'm just making this change in all
my projects, to make sure this doesn't bite me in the future!
2024-02-13 08:52:45 -08:00
4fff8d88f2 Add support_staff flag to user record; they can use Support tools
A little architecture trick here! DTI 2020 authorizes support staff
requests by means of a secret token, instead of user account stuff. And
our support tools still all call DTI 2020 APIs.

So here, we bridge the gap: we copy DTI 2020's support secret to this
app's environment variables (I needed to update
`deploy/files/production.env` and run `bin/deploy:setup` for this!),
then users with the new `support_secret` flag have it added to their
HTML documents in the meta tags. Then, the JS reads the meta tag.

I also fixed an issue in the `deploy/setup.yml` playbook, where I had
temporarily commented some stuff out to skip steps one time, and forgot
to uncomment them after oops lol!
2024-01-29 04:21:19 -08:00
76af587e7c Replace falcon server with puma
Been wanting this for a while in theory, gonna actually do it now!

The motivation is that I want to turn up the timeout for loading pets,
because the Neopets endpoints are slower today with the NC UC release -
but I can already predict that under our current architecture that will
be a problem, because it'll block up our request queue!

Falcon uses Ruby's relatively-new async system to *not* have requests
block on upstream requests, and my understanding is that this behavior
is plug-and-play. Let's see how it goes!
2024-01-23 21:55:26 -08:00
2b382d95fb Update my desktop SSH key
I did a pretty thorough reset of my desktop machine, and rather than go
spelunking for the same private key, I just rolled it over to a new
one. Let's set it up!
2024-01-14 03:07:12 -08:00
91eb2f7752 Kill the app at high RAM, instead of trying to throttle it first
Well, sitting at the `MemoryHigh` limit still grinds the app to a halt
anyway, lmao. I guess it's a feature designed for well-behaved processes
and not for outright leaking ones?

Let's try just having systemd basically reset the app regularly when the
RAM hits a certain threshold. I think that's what this config will do,
we'll find out!
2023-10-27 17:03:08 -07:00
af705f1be0 Tighten the RAM limit bounds on the production impress service
Lol ok, as I had kinda predicted, the memory bounds I set last time
were not tight enough, and it stalled out again! (It was at 75% and
fully just not working.)

Let's try this tighter bound instead!
2023-10-27 10:32:33 -07:00
06258b1dd5 Upgrade puma in the initial-placeholder app, to satisfy Dependabot
So, Dependabot correctly reported that this version of puma is
vulernable, which I fixed in the main app already—but I didn't notice we
also use that version in this cute tiny placeholder app we use early in
the deployment process.

There's not a real security need to upgrade this, as this placeholder
app has no access to useful data when it is run, but I think it's better
to resolve this by fixing it than by silencing Dependabot! May as well!
2023-10-26 14:48:21 -07:00
271d477110 Add RAM constraints to impress service on in production
I just restarted the impress app in production! First I logged in to see
why it wasn't responding, and I saw that there was almost no free RAM
left, and that the Rails app had grown to eat it all up!

So in this change, we set a memory limit: if the impress app is taking
up more than 75% of the machine's RAM, systemctl will try to shrink it
down; if it can't, then it will kill the app at 80%.

I'm not totally sure whether these bounds are tight enough? I didn't
look closely enough at the numbers to see what the app's actual usage
was according to systemctl at the time (`sudo systemctl status
impress`), so my hope is this is enough. But if we run into a memory
leak crash like that again, because it turns out even existing at 75%
RAM freezes the machine when running alongside its other processes, we
can decrease these numbers!

I also don't know the nature of the memory leak, and that could be worth
investigating—the app pretty cleanly fits into ~500–600MB when it starts
up, but then does seem to slowly but steadily grow. If it could be kept
at that size, it's possible we could downgrade the server and save some
costs—but that's a question for another day, since making sure we handle
memory leaks when they *do* happen is a more important robustness fix!
2023-10-26 13:52:44 -07:00
024041e591 Configure nginx to send pre-gzipped files to the client
Rails already creates little pre-gzipped `.gz` copies of all our assets
in the `public/assets` directory when we build. This configures nginx to
send those when available!

We weren't doing *any* gzip stuff before, so this helps a lot with those
bigger JS files, like the `wardrobe-2020` stuff. It's now at ~.5MB with
compression, which is still a bit big, but nowhere near as offensive as
the 4.5MB pre-anything, or 1.5MB post-minification, lol.
2023-10-25 15:44:01 -07:00
44141ce165 Extract some files out of the deploy script
Okay, there's enough going on in here now that I don't like it inline
anymore! Welcome to `files`!
2023-10-25 15:41:16 -07:00
22e3f4240a Update most URLs to use HTTPS
I noticed we didn't have the little lock icon in the browser, and yeah
huh there's a lot of `http://` still floating around! Let's fix that!
2023-10-25 15:22:57 -07:00
29dd353895 Remove beta.impress.openneo.net from deploy setup
We're now all-in on impress.openneo.net for this box!

One little wrinkle is that certbot was initially upset that I had
already uploaded the copy-pasted certs from the other box to here, at
the file path it expected to get to manage. So, I moved those to
`/srv/impress/shared/temp-certs`, and changed the nginx config
accordingly; and then deleted the original and let certbot control it!
2023-10-25 15:22:50 -07:00
56ce32b6cb Upgrade to Rails 7.1.1
The usual stuff! Installed the new gem and its new deps, ran
`bin/rails app:update` and did my best to manually merge the dev/prod
config files with the new canonical defaults, deleted some migrations I
don't think are relevant to us, and yeah!

Also, Rails 7.1 seems to need `libyaml-dev` installed, so I added that
to the `deploy/setup.yml` playbook!

One thing to note is that, while I was here, I turned on some settings
relating to our use of SSL that technically weren't on before. This
should be fine and helpful? But if stuff breaks, well, check those!
2023-10-25 15:05:31 -07:00
d5abc65dc9 Convenient shell things when logging in as impress user in production
Now, if I run `sudo -i -u impress` on the production server, it opens a
login bash shell, with all of the app's environment variables exported,
straight to `/srv/impress`.

This will let me quickly `cd current; bin/rails console` to start poking
at whatever needs poked!
2023-10-24 16:03:22 -07:00
021620e8b8 Move comment in setup.yml
I'm not sure why this was causing problems? especially why *now*? But I was seeing errors in systemctl of it trying to parse this comment as an environment variable soooo ok!

Could just be an intermittent thing where like, a byte got dropped last time we transferred this file or something? but whatever, this has fixed it and also is reasonable comment placement!
2023-10-23 19:05:09 -07:00
f21a7da362 Temporarily support both beta.impress and impress 2023-10-23 19:05:09 -07:00
bdd381df44 Clarify a note in the deploy playbook
Looking back at this now I'm just like. Oh right, of course, we don't have passwordless access to *become root*, so of course Ansible's strategy of becoming root and then running the playbook step was failing!
2023-10-23 19:05:09 -07:00
307f559226 Oops, add EXECJS_RUNTIME=Disabled to service file
Uhhh I think I must have made a mistake here where like… I must have left this in the service file for a while then accidentally deleted it from the Ansible playbook but not the live server? I had tested with this, then tested again without it and thought it wasn't necessary, but it turns out to have been necessary I guess? Ok!

This instructs Rails's ExecJS library to not bother looking for Node or something similar, because the app doesn't actually need to run any JS, even though the `react-rails` library (?) seems to be pretty eager about the possibility that we'll need to server-side-render stuff. (We should consider whether we want to though tbh? But… idk that would be a pretty different arch than what we've done with `jsbundling-rails` so like. idk whatever)
2023-10-23 19:05:09 -07:00
65387952ac Add more headers to nginx proxy_pass
Mm, something in Rails was getting upset when working with session cookies because the `Host` header was `127.0.0.1:3000` instead of `beta.impress.openneo.net`. I only saw this log entry on important actions like login, so my hope is that this is why login is failing??

I was intentionally omitting these to start, because I didn't understand them well and didn't want to add things I didn't understand. But now I've checked in on them more and they seem standard and reasonable. Ok!

```
HTTP Origin header (https://beta.impress.openneo.net) didn't match request.base_url (http://127.0.0.1:3000)
```

Source: https://stackoverflow.com/a/73198861/107415
2023-10-23 19:05:09 -07:00
9b68e982e7 Precompile assets when deploying new version
I did some refactoring while here too, of pulling the deploy scripts out of `package.json` and into `bin`, to be a bit more canonically Rails-y. (idk how canonical the colon thing is but, probably fine??)
2023-10-23 19:05:09 -07:00
c2abc8d876 Add playbook to deploy new app version
Okay, this is much simpler than the impress-2020 version where we symlinked node_modules and stuff - Bundler is just a lot better at this lol

Right now, the app is failing to start because we don't install Node—I wasn't sure whether we'd need to and whether I was gonna precompile the assets etc

Though now that I say that out loud, I guess part of the issue might be that I'm not sure the app is running in RAILS_ENV=production, I wonder if it still wants Node in that case?? I'll flip that switch in the service file now, then commit to save my place for the day, then try again with starting the app sometime and see what it says!
2023-10-23 19:05:09 -07:00
3dd5d26332 Create setup.yml deploy script
Yay it's working! We set up the box, install Ruby, upload a placeholder app, set it up as a service, and get it hooked up to nginx!

Next, we'll add the script to upload the latest version of the site. We just need to slot it into `/srv/impress/current`, run `bundle install`, and that should basically be that! (Oh, and we need to compile production assets—I wonder if it's useful to do that on the dev machine instead of on the target? That might save us from needing to install Node. Or maybe we'll have to anyway!)
2023-10-23 19:05:09 -07:00