Okay, I've simplified the migration to *just* add the column, and
instead added a task to find assets without manifest URLs and backfill
them.
Performance is a lot better now, using the `async-http` library, which
as I understand it supports both persistent connections when invoked
like this, and maybe also HTTP/2 multiplexing?? (Though I'm not
actually sure images.neopets.com does lol)
I'm not sure about the number of concurrent tasks I picked here, 100
seems okay for an internet thing and for such small requests, but I
worry that the CDN is gonna get annoyed or something. Well, we'll see!
This task is very resumable if it turns out we get frozen out or
something.
Ok so, impress-2020 guesses the manifest URL every time based on common
URL patterns. But the right way to do this is to read it from the
modeling data! But also, we don't have a great way to get the modeling
data directly. (Though as I write this, I guess we do have that
auto-modeling trick we use in the DTI 2020 codebase, I wonder if that
could work for this too?)
So anyway, in this change, we update the modeling code to save the
manifest URL, and also the migration includes a big block that attempts
to run impress-2020's manifest-guessing logic for every asset and save
the result!
It's uhh. Not fast. It runs at about 1 asset per second (a lot of these
aren't cache hits), and sometimes stalls out. And we have >600k assets,
so the estimated wall time is uhh. Seven days?
I think there's something we could do here around like, concurrent
execution? Though tbqh with the nature of the slowness being seemingly
about hitting the slow underlying images.neopets.com server, I don't
actually have a lot of faith that concurrency would actually be faster?
I also think it could be sensible to like… extract this from the
migration, and run it as a script to infer missing manifest URLs. That
would be easier to run in chunks and resume if something goes wrong.
Cuz like, I think my reasoning here was that backfilling this data was
part of the migration process… but the thing is, this migration can't
reliably get a manifest for everything (both cuz it depends on an
external service and cuz not everything has one), so it's a perfectly
valid migration to just leave the column as null for all the rows to
start, and fill this in later. I wish I'd written it like that!
But anyway, I'm just running this for now, and taking a break for the
night. Maybe later I'll come around and extract this into a separate
task to just try this on all assets missing manifests instead!
Ahh, I guess I missed these, I think they're maybe not actually used in
the app is why? cuz they're all default values that are overridden at
the actual call sites. But I ran into it when running `Pet.load` in the
console, and yeah let's just fix 'em up!
This hasn't worked for a while, and I don't know an API off the top of
my head to drop in for it. Let's just delete it for now, and revisit it
later if we want to!
Dang, I'm really wishing I'd opened this sooner cuz I didn't realize it
would be THIS easy!!
The bug was that the `t` method started taking Ruby keyword params
instead of a hash object for `options`, so the syntax changed.
Womp womp!
Really don't know why this wasn't a problem with Apollo (or was it??),
but yeah, don't save when there's a save error!! Then we reset the
mutation state when the outfit state changes.
It's weird to be reading this code and be like. was this not always an
issue? Maybe something in Apollo prevented this? Did we use optimistic
UI or something? Idk?
There's still an issue with it infinitely retrying in an error state
though.
Just sharing this out to gather info, since this might be coming kinda
soon!
I also moved the announcement higher up in the template, because it
gets broken on the user lists page which uses floats quite a bit for
the site header—and tbh I feel like this is better anyway lol.
In the impress-2020 app, we use this to prepopulate certain GraphQL
data into the Apollo cache when SSR'ing a page. We don't do that here,
so, goodbye!
The wardrobe-2020 app had a cute drawer that embeds the item page, but
honestly I don't think it was that valuable, and especially not when it
means we have to basically maintain two item pages lol. Let's decrease
the surface area!
A really really simple change! It works on the item page, the item
index page, item search, the homepage, and the item lists page.
The main reason I avoided this for so long (even before modernizing the
Rails app) was that the ElasticSearch stuff felt like it made it messy?
But now it's pretty simple, and it works in search already cuz I did
that when I implemented item search, so, nice!
Ok cool, I have just not been running any of this since moving out of
impress-2020, but now that we're doing serious JS work in here it's time
to turn it back on!!
1. Install eslint and the plugins we use
2. Set up a `yarn lint` command
3. Set up a git hook via husky to lint on pre-commit
4. Fix/disable all the lint errors!
Rather than letting the fact that the server API models outfits a bit
differently (underscore keys, integer IDs for things), I'd rather
convert it to the familiar field names and expected types!
Oh yeah, I didn't catch that `bin/dev` calls the `Procfile` which in
turn calls `yarn build --watch`, which leaves out the important
`--public-path=/dev-assets` argument!
Calling `yarn dev` instead keeps us consistent!
This came in a few parts!
1. Add meta tags to let us know we're logged in.
2. Install React Query, which has the data-loading sensibilities I like
about Apollo without the GraphQL that has honestly been a drag.
3. Replace the outfit-loading and outfit-saving calls with API calls to
the main app.
4. Update the main app's API calls to use our more flexible data
constructs like "pose".
Would've loved to do this more incrementally, but it's hard to! You
can't split out outfit-loading and outfit-saving, or auth from any of
that, or the state gets all out-of-sorts.
Still, this is a good nugget we've pulled out all-in-all, and one that
people have been asking for! Can maybe look to logged-in item search
soon too, for own/want data?
Poked at the app and saw no breakages, I'm reasonably satisfied! I think
the 17 -> 18 migration is designed to be pretty backwards-compatible,
especially since I didn't bother to do the `createRoot` stuff yet, which
is where more of the breaking things happen.
Moreover, this code is in a bit of a flimsy state anyway: it'll kinda
work if you're logged in at impress-2020.openneo.net, but that wasn't
intentionally engineered, and I'm not sure exactly what circumstances
cause those cookies to send vs not send?
My intent is to clean up to replace all this with login code that
references the main app instead… this'll require swapping out some
endpoints to refer to what the main app has, but I'm hoping that's not
so tricky to do, since the main app does offer basically all of this
functionality already anyway. (That's not to say it's simple, but I
think it's a migration in the right direction!)
I used the new profiler tools on this page, and noticed a lot of
allocations in the Globalize library, which we use for translating
database records. I realized that we were loading all of the fields of
not just all of the items on the page, but all of their translation
records in all locales! We used to scrape data for lots of languages, so
that can be quite a lot!
Unfortunately, Rails's `includes` method to efficiently preload related
records always loads all fields, and simply can't be overridden.
So, in this change we write manual preloading code, to identify the
records we need, load them in big bulk queries, and assign them back to
the appropriate associations. Basically just what `includes` does, but
written out a bit more, to give us the chance to specify SELECT and
WHERE clauses!
It shows up in development always, and if you're logged in as Me
Specifically in production!
I'm using this to poke at memory usage for pages that seem suspicious.
I don't know why our app reliably grows so large in RAM, but my hunch is
that maybe there are some pages that just use a truly large amount to
begin with - and I've learned Ruby doesn't release memory back after
it's GC'd, it just grows the process and keeps the free space to itself
in its own heap!
So I'm just eyeing pages that I know *can* have a lot going on, and
seeing what I find!
Well, sitting at the `MemoryHigh` limit still grinds the app to a halt
anyway, lmao. I guess it's a feature designed for well-behaved processes
and not for outright leaking ones?
Let's try just having systemd basically reset the app regularly when the
RAM hits a certain threshold. I think that's what this config will do,
we'll find out!
Lol ok, as I had kinda predicted, the memory bounds I set last time
were not tight enough, and it stalled out again! (It was at 75% and
fully just not working.)
Let's try this tighter bound instead!
So, Dependabot correctly reported that this version of puma is
vulernable, which I fixed in the main app already—but I didn't notice we
also use that version in this cute tiny placeholder app we use early in
the deployment process.
There's not a real security need to upgrade this, as this placeholder
app has no access to useful data when it is run, but I think it's better
to resolve this by fixing it than by silencing Dependabot! May as well!
Oh yeah right, let's `yarn install` and build as part of the `post-create.sh` script!
I didn't have a command to do a one-time dev build of the assets yet (just one-time prod, or dev with reloading), so I added `yarn build:dev`!
Okay, so I've kept `database.yml` using the new username and password
`impress_dev`, cuz I like that it helps clarify that it's dev stuff and
is impossible to confuse with prod. (I updated my local setup to match!)
But hardcoding `host: db` into here breaks my local setup where the
database _isn't_ at the hostname `db`. So I add a way for new optional
database URL environment variables to get merged in with these settings,
and then configured the dev container to use that—and just in the most
limited override possible, to avoid duplicating stuff we don't need to.
(I could've just used the same names `DATABASE_URL_{PRIMARY,OPENNEO_ID}`
for this, but idk, I think it's confusing to have the same one for both
dev and prod, even though Rails _does_ basically do this; see below.)
Normally, the environment variable `DATABASE_URL` just _does_ this, and
you don't need to include a `url` key at all to get this behavior. But
since we've got the legacy two-database thing going on, we do this
instead! If we were to merge the `openneo_id.users` table into the
primary database, we could simplify this!
I just restarted the impress app in production! First I logged in to see
why it wasn't responding, and I saw that there was almost no free RAM
left, and that the Rails app had grown to eat it all up!
So in this change, we set a memory limit: if the impress app is taking
up more than 75% of the machine's RAM, systemctl will try to shrink it
down; if it can't, then it will kill the app at 80%.
I'm not totally sure whether these bounds are tight enough? I didn't
look closely enough at the numbers to see what the app's actual usage
was according to systemctl at the time (`sudo systemctl status
impress`), so my hope is this is enough. But if we run into a memory
leak crash like that again, because it turns out even existing at 75%
RAM freezes the machine when running alongside its other processes, we
can decrease these numbers!
I also don't know the nature of the memory leak, and that could be worth
investigating—the app pretty cleanly fits into ~500–600MB when it starts
up, but then does seem to slowly but steadily grow. If it could be kept
at that size, it's possible we could downgrade the server and save some
costs—but that's a question for another day, since making sure we handle
memory leaks when they *do* happen is a more important robustness fix!
I missed two things:
1) `rake` wasn't available in the path (surprising but ok!), so I replaced it with the more modern and more portable `bin/rails` invocation.
2) Without specifying a `host`, Rails was trying to connect with the database over a socket instead of over a port. Here, we tell it where to actually connect!
I think I'll need to make some tweaks here, since this isn't compatible with my _own_ local setup—maybe I'll revert `database.yml`, and have the dev container use the `DATABASE_URL` environment variable to override it?
But whatever, this is working for now, and that's exciting to me!
Note: On my machine, the install step hangs for a loooong time, to the point where I usually give up. On Codespaces, it also took a while at the same step (`Installing devise-encrpytable`), but eventually got through it. Maybe on my machine it would work if I'm more patient? Idk! But it's good to see it working on something!!
I cleared the references to the Neopia server (old thing responsible
for pet loading without blocking the Rails app) out of here a while ago,
but didn't clear out this config value!
This also clears the `tmp` directory out of the initial repository's
state! Rails should auto-generate it when it wants to make temporary
stuff.
`touch tmp/restart.txt` is the way to restart a Rails app deployed with
Phusion Passenger. I'm, uhh, not sure how a copy of it ever made its way
into the codebase, maybe I was confused about how to restart my local
app?
Look, I'll be real, I have literally not run these automated tests in
probably like a whole decade. Most of these files are empty, the ones
that aren't seem basically trivial, and I bet half of it would fail
anyway.
If I wanted to do real automated testing, I would basically want to
start from scratch anyway, and apply coverage I can trust to the areas
I actually care about.
Until then, I feel like these gems and files are mostly just clutter,
and I don't like them being One More Barrier To Entry. Goodbye, unused
complexity!
Oh right, we intentionally fail if there's no SECRET_TOKEN provided, but
that's not really useful for development!
Here, we add a SECRET_TOKEN only used in development - which doesn't
need to be secret, because it doesn't guard actual user sessions!
In production, the behavior is unchanged.
Rails already creates little pre-gzipped `.gz` copies of all our assets
in the `public/assets` directory when we build. This configures nginx to
send those when available!
We weren't doing *any* gzip stuff before, so this helps a lot with those
bigger JS files, like the `wardrobe-2020` stuff. It's now at ~.5MB with
compression, which is still a bit big, but nowhere near as offensive as
the 4.5MB pre-anything, or 1.5MB post-minification, lol.
We're now all-in on impress.openneo.net for this box!
One little wrinkle is that certbot was initially upset that I had
already uploaded the copy-pasted certs from the other box to here, at
the file path it expected to get to manage. So, I moved those to
`/srv/impress/shared/temp-certs`, and changed the nginx config
accordingly; and then deleted the original and let certbot control it!
The usual stuff! Installed the new gem and its new deps, ran
`bin/rails app:update` and did my best to manually merge the dev/prod
config files with the new canonical defaults, deleted some migrations I
don't think are relevant to us, and yeah!
Also, Rails 7.1 seems to need `libyaml-dev` installed, so I added that
to the `deploy/setup.yml` playbook!
One thing to note is that, while I was here, I turned on some settings
relating to our use of SSL that technically weren't on before. This
should be fine and helpful? But if stuff breaks, well, check those!