Been wanting this for a while in theory, gonna actually do it now!
The motivation is that I want to turn up the timeout for loading pets,
because the Neopets endpoints are slower today with the NC UC release -
but I can already predict that under our current architecture that will
be a problem, because it'll block up our request queue!
Falcon uses Ruby's relatively-new async system to *not* have requests
block on upstream requests, and my understanding is that this behavior
is plug-and-play. Let's see how it goes!
Oh yeah, a long-standing limitation. Good thing we're better at stuff
now!
This is also probably the real cause of the weird number of slight
discrepancies between main DTI and DTI 2020 when I eyeballed stuff lol
oh, well, that and the missing default-lists. A bit messy!
In impress-2020, we do a big slow query to figure out which users have
been active in trades recently. Now, we cache that timestamp on the
User model.
This won't have any immediate effect; it's to clear the way for Classic
DTI to receive the better trade ratios feature people like from 2020.
I also added some unit testing infra because I finally wanted it! for
all the ways you can trigger this timestamp lol
Note too that this is a bit of an unusually complex migration, but my
hope is that the batching and query structure and such helps it run
surprisingly fast! 🤞
So this was a slightly wrong error message, what was happening was:
1. Trying to load the image hash for this pet, by looking them up at
https://pets.neopets.com/cpn/PET_NAME/1/1.png and seeing what URL it
redirects to.
2. But pets.neopets.com was rejecting our User-Agent string, which
would've been just "Ruby", since we hadn't set it otherwise. I guess
that's an explicitly banned string?
I also found that the kind of more-helpful User-Agent string I like to
write was being rejected, and I could only get it to accept something
very simple? So that's what we're using now, I guess!!
Building toward replacing more of the 2020 data sources! I think this is
an endpoint that benefits from bulk loading, esp with the way the item
page previews work. I also like taking the concept of "canonical" out of
the GQL interface, and instead just loading for each of the 50 species
and letting the client decide. (And then it can fast-swap between them!)
There was a static page explaining it, which we no longer link to; and
there was an unused field in the User model for who was a beta tester
for it. Goodbye!
I noticed when running `rails routes` that there's a lot of routes for
major unused Rails features, like storage. I didn't look deeply enough
into ActiveStorage to know if I was risking accepting arbitrary file
uploads, I just figured, if I disable it (which simplifies the app
footprint anyway), then I can be certain! So, goodbye!
Preparing a better endpoint for wardrobe-2020 to use! I deleted the
now-unused swf_assets#index endpoint, and replaced it with an
"appearances" concept that isn't exactly reflected in the database
models but is a _lot_ easier for clients to work with imo.
Note that this was a big part of the motivation for the recent
`manifest_url` work—in this draft, I'm probably gonna have the client
request the manifest, rather than use impress-2020's trick of caching
it in the database! There's a bit of a perf penalty, but I think that's
a simpler starting point, and I have a hunch I'll be able to make up
the perf difference once we have the impress-media-server managing more
of these responsibilities.
This hasn't worked for a while, and I don't know an API off the top of
my head to drop in for it. Let's just delete it for now, and revisit it
later if we want to!
It shows up in development always, and if you're logged in as Me
Specifically in production!
I'm using this to poke at memory usage for pages that seem suspicious.
I don't know why our app reliably grows so large in RAM, but my hunch is
that maybe there are some pages that just use a truly large amount to
begin with - and I've learned Ruby doesn't release memory back after
it's GC'd, it just grows the process and keeps the free space to itself
in its own heap!
So I'm just eyeing pages that I know *can* have a lot going on, and
seeing what I find!
Okay, so I've kept `database.yml` using the new username and password
`impress_dev`, cuz I like that it helps clarify that it's dev stuff and
is impossible to confuse with prod. (I updated my local setup to match!)
But hardcoding `host: db` into here breaks my local setup where the
database _isn't_ at the hostname `db`. So I add a way for new optional
database URL environment variables to get merged in with these settings,
and then configured the dev container to use that—and just in the most
limited override possible, to avoid duplicating stuff we don't need to.
(I could've just used the same names `DATABASE_URL_{PRIMARY,OPENNEO_ID}`
for this, but idk, I think it's confusing to have the same one for both
dev and prod, even though Rails _does_ basically do this; see below.)
Normally, the environment variable `DATABASE_URL` just _does_ this, and
you don't need to include a `url` key at all to get this behavior. But
since we've got the legacy two-database thing going on, we do this
instead! If we were to merge the `openneo_id.users` table into the
primary database, we could simplify this!
I missed two things:
1) `rake` wasn't available in the path (surprising but ok!), so I replaced it with the more modern and more portable `bin/rails` invocation.
2) Without specifying a `host`, Rails was trying to connect with the database over a socket instead of over a port. Here, we tell it where to actually connect!
I think I'll need to make some tweaks here, since this isn't compatible with my _own_ local setup—maybe I'll revert `database.yml`, and have the dev container use the `DATABASE_URL` environment variable to override it?
But whatever, this is working for now, and that's exciting to me!
Note: On my machine, the install step hangs for a loooong time, to the point where I usually give up. On Codespaces, it also took a while at the same step (`Installing devise-encrpytable`), but eventually got through it. Maybe on my machine it would work if I'm more patient? Idk! But it's good to see it working on something!!
I cleared the references to the Neopia server (old thing responsible
for pet loading without blocking the Rails app) out of here a while ago,
but didn't clear out this config value!
Oh right, we intentionally fail if there's no SECRET_TOKEN provided, but
that's not really useful for development!
Here, we add a SECRET_TOKEN only used in development - which doesn't
need to be secret, because it doesn't guard actual user sessions!
In production, the behavior is unchanged.
The usual stuff! Installed the new gem and its new deps, ran
`bin/rails app:update` and did my best to manually merge the dev/prod
config files with the new canonical defaults, deleted some migrations I
don't think are relevant to us, and yeah!
Also, Rails 7.1 seems to need `libyaml-dev` installed, so I added that
to the `deploy/setup.yml` playbook!
One thing to note is that, while I was here, I turned on some settings
relating to our use of SSL that technically weren't on before. This
should be fine and helpful? But if stuff breaks, well, check those!
Idk why, but unlike my previous experience with Rails devcontainers, this time the setup process is running so wildly slowly?
Might just be a transient issue on my machine, maybe something that would be improved with a restart and trying again another time? Or could be something about the MySQL image that doesn't run great in this context?
In any case, I'm just gonna set this down for now!
Now, like in DTI 2020, opening an outfit will go straight to the editor.
I'm not 100% on whether this is actually like. the superior behavior?
But I think it's good enough, and it's what the wardrobe-2020 code
expects, so let's just roll with it for now!
This was used by the Neopia server to send us the modeling data it requested out-of-band. But now we do all our modeling requests back in-app again, so we don't need this!
I hope this doesn't cause problems! But yeah, with Puma doing threading, and maybe switching to Falcon someday to get even better concurrency properties, I feel like this will probably be fine?
And it makes the UX a loootttt better, to be back in the world where all these forms just work, whew.
Looking at the docs, I think what changed is that `throttled_responder` gets the request as an argument instead of the `env`? And has the same return type for the lambda as before?
So uhhh I don't remember how to test this, but uhh it's not crashing when the server starts anymore, and I feel like the most likely problem here would be that you get a 500 instead of a useful response in the rate limit case, so like. ehh I'll just leave it be!
I did some refactoring while here too, of pulling the deploy scripts out of `package.json` and into `bin`, to be a bit more canonically Rails-y. (idk how canonical the colon thing is but, probably fine??)
I don't know enough about our caching situation to know where memcache performs meaningfully better than Rails's in-memory cache. Let's delete it for now and see if there's a problem, to simplify the deploy environment!
Yay it's working! We set up the box, install Ruby, upload a placeholder app, set it up as a service, and get it hooked up to nginx!
Next, we'll add the script to upload the latest version of the site. We just need to slot it into `/srv/impress/current`, run `bundle install`, and that should basically be that! (Oh, and we need to compile production assets—I wonder if it's useful to do that on the dev machine instead of on the target? That might save us from needing to install Node. Or maybe we'll have to anyway!)
Hey nice!!
Note that I removed an account delete button from the settings page. You can still send a DELETE request to the right endpoint to do it, but it's not gonna delete all the associated records, and I wanna think a bit about how to handle that better before exposing that button.
A lot of rough edges here (e.g. no styles on the flash messages), but it's working and that's good!!
I tested this by temporarily switching to the production database and logging in as matchu!
Still missing a lot of big features too, like registration, password resets, settings page, etc.
This removes login/logout/session logic for integrating with OpenNeo ID, replacing them with stubs that just redirect to `/?TODO` when you click login, and helpers that act as if you're not logged in.
This gives us a clean slate to plug in new Devise logic to integrate with the `openneo_id` database directly!
This will enable us to access the auth records, which we store in a separate database for weird legacy reasons!
We don't do anything else yet, just set up the connection to be available.
(NOTE: This commit was a bit of a history rewrite: we started working on this with `database.yml` still gitignored, but then in 8fb6e82 we added it back in to be able to fix a bug in 44c42f9. So previously this branch added back `database.yml` to git *and* added `openneo_id` to it, but since then I've rebased against the other changes, and rewrote history to make this a change to *just* add the database! I also moved it in the timeline, to be before some of the other things that depend on it.)
Without this, searches for negative of `fits` or `species` would crash, bc somewhere Rails set the default SQL mode to be stricter than before. This just sets it back!
We gitignored it a long time ago as the way to hide our db secrets, but that's not how we manage them anymore! (Or, well, we haven't done production deployment with this new setup yet, but you get the point.)
This helps clarify what the database config oughta look like!
Whew! Seems like a pretty clean one? Ran `rails app:upgrade` and stuff, and made some corrections to keyword arguments for `translate` calls. There might be more such problems elsewhere? But that's hard to search for, and we'll have to see.
This one was pretty straightforward yaay! Main thing was the change from `render file` to `render template` in a couple places, oh and a thing with complex `order()` clauses.
The session format changed, so we change the session cookie name rather than have things crash about it! (I hope the actual prod behavior is to ignore bad cookies rather than crash? But I figure this is more reliable anyway.)
Some important little upgrades but mostly straightforward!
Note that there's still a known issue where item searches crash, I was hoping that this was a bug in Rails 4.2 that would be fixed on upgading to 5, but nope, oh well!
Also uhh I just got a bit silly and didn't actually mean to go all the way to 5.2 in one go, I had meant to start at 5.0… but tbh the 5.1 and 5.2 changes seem small, and this seems to be working, so. Yeah ok let's roll!
Idk I guess these are the default place to put certain settings, but idk if they're still canonical, and I'd rather just not have files that don't mean anything rn!
Idk exactly what's going on with dotenv-deployment, if it turns out it was critical to our deploy process then we'll change the deploy process! It's deprecated and conflicts with gem deps for `dotenv-rails`.
At one point we piloted a "Camo" service to proxy HTTPS image urls for us, but it doesn't exist anymore.
We already have proxies and stuff for this, so I left `Image` as a placeholder for this, but it's not working yet!
This also deletes our final reference to the Addressable gem, so we can remove it!
I don't think these work anymore, and our volunteers get new items into the db fast anyway, Impress 2020 is doing better spidering these days. And then we get to remove the cron job `whenever` gem!
Yay, we've deleted all our background tasks!
We'll probably want to replace some of the basic functionality like certain caching? But we can deal with that as we run into it.
The direct motivation here was a seeming version conflict between Rails 4.2's rack dependency and latest Resque's rack dependency... but this is just nice complexity elimination regardless, we want this anyway :3
We've already swapped out the backend for this stuff to Impress 2020, so the resque task and the broken image report UI aren't actually relevant anymore. Delete them!
This helps us delete Resque soon too.
Idk this one might actually be a bit of a pain to load? But I'd want to optimize it differently anyway, and there's overhauls we're already planning to do here.
During this upgrade process, `rails server` hasn't been updating its logic when files changed, so every change had to be accompanied by a restart.
This turned out to be because Vagrant's networked filesystem to share between the host and guest systems doesn't support the filesystem update events Rails is listening for. So, we switch to a simpler file watcher that does more work but doesn't depend on the filesystem events!
It's unused, and I'm just double-checking that it's not somehow causing the issues with the rails dev server not reloading classes. (The `threadsafe!` option would do that, but I don't thiiiink this is the env we're running? But I'm wondering if the loader is getting confused by the prefixiness of the name or something. Unlikely!)
This is recommended by the Rails 4.0 upgrade guide:
> The caching method changed between Rails 3.x and 4.0. You should change the cache namespace and roll out with a cold cache.
I noticed too that old cache entries with old character encodings were a real problem, so yeah making sure we're working with a cold cache is smart!!
We'll need to replace the item search query stuff with direct MySQL queries, but that's not ready yet bc the app still isn't booting, so we're committing this in a known broken state for now!
I haven't logged into newrelic in a billion years, let's just stop sending them stuff
(This is a precursor to an attempt to delete flex stuff too and replace our elasticsearch stuff with direct mysql queries like Impress 2020 does, but that'll be more work!)
I guess the APIs changed here, but these were placeholder settings we weren't actually using anyway (cuz we use the OpenNeo ID integration), so I just commented them out and it seems fine for now!
NOTE: This doesn't boot yet! There's something changed in the `devise` API that we'll need to fix!
```
/vagrant/config/initializers/devise.rb:46:in `block in <top (required)>': undefined method `encryptor=' for Devise:Module (NoMethodError)
```
But yeah, we navigated the gem upgrades, and also I ran `rake rails:update` and hand-processed the suggestions it had for our config files.
Rather than figure out how to upgrade the Stripe gem to be compatible with future Rails, I'd rather just delete the references, since it's currently unused.
I'm not so bold as to go in and fully trash all our donation code; I just want to ensure we're not sending people down broken codepaths, and that if they reach them, the error messages are clear enough.
We've been serving images directly from `impress-asset-images.s3.amazonaws.com` for a long time. While they serve with long-lasting HTTP cache headers, and the app requests them with the `updated_at` timestamp in the query string; each GET request still executes a full S3 ReadObject operation to get the latest version.
In the past, this was only relevant to users on Image Mode, not Flash Mode. But now that everyone's on Image Mode, this matters a lot more!
Now, we've configured a Fastly host at `impress-asset-images.openneo.net`, to sit in front of our S3 bucket. This should dramatically reduce the GET requests to S3 itself, as our cache warms up and gains copies of the most common asset PNGs.
That said, I'm not sure how much actual cost impact this change will have. Our AWS console isn't configured to differentiate cost by bucket yet—I've started this process, but it might take a few days to propagate. All I know is that our current costs are $35/mo data transfer + $20/mo storage, and that outfit images are responsible for most of the storage cost. I hypothesize that `impress-asset-images` is responsible for most of the reads and data transfers, but I'm not sure!
In the future, I think we'll be able to bring our AWS costs to near-zero, by:
- Obsolete `impress-asset-images`, by using the official Neopets PNGs instead, after the HTML5 conversion completes.
- Obsolete `impress-outfit-images`, by using a Node endpoint to generate the images, fronted by a CDN cache. (Transfer the actual data to a long-term storage backup, and replace the S3 objects with redirects, so that old S3 URLs will still work.)
I hope this will be a big slice of the costs though! 🤞
(Note: I'll be deploying this on a bit of a delay, because I want to see the DNS propagate across the globe before flipping to a new domain!)