In impress-2020, we do a big slow query to figure out which users have
been active in trades recently. Now, we cache that timestamp on the
User model.
This won't have any immediate effect; it's to clear the way for Classic
DTI to receive the better trade ratios feature people like from 2020.
I also added some unit testing infra because I finally wanted it! for
all the ways you can trigger this timestamp lol
Note too that this is a bit of an unusually complex migration, but my
hope is that the batching and query structure and such helps it run
surprisingly fast! 🤞
Poking at the site, I didn't love that clicking any of the NC items on
the homepage took a few seconds to load cuz of the OWLS request taking
a couple seconds.
If the request were faster to load I might decide otherwise, but for
now, let's just keep that cache long.
Another approach we could take would be to ask for an endpoint that
returns all the values at once, so we could cache and update them all
together, instead of having it all split into separate cache keys!
So this was a slightly wrong error message, what was happening was:
1. Trying to load the image hash for this pet, by looking them up at
https://pets.neopets.com/cpn/PET_NAME/1/1.png and seeing what URL it
redirects to.
2. But pets.neopets.com was rejecting our User-Agent string, which
would've been just "Ruby", since we hadn't set it otherwise. I guess
that's an explicitly banned string?
I also found that the kind of more-helpful User-Agent string I like to
write was being rejected, and I could only get it to accept something
very simple? So that's what we're using now, I guess!!
Unlike previous attempts to do future-Yarn, this seemed to just mostly
go swimmingly, idk why! (I moved to a new computer and installed new
Yarn without really thinking, and hey nice!)
There was an issue where it was trying to do some glob expansion in
one of the arguments for the `build` script, fixed! (Idk why Yarn has
its own glob expansion and different from previous versions, but ok!)
I did a pretty thorough reset of my desktop machine, and rather than go
spelunking for the same private key, I just rolled it over to a new
one. Let's set it up!
Building toward replacing more of the 2020 data sources! I think this is
an endpoint that benefits from bulk loading, esp with the way the item
page previews work. I also like taking the concept of "canonical" out of
the GQL interface, and instead just loading for each of the 50 species
and letting the client decide. (And then it can fast-swap between them!)
The models folder is a bit confusingly large, these are more mixins and
kinda clutter it. Push them off into `lib`, I think!
I think they used to be in models mainly because Rails used to handle
`lib` differently with autoloading, and it made for a worse dev
experience. Now it's all the same, though!
Self-hosted Plausible instance! I have need of usage numbers again,
after a good few years of just not using it; but I don't want to send
the data to Google, and I enjoy self-hosting things, so here we have it!
We haven't used the mall spider in this app in forever (I guess we even
deleted the code at some point?), but there was some vestigial stuff
left. Goodbye!
There was a static page explaining it, which we no longer link to; and
there was an unused field in the User model for who was a beta tester
for it. Goodbye!
I noticed when running `rails routes` that there's a lot of routes for
major unused Rails features, like storage. I didn't look deeply enough
into ActiveStorage to know if I was risking accepting arbitrary file
uploads, I just figured, if I disable it (which simplifies the app
footprint anyway), then I can be certain! So, goodbye!
At this point, I've gone through all the assets, and the only ones
without manifests are:
1. The ones that truly have no manifest yet (that we know of)
2. The ones where execution happened to time out
I think the 5-second timeout is a very reasonable default for starting
the backfill, in a way that prioritizes moving forward; but now that we
have most things, I'd rather be able to re-run it with a more generous
timeout. So here we are!
Just moving more stuff over! I modernized Item's `as_json` method while
I was here. (Note that I removed the NC/own/want fields, because I
think the only other place this method is still called from is the
quick-add feature on the closet lists page, and I think it doesn't use
these fields to do anything: updating the page is basically a full-page
reload, done sneakily.)
There was a time when I used an old proxy server to try to fix mixed
content issues, and I eventually removed it but never took the tendrils
out from the code.
We probably _should_ figure out how to secure these URLs! But until
then, we may as well simplify the code.
Ok progress! We've moved the info about what bodies this fits, and
what zones it occupies, into a Rails API endpoint that we now load at
the same time as the other data!
We'll keep moving more over, too!
I changed my mind again! At first I wanted to make the special case
clearer, and to be able to more strongly assert that the species is
not null. But now I'm like… eh, there's code that references `body.id`
that has no reason _not_ to work in the all-bodies case… let's just
keep the types more consistent, I think.
This is more similar to what impress-2020 does, I was working on the
wardrobe-2020 code and took some inspiration!
The body has an ID and a species, or is the string "all".
Preparing a better endpoint for wardrobe-2020 to use! I deleted the
now-unused swf_assets#index endpoint, and replaced it with an
"appearances" concept that isn't exactly reflected in the database
models but is a _lot_ easier for clients to work with imo.
Note that this was a big part of the motivation for the recent
`manifest_url` work—in this draft, I'm probably gonna have the client
request the manifest, rather than use impress-2020's trick of caching
it in the database! There's a bit of a perf penalty, but I think that's
a simpler starting point, and I have a hunch I'll be able to make up
the perf difference once we have the impress-media-server managing more
of these responsibilities.
I tried to port the Rainbow Pool ones forward, but ran into issues with the
service that uses browser-specific stuff to check that traffic is valid :/
Incidentally, those were the only places we were using `rest-client`.
Goodbye!
I noticed a thing with like, an asset that I think referenced an item that
doesn't exist, which caused an error in the `body_specific?` validation
step?
Tbh that validation step needs fixed up in a number of ways, but I'm
scared to, since it's hard to know what will break modeling lol.
But in any case, more graceful handling is nice! If something happens,
I'd rather leave it null and try again later than have the job crash!
It's not just that none of them were 200 OK, it's that they were all 404.
In the event that something returns not-200 and not-404, we immediately
abort, so we shouldn't get to this case unless they were all 404!
Okay, I've simplified the migration to *just* add the column, and
instead added a task to find assets without manifest URLs and backfill
them.
Performance is a lot better now, using the `async-http` library, which
as I understand it supports both persistent connections when invoked
like this, and maybe also HTTP/2 multiplexing?? (Though I'm not
actually sure images.neopets.com does lol)
I'm not sure about the number of concurrent tasks I picked here, 100
seems okay for an internet thing and for such small requests, but I
worry that the CDN is gonna get annoyed or something. Well, we'll see!
This task is very resumable if it turns out we get frozen out or
something.
Ok so, impress-2020 guesses the manifest URL every time based on common
URL patterns. But the right way to do this is to read it from the
modeling data! But also, we don't have a great way to get the modeling
data directly. (Though as I write this, I guess we do have that
auto-modeling trick we use in the DTI 2020 codebase, I wonder if that
could work for this too?)
So anyway, in this change, we update the modeling code to save the
manifest URL, and also the migration includes a big block that attempts
to run impress-2020's manifest-guessing logic for every asset and save
the result!
It's uhh. Not fast. It runs at about 1 asset per second (a lot of these
aren't cache hits), and sometimes stalls out. And we have >600k assets,
so the estimated wall time is uhh. Seven days?
I think there's something we could do here around like, concurrent
execution? Though tbqh with the nature of the slowness being seemingly
about hitting the slow underlying images.neopets.com server, I don't
actually have a lot of faith that concurrency would actually be faster?
I also think it could be sensible to like… extract this from the
migration, and run it as a script to infer missing manifest URLs. That
would be easier to run in chunks and resume if something goes wrong.
Cuz like, I think my reasoning here was that backfilling this data was
part of the migration process… but the thing is, this migration can't
reliably get a manifest for everything (both cuz it depends on an
external service and cuz not everything has one), so it's a perfectly
valid migration to just leave the column as null for all the rows to
start, and fill this in later. I wish I'd written it like that!
But anyway, I'm just running this for now, and taking a break for the
night. Maybe later I'll come around and extract this into a separate
task to just try this on all assets missing manifests instead!
Ahh, I guess I missed these, I think they're maybe not actually used in
the app is why? cuz they're all default values that are overridden at
the actual call sites. But I ran into it when running `Pet.load` in the
console, and yeah let's just fix 'em up!
This hasn't worked for a while, and I don't know an API off the top of
my head to drop in for it. Let's just delete it for now, and revisit it
later if we want to!