Now we're *really* duplicating with Impress 2020's system lol, but I
need a way to not keep trying to load manifests that are actually 404,
which are surprisingly plentiful!
This doesn't actually stop us from loading anything yet, it just tracks
the timestamps and the HTTP status! But next I'll add logic to skip
when it was 4xx recently.
Doing that sweet, sweet backfill!! It's not exactly *fast*, since
there's about 570k records to work through, but it's pretty good all
things considered! Thanks, surprisingly-reusable async code!
I'm not sure where these duplicate records have been coming from over
the years (I checked the timestamps and it's been happening
occasionally since 2013 up to late last year, there were ~1,600
instances), but for now let's just get rid of them!
This is related to the issues we've been addressing lately where some
biology assets have manifests but no PNG specified in them: the older
copies of the assets would have our generated PNG as a fallback, but
the newer copies would get served as part of the pet appearance *in
addition to* the older copies, and the newer copies would be marked as
having no DTI-generated image, which our system wasn't always able to
handle.
We've primarily been addressing this by leaning into more graceful
failure modes of skipping certain layers, but… these layers *shouldn't
be here*, and are cluttering up support tools and such; let's be rid of
them!
I ran this today seemingly without issue, but I kept a backup of the
`yarn db:export:public-data` task in `impress-2020` to be able to check
and rollback if we discover a mistake.
One last note: the `ORDER BY` clause in the `GROUP_CONCAT` call was a
late addition, *after* I ran this in production. Scanning the console
output, it seems like ordering by ID was MySQL's default behavior here
anyway (makes sense!), so I'm not gonna bother to rollback and re-run,
but I think specifying this is helpful to ensure we're not depending on
unspecified behavior and to be really clear about our intentions of
which record to keep (the one with the smallest DTI ID number).
Something in the Rails loader doesn't like that I have both a gem and
a lib folder named `RocketAMF`, I think? It'll often work for the first
pet load request, then on subsequent ones say `Envelope` is not
defined, I'm guessing because it scrapped the gem's module in favor of
mine?
Idk, let's just simplify all this by making our own module. I feel like
this old lib could use an overhaul and simplification anyway, but this
will do for now!
The models folder is a bit confusingly large, these are more mixins and
kinda clutter it. Push them off into `lib`, I think!
I think they used to be in models mainly because Rails used to handle
`lib` differently with autoloading, and it made for a worse dev
experience. Now it's all the same, though!
At this point, I've gone through all the assets, and the only ones
without manifests are:
1. The ones that truly have no manifest yet (that we know of)
2. The ones where execution happened to time out
I think the 5-second timeout is a very reasonable default for starting
the backfill, in a way that prioritizes moving forward; but now that we
have most things, I'd rather be able to re-run it with a more generous
timeout. So here we are!
I tried to port the Rainbow Pool ones forward, but ran into issues with the
service that uses browser-specific stuff to check that traffic is valid :/
Incidentally, those were the only places we were using `rest-client`.
Goodbye!
I noticed a thing with like, an asset that I think referenced an item that
doesn't exist, which caused an error in the `body_specific?` validation
step?
Tbh that validation step needs fixed up in a number of ways, but I'm
scared to, since it's hard to know what will break modeling lol.
But in any case, more graceful handling is nice! If something happens,
I'd rather leave it null and try again later than have the job crash!
It's not just that none of them were 200 OK, it's that they were all 404.
In the event that something returns not-200 and not-404, we immediately
abort, so we shouldn't get to this case unless they were all 404!
Okay, I've simplified the migration to *just* add the column, and
instead added a task to find assets without manifest URLs and backfill
them.
Performance is a lot better now, using the `async-http` library, which
as I understand it supports both persistent connections when invoked
like this, and maybe also HTTP/2 multiplexing?? (Though I'm not
actually sure images.neopets.com does lol)
I'm not sure about the number of concurrent tasks I picked here, 100
seems okay for an internet thing and for such small requests, but I
worry that the CDN is gonna get annoyed or something. Well, we'll see!
This task is very resumable if it turns out we get frozen out or
something.
Okay, right, if we're just using www.neopets.com (like we are for now), it fails on http://www.neopets.com because it triggers a redirect that we don't follow.
So here I 1) change the default to HTTPS, and 2) add HTTPS support to our little RocketAMF lib
This removes login/logout/session logic for integrating with OpenNeo ID, replacing them with stubs that just redirect to `/?TODO` when you click login, and helpers that act as if you're not logged in.
This gives us a clean slate to plug in new Devise logic to integrate with the `openneo_id` database directly!
Some tricks required here to get the dependencies to work out, but we got it!!
Oh also, we move away from the rbenv in Ubuntu's package manager, because it doesn't support more recent Rubies like 2.4.10.
I don't think these work anymore, and our volunteers get new items into the db fast anyway, Impress 2020 is doing better spidering these days. And then we get to remove the cron job `whenever` gem!
Yay, we've deleted all our background tasks!
We'll probably want to replace some of the basic functionality like certain caching? But we can deal with that as we run into it.
The direct motivation here was a seeming version conflict between Rails 4.2's rack dependency and latest Resque's rack dependency... but this is just nice complexity elimination regardless, we want this anyway :3
NOTE: This doesn't boot yet! There's something changed in the `devise` API that we'll need to fix!
```
/vagrant/config/initializers/devise.rb:46:in `block in <top (required)>': undefined method `encryptor=' for Devise:Module (NoMethodError)
```
But yeah, we navigated the gem upgrades, and also I ran `rake rails:update` and hand-processed the suggestions it had for our config files.
Looks like I forgot to update the RemoteGateway code to consider
that RocketAMF now returns strings. Like in the Pet code, I opted
to dump it into a HashWithIndifferentAccess rather than assume
that we'll forever use strings and it'll never get switched back
to symbols.