Because we ended up with such a big error, and it doesn't have an easy
fix, I'm wrapping up today by reverting the entire set of refactors
we've done lately, so modeling in production can continue while we
improve this code further over time.
I generated this commit by hand-picking the refactor-y commits
recently, running `git revert --no-commit <hash>` in reverse order,
then manually updating `pet_spec.rb` to reflect the state of the code:
passing the most important behavioral tests, but no longer passing one
of the kinds of annoyances I *did* fix in the new code.
```shell
git revert --no-commit 48c1a58df9
git revert --no-commit 42e7eabdd8
git revert --no-commit d82c7f817a
git revert --no-commit 5264947608
git revert --no-commit 90407403ba
git revert --no-commit 242b85470d
git revert --no-commit 9eaee4a2d4
git revert --no-commit 52ca41dbff
git revert --no-commit c03e7446e3
git revert --no-commit f81415d327
git revert --no-commit 13ceec8fcc
```
This bug never made it into production I think, it was a consequence of
some of how I refactored stuff in the recent changes? I think??
But yeah, I refactor how we manage `SwfAsset#body_id`, to be a bit more
explicit about when and how it can change, instead of the weird
callbacks that tbqh have bit us too often…
Not using this on the item page preview yet, but we will!
I like this approach over e.g. a web component specifically for the
sandboxing: while I don't exactly *distrust* JS that we're loading from
Neopets.com, I don't like the idea of *any* part of the site that
executes arbitrary JS unsafely at runtime, even if we theoretically
trust where it theoretically came from. I don't want any failure
upstream to have effects on us!
I copied basically all of the JS from a related project
`impress-media-server` that I had spun up at one point, to investigate
similar embed techniques. Easy peasy drop-in-squeezy!
I think the Rails query cache handled these anyway? But `SwfAsset` has
a `before_save` hook that checks its zone's info, and
`SwfAsset.preload_manifests` saves all the assets, on the assumption
that saving is a no-op when the record didn't change anyway. And it
basically is!
But I figure that, now that I'm realizing hooks exist, simply not
attempting to save unchanged records is probably a better
representation of what we intend to do. So I'm fixing it like that!
Another potential fix would be to preload the zones for these assets,
but I think that confuses the intent too much; the method itself isn't
using the zones, it's just a weird incidental thing that a save hook
happens to use. (Would probably be better to refactor this old save
hook into a different situation altogether, but that's for another
time!)
This hasn't been causing issues as far as I know, I just noticed
*months ago* that I forgot to do this, and have had a sticky note about
it on my desk since then lol.
I tested this by temporarily setting the timeout to `0.5`, and watching
it fail!
When we moved more logic into the main app, we made some assumptions
about manifest art that were different than Impress 2020's, in hopes
that they would be More Correct for potential future edge cases.
Turns out, they were actually *less* correct for *current* edge cases!
Chips linked us to a few examples, including this Reddit post:
https://www.reddit.com/r/neopets/comments/1b8fd72/i_dont_think_thats_the_correct_image/
Fixed now!
First off, I think our code has converged on a convention of gracefully
returning `nil` for manifest-less situations, so we can do that instead
of raise! And then that lets us just simplify this check to whether
`manifest` is present, instead of `manifest_url`, so we stop crashing
in cases where we get to this point in the code and there's a manifest
URL but not a manifest.
This was a bit tricky! When I initially turned it on, running
`rails swf_assets:manifests:load` would trigger database errors of "oh
no we can't get a connection from the pool!", because too many records
were trying to concurrently save at once.
So now, we give ourselves the ability to say `save_changes: false`, and
then save them all in one batch after! That way, we're still saving by
default in the edge cases where we're downloading and saving a manifest
on the fly, but batching them in cases where we're likely to be dealing
with a lot of them!
Now we're *really* duplicating with Impress 2020's system lol, but I
need a way to not keep trying to load manifests that are actually 404,
which are surprisingly plentiful!
This doesn't actually stop us from loading anything yet, it just tracks
the timestamps and the HTTP status! But next I'll add logic to skip
when it was 4xx recently.
The alt styles controller is the one place we use this right now, but
I'm planning to generalize this to loading appearances during item
search, too!
I also add more `only` fields to the alt styles `as_json` call, because
idk it feels like good practice to both 1) say what we need in this
endpoint, rather than rely on default behavior upstream, and 2) to
avoid leaking fields we didn't realize were on there. (And also to
preserve bandwidth, too!)
I think there's no call sites for these anymore, so now I can start
repurposing these methods for the new API endpoints I'm planning! :3
Now, `SwfAsset#image_url` approximately matches Impress 2020 logic: use
the thumbnail PNG from the manifest if one exists, or the Impress 2020
converter for canvas movies, or the old AWS copy generated by gnash if
necessary, or return nil.
I think this used to be used in an API endpoint we've now deleted? I'm
just cleaning up call sites because I intend to refactor the `urls`
method and stuff, so I'm removing cruft that would complicate it!
I'm not certain-certain this is unused, but I did a global search for
`\bimages\b` in the codebase, and didn't find anything that looked like
a match to me!
Doing that sweet, sweet backfill!! It's not exactly *fast*, since
there's about 570k records to work through, but it's pretty good all
things considered! Thanks, surprisingly-reusable async code!
I'm gonna also use this for a task to try to warm up *all* the
manifests in the database! But to start, just a simple one, to prepare
the alt styles page quickly on first run. (This doesn't really matter
in production now that I've already visited the page once, but it helps
when resetting things in dev, and I think more it's about establishing
the pattern!)
The Neopets Media Archive is a service that mirrors `images.neopets.com`
over time! Right now we're starting by just loading manifests, and
using them to replace the hacks we used for determining the Alt Style
PNG and SVG URLs; but with time, I want to load *all* customization
media files, to have our own secondary file source that isn't dependent
on Neopets to always be up.
Impress 2020 already caches manifest files, but this strategy is
different in two ways:
1. We're using the filesystem rather than a database column. (That is,
manifest data is kinda duplicated in the system right now!) This is
because I intend to go in a more file-y way long-term anyway, to
load more than just the manifests.
2. Impress 2020 guesses at the manifest URLs by pattern, and reloads
them on a regular basis. Instead, we use the modeling system: when
TNT changes the URL of a manifest by appending a new `?v=` query
string to it, this system will consider it a new URL, and will load
the new copy accordingly.
Fun fact, I actually have been prototyping some of this stuff in a side
project I'd named `impress-media-server`! It's a little Sinatra app
that indeed *does* save all the files needed for customization, and can
generate lightweight lil preview iframes and images pretty easily. I
had initially been planning this as a separate service, but after
thinking over the arch a bit, I think it'll go smoother to just give
the main app all the same access and awareness—and I wrote it all in
Ruby and plain HTML/JS/CSS, so it should be pretty easy to port over
bit-by-bit!
Anyway, only Alt Styles use this for now, but my motivation is to be
able to use more-correct asset URL logic to be able to finally swap
over wardrobe-2020's item search to impress.openneo.net's item search
API endpoint—which will get "Items You Own" searches working again, and
whittle down one of the last big things Impress 2020 can do that the
main app can't. Let's see how it goes!
Preparing to finally move wardrobe-2020's item search to use the main
app's API endpoints instead!
One blocker I forgot about here: Impress 2020 has actual support for
knowing an item's true appearance, like by reading the manifest and
stuff, that we haven't really ported over. I feel like maybe I should
pause and work on the changes to manifest-archiving that I'd been
planning anyway? I'll think about it.
Yay it works(*)! But two major missing pieces:
- Outfit saving doesn't persist it at all
- Item compatibility is unaffected: items will still appear in search
and in the preview, even when they don't fit anymore.
We haven't used the mall spider in this app in forever (I guess we even
deleted the code at some point?), but there was some vestigial stuff
left. Goodbye!
Preparing a better endpoint for wardrobe-2020 to use! I deleted the
now-unused swf_assets#index endpoint, and replaced it with an
"appearances" concept that isn't exactly reflected in the database
models but is a _lot_ easier for clients to work with imo.
Note that this was a big part of the motivation for the recent
`manifest_url` work—in this draft, I'm probably gonna have the client
request the manifest, rather than use impress-2020's trick of caching
it in the database! There's a bit of a perf penalty, but I think that's
a simpler starting point, and I have a hunch I'll be able to make up
the perf difference once we have the impress-media-server managing more
of these responsibilities.
Ok so, impress-2020 guesses the manifest URL every time based on common
URL patterns. But the right way to do this is to read it from the
modeling data! But also, we don't have a great way to get the modeling
data directly. (Though as I write this, I guess we do have that
auto-modeling trick we use in the DTI 2020 codebase, I wonder if that
could work for this too?)
So anyway, in this change, we update the modeling code to save the
manifest URL, and also the migration includes a big block that attempts
to run impress-2020's manifest-guessing logic for every asset and save
the result!
It's uhh. Not fast. It runs at about 1 asset per second (a lot of these
aren't cache hits), and sometimes stalls out. And we have >600k assets,
so the estimated wall time is uhh. Seven days?
I think there's something we could do here around like, concurrent
execution? Though tbqh with the nature of the slowness being seemingly
about hitting the slow underlying images.neopets.com server, I don't
actually have a lot of faith that concurrency would actually be faster?
I also think it could be sensible to like… extract this from the
migration, and run it as a script to infer missing manifest URLs. That
would be easier to run in chunks and resume if something goes wrong.
Cuz like, I think my reasoning here was that backfilling this data was
part of the migration process… but the thing is, this migration can't
reliably get a manifest for everything (both cuz it depends on an
external service and cuz not everything has one), so it's a perfectly
valid migration to just leave the column as null for all the rows to
start, and fill this in later. I wish I'd written it like that!
But anyway, I'm just running this for now, and taking a break for the
night. Maybe later I'll come around and extract this into a separate
task to just try this on all assets missing manifests instead!
Okay, this is a process that idk if it's even been working for a while anyway, I don't think Neopets translates item names anymore?
And it's crashing when I try to model stuff now, so like. yeah ok I'm fine with just skipping this, it's a shame to lose out on potential data going forward but *I think there just isn't data to get anyway*
I think we used this for both conversion to image, and also for CORS stuff when rendering Flash-based previews… let's trash it, I don't want to be growing our hard drive with files I don't think we use anymore!
If I'm wrong and it turns out we do use them for something, then like. hey I'm sure we'll find out soon enough, and it's very recoverable operation.
Some important little upgrades but mostly straightforward!
Note that there's still a known issue where item searches crash, I was hoping that this was a bug in Rails 4.2 that would be fixed on upgading to 5, but nope, oh well!
Also uhh I just got a bit silly and didn't actually mean to go all the way to 5.2 in one go, I had meant to start at 5.0… but tbh the 5.1 and 5.2 changes seem small, and this seems to be working, so. Yeah ok let's roll!
Some tricks required here to get the dependencies to work out, but we got it!!
Oh also, we move away from the rbenv in Ubuntu's package manager, because it doesn't support more recent Rubies like 2.4.10.
Using `s3_path` and stuff made it sound like we were still referencing the original Amazon S3 images - but actually our new asset proxy just uses the same path structure, and we didn't change anything about it.
Oh also I deleted an after_conversion method that isn't used anymore, forgot about that!
We've already swapped out the backend for this stuff to Impress 2020, so the resque task and the broken image report UI aren't actually relevant anymore. Delete them!
This helps us delete Resque soon too.
Right, previously we were querying "has *at least one asset* that is not in zone X" instead of "has NO assets that are in zone X".
I don't know a fast way to query for that, this will have to do for now!
Ohh ok, without this change all of our `scope`s were just immediately evaluating the argument and fetching _all_ such matching records immediately, instead of waiting to actually be called. This led to bugs like `pet_type.as_json` returning ALL pet states in the whole db, because the `PetState.emotion_order` scope was being treated as a single predefined query, rather than a query fragment to merge into the current context.
This also explains what happened in 724ed83: that's why things before the scope in the query were being ignored.