Now, someone with production DB access can run `yarn db:export:public-data` to create `public-data-constants.sql` and `public-data-from-modeling.sql`.
Then, someone setting up their dev database can run `yarn db:setup-dev:full` and get all the wearables data imported right into their dev database!
I'm noticing just how poorly I'm keeping up with my own goals for finishing up DTI, and wondering if now is a good time to circle back to some old offers for code contributions I got last year… I also just figure that making this app Possible To Run with a backup of the basic public database is like. a pretty handy thing to have for archival's sake imo
Note that, for this change, we also set up Git LFS (Large File Storage). Github should be automatically compatible with this! It's a way to not write the whole 30MB database dump into the repository history, and instead keep it in a secondary filestore, because Git's core algorithms aren't really built to handle large blobs of data very well. Users setting up their dev environment will therefore also need to have Git LFS installed for this script to work! (Otherwise, they'll see a "pointer" file in `public-data-from-modeling.sql.gz` that contains some metadata about the file state but not the data itself.)
Ok great, we can now run the delta archive process!
It'd be nice to get this running on cron on the impress-2020 server, to a temporary folder? I *do* want to be remembering to run something regularly on my personal machine too though, to keep my own copy up-to-date…
That's what I get for not fully testing lmao! But right, paths in shell scripts are relative to the working directory, and if I want to be relative to the script I gotta use dirname!
Finally playing with this, now that we've been doing paginated search results in the main element! Let's see how it goes 😳
I made a thing to make the pagination toolbar smaller (might want to do that on the mobile view too?), and also to put the search suggestions in a popover floating at the top of the search box.
I tried to do this earlier, but the caching problem from the previous commit (where we weren't including `id` for the search result in the GQL query) was causing it to do a like, infinite loop thing, where the preload results would cache-invalidate the current results, and so the 3 queries would just fight for which one's in the cache?
But now that caching is working, this is working too! Makes it all feel a lot snappier :3
Apollo Client is pretty darn reliant on an `id` field for effective caching, more often than you'd think!
Before this change, navigating back to a page you'd already loaded would cause it to reload. After this change, it no longer does, and serves the page from cache instead!
We also didn't need the query one, because we now `key` the `SearchResults` by the query, so the container becomes empty-then-full-again, which resets scroll back to top.
idk this has been a long-time popular request, so I'm just gonna like. throw it all the way out there. and see what people think of it
I'm a bit worried it might change up the mobile experience too much? But like. let's find out!
My intention is to move this out of PaginationToolbar entirely, so that it becomes a component we can reuse in a non-URL-state setting. (I'm looking at using pagination for the wardrobe item search is why!)
We do a thing where we sometimes proactively update an appearance layer's manifest from images.neopets.com when it's been a while since the last time, _during_ user requests.
But when images.neopets.com is being slow, this makes our API requests about appearances super slow, too!
In this change, we add a 2-second timeout to those requests. That should be plenty for when images.neopets.com is in a good mood, but also give up fast enough for the site to not feel miserable lol :p (especially when the use "Use DTI's image archive" option is on!)
Sat down and thought about the structure here and how to make the full/delta stuff make more sense together! Here's what I came up with!
In both full and delta archiving, we prepare the manifest, we create the local archive, then we upload it to remote.
I like running the full `archive:create` to help us be _confident_ we've got the whole darn thing, but it takes multiple days to run on my machine and its slow HDD, which… I'm willing to do _sometimes_, but not frequently.
But if we had a version of the script that ran faster, and only on URLs we still _need_, we could run that more regularly and keep our live archive relatively up-to-date. This would enable us to build reliable fallback infra for when images.neopets.com isn't responding (like today lol)!
Anyway, I stopped early in this process because images.neopets.com is bad today, which means I can't really run updates today, lol :p but the delta-ing stuff seems to work, and takes closer to 30min to get the full state from the live archive, which is, y'know, still slow, but will make for a MUCH faster process than multiple days, lol
Should be a smooth drop-in replacement, we give the field an alias `imageUrl` in the query, so the rest of the app is none the wiser!
I didn't test the layer upload cache invalidation, but it seems pretty obvious to me, so ehh I'm just shipping it lmao
Without this, 150x150 and 300x300 outfit thumbnails would fail to render new item layers where we didn't have an AWS image layer. Now, they correctly render the new stuff!
I tested this with the new "Spooky Stitches Markings" on the Grarrl, which has a blank image in AWS, but works correctly in the new code by loading the image from neopets.com!
This'll both hide sections that are empty (which just wasn't plausible for a long time), and print a happy lil message if there's no sections to show at all!
We seem to have everything modeled now, and we have automatic modeling, so like… this is not a useful link for the general public anymore!
Instead, we'll just keep it a secret for us to check on the state of things!
Print out the image hash for easier debugging (can look up the custom data ourselves to check it), and also fix a bug with retries not carrying `contextString` through, oops!
Okay, this is gonna be a drop-in new backend for impress-asset-images.openneo.net, to enable Classic DTI to use the same images as DTI 2020!
This will enable us to stop generating images and uploading them to S3 just for Classic's sake, so we can turn those background processes off! And the new modeling script skips that anyway, so this is an important compatibility step for the new data that went out today!
We're gonna update impress-asset-images.openneo.net to perform redirects and stuff, so Classic DTI can start using the same images that DTI 2020 does.
That should enable us to stop relying on AWS for images, which is important because the new modeling script breaks that anyway :p but this will also let us turn off the image converters that run in the background all the time, and I'm excited for that too!
It seems to be working!! How exciting!! I'm just letting it run on stuff now :3
One important issue is that Classic DTI doesn't show images for items modeled this way, because we don't download the SWFs for it. But I wanna update it to stop using AWS anyway and do the same stuff 2020 does, I think we can do that pretty sneakily!
Ok so, I kinda assumed that the query engine would only compute `all_species_ids_for_this_color` on the rows we actually returned, and it's a fast subquery so it's fine. But that was wrong! I think the query engine computing that for _every_ item, and _then_ filter out stuff with `HAVING`. Which makes sense, because the `HAVING` clause references it, so computing it makes sense!
In this change, we inline the subquery, so it only gets called if the other conditions in the `HAVING` clause don't fail first. That way, it only gets run when needed, and the query runs like 2x faster (~30sec instead of ~60sec), which gets us back inside some timeouts that were triggering around 1 minute and making the page fail.
However, this meant we no longer return `all_species_ids_for_this_color`, which we actually use to determine which species are _left_ to model for! So now, we have a loader that also basically runs the same query as that condition subquery.
A reasonable question would be, at this point, is the `HAVING` clause a good idea? would it be simpler to do the filtering in JS?
and I think it might be simpler, but I would guess noticeably worse performance, because I think we really do filter out a _lot_ of results with that `HAVING` clause—like basically all items, right? So to filter on the JS side, we'd be transferring data for all items over the wire, which… like, that's not even the worst dealbreaker, but it would certainly be noticed. This hypothesis could be wrong, but it's enough of a reason for me to not bother pursuring the refactor!
I'm looking into what it would take to update the archive on a regular basis. The commands right now *are* pretty good at avoiding duplicate work… but the S3 upload still seems like it's taking very long even to just validate what's in the archive already. We might have to build our own little cache rather than using `aws s3 sync`, if we want faster incremental updates?
Here, I make a few quality-of-life changes to add a `archive:create` command that runs everything in a straight line. That way, I can let it run and see how much wall-time it takes, to be able to decide whether speeding it up feels necessary. (vs whether it's a few-hours task I can just set a reminder to manually run every week or something)
Oops, Next.js has built-in request body parsing that happens automatically. So it was giving us a `req.body` string, and our code to read in the body and put it in a buffer was waiting forever!
Thankfully, it's much easier than I expected to turn that behavior off for just one route. Now it works like before, so our existing code works again, ta da!
Before, we were using the ContentType from the S3 object, which was unreliable. This helps us behave better for query-string files!
We also add a filename via Content-Disposition for the files that auto-download and for the Save As case. Idk if this is super important exactly, but I feel like it'll be a lifesaver for anyone using this to get at a specific file for their own reference at any point! and it just seems polite lol
Okay there we go, I was following Linode's guidance and ended up using a non-Amazon S3 client, but it turns out you can get the official Amazon AWS client to play with private services too, and it doesn't do the thing s3cmd does of trying to list every single file before doing anything 😅
This command _is_ doing weird stall-outs here and there, but is mostly just chugging along. It's not exactly fast, I imagine it'll take some time, but the fact that it's like. working. is huge lmao
Okay so the funny thing is that my upload script is clearly like *super* not working lol, it's been running more than an hour now and still hasn't finished listening the files. So there's only actually a handful of files to test with here, from the `archive:create:upload-test` script!
But anyway, uhh once the archive is actually uploaded, this is a way to read it back! Mainly as a way to assure me that it's all saved correctly, but also as a potential backup for images.neopets.com if it goes down again sometime.
This will upload to our remote storage! We're using `s3cmd`, but our storage isn't actually Amazon S3, it's Linode Object Storage, which has an S3-compatible API. (And it's where our VPSes already are, and its pricing model is very generous for our relatively small scale of data.)
I haven't _really_ tested this exactly yet, because while `archive:create:upload-test` works great and uploads the 3 targeted files successfully… running the big one takes a very long time to even _enumerate_ all the files on my machine. (This makes sense, because I'm keeping the ~100GB archive on my HDD, which is not a fast disk!)
So I'm pushing ahead even though the script is untested, because I wanna work on other stuff too!
We now support returning `null` from `petAppearance` when a pet genuinely has no customization data.
We also deprecate some old fields, and update our own call site to match.
The item page got in a weird situation where this setting seemed to cause loading things _about_ `currentUser` to make the `useCurrentUser` GQL query flake out and say it's loading. This would make the layout bounce around a bunch while it tries to decide whether to show you the buttons or not.
I still don't love the UX of not having any loading state after login, but like… eh it's certainly better lmao
Ok! I think I got it! It's very very nastly tho lmao! But this will merge in the new SSR-provided data before the new page can render, instead of having it sometimes make redundant network requests & show loading spinners in the meantime for data that Next.js already fulfilled for it.
Nasty nasty lil trick. But it seems to be working! Let's see how it does lmao