The motivation is that I want VERCEL_URL and local net requests outta here :p and we were doing some cutesiness with leveraging the CDN cache to back the GQL fields. No more of that, folks! lol
My main inspiration for doing this is actually our potentially-huge upcoming Vercel bill lol
From inspecting my Honeycomb dashboard, it looks like the main offender for backend CPU time usage is outfit images. And it looks like they come in big spikes, of lots of low usage and then suddenly 1,000 requests in one minute.
My suspicion is that this is from users with many saved outfits loading their outfit page, which previously would show all of them at once.
We do have `loading="lazy"` set, but not all browsers support that yet, and I've had trouble pinning down the exact behavior anyway!
Anyway, paginating makes for a better experience for those huge-list users anyway. We've been meaning to do it, so here we go!
My hope is that this drastically decreases backend CPU hours immediately 🤞 If not, we'll need to investigate in more detail where these outfit image requests are actually coming from!
Note that I added the pagination to the existing `outfits` GraphQL endpoint, rather than creating a new one. I felt comfortable doing this because it requires login anyway, so I'm confident that other clients aren't using it; and because, while this kind of thing often creates a risk of problems with frontend and backend code getting out of sync, I think someone running old frontend code will just see only their first 30 outfits (but no pagination toolbar), and get confused and refresh the page, at which point they'll see all of them. (And I actually _prefer_ that slightly confusing UX, to avoid getting more giant spikes of outfit image requests, lol :p)
Hmm, right, okay, we *generally* should have all users imported to Auth0, but this can fail if the cron job is behind or Auth0 rejected the data (e.g. user data in a format it doesn't support).
Previously, this would apply the name change in the database, but return Auth0's "The user does not exist." error to the GraphQL client, making it look like the update fully failed.
In this change, we handle that case differently: when the Auth0 update fails with a 404, we proceed but log a warning; and when Auth0 fails with an unexpected error, we roll back the database change in addition to raising the error to the client, to keep the behavior obvious and consistent.
Did some stuff in here for parsing the default list ID too. We skipped that when making the new list index page, but now maybe you could reasonably link to the default list? 🤔 not sure it's a huge deal though
We update /api/assetImage to accept size as a parameter (I make it mandatory to push people into HTTP caching happy paths), and we update the GraphQL thing to use it in those cases too!
This also means that, if these images seem to go well, we could swap Classic DTI over to them… I want to turn off those RAM-heavy image converters on the VPS lol
In this change, we start using our new API endpoint for movie image URLs, instead of the Classic DTI image.
This should make the little fade-in phase for certain movies a little bit less jarring (the part where we preload the image before the movie loads), though I suppose that won't necessarily load as fast until it gets into the cache the first time lol. (A good reason to maybe put a more long-lived cache like Fastly in front of this stuff long-term?)
Not doing it for the smaller image sizes yet, I'm a bit worried that I don't 100% know how to teach /api/assetImage to resize without tipping over the function limit…
…oh! I should have the webpage render at different sizes! Yeah that's a great idea lol
I wasn't sure how to fill the space for items that are fully modeled, then realized some basic at-a-glance "who does this fit" would help!
The load time isn't great, I think I need to break out that dependent subquery, but maybe the stale-while-revalidate will cover it well enough at first.
To make this fast, I had to tweak the GraphQL resolver a bit to run a filtered version of the query for `newestItems` instead of scanning the full database! But yeah, looking good!
I think I'm gonna want to swap out "Fully modeled" for some insight about who it fits
Huh, weird. So I reversed the manifest, because you want to get the *last* movie. And I figured that semantic probably extended to PNGs and SVGs too?
But actually, PNGs sometimes have *other* PNGs in the manifest that aren't the relevant asset at all, and are just reference art.
Again, I'm really not sure what the underlying semantic is here? Does the Neopets customizer just display them all, and for the items with this problem, they happen to layer in a way that's not broken?? I would really like to not do that, and I would really like to know the real semantic, but I can't find it >.>
So um, I'm going ahead and using the best semantic that solves the problems I know about? Which is, use the last movie, and use the first PNG. Fingers crossed lol!
I also didn't test this change extensively, because I'm on a train lol
I'm just trusting that this push will be better than what we had before. I tested it on the Dandan MME, which has two JS files, and it took the latter; and the Pathway of Petals Background, which has two PNG files, and it took the former. Success? 😬🤞
Huh, so it turns out sometimes the manifest will include old broken conversion attempts!
This fixed the "MiniMME18-S2c: Holomorphic Foliage and Dandan Set", the "Electric Dress" on various species (incl. Aisha), and yeah!
What an interesting discovery 😂
Ah right okay, when the `ItemSearchResultV2` doesn't have an `id`, Apollo Cache isn't quite so strong about caching conflicting-y fields, like the different parameterizations of `items`.
With this change, we give the search result object an ID, which helps Apollo cache more confidently!
It's just a serialization of the relevant search fields 😅
That's the last itemSearch call site! I'll probably keep it up for other clients for a while though, esp since it doesn't depend on any additional loaders or anything, it's pretty small overall
Updated the comments to reflect this, and also remembered to make them real docstrings lol!
The main change is that we restructure the query, so that only the parts that are actually affected by pagination depend on those variables!
This will enable the Apollo Cache to trivially cache and show `numTotalItems` while waiting for other pages to load.
Oops, I pulled `currentUserId` from the wrong place, so it was always acting as if you're logged in! Now, you can see the list page for your own private list!
There are a couple spots where we parse SWF URLs to get the ID out! Most visibly, our Support tools were crashing on it. And internally, manifest loading wasn't working. (I'm not sure if this got caught or if it caused crashes in user space? I didn't see them when wearing a failing item)
Anyway, fixed now!
This was a known oversight, that I've finally fixed because I realized this subquery probably would be just fine lol!
Now, instead of removing rows with _all_ species modeled, we remove rows with all species _for that color_ modeled.
This leaves the rest of the modeling list unchanged, but removed 10 Maraquan items that were done modeling but still on the list:
- Dyeworks Coral: Maraquan White Beaded Gown
- Dyeworks Green: Maraquan White Beaded Gown
- Dyeworks Lavender: Maraquan White Beaded Gown
- Dyeworks Purple: Maraquan Wig with Negg Accessory
- Dyeworks Lavender: Maraquan Sea Blue Gown
- Dyeworks Pink: Maraquan Sea Blue Gown
- Dyeworks Silver: Maraquan Sea Blue Gown
- Maraquan White Lace Gown
- Underwater Maraquan Markings
(I also went in the database and marked the "Maraquan Ocean Blue Contacts" with the `modeling_status_hint = "done"`, because it's not compatible with Lutari.)
Oops, right, I forgot for a while that GraphQL fields have a special syntax for docstrings, and it's not just comments! This will help stuff show up in our GraphQL Playground API docs correctly 🥰
I'm not sure which image url is better to return from stuff like this, and I don't actually have a use case for it anymore, so let's just clear it out until we need something like it!
Oops, we get a _lot_ of outfit image requests, and it's pushing the limits of our free Honeycomb plan! But I don't really need all that much detail, because there's so many.
So, we here apply sampling! `api/outfitImage` is getting a 1/10 rate, and for GraphQL, `ApiOutfitImage` is getting 1/10, and `SearchPanel` is getting 1/5.
I had to add a `addTraceContext` call, to give all the child events awareness of what operation they're being called in, too!
I haven't actually tested that this is working-working, just that the endpoints still return good data. We'll see how it shakes out in prod!
But I did add `console.log(sampleRate, shouldSample, data);` to the `samplerHook` briefly, to see the data flow through, and I reloaded a `SearchPanel` request a few times and observed a plausibly 20% success rate.
We've been serving images directly from `impress-asset-images.s3.amazonaws.com` for a long time. While they serve with long-lasting HTTP cache headers, and the app requests them with the `updated_at` timestamp in the query string; each GET request still executes a full S3 ReadObject operation to get the latest version.
In the past, this was only relevant to users on Image Mode, not Flash Mode. But now that everyone's on Image Mode, this matters a lot more!
Now, we've configured a Fastly host at `impress-asset-images.openneo.net`, to sit in front of our S3 bucket. This should dramatically reduce the GET requests to S3 itself, as our cache warms up and gains copies of the most common asset PNGs.
That said, I'm not sure how much actual cost impact this change will have. Our AWS console isn't configured to differentiate cost by bucket yet—I've started this process, but it might take a few days to propagate. All I know is that our current costs are $35/mo data transfer + $20/mo storage, and that outfit images are responsible for most of the storage cost. I hypothesize that `impress-asset-images` is responsible for most of the reads and data transfers, but I'm not sure!
In the future, I think we'll be able to bring our AWS costs to near-zero, by:
- Obsolete `impress-asset-images`, by using the official Neopets PNGs instead, after the HTML5 conversion completes.
- Obsolete `impress-outfit-images`, by using a Node endpoint to generate the images, fronted by a CDN cache. (Transfer the actual data to a long-term storage backup, and replace the S3 objects with redirects, so that old S3 URLs will still work.)
I hope this will be a big slice of the costs though! 🤞
(Note: I'll be deploying this on a bit of a delay, because I want to see the DNS propagate across the globe before flipping to a new domain!)
Before this, if you made a change while the outfit was auto-saving, it would reset your changes back and forth in an infinite loop, oops!
This was because the response from the save would reset the outfit state to match, but the _debounced_ outfit state would still show the user's changes, so we'd trigger another save. And then the same thing would happen in reverse, and back and forth again!
Hope it actually work-works lol
Did some refactors in useOutfitState to support the new reset action we do after auto-saving, in case the server tweaked things like the name.
We move to an actual GQL query, instead of approximating with /api/validPetPoses.
Notable changes are omitting glitched states from UNKNOWN, so we don't prompt Support users to fill in missing states with bad states; and omitting glitched states from standard, so that we _do_ prompt Support users to check UNKNOWN states for new _non-glitched_ versions we can start to use.
Just a basic e2e starting point! Simple logic, with simple gates to prevent saving outfits we're not ready for. Safe to ship, despite being very incomplete!
This is a glitchy state that pets can get into! `spankaroonie` is an example, at time of writing.
Before, we would crash on loading downstream fields for the pet's color. Now, we don't! We also fix an oversight in the pet's `petAppearance` field, to trigger the "not yet modeled" error when the pet type doesn't exist.
Some of the "MiniMME11-S1: Approaching Eventide Skirt" visuals are pretty clearly glitched on TNT's end, like the Jubjub, which just has a single flat version of the dress floating in the corner of the screen.
This is a message to make that case even clearer!
I'm applying this to the "MiniMME11-S1: Approaching Eventide Skirt" on the Acara, which seems to load all 1000 images from the manifest, but then show no animation and no errors. Not sure what's up, and not inclined to deep-debug until we have a check on whether it works on-site!
Huh, so apparently the "MiniMME11-S1: Approaching Eventide Skirt" on the Acara has 1000 layer images lol.
This caused the manifest string to overflow the MySQL `TEXT` field, and fail to parse as valid JSON when loading it back for the client.
I've updated the database to use `MEDIUMTEXT` instead, and added a warning message & skip behavior when the manifest size would exceed the database limit, and added graceful error handling for the invalid JSON scenario. Now, we don't crash, and the data self-repairs, and keeps in a better state in the first place!
But I'm also worried about this asset, it doesn't play correctly anyway, and I'm not sure if that's an overload on our end, or just a flat problem in the JS. (There's no error message on the client, it just… loads all the layers, then shows no play button, seemingly self-satisfied.)
This was causing a bug where unlabeled poses would cause pet lookups to fail!
Now, we return the actual pet appearance for the pet, if we have it, by matching against asset IDs first.
I'm setting up the app in a fresh box, and I noticed that the Auth0 credentials are an immediate crasher if not present. That doesn't seem ideal to me for something only used in support actions! I'd rather just have that support mutation crash, if we happen to call it.
Oh right, our preview deploy was loading the prod allWakaValues data! Now it uses the VERCEL_URL env variable to request from the current deployment instead.
One day is too long! I'd prefer 1min for just the value itself, but I don't want to bog down all the other metadata with it, it's not _essential_ for it to be faster.
I wanted the ability to clear out closet list text for Support users, and figured I should just build the UI for end users too, and grant Support users the same access!
I narrowed down the problem to the fact that we were joining in pet types against assets, and *then* running GROUP and DISTINCT and everything. Assets x compatible species/color pairs is a LOT of rows!
Here, we instead get all the relevant body IDs first, and *then* match them against pet types—which we fetch in one batch to match body to canonical species/color.
I'm also trashing the weird caching mechanism we did here, because in practice it doesn't seem reliable anyway. If anything, I'd want to look at stronger CDN caching. (I made a small improvement to the caching annotation, but ultimately it still doesn't matter, because this query uses logged-in stuff and always comes out max-age=0 anyway.)
This helps with items like "Living in Watermelon Foreground and Background", which has a species-specific foreground and bodyId=0 background.
With this flag set on the background, it won't appear for pets that don't _also_ have something else that fits. In this case, it hides it from Standard Vandas, and all non-standard colors.
There's some hacky limitations here: the item page still highlights the Vanda, even though clicking gives nothing; and the zone info for it is messy too, with the Background claiming to fit all species, and the LFI claiming to fit 54 specific species. But those don't seem important enough to code for!
The other day, I deleted what was apparently a load-bearing glitch row, lol 😂
We had a row in pet_types that somehow had `body_id = 0`. And I guess that was causing this query to return some species, even though that body has no species.
Here, I'm adding support for the special `representsAllBodies` body's species to be null. The client seems chill with it, we weren't using that property in that situation anyway!
I think this will be generally useful to minimize switching around for common operations, but also I'm thinking of building a bulk assign tool for things with broken body IDs, and this will be the place for it to live, I think
Oops, we weren't doing a good job encapsulating the different conditions in item search. The `OR` in the NC condition was causing a precedence problem!
Now, we wrap all the conditions in parens at the interpolation site, to make it really clear that they all need to be made safe like that!
Now, there's not a bunch of "??????" entries in NC search, oops 😅
This was one more bit that needed fixing for "Flying in an Airplane": it wasn't just the official SVG that was incorrect, but also the official SWF. So our converted PNG was also incorrect!
Here, we now try to use the official Neopets PNG when the manifest provides it, instead of our own.
Inspired by the "Flying in an Airplane" bug (item 82287), where the official SVG (and I think SWF) were visually glitched and included both zones in the image, but the official PNG was correct.
This flag lets us use the PNG, like the official player does—but only for this item, while still keeping SVGs for everyone else!
I'm gonna extend `itemSearch` to also look up the total number of results, and the fragmentation between `itemSearch` and `itemSearchToFit` finally caught up with me :p
I've deprecated `itemSearchToFit`, and moved the fit parameters into a new optional `fitsPet` parameter for `itemSearch`.
I'm going to keep `itemSearchToFit` around for now, because old JS builds still use it, and I'd like to avoid disrupting folks. But I'm not going to add the new total results field to the results object it returns, and that's gonna be okay!
I haven't been keeping these up to date, so at this point they're more overhead than they're worth.
Helpful in the early days when we were iterating fast and making more mistakes, but now we're more solid (and I learned how to just resend queries from devtools :p)
Oops, the new `canonicalAppearance` arguments couldn't handle being omitted rather than being null, due to a low-level SQL call site that cares about the difference.
This meant that loading an item page in an old tab, with an old copy of our JS, could cause a crash.
Now, the backend will be okay with queries from old pages, and respond the same as before!
This isn't a huge deal, because once everyone is on the new JS we won't send queries without this parameter anyway… but I like my optional arguments to actually BE optional, without surprises lurking >.>
Ok cool, so apparently another win we get from using `ts-node` is that I can finally easily use some non-native-Node features like ES module import syntax, for consistency with what I'm doing in the main app source! That was getting on my nerves tbh. Ooh I bet I can finally use `?.` too, I've had to rewrite that a bunch…
Oh yay, I'm pleased with this! I hope it works out well!
stale-while-revalidate is an HTTP caching feature that gives us the ability to still serve relatively static content like item pages ASAP, while also making sure users generally see updates quickly.
The trick is that we declare a period of time where, you can still serve the data from the cache, but you should _then_ go re-fetch the latest data in the background for next time. This works on end users and on the CDN!
I've scanned the basic wardrobe and homepage stuff and brought them up-to-date, and gave particular attention to the item page, which I hope can be very very snappy now! :3
Note to self: Vercel says we can manually clear out a stale-while-revalidate resource by requesting it with `Pragma: no-cache`. I'm not sure it will listen to us for _fresh_ resources, though, so I'm not sure we can actually use that to flush things out in the way I had been hoping until writing this sentence lol :p
Trying to get that Item page fast!
I don't really want to ship this as-is, because I'd really like to get stale-while-revalidate working before shipping a 1-week cache timer… will be tricky though!
So, we've been behind the latest conversion data for a while anyway, because I don't run the sync very often. But I ran the sync today and noticed that the newly converted SVGs weren't showing up in DTI!
This is because TNT changed the asset manifest structure they use for SVG-only assets. Now, we support both!
To test, I checked the Blue Acara (old-style SVG manifest), the Blue Chia (new-style SVG manifest), and the Floating Negg Faerie Doll (animated clip).