Weird! Well! This caused two issues.
One is that we used to filter down to only assets whose urls end in
`.swf`, because I never added support for the sound ones :p
The other is that we were using the SWF URL to infer the potential
manifest URLs, which we can't do when the SWF URL is a weird
placeholder!
Thankfully, just yesterday (wow!) we happened to have Classic DTI keep
track of `manifest_url` during the modeling process, and we backfilled
everything. So the most recent `temp` assets have their manifest! But
slightly older ones (not that old tho!!) didn't, so I manually inferred
the manifest URL from the asset's `remote_id` field instead (which
worked cuz there was no hash component to the manifest URL, unlike some
manifests).
The item "Flowing Black Cloak" (86668) on the Zafara is the one we were
looking at and testing with!
Lol I guess we've probably just been having intermittent CORS issues
forever, oops. I hope the cache has been mostly warmed on the right
thing! But today I checked and the entire species/color picker on the
Rails app wasn't working for CORS reasons, so like. Yeah oof.
If you delete a closet list without deleting the items in it, it'll dump them into your default own/want lists. This isn't necessarily the behavior we really want, but oh well, that's the truth right now!
The GraphQL endpoint was assuming that all hanger `list_id` values were valid, oops! Now, we ignore list IDs that don't match a list.
Oh beans, I made an env variable change that I thought would switch us over to db.impress.openneo.net, and I was just plain wrong, darn. impress-2020 has been fully failing since then, oops!!!
Here, we change it to be an environment variable, so that in the future it will work how I want it to lol
Oops, we added behavior that varies the CORS response headers according to the incoming `Origin` header, but we forgot to add `Vary: Origin`!
This doesn't cause an issue for the app when you make requests to the server directly, but since it's behind a Fastly cache layer, we ended up caching responses that didn't include CORS headers but should have.
Now, this will instruct the Fastly cache to treat requests with different `Origin` headers as being entirely different. (This means we won't be sharing caches between requests from impress-2020 and the Rails app anymore, but that should be okay in practice!)
This was Dice's idea ty!! Now, instead of crashing when looking up any pet whose name starts with a number, we do a clever lil workaround instead! We don't get *all* the data back (we're missing metadata), but that's fine for the main use case of typing your pet name in at the homepage.
Sigh, I guess xmlrpc was deleted? Back to the method that doesn't work for pets with leading digits in their names, sobbe
Dice has a neat idea for how to work around that, but I'm not sure how to fit it in our architecture, let's take a look!
We do a thing where we sometimes proactively update an appearance layer's manifest from images.neopets.com when it's been a while since the last time, _during_ user requests.
But when images.neopets.com is being slow, this makes our API requests about appearances super slow, too!
In this change, we add a 2-second timeout to those requests. That should be plenty for when images.neopets.com is in a good mood, but also give up fast enough for the site to not feel miserable lol :p (especially when the use "Use DTI's image archive" option is on!)
Okay, this is gonna be a drop-in new backend for impress-asset-images.openneo.net, to enable Classic DTI to use the same images as DTI 2020!
This will enable us to stop generating images and uploading them to S3 just for Classic's sake, so we can turn those background processes off! And the new modeling script skips that anyway, so this is an important compatibility step for the new data that went out today!
We're gonna update impress-asset-images.openneo.net to perform redirects and stuff, so Classic DTI can start using the same images that DTI 2020 does.
That should enable us to stop relying on AWS for images, which is important because the new modeling script breaks that anyway :p but this will also let us turn off the image converters that run in the background all the time, and I'm excited for that too!
It seems to be working!! How exciting!! I'm just letting it run on stuff now :3
One important issue is that Classic DTI doesn't show images for items modeled this way, because we don't download the SWFs for it. But I wanna update it to stop using AWS anyway and do the same stuff 2020 does, I think we can do that pretty sneakily!
Ok so, I kinda assumed that the query engine would only compute `all_species_ids_for_this_color` on the rows we actually returned, and it's a fast subquery so it's fine. But that was wrong! I think the query engine computing that for _every_ item, and _then_ filter out stuff with `HAVING`. Which makes sense, because the `HAVING` clause references it, so computing it makes sense!
In this change, we inline the subquery, so it only gets called if the other conditions in the `HAVING` clause don't fail first. That way, it only gets run when needed, and the query runs like 2x faster (~30sec instead of ~60sec), which gets us back inside some timeouts that were triggering around 1 minute and making the page fail.
However, this meant we no longer return `all_species_ids_for_this_color`, which we actually use to determine which species are _left_ to model for! So now, we have a loader that also basically runs the same query as that condition subquery.
A reasonable question would be, at this point, is the `HAVING` clause a good idea? would it be simpler to do the filtering in JS?
and I think it might be simpler, but I would guess noticeably worse performance, because I think we really do filter out a _lot_ of results with that `HAVING` clause—like basically all items, right? So to filter on the JS side, we'd be transferring data for all items over the wire, which… like, that's not even the worst dealbreaker, but it would certainly be noticed. This hypothesis could be wrong, but it's enough of a reason for me to not bother pursuring the refactor!
We now support returning `null` from `petAppearance` when a pet genuinely has no customization data.
We also deprecate some old fields, and update our own call site to match.
Always been a bit annoyed to have even the item name load in so weird and slow 😅 this fixes it to come in much faster!
This also allows us to SSR the item name in the page title, since we've put it in the GraphQL cache at SSR time!
Whew, setting up a cute GraphQL SSR system! I feel like it strikes a good balance of not having actually too many moving parts, though it's still a bit extensive for the problem we're solving 😅
Anyway, by doing SSR at _all_, we solve the problem where Next's "Automatic Static Optimization" was causing problems by setting the outfit state to the default at the start of the page load.
So I figured, why not try to SSR things _good_?
Now, when you navigate to the /outfits/new page, Next.js will go get the necessary GraphQL data to show the image before even putting the page into view. This makes the image show up all snappy-like! (when images.neopets.com is behaving :p)
We could do this with the stuff in the items panel too, but it's a tiny bit more annoying in the code right now, so I'm just gonna not worry about it and see how this performs in practice!
This change _doesn't_ include making the images actually show up before JS loads in, I assume because our JS code tries to validate that the images have loaded before fading them in on the page. Idk if we want to do something smarter there for the SSR case, to try to get them loading in faster!
Specifically, we rename the dev database to `openneo_impress` for consistency with the main app, and create a second schema file for `openneo_id`, so we can do local account creation.
I decided that `setAuthToken` was the more appropriate server-level API, and that letting the login/logout mutations engage with the auth library made more sense to me!
I forgot that we sometimes use the Apollo server in a context where `req` and `res` aren't present! (namely in our `build-cached-data` script.)
In this change, we update the DTI-Auth-Mode HTTP header check to be cognizant that the request might be absent!
Hey hey, logging out works! The server side of this was easy, but I made a few refactors to support it well on the client, like `useLoginActions` becoming just `useLogout` lol, and updating how the nav menu chooses between buttons vs menu because I wanted `<LogoutButton />` to contain some state.
We also did good Apollo cache stuff to update the page after you log in or out! I think some fields that don't derive from `User`, like `Item.currentUserOwnsThis`, are gonna fail to update until you reload the page but like that's fine idk :p
There's a known bug where logging out on the Your Outfits page turns into an infinite loop situation, because it's trying to do Auth0 stuff but the login keeps failing to have any effect because we're in db mode! I'll fix that next.
Yeah cool the login button seems to. work now? And subsequent requests serve user data correctly based on that, and let you edit stuff.
I also tested the following attacks:
- Using the wrong password indeed fails! lol basic one
- Changing the userId or createdAt fields in the cookie causes the auth token to be rejected for an invalid signature.
Tbh that's all that comes to mind… like, you either attack us by tricking the login itself into giving you a token when it shouldn't, or you attack us by tricking the subsequent requests into accepting a token when it shouldn't. Seems like we're covered? 😳🤞
Still need to add logout, but yeah, this is… looking surprisingly feature-parity with our Auth0 integration already lmao. Maybe it'll be ready to launch sooner than expected?
Right, I had that idea while writing the comment, then forgot to actually do it lmao
This is important for session expiration: we don't want you to be able to hold onto an old cookie for an account that you should be locked out of. Updating the `createdAt` value requires a new signature, so the client can't forge when this token was created, so we can be confident in our ability to expire them.
Okay so one of the trickiest parts of login is done! 🤞 and now we need to make it actually show up in the UI. (and also pressure-test the security a bit, I've only really checked the happy path!)
Hey nice it looks like it's working! :3 "Bright Speckled Parasol" is a nice test case, it has a long text string! And when the NC value is not included in the ~owls list, we indeed don't show the badge!
Hey finally! I got in the mood and did it, after… a year? idk lol
The button should only appear for outfits that are already saved, that are owned by you. And the server enforces it!
I also added a new util function to give actually useful error messages when the GraphQL server throws an error. Might be wise to use this in more places where we're currently just using `error.message`!
Uhhh I guess I never added the check that the outfit you're editing is your own? Embarrassing.
I don't have any reason to believe anyone abused this, but 😬! Good to have fixed now!
Tbh I didn't even really validate these changes, or that the codepaths right now aren't working, they just seem like clear drop-in upgrades now that HTTPS works and HTTP requests are redirected. Simplify!
Right now it returns 50 rows; each item that needs modeling returns 1–4 rows, usually 1. So a limit of 200 should be pretty dangerous, while also creating a release valve if there's another future bug: it'll just have the problem of returning too few items, instead of the problem of crashing everything! 😅
Neopets released a new Maraquan Koi, and it revealed a mistake in our modeling query! We already knew that the Maraquan Mynci was actually the same body type as the standard Mynci colors, but now the Koi is the same way, and because there's _two_ such species, the query started reacting by assuming that a _bunch_ of items that fit both the standard Mynci and standard Koi (which is a LOT of items!!) should also fit all _Maraquan_ pets, because it fits both the Maraquan Mynci and Maraquan Koi too. (Whereas previously, that part of the query would say "oh, it just fits the Maraquan Mynci, we don't need to assume it fits ALL maraquan pets, that's probably just species-specific.")
so yeah! This change should help the query ignore Maraquan species that have the same body type as standard species. That's fine to essentially treat them like they don't exist, because we won't lose out on any modeling that way: the standard models will cover the Maraquan versions for those two species!
Ah okay, pools support `query` and `execute` the same way connection objects do (as a shorthand for acquiring, querying, and releasing), but it doesn't have the same helpers for transactions. Makes sense: you need those queries to go to the same connection, and an API where you just call it against the pool object can't tell that it's part of the same thing!
Now, we have our transaction code explicitly acquire a connection to use for the duration of the transaction.
An alternative considered would have been to have `connectToDb` acquire a connection, and then release it at the end of the GraphQL request. That would have made app code simpler, but added a lot of additional potential surprise failure points to the infra imo (e.g. what if we're misunderstanding the GraphQL codepath and the connection never gets released? whereas here it's relatively easy to audit that there's a `finally` in the right spot.)
This should both fix cases where the connection closes for various reasons, by having the pool reconnect; and also should be a second way of solving some of the blocking issues we were having with large queries, by letting faster queries use parallel connections.
Idk what a reasonable number is, 10 seems to be what various guides are saying? Might tune it down if it ends up pushing various connection limits? (We could also constrain it on dev specifically, if that matters.)
I hypothesize that loading people's full trade lists more often than necessary is part of the cause of the recent mega slowdown!
My hypothesis is that we're clogging up the MySQL connection socket with a ton of data, which blocks all other queries until the big ones come through and parse out. (I haven't actually validated my assumption that MySQL connections send query results in serial like that, but it makes sense to me, and fits what I've been seeing.)
There's more places we could potentially optimize, like the trade list page itself… (we currently aggressively load everything when we could limit it and load the rest on the followup pages, or even paginate the followup pages…)
…but my hope is that this helps enough, by relieving the load on the homepage (latest items) and on item searches!
Oooh this feature is feeling very nice :) :) We hid "not in a list" pretty smoothly I think!
A known bug: If you have the item in a list, then click the big colorful button, it will remove the item from *all* lists; and then if you click it again, it will add it to Not in a List. But! The UI will still show the lists it was in before, because we haven't updated the client cache. (It's not that bad in the middle state though, because the list dropdown stuff gets hidden.)
I don't think this is actually relevant in-app right now, but I figured sending it is More Correct, and is likely to prevent future bugs if anything (and prevent future question about why we're _not_ sending it).
I also removed the `maxAge: 0` on `currentUser`, now that I've updated Fastly to no longer default to 5-minute caching when no cache time is specified. I can see why that's a reasonable default for Fastly, but we've been pretty careful about specifying Cache-Control headers when relevant, so the extra caching is mostly incorrect.
If the user is searching for things they own or want, make sure we don't CDN cache it!
For many queries, this is taken care of in practice, because the search result includes `currentUserOwnsThis` and `currentUserWantsThis`. But I noticed in testing that, if the search result has no items, so those fields aren't actually part of the _response_, then the private header doesn't get set. So this mainly makes sure we don't accidentally cache an empty result from a user who didn't have anything they owned/wanted yet!
Some queries, like on `/your-outfits`, had the cache hint `max-age=0, private` set. In this case, our cache code sent no cache header, on the assumption that no header would result in no caching.
This was true on Vercel, but isn't true on our new Fastly setup! (Which makes sense, Vercel was a bit more aggressive here I think.)
This was causing an arbitrary user's data to be cached by Fastly as the result for `/your-outfits`. (We found this bug before launching the Fastly cache though, don't worry! No actual user data leaked!)
Now, as of this change, the `/your-outfits` query correctly sends a header of `Cache-Control: max-age=0, private`. This directs Fastly not to cache the result.
To fix this, we made a change to our HTTP header code, which is forked from Apollo's stuff.
Comments explain most of this! Vercel changed around the Cache-Control headers a bit to always essentially apply max-age:0 when scope:PRIVATE was true.
I'm noticing this isn't *fully* working yet though, because we're not getting a `Cache-Control: private` header, we're just getting no header at all. Fastly might aggressively choose to cache it anyway with etag stuff! I bet that's the fault of our caching middleware plugin thing, so I'll check on that!
Hmm, I see, Vercel chews on Cache-Control headers a bit more than I'm used to, so anything marked `scope: PRIVATE` would not be cached at all.
But on a more standard server, this was coming out as privately cacheable, but for an actual amount of time (1 hour in the homepage case), because of the `maxAge` on other fields. That meant the device browser cache would hold onto the result, and not always reflect Own/Want changes upon page reload.
In this change, we set `maxAge: 0`, because we want this field to be very responsive. I also left `scope: PRIVATE`, even though I think it doesn't really matter if we're saying the field isn't cacheable anyway, because I want to set the precendent that `currentUser` fields need it, to avoid a potential gotcha if someone creates a cacheable `currentUser` field in the future. (That's important to be careful with though, because is it even okay for logouts to not clear it? TODO: Can we clear the private HTTP cache somehow? I guess we would need to include the current user ID in the URL?)