Ok so, I kinda assumed that the query engine would only compute `all_species_ids_for_this_color` on the rows we actually returned, and it's a fast subquery so it's fine. But that was wrong! I think the query engine computing that for _every_ item, and _then_ filter out stuff with `HAVING`. Which makes sense, because the `HAVING` clause references it, so computing it makes sense!
In this change, we inline the subquery, so it only gets called if the other conditions in the `HAVING` clause don't fail first. That way, it only gets run when needed, and the query runs like 2x faster (~30sec instead of ~60sec), which gets us back inside some timeouts that were triggering around 1 minute and making the page fail.
However, this meant we no longer return `all_species_ids_for_this_color`, which we actually use to determine which species are _left_ to model for! So now, we have a loader that also basically runs the same query as that condition subquery.
A reasonable question would be, at this point, is the `HAVING` clause a good idea? would it be simpler to do the filtering in JS?
and I think it might be simpler, but I would guess noticeably worse performance, because I think we really do filter out a _lot_ of results with that `HAVING` clause—like basically all items, right? So to filter on the JS side, we'd be transferring data for all items over the wire, which… like, that's not even the worst dealbreaker, but it would certainly be noticed. This hypothesis could be wrong, but it's enough of a reason for me to not bother pursuring the refactor!
We now support returning `null` from `petAppearance` when a pet genuinely has no customization data.
We also deprecate some old fields, and update our own call site to match.
The item page got in a weird situation where this setting seemed to cause loading things _about_ `currentUser` to make the `useCurrentUser` GQL query flake out and say it's loading. This would make the layout bounce around a bunch while it tries to decide whether to show you the buttons or not.
I still don't love the UX of not having any loading state after login, but like… eh it's certainly better lmao
I'm not gonna do SSR here, the pages aren't really designed for partial loading state yet. It could be done, but I'm too sleepy! And it's too much refactor at once.
Always been a bit annoyed to have even the item name load in so weird and slow 😅 this fixes it to come in much faster!
This also allows us to SSR the item name in the page title, since we've put it in the GraphQL cache at SSR time!
Whew, setting up a cute GraphQL SSR system! I feel like it strikes a good balance of not having actually too many moving parts, though it's still a bit extensive for the problem we're solving 😅
Anyway, by doing SSR at _all_, we solve the problem where Next's "Automatic Static Optimization" was causing problems by setting the outfit state to the default at the start of the page load.
So I figured, why not try to SSR things _good_?
Now, when you navigate to the /outfits/new page, Next.js will go get the necessary GraphQL data to show the image before even putting the page into view. This makes the image show up all snappy-like! (when images.neopets.com is behaving :p)
We could do this with the stuff in the items panel too, but it's a tiny bit more annoying in the code right now, so I'm just gonna not worry about it and see how this performs in practice!
This change _doesn't_ include making the images actually show up before JS loads in, I assume because our JS code tries to validate that the images have loaded before fading them in on the page. Idk if we want to do something smarter there for the SSR case, to try to get them loading in faster!
Okay so there's a bug here where navigating directly to /outfits/new?species=X&color=Y will reset to a Blue Acara, because Next.js statically renders the Blue Acara on build, and then rehydrates a Blue Acara on load, and then updates the real page query in—and our state management for outfits doesn't *listen* to URL changes, it only *emits* them.
It'd be good to consider like… changing that? It's tricky because our state model is… not simple, when you consider that we have both local state and URL state and saved-outfit state in play. But it could be done! But there might be another option too. I'll take a look at this after moving the home page, which will give me the chance to see what the experience navigating in from there is like!
Hey we're getting real close! :3
I accepted a small bug here where clicking the breadcrumb to "Items X wants" won't scroll down if the page isn't loaded yet. (e.g. you landed on this list page first). If you came *from* the lists index page though, then when you go back your stuff will be there already, so you should be fine. (It might also happen if the page loads fast enough, which in prod it might do?)
Just gonna leave it for now, because the workaround would be a lot! (have the page re-check the anchor once it's done loading)
We're getting so close! :3
There was some shared components in `UserItemListPage` that needed updated too, even though the rest of the page isn't migrated yet.
Okay I actually screwed up the layouts thing a bit! Because right, they need to *share* a LayoutComponent in order to share the UI across the pages. This gets a bit tricky with wanting to change the margin, too. I'll address this with an upcoming refactor!
Thankfully this wasn't a crasher since it was already in a try/catch, but it was logging a failure that didn't need to be logged! If `localStorage` isn't available (e.g. SSR), just use the initial value.
The tricky part here was that `returnPartialData` seems to behave differently during SSR. On the page itself, this seems to cause us to always get back at least an empty object, but in SSR we can sometimes get null—which means that a LOT of code that expects the item object to exist while in loading state gets thrown off.
To keep this situation maximally clear, I added a bunch of null handling with `?.` to `ItemPageLayout`. An alternative would have been to check for null and put in an empty object if not, but this feels more resilient and more true to the situation.
The search bar here is a bit tricky, but is pretty straightforwardly adapted from how we did the layouts in App.js. Fingers crossed that it works as smoothly as expected when the search page is migrated too! (Right now typing in there is all messy because it hops over to the fallback route and does its whole separate thing.)
This one is a bit trickier, because it doesn't use a page layout, and we had to make some fixes in OutfitMovieLayer! Nice to get a head-start on that though :3
The first page moved over! Note that this broke navigation on the rest of the app, so don't deploy this until we're done!
The reason it broke was that we had to migrate GlobalHeader and GlobalFooter to the Next.js link & router stuff too, or else it crashed because it wasn't in a react-router-dom context.
Just sorta idly poking at what it would take to make our Next setup a bit more normal. To start, I'm putting things in more of the normal place, and eyeing what it would take to switch to Next's built-in routing! So now `App.js` is pretty much entirely a routing file, potentially to be deleted once we move 🤔
I didn't notice that there was another place where "auth0" was set as the default, oops!
This caused logins to fail, because the cookie would be set, but the request wouldn't be sent with the correct `DTIAuthMode` header. So you'd stay logged in with your auth0 credentials, even though the UI was using your db cookies. Or something? Idk weird state mismatch.
Point being, fixed now!
Still leaving the toggle so users can hop back out to Auth0 if it turns out we broke stuff on 'em, but yeah I haven't heard anything bad from the experiment at all, and I think we don't need to bother with the gradual rollout! Let's just go!
We previously had a lil trick to help Cypress perform an Auth0 login without using the whole Auth0 UI. (I forget exactly why!)
With Cypress deleted, we don't need this code anymore!
Specifically, we rename the dev database to `openneo_impress` for consistency with the main app, and create a second schema file for `openneo_id`, so we can do local account creation.
I decided that `setAuthToken` was the more appropriate server-level API, and that letting the login/logout mutations engage with the auth library made more sense to me!
Oh yeah, ok, we don't actually want to evict `currentUser` from the Apollo cache on *any* login mutation. That's both inefficient, and puts our navbar in a loading state that hides the login button and thereby unmounts the login modal, oops!
hey it's been a while! lol
I replaced the friendly long-time Feedback Xwee with a cute chef Kiko, to mix things up a bit and help people notice that the box changed!
I forgot that we sometimes use the Apollo server in a context where `req` and `res` aren't present! (namely in our `build-cached-data` script.)
In this change, we update the DTI-Auth-Mode HTTP header check to be cognizant that the request might be absent!
There was a bug in the new db auth method where `useRequireLogin` was expecting Auth0 logins to work, so it would get caught in an infinite redirect loop.
Rather than trying to figure out how to make `useRequireLogin` work with the new modal UI, I figured we can just delete it (since we only ended up using it once anyway), and add a little message if you happen to end up on the page while logged out. Easy peasy!
Hey hey, logging out works! The server side of this was easy, but I made a few refactors to support it well on the client, like `useLoginActions` becoming just `useLogout` lol, and updating how the nav menu chooses between buttons vs menu because I wanted `<LogoutButton />` to contain some state.
We also did good Apollo cache stuff to update the page after you log in or out! I think some fields that don't derive from `User`, like `Item.currentUserOwnsThis`, are gonna fail to update until you reload the page but like that's fine idk :p
There's a known bug where logging out on the Your Outfits page turns into an infinite loop situation, because it's trying to do Auth0 stuff but the login keeps failing to have any effect because we're in db mode! I'll fix that next.
Yeah cool the login button seems to. work now? And subsequent requests serve user data correctly based on that, and let you edit stuff.
I also tested the following attacks:
- Using the wrong password indeed fails! lol basic one
- Changing the userId or createdAt fields in the cookie causes the auth token to be rejected for an invalid signature.
Tbh that's all that comes to mind… like, you either attack us by tricking the login itself into giving you a token when it shouldn't, or you attack us by tricking the subsequent requests into accepting a token when it shouldn't. Seems like we're covered? 😳🤞
Still need to add logout, but yeah, this is… looking surprisingly feature-parity with our Auth0 integration already lmao. Maybe it'll be ready to launch sooner than expected?
Right, I had that idea while writing the comment, then forgot to actually do it lmao
This is important for session expiration: we don't want you to be able to hold onto an old cookie for an account that you should be locked out of. Updating the `createdAt` value requires a new signature, so the client can't forge when this token was created, so we can be confident in our ability to expire them.
Okay so one of the trickiest parts of login is done! 🤞 and now we need to make it actually show up in the UI. (and also pressure-test the security a bit, I've only really checked the happy path!)
Thinking about longevity, I think I wanna cut Auth0 loose, and just go back to using our own auth.
I had figured at the time that I didn't want to integrate with OpenNeo ID's whole mess, and I didn't want to write a whole new auth system, so Auth0 seemed to make things easier.
But now, it's just kinda a lot to be carrying along an external service as a dependency for login, especially when we've got all the stuff in the database right here. I wanna remove architecture pieces! Get it outta here!
And I'll finally build account creation from the 2020 site while I'm at it, which seemed like it was gonna be a bit of a pain with Auth0 and syncing anyway. (I think at the time I was a bit more optimistic about a full transfer from one system to another, but that's much further off than I realized, and this path will be much better for keeping things in sync.)
Just a cute little way to let us preview it without having to spin up a separate instance of the app or use a feature flag system!
This means we can safely merge and push this to production, without worrying about leaking the feature before the ~owls team signs off.
Hey nice it looks like it's working! :3 "Bright Speckled Parasol" is a nice test case, it has a long text string! And when the NC value is not included in the ~owls list, we indeed don't show the badge!
Hey finally! I got in the mood and did it, after… a year? idk lol
The button should only appear for outfits that are already saved, that are owned by you. And the server enforces it!
I also added a new util function to give actually useful error messages when the GraphQL server throws an error. Might be wise to use this in more places where we're currently just using `error.message`!
Uhhh I guess I never added the check that the outfit you're editing is your own? Embarrassing.
I don't have any reason to believe anyone abused this, but 😬! Good to have fixed now!
Been getting a lot of errors I *think* from folks trying to add OWLS Pricer to Impress 2020 even though it doesn't work here! Reasonable to have happen though! I thought Sentry knew to ignore those, but I guess it doesn't?
In this change, we add some filtering to ignore errors triggered by extensions. This should keep them out of our inbox!
I wasn't able to test this very robustly locally. I'm mostly just crossing fingers!
Tbh I didn't even really validate these changes, or that the codepaths right now aren't working, they just seem like clear drop-in upgrades now that HTTPS works and HTTP requests are redirected. Simplify!
Right now it returns 50 rows; each item that needs modeling returns 1–4 rows, usually 1. So a limit of 200 should be pretty dangerous, while also creating a release valve if there's another future bug: it'll just have the problem of returning too few items, instead of the problem of crashing everything! 😅
Neopets released a new Maraquan Koi, and it revealed a mistake in our modeling query! We already knew that the Maraquan Mynci was actually the same body type as the standard Mynci colors, but now the Koi is the same way, and because there's _two_ such species, the query started reacting by assuming that a _bunch_ of items that fit both the standard Mynci and standard Koi (which is a LOT of items!!) should also fit all _Maraquan_ pets, because it fits both the Maraquan Mynci and Maraquan Koi too. (Whereas previously, that part of the query would say "oh, it just fits the Maraquan Mynci, we don't need to assume it fits ALL maraquan pets, that's probably just species-specific.")
so yeah! This change should help the query ignore Maraquan species that have the same body type as standard species. That's fine to essentially treat them like they don't exist, because we won't lose out on any modeling that way: the standard models will cover the Maraquan versions for those two species!
Ah okay, pools support `query` and `execute` the same way connection objects do (as a shorthand for acquiring, querying, and releasing), but it doesn't have the same helpers for transactions. Makes sense: you need those queries to go to the same connection, and an API where you just call it against the pool object can't tell that it's part of the same thing!
Now, we have our transaction code explicitly acquire a connection to use for the duration of the transaction.
An alternative considered would have been to have `connectToDb` acquire a connection, and then release it at the end of the GraphQL request. That would have made app code simpler, but added a lot of additional potential surprise failure points to the infra imo (e.g. what if we're misunderstanding the GraphQL codepath and the connection never gets released? whereas here it's relatively easy to audit that there's a `finally` in the right spot.)
This should both fix cases where the connection closes for various reasons, by having the pool reconnect; and also should be a second way of solving some of the blocking issues we were having with large queries, by letting faster queries use parallel connections.
Idk what a reasonable number is, 10 seems to be what various guides are saying? Might tune it down if it ends up pushing various connection limits? (We could also constrain it on dev specifically, if that matters.)
I hypothesize that loading people's full trade lists more often than necessary is part of the cause of the recent mega slowdown!
My hypothesis is that we're clogging up the MySQL connection socket with a ton of data, which blocks all other queries until the big ones come through and parse out. (I haven't actually validated my assumption that MySQL connections send query results in serial like that, but it makes sense to me, and fits what I've been seeing.)
There's more places we could potentially optimize, like the trade list page itself… (we currently aggressively load everything when we could limit it and load the rest on the followup pages, or even paginate the followup pages…)
…but my hope is that this helps enough, by relieving the load on the homepage (latest items) and on item searches!
This is a bit hacky, but I want to ship and I'm not in a mood for a refactor :P
Before this change, you could see a bug by doing the following:
1. Click "I own this" to own an item.
2. Click "Add a list" and add it to a list.
3. Click "I own this" to un-own the item. (This deletes it from all lists.)
4. Observe that the "Add a list" dropdown disappears.
5. Click "I own this" to own it again.
6. Observe that, before this change, the dropdown would reappear, but incorrectly say it was still in the old list. After this change, it appears with the blank "Add to list", as intended.
Oops, I used the wrong property to control the checkbox state! This made it an uncontrolled component. It would always start unchecked when the page loads, regardless of actual own/want state, and then toggle based on physical clicks.
This meant that things generally worked correctly if you didn't own/want the item when you first loaded the page; but if you already did, then you would click once and send an *add* mutation instead of a remove; and then click again and be able to remove.
Now, removes only take one click!
Oooh this feature is feeling very nice :) :) We hid "not in a list" pretty smoothly I think!
A known bug: If you have the item in a list, then click the big colorful button, it will remove the item from *all* lists; and then if you click it again, it will add it to Not in a List. But! The UI will still show the lists it was in before, because we haven't updated the client cache. (It's not that bad in the middle state though, because the list dropdown stuff gets hidden.)
My cute keybind to quickly wrap stuff in <Box> wasn't working in this file, and I figured out from deleting stuff and narrowing down that it was this comment. I guess Emmet's JSX parser doesn't like a comment being there! I moved it up a bit instead.
Ah hm, not sure why this only showed up once I tried a prod deploy, but I declared `hiResMode` twice in there, because we already had fixed this bug for item layers but not pet layers!
In this change, I fix the duplicate `hiResMode` declaration, and update the new pet layers message to match the item layers message.
Give the grid a fixed size, have the list name stuff get ellipsis when it's too long, and try to show all list names (which will almost certainly too long for the space) to give a better hint of what's in there.
I don't think this is actually relevant in-app right now, but I figured sending it is More Correct, and is likely to prevent future bugs if anything (and prevent future question about why we're _not_ sending it).
I also removed the `maxAge: 0` on `currentUser`, now that I've updated Fastly to no longer default to 5-minute caching when no cache time is specified. I can see why that's a reasonable default for Fastly, but we've been pretty careful about specifying Cache-Control headers when relevant, so the extra caching is mostly incorrect.
We had previously configured the client to not bother to try a GET request for GraphQL queries, and just jump straight to POST instead, because the `vercel dev` server for create-react-app reloaded the backend code for every request anyway, which doubled the dev response time.
The Next.js server is more efficient than this, and keeps some memory, so GET requests work similarly in dev as on prod now! (i.e. it fails the first time, but then succeeds on the second)
In this change, we remove the code to skip `createPersistedQueryLink` in development, and instead always call it. We simplify the code accordingly, too.
If the user is searching for things they own or want, make sure we don't CDN cache it!
For many queries, this is taken care of in practice, because the search result includes `currentUserOwnsThis` and `currentUserWantsThis`. But I noticed in testing that, if the search result has no items, so those fields aren't actually part of the _response_, then the private header doesn't get set. So this mainly makes sure we don't accidentally cache an empty result from a user who didn't have anything they owned/wanted yet!
Some queries, like on `/your-outfits`, had the cache hint `max-age=0, private` set. In this case, our cache code sent no cache header, on the assumption that no header would result in no caching.
This was true on Vercel, but isn't true on our new Fastly setup! (Which makes sense, Vercel was a bit more aggressive here I think.)
This was causing an arbitrary user's data to be cached by Fastly as the result for `/your-outfits`. (We found this bug before launching the Fastly cache though, don't worry! No actual user data leaked!)
Now, as of this change, the `/your-outfits` query correctly sends a header of `Cache-Control: max-age=0, private`. This directs Fastly not to cache the result.
To fix this, we made a change to our HTTP header code, which is forked from Apollo's stuff.
Comments explain most of this! Vercel changed around the Cache-Control headers a bit to always essentially apply max-age:0 when scope:PRIVATE was true.
I'm noticing this isn't *fully* working yet though, because we're not getting a `Cache-Control: private` header, we're just getting no header at all. Fastly might aggressively choose to cache it anyway with etag stuff! I bet that's the fault of our caching middleware plugin thing, so I'll check on that!
Hmm, I see, Vercel chews on Cache-Control headers a bit more than I'm used to, so anything marked `scope: PRIVATE` would not be cached at all.
But on a more standard server, this was coming out as privately cacheable, but for an actual amount of time (1 hour in the homepage case), because of the `maxAge` on other fields. That meant the device browser cache would hold onto the result, and not always reflect Own/Want changes upon page reload.
In this change, we set `maxAge: 0`, because we want this field to be very responsive. I also left `scope: PRIVATE`, even though I think it doesn't really matter if we're saying the field isn't cacheable anyway, because I want to set the precendent that `currentUser` fields need it, to avoid a potential gotcha if someone creates a cacheable `currentUser` field in the future. (That's important to be careful with though, because is it even okay for logouts to not clear it? TODO: Can we clear the private HTTP cache somehow? I guess we would need to include the current user ID in the URL?)
The motivation is that I want VERCEL_URL and local net requests outta here :p and we were doing some cutesiness with leveraging the CDN cache to back the GQL fields. No more of that, folks! lol
There are some places where we use <img> tags where I think it's actually just the right thing to do.
`next/image` is good for image optimization, but I don't think it's worth the proxying for Neopets art images that don't actually always have a higher-res version to begin with.
Idk, maybe the species faces could be a decent choice, but right now we're solving it with `srcSet`, and that's fine. Doesn't seem worth migrating, let's just move on with our lives! 😅
I'm still leaving the lint rule on though, because I think it's a helpful reminder. (I don't think it catches `<Box as="img" />` though, which is a shame, because that would be our natural default in this app!)
Yeah, cool, now we use the `next/image` tag, and our images are showing up again!
There's still lint errors for using bare img tags in some places, but I'm not sure I really care…
This was a fun journey! Turns out Next 12 is using a new faster JS compiler called SWC, which had a compiler bug that triggered here!
The incorrect looping behavior caused `libraryUrl` to sometimes be `null` by the time the movie promise completes, because `layer` was set to whatever the `last` layer in the list had been. https://github.com/swc-project/swc/issues/2624
Anyway, turns out this code has been through a few refactors, and the `async` function wrapper is extraneous now! So I've just deleted it and inlined its code. Ta da! lol
Tweaked some of the default Next.js rules, fixed lint-staged for `next lint`, made a few small easy lint fixes. Feels good!
Note that using the `dirs` option in `next.config.js` was causing `lint-staged` to lint _everything_. That's why I edited `yarn lint` to specify the dirs instead: that way, that command will lint all those dirs, but they won't get included in invocations with `--file`.
There are still a few lint errors left after this commit, because our <img> tags aren't working (@next/next/no-img-element). I'll fix those when we figure out what's wrong with images!
I'm interested in ejecting from Vercel, so I'm trying to get off their proprietary-ish create-react-app + Vercel API thing, and onto Nextjs, which is very similar in shape, but more portable.
I had to disable `craCompat` in `next.config.js` to stop us from crashing on their webpack config, see https://github.com/vercel/next.js/discussions/25858#discussioncomment-1573822
The frontend seems to work at a basic level, but network requests fail, and images don't seem to be working. I'll work on those next!
Note that this commit was forced through despite failing lint checks. We'll need to fix that up too!
Also, after the codemod, I moved `src/pages` to the more canonical location `pages`. Lint tooling seemed surprised to not find a `pages` directory, and I didn't see a config that was making it work correctly in the other location, so I figured it's that Next is willing to check `pages` or `src/pages`? But this is more canonical so yeah!
Even in production, scrolling is a bit slow! This will preload the pagination one click ahead.
There is a bit of a perf downside, in that if you click through the pages too fast, you'll trigger _extra_ requests. I think that's a net win though, and I'm not gonna try to get cleverer than this right now.
My main inspiration for doing this is actually our potentially-huge upcoming Vercel bill lol
From inspecting my Honeycomb dashboard, it looks like the main offender for backend CPU time usage is outfit images. And it looks like they come in big spikes, of lots of low usage and then suddenly 1,000 requests in one minute.
My suspicion is that this is from users with many saved outfits loading their outfit page, which previously would show all of them at once.
We do have `loading="lazy"` set, but not all browsers support that yet, and I've had trouble pinning down the exact behavior anyway!
Anyway, paginating makes for a better experience for those huge-list users anyway. We've been meaning to do it, so here we go!
My hope is that this drastically decreases backend CPU hours immediately 🤞 If not, we'll need to investigate in more detail where these outfit image requests are actually coming from!
Note that I added the pagination to the existing `outfits` GraphQL endpoint, rather than creating a new one. I felt comfortable doing this because it requires login anyway, so I'm confident that other clients aren't using it; and because, while this kind of thing often creates a risk of problems with frontend and backend code getting out of sync, I think someone running old frontend code will just see only their first 30 outfits (but no pagination toolbar), and get confused and refresh the page, at which point they'll see all of them. (And I actually _prefer_ that slightly confusing UX, to avoid getting more giant spikes of outfit image requests, lol :p)
This is a minor change to clear a console warning, and make intended behavior clearer! You're not supposed to pass `null` as a select value, because it's ambiguous about whether you're looking for the first option or to make this an "uncontrolled component".
Here, I now provide a fallback value, which is an explicit string for the placeholder option. I made the string very explicit, to aid in debugging if it somehow leaks out from where it's supposed to be! (But I also added gating in the `onChange` event, just to be extra sure.)
Huh. `flexAlign` isn't a real Chakra style prop, because it's not a real CSS style. I wonder if I meant `alignItems`? Anyway, this was getting passed down to the DOM element and triggering a console warning. Removed!
Oops, our cutesy feature to show an outfit thumbnail sa a placeholder while the rest of the data is loading was making spurious requests!
I put the `skip` in the wrong place 😅
This caused a request to https://outfits.openneo-assets.net/outfits/null/v/NaN/300.png, which would return a 500.
The user wouldn't see anything, because the image wouldn't show because it failed. But it's a mistake, and it's sending extra requests from the client and to the server, and it's a good one to fix!