Right, ok, `db.close()` needs to be `db.end()` now.
This probably didn't break the user-syncing cron job though, because that doesn't automatically update I think? so it should still be comfortably running older version of the code that should still work just fine
Ah okay, pools support `query` and `execute` the same way connection objects do (as a shorthand for acquiring, querying, and releasing), but it doesn't have the same helpers for transactions. Makes sense: you need those queries to go to the same connection, and an API where you just call it against the pool object can't tell that it's part of the same thing!
Now, we have our transaction code explicitly acquire a connection to use for the duration of the transaction.
An alternative considered would have been to have `connectToDb` acquire a connection, and then release it at the end of the GraphQL request. That would have made app code simpler, but added a lot of additional potential surprise failure points to the infra imo (e.g. what if we're misunderstanding the GraphQL codepath and the connection never gets released? whereas here it's relatively easy to audit that there's a `finally` in the right spot.)
This should both fix cases where the connection closes for various reasons, by having the pool reconnect; and also should be a second way of solving some of the blocking issues we were having with large queries, by letting faster queries use parallel connections.
Idk what a reasonable number is, 10 seems to be what various guides are saying? Might tune it down if it ends up pushing various connection limits? (We could also constrain it on dev specifically, if that matters.)
Uhh hmm, I don't remember when we removed it from package.json, I guess
maybe I thought it was unused and didn't look carefully enough?
Anyway, this fixes the export-users-to-auth0 script, which was crashing
because yargs wasn't installed, oops!
I hypothesize that loading people's full trade lists more often than necessary is part of the cause of the recent mega slowdown!
My hypothesis is that we're clogging up the MySQL connection socket with a ton of data, which blocks all other queries until the big ones come through and parse out. (I haven't actually validated my assumption that MySQL connections send query results in serial like that, but it makes sense to me, and fits what I've been seeing.)
There's more places we could potentially optimize, like the trade list page itself… (we currently aggressively load everything when we could limit it and load the rest on the followup pages, or even paginate the followup pages…)
…but my hope is that this helps enough, by relieving the load on the homepage (latest items) and on item searches!
This is a bit hacky, but I want to ship and I'm not in a mood for a refactor :P
Before this change, you could see a bug by doing the following:
1. Click "I own this" to own an item.
2. Click "Add a list" and add it to a list.
3. Click "I own this" to un-own the item. (This deletes it from all lists.)
4. Observe that the "Add a list" dropdown disappears.
5. Click "I own this" to own it again.
6. Observe that, before this change, the dropdown would reappear, but incorrectly say it was still in the old list. After this change, it appears with the blank "Add to list", as intended.
Oops, I used the wrong property to control the checkbox state! This made it an uncontrolled component. It would always start unchecked when the page loads, regardless of actual own/want state, and then toggle based on physical clicks.
This meant that things generally worked correctly if you didn't own/want the item when you first loaded the page; but if you already did, then you would click once and send an *add* mutation instead of a remove; and then click again and be able to remove.
Now, removes only take one click!
Oooh this feature is feeling very nice :) :) We hid "not in a list" pretty smoothly I think!
A known bug: If you have the item in a list, then click the big colorful button, it will remove the item from *all* lists; and then if you click it again, it will add it to Not in a List. But! The UI will still show the lists it was in before, because we haven't updated the client cache. (It's not that bad in the middle state though, because the list dropdown stuff gets hidden.)
My cute keybind to quickly wrap stuff in <Box> wasn't working in this file, and I figured out from deleting stuff and narrowing down that it was this comment. I guess Emmet's JSX parser doesn't like a comment being there! I moved it up a bit instead.
Ah hm, not sure why this only showed up once I tried a prod deploy, but I declared `hiResMode` twice in there, because we already had fixed this bug for item layers but not pet layers!
In this change, I fix the duplicate `hiResMode` declaration, and update the new pet layers message to match the item layers message.
Well, instrumentation seems to be working fine again! The bug we ran into during commit e5081dab7e is gone. Cool!
I want to be able to see what's making the new box slow. My hypothesis was (and it seems to be right) that communication with the database on the Classic DTI server is slow.
But now that they're on the same Linode account and region, I think I can set up a private VLAN to make them muuuch faster. We'll try it out!
Give the grid a fixed size, have the list name stuff get ellipsis when it's too long, and try to show all list names (which will almost certainly too long for the space) to give a better hint of what's in there.
I don't think this is actually relevant in-app right now, but I figured sending it is More Correct, and is likely to prevent future bugs if anything (and prevent future question about why we're _not_ sending it).
I also removed the `maxAge: 0` on `currentUser`, now that I've updated Fastly to no longer default to 5-minute caching when no cache time is specified. I can see why that's a reasonable default for Fastly, but we've been pretty careful about specifying Cache-Control headers when relevant, so the extra caching is mostly incorrect.
We had previously configured the client to not bother to try a GET request for GraphQL queries, and just jump straight to POST instead, because the `vercel dev` server for create-react-app reloaded the backend code for every request anyway, which doubled the dev response time.
The Next.js server is more efficient than this, and keeps some memory, so GET requests work similarly in dev as on prod now! (i.e. it fails the first time, but then succeeds on the second)
In this change, we remove the code to skip `createPersistedQueryLink` in development, and instead always call it. We simplify the code accordingly, too.
If the user is searching for things they own or want, make sure we don't CDN cache it!
For many queries, this is taken care of in practice, because the search result includes `currentUserOwnsThis` and `currentUserWantsThis`. But I noticed in testing that, if the search result has no items, so those fields aren't actually part of the _response_, then the private header doesn't get set. So this mainly makes sure we don't accidentally cache an empty result from a user who didn't have anything they owned/wanted yet!
Some queries, like on `/your-outfits`, had the cache hint `max-age=0, private` set. In this case, our cache code sent no cache header, on the assumption that no header would result in no caching.
This was true on Vercel, but isn't true on our new Fastly setup! (Which makes sense, Vercel was a bit more aggressive here I think.)
This was causing an arbitrary user's data to be cached by Fastly as the result for `/your-outfits`. (We found this bug before launching the Fastly cache though, don't worry! No actual user data leaked!)
Now, as of this change, the `/your-outfits` query correctly sends a header of `Cache-Control: max-age=0, private`. This directs Fastly not to cache the result.
To fix this, we made a change to our HTTP header code, which is forked from Apollo's stuff.
Comments explain most of this! Vercel changed around the Cache-Control headers a bit to always essentially apply max-age:0 when scope:PRIVATE was true.
I'm noticing this isn't *fully* working yet though, because we're not getting a `Cache-Control: private` header, we're just getting no header at all. Fastly might aggressively choose to cache it anyway with etag stuff! I bet that's the fault of our caching middleware plugin thing, so I'll check on that!
Hmm, I see, Vercel chews on Cache-Control headers a bit more than I'm used to, so anything marked `scope: PRIVATE` would not be cached at all.
But on a more standard server, this was coming out as privately cacheable, but for an actual amount of time (1 hour in the homepage case), because of the `maxAge` on other fields. That meant the device browser cache would hold onto the result, and not always reflect Own/Want changes upon page reload.
In this change, we set `maxAge: 0`, because we want this field to be very responsive. I also left `scope: PRIVATE`, even though I think it doesn't really matter if we're saying the field isn't cacheable anyway, because I want to set the precendent that `currentUser` fields need it, to avoid a potential gotcha if someone creates a cacheable `currentUser` field in the future. (That's important to be careful with though, because is it even okay for logouts to not clear it? TODO: Can we clear the private HTTP cache somehow? I guess we would need to include the current user ID in the URL?)
Yeah ok, let's just run one browser instance and one pool.
I feel like I observed that, when I killed chromium in prod, pm2 noticed the abrupt loss of a child process and restarted the whole app process? which is rad? so maybe let's just trying relying on that and see how it goes
We used Playwright in the first place to try to work around a Vercel deploy issue, and I'm not sure it really ended up mattering lol :p
But yeah, I'm putting the new Puppeteer code through the same prod stress test, and it just doesn't seem to be getting into the same broken state that Playwright was. I'm guessing it's just that Puppeteer has more investment in edge-case handling? (There's also the fact that we're no longer running things as root, which could have been a fucky problem, too?)
Oh, I made a typo that caused pm2 to be running our processes as `root` instead of `matchu`! Let's very fix that!! 😳
I noticed this because I'm trying Puppeteer, and it got upset about running in sandboxed mode as root, and I'm like "as root??"
So yeah, good, fixed, lol 😬