I'm interested in ejecting from Vercel, so I'm trying to get off their proprietary-ish create-react-app + Vercel API thing, and onto Nextjs, which is very similar in shape, but more portable.
I had to disable `craCompat` in `next.config.js` to stop us from crashing on their webpack config, see https://github.com/vercel/next.js/discussions/25858#discussioncomment-1573822
The frontend seems to work at a basic level, but network requests fail, and images don't seem to be working. I'll work on those next!
Note that this commit was forced through despite failing lint checks. We'll need to fix that up too!
Also, after the codemod, I moved `src/pages` to the more canonical location `pages`. Lint tooling seemed surprised to not find a `pages` directory, and I didn't see a config that was making it work correctly in the other location, so I figured it's that Next is willing to check `pages` or `src/pages`? But this is more canonical so yeah!
Even in production, scrolling is a bit slow! This will preload the pagination one click ahead.
There is a bit of a perf downside, in that if you click through the pages too fast, you'll trigger _extra_ requests. I think that's a net win though, and I'm not gonna try to get cleverer than this right now.
My main inspiration for doing this is actually our potentially-huge upcoming Vercel bill lol
From inspecting my Honeycomb dashboard, it looks like the main offender for backend CPU time usage is outfit images. And it looks like they come in big spikes, of lots of low usage and then suddenly 1,000 requests in one minute.
My suspicion is that this is from users with many saved outfits loading their outfit page, which previously would show all of them at once.
We do have `loading="lazy"` set, but not all browsers support that yet, and I've had trouble pinning down the exact behavior anyway!
Anyway, paginating makes for a better experience for those huge-list users anyway. We've been meaning to do it, so here we go!
My hope is that this drastically decreases backend CPU hours immediately 🤞 If not, we'll need to investigate in more detail where these outfit image requests are actually coming from!
Note that I added the pagination to the existing `outfits` GraphQL endpoint, rather than creating a new one. I felt comfortable doing this because it requires login anyway, so I'm confident that other clients aren't using it; and because, while this kind of thing often creates a risk of problems with frontend and backend code getting out of sync, I think someone running old frontend code will just see only their first 30 outfits (but no pagination toolbar), and get confused and refresh the page, at which point they'll see all of them. (And I actually _prefer_ that slightly confusing UX, to avoid getting more giant spikes of outfit image requests, lol :p)
This is a minor change to clear a console warning, and make intended behavior clearer! You're not supposed to pass `null` as a select value, because it's ambiguous about whether you're looking for the first option or to make this an "uncontrolled component".
Here, I now provide a fallback value, which is an explicit string for the placeholder option. I made the string very explicit, to aid in debugging if it somehow leaks out from where it's supposed to be! (But I also added gating in the `onChange` event, just to be extra sure.)
Huh. `flexAlign` isn't a real Chakra style prop, because it's not a real CSS style. I wonder if I meant `alignItems`? Anyway, this was getting passed down to the DOM element and triggering a console warning. Removed!
Oops, our cutesy feature to show an outfit thumbnail sa a placeholder while the rest of the data is loading was making spurious requests!
I put the `skip` in the wrong place 😅
This caused a request to https://outfits.openneo-assets.net/outfits/null/v/NaN/300.png, which would return a 500.
The user wouldn't see anything, because the image wouldn't show because it failed. But it's a mistake, and it's sending extra requests from the client and to the server, and it's a good one to fix!
Hmm, right, okay, we *generally* should have all users imported to Auth0, but this can fail if the cron job is behind or Auth0 rejected the data (e.g. user data in a format it doesn't support).
Previously, this would apply the name change in the database, but return Auth0's "The user does not exist." error to the GraphQL client, making it look like the update fully failed.
In this change, we handle that case differently: when the Auth0 update fails with a 404, we proceed but log a warning; and when Auth0 fails with an unexpected error, we roll back the database change in addition to raising the error to the client, to keep the behavior obvious and consistent.
Oh oops, I missed this path change when I changed the route to `/user/:id/lists`! This caused searching by email to redirect to the homepage, but with a valid URL in the address bar; and refreshing the page would hit the redirect defined in `vercel.json`, redirect to the new route, and load the correct page.
Fixed!
Oh right, we have another prompt we need to not prompt for, lol :p
Here, I just assume that there's one database connection on the Auth0 account. If that's not true, the script will error—not because this is a fundamentally unresolvable problem, but because I don't want to write code for configuring a situation that doesn't exist yet :p
Huh, oops, there are a _few_ reasons the user sync cron job hasn't been running correctly.
I fixed some of the config in prod, but then discovered one more issue: the script prompts for an admin database password, so of _course_ it can't auto-run, lol.
Instead, I've now created a `impress2020-util` account, with just a few permissions (but specifically the `openneo_id.users` permission that I _don't_ give the app!), and added the username and password to the secret .env file, both locally and in prod. (In this case, prod means the Linode VPS, not Vercel, because that's where our cron runs.)
Like, the little magnifying glass in the "Search all items", you can click it to get taken to the _big_ search page with the autocomplete filters and stuff
Did some stuff in here for parsing the default list ID too. We skipped that when making the new list index page, but now maybe you could reasonably link to the default list? 🤔 not sure it's a huge deal though
I noticed someone using `<pre>` for styling, and thought, sure why not!
I haven't added support for the code block indent thing, and I think that's probably fine?
A lot of DTI lists use old URLs to anchor-link between lists! Here, we rewrite those URLs to match what DTI 2020 expects, so that they actually correctly jump you across the page and aren't filtered out!
I noticed when loading Your Outfits earlier (before I switched it to just use prod images even on dev), that there was a big memory leak slowing down my machine.
My hypothesis is that this is because I wasn't _waiting_ for the resources to tear down before finishing the request, so Vercel terminated the request early, and I further hypothesize that terminating a Playwright session partway through isn't guaranteed to clean up the browser.
Not sure about that! Could have just been that we spun up a lot at once, and a bunch of things went into swap! (But I thought it generally handles requests in serial in the dev server? So that feels unlikely.)
Anyway, I don't feel like extensively testing this again and maybe messing up my computer session again :p Just, when I first wrote this without awaits, I knew that it was a bit risky, but I wanted to _see_ if it was a problem before slowing down individual requests by awaiting. And now, my "it's likely to be a problem" threshold has been reached, lol!
So, I'm not _confident_ this is the best play, I don't know the internals well enough; but it seems like a better side to err on than the other, now that I know more!
The old URLs were glitchy because we weren't escaping the `layerUrls` param… and this will let us take better advantage of the same shared caching as other stuff!
I'm seeing a lot of 500 "Could not load image: undefined" lately, not sure why! Wondering if it's resources staying open too long, idk. This should at least be better than leaving it open!
Bump immer from 8.0.1 to 9.0.6
I tested the preview deploy a little bit with closet manipulation, seemed fine to me! That's where we do our more complex immer stuff anyway, with sets. Most of what we do with immer is very basic!
Well, holding onto the browser instance seems to be a source of bugs (my previous fix didn't seem to fix it), and I'm not _sure_ what the perf characteristics are, so let's just try a fresh instance each time!
I don't actually know what a browser "instance" in Playwright really is, I'm not sure it even necessarily creates a new process, I just don't know
and I saw some Vercel example code take this approach, which is definitely simpler, and I guess must not be _overtly_ bad perf if it's idiomatic?
So, like, ok, cool, let's see if this stops 500ing us with "Browser closed"! 😅
Whoops, `Promise.race` isn't quite what I wanted here. This meant that, if the image promise _fails_ before the movie _succeeds_, the outfit would crash even though it doesn't need to. (And this was happening too often, due to a bug in /api/assetImage!)
Now, we accept whichever _successful_ result loads first, or reject if they _both_ fail.
I tested this by having /api/assetImage always throw, and confirmed that it crashed the outfit before this change, and no longer does after this change!
Been seeing this in testing in prod, the first few images worked great, but then eventually they all started saying the browser was disconnected.
Here, we add a check to reconnect if it goes missing. This is actually kinda hard to test in dev, because the dev server creates a new process every time the function runs, so fingers crossed!
I also added explicit logic to close the page when we're done with it, I'm worried we crashed the browser by exceeding the RAM limit by leaving pages open. Not sure quite how their model works and whether things eventually get flushed out on their own!
We update /api/assetImage to accept size as a parameter (I make it mandatory to push people into HTTP caching happy paths), and we update the GraphQL thing to use it in those cases too!
This also means that, if these images seem to go well, we could swap Classic DTI over to them… I want to turn off those RAM-heavy image converters on the VPS lol
In this change, we start using our new API endpoint for movie image URLs, instead of the Classic DTI image.
This should make the little fade-in phase for certain movies a little bit less jarring (the part where we preload the image before the movie loads), though I suppose that won't necessarily load as fast until it gets into the cache the first time lol. (A good reason to maybe put a more long-lived cache like Fastly in front of this stuff long-term?)
Not doing it for the smaller image sizes yet, I'm a bit worried that I don't 100% know how to teach /api/assetImage to resize without tipping over the function limit…
…oh! I should have the webpage render at different sizes! Yeah that's a great idea lol
Okay cool, this one worked! We use this special Chrome package with AWS Lambda support, and then we use normal Playright in dev, and then we exclude `playwright` from the deployment (even though it got auto-detected by `require("playwright")`) to just barely sneak in under the 50MB limit for this function. Phew!
The preview deploys for this seem to be, actually working? So that's exciting!
So, just using normal playwright was crashing with this error: https://github.com/microsoft/playwright/issues/5862
I didn't understand why everyone was using playwright-core until I read the comments more carefully, and saw that it was because folks were using playwright-aws-lambda, because that's where Vercel functions run. (It has some special compat stuff.)
So I'm figuring that maybe the special case in Vercel's builder that fixes this for playwright-core maybe doesn't apply to normal playwright? But that people don't actually run into that issue in practice, because they're all using playwright-core for playwright-aws-lambda instead?
Idk, let's see how it goes! My hope is that this both fixes the immediate crasher about browsers.json being missing, _and_ fixes a problem we were _gonna_ have down the line about normal playwright not working in an AWS Lambda setting.
Marking this glitch on the Yellow Lutari head today, and oops there isn't UI copy for it yet! Added!
Also fixed some bugs in here, like old text about the position of the pose picker relative to the glitch badge, and I noticed while debugging that `layerUsesHTML5` returns a truthy string instead of a boolean which seems error-prone!
Hm, okay, so the documented way to not instrument anything doesn't actually stop them from patching Module._load. But this undocumented option sure does! So, woo, let's try it! lol
Huh, well, I can't figure out what in our production env stopped working with Honeycomb's automatic instrumentation… so, oh well! Let's try disabling it for now and see if it works.
This means our Honeycomb logs will no longer include _super helpful_ visualizations of how HTTP requests and MySQL queries create a request dependency waterfall… but I haven't opened Honeycomb in a while, and this bug is blocking all of prod, so if this fixes the site then I'm okay with that as a stopgap!
Btw the error message was:
```
Unhandled rejection: TypeError: Cannot read property 'id' of undefined at exports.instrumentLoad (/var/task/node_modules/honeycomb-beeline/lib/instrumentation.js:80:14) at Function._load (/var/task/node_modules/honeycomb-beeline/lib/instrumentation.js:164:16) at ModuleWrap.<anonymous> (internal/modules/esm/translators.js:199:29) at ModuleJob.run (internal/modules/esm/module_job.js:169:25) at Loader.import (internal/modules/esm/loader.js:177:24)
```
Oh also, this is the first time eslint has looked at scripts/build-cached-data.js I guess, so I fixed some lint errors in there.
I tried a "Redeploy" from the Vercel console, but it didn't seem to fix
the bug. That could mean that my theory about Node version as the
culprit is wrong? Or it could mean that we need fresh code for that
config change to take effect. So, this commit should simulate fresh
code! lol 😅
In production we're suddenly getting errors in module wrapping in honeycomb-beeline. I wonder if it's like, an incompatibility with Vercel's version of Node?
Well, this new version seems to still be playing nice on dev, so hopefully that's all it is and this fixes it! I give it like a 35% chance lol :p