Things seemed to mostly work at first glance! I haven't tested outfitPageSSR because we'll need to redo it though, and also the outfit image routes aren't working anymore (vercel.json isn't how next.js works)
I noticed when loading Your Outfits earlier (before I switched it to just use prod images even on dev), that there was a big memory leak slowing down my machine.
My hypothesis is that this is because I wasn't _waiting_ for the resources to tear down before finishing the request, so Vercel terminated the request early, and I further hypothesize that terminating a Playwright session partway through isn't guaranteed to clean up the browser.
Not sure about that! Could have just been that we spun up a lot at once, and a bunch of things went into swap! (But I thought it generally handles requests in serial in the dev server? So that feels unlikely.)
Anyway, I don't feel like extensively testing this again and maybe messing up my computer session again :p Just, when I first wrote this without awaits, I knew that it was a bit risky, but I wanted to _see_ if it was a problem before slowing down individual requests by awaiting. And now, my "it's likely to be a problem" threshold has been reached, lol!
So, I'm not _confident_ this is the best play, I don't know the internals well enough; but it seems like a better side to err on than the other, now that I know more!
I'm seeing a lot of 500 "Could not load image: undefined" lately, not sure why! Wondering if it's resources staying open too long, idk. This should at least be better than leaving it open!
Well, holding onto the browser instance seems to be a source of bugs (my previous fix didn't seem to fix it), and I'm not _sure_ what the perf characteristics are, so let's just try a fresh instance each time!
I don't actually know what a browser "instance" in Playwright really is, I'm not sure it even necessarily creates a new process, I just don't know
and I saw some Vercel example code take this approach, which is definitely simpler, and I guess must not be _overtly_ bad perf if it's idiomatic?
So, like, ok, cool, let's see if this stops 500ing us with "Browser closed"! 😅
Been seeing this in testing in prod, the first few images worked great, but then eventually they all started saying the browser was disconnected.
Here, we add a check to reconnect if it goes missing. This is actually kinda hard to test in dev, because the dev server creates a new process every time the function runs, so fingers crossed!
I also added explicit logic to close the page when we're done with it, I'm worried we crashed the browser by exceeding the RAM limit by leaving pages open. Not sure quite how their model works and whether things eventually get flushed out on their own!
We update /api/assetImage to accept size as a parameter (I make it mandatory to push people into HTTP caching happy paths), and we update the GraphQL thing to use it in those cases too!
This also means that, if these images seem to go well, we could swap Classic DTI over to them… I want to turn off those RAM-heavy image converters on the VPS lol
Okay cool, this one worked! We use this special Chrome package with AWS Lambda support, and then we use normal Playright in dev, and then we exclude `playwright` from the deployment (even though it got auto-detected by `require("playwright")`) to just barely sneak in under the 50MB limit for this function. Phew!
The preview deploys for this seem to be, actually working? So that's exciting!
So, just using normal playwright was crashing with this error: https://github.com/microsoft/playwright/issues/5862
I didn't understand why everyone was using playwright-core until I read the comments more carefully, and saw that it was because folks were using playwright-aws-lambda, because that's where Vercel functions run. (It has some special compat stuff.)
So I'm figuring that maybe the special case in Vercel's builder that fixes this for playwright-core maybe doesn't apply to normal playwright? But that people don't actually run into that issue in practice, because they're all using playwright-core for playwright-aws-lambda instead?
Idk, let's see how it goes! My hope is that this both fixes the immediate crasher about browsers.json being missing, _and_ fixes a problem we were _gonna_ have down the line about normal playwright not working in an AWS Lambda setting.
Hm, okay, so the documented way to not instrument anything doesn't actually stop them from patching Module._load. But this undocumented option sure does! So, woo, let's try it! lol
Huh, well, I can't figure out what in our production env stopped working with Honeycomb's automatic instrumentation… so, oh well! Let's try disabling it for now and see if it works.
This means our Honeycomb logs will no longer include _super helpful_ visualizations of how HTTP requests and MySQL queries create a request dependency waterfall… but I haven't opened Honeycomb in a while, and this bug is blocking all of prod, so if this fixes the site then I'm okay with that as a stopgap!
Btw the error message was:
```
Unhandled rejection: TypeError: Cannot read property 'id' of undefined at exports.instrumentLoad (/var/task/node_modules/honeycomb-beeline/lib/instrumentation.js:80:14) at Function._load (/var/task/node_modules/honeycomb-beeline/lib/instrumentation.js:164:16) at ModuleWrap.<anonymous> (internal/modules/esm/translators.js:199:29) at ModuleJob.run (internal/modules/esm/module_job.js:169:25) at Loader.import (internal/modules/esm/loader.js:177:24)
```
Oh also, this is the first time eslint has looked at scripts/build-cached-data.js I guess, so I fixed some lint errors in there.
I'm gonna try and see about getting Fastly to redirect these internally, so we can get all the benefits of CDN-caching the generated image, without forcing the user through another round-trip!
Someone asked to use the DTI API for a small client-side project, so I'm making this change to support it!
As explained in the comment, I think this should be safe regarding CSRF attacks. But it _does_ increase the risk that someday we change something elsewhere that creates a problem, like using cookies to authorize something. So, let's remember to be careful! (as I would hope we would be when adding another auth mechanism!)
Gonna have the /outfit-urls page start returning these instead, for feature parity with before
I might change the strategy on this at some point, like have it get `updatedAt` and redirect instead of generating the image. But this is simpler for now (and the Vercel cache doesn't seem to be as aggressive as I want anyway), and I can change it later!
Oops, we get a _lot_ of outfit image requests, and it's pushing the limits of our free Honeycomb plan! But I don't really need all that much detail, because there's so many.
So, we here apply sampling! `api/outfitImage` is getting a 1/10 rate, and for GraphQL, `ApiOutfitImage` is getting 1/10, and `SearchPanel` is getting 1/5.
I had to add a `addTraceContext` call, to give all the child events awareness of what operation they're being called in, too!
I haven't actually tested that this is working-working, just that the endpoints still return good data. We'll see how it shakes out in prod!
But I did add `console.log(sampleRate, shouldSample, data);` to the `samplerHook` briefly, to see the data flow through, and I reloaded a `SearchPanel` request a few times and observed a plausibly 20% success rate.
Oops, right, I meant to use the new `impress-outfit-images.openneo.net` host for this! It works just fine from `impress-2020.openneo.net` as the backing source right now, but I want these semi-permanent URLs to be a bit more decoupled.
This is mostly for decoupling's sake: I want pretty outfit URLs that aren't dependent on the `/api/outfitImage` path structure, so that we can serve them as ~permanent pretty URLs that don't actually indicate where or how the image is stored.
This reverts commit a0121f09ea.
I didn't realize that VERCEL_URL wouldn't use the canonical URL. And thinking further about the Open Graph semantics, I think having just one canonical host makes more sense, even if it makes testing a bit more annoying in some cases.
Ah right, this is how we get the deployed API code to know its own hostname! This way, when we do a preview deploy, /api/outfitImage will use the preview version of the GraphQL endpoint, too.
Hmm, the built copy of the HTML isn't deployed with the API function on Vercel. Ok! Let's just do the same trick as we did for the dev server, and make an HTTP request.
This is fine enough for perf because we're caching this result locally to the function, plus it should be a fast CDN-cached response anyway.
Size upgrade, wowie! (I figure sharing services probably prefer a higher-resolution image, they're not very big, and to then scale them down for the application-specific use case & device.)
Meta tags are a bit tricky in apps built with `create-react-app`! While some bots like Google are able to render the full page when crawling, not all bots are. Most will just see the empty-ish index.html that would normally load up the application.
But we want outfit sharing to work! And be cool! And use our new outfit thumbnails!
In this change, we add a new server-side rendering API route to handle `/outfits/:id`.
It's very weak server-side rendering: it just loads index.html, and makes a few small tweaks inside the `<head>` tag. But it should be enough for sharing to work in clients that support the basics of Open Graph, which I think most major providers respect! (I know Twitter has their own tags, but it also respects the basics of OG, so let's see whether there's anything we end up _wanting_ to tweak or not!)
Not using this anywhere in-app yet! But might swap it into the user outfits page, and use it to server-side-render social sharing meta tags!
Also eyeing this as a way to replace our nearly 1TB of outfit image S3 storage, and save $20/mo…
Oops, I forgot to grep outside `src` for this, lol! I'll keep the S3 domain support for now, because it's still fine to accept and some clients might be on old code, whatever!
Idk I was surprised to notice that /api/validPetPoses wasn't cached by the Vercel CDN… but I guess that makes sense, because it's only allowed to be an hour old, and client caches hold onto it, so I suppose it makes sense for the CDN cache to not bother.
But in practice, that means users loading the site are pretty _likely_ to see /api/validPetPoses taking its full load time, which can be up to 500ms.
So, I'm offering this additional cache hint, to be willing to serve /api/validPetPose responses up to a week old while reloading the latest data. My hope is that the cache algorithm is much more excited about holding onto that, and that it makes the 500ms delay very _rare_ in practice!
I'm learning that top-level traces should use operation_name as well as name, because name is the low-level thing that every trace gets (including child traces like db queries and net requests)!
Also there was an incorrect label in one of these, and validPetPoses was missing it altogether
This will help us disambiguate items like Butterfly Dress with a non-unique name, and also some where the real item name is glitchy so using it as a key is not wise lol
By printing out this logging data, of item names we have that Waka doesn't, vs item names Waka has that we don't, I was able to solve a lot of them with new code in `normalizeItemName`!
Here's the mismatches that are left at time of writing:
```
[Item: Y, Waka: N] No Waka value for NC DTI item "goldenlulumedallion" (43034)
[Item: Y, Waka: N] No Waka value for NC DTI item "faelliebirthdaybagsurprise" (51350)
[Item: Y, Waka: N] No Waka value for NC DTI item "neopets11thbirthdaycommemorativemysterycapsule" (53448)
[Item: Y, Waka: N] No Waka value for NC DTI item "dyeworksblue:iscawig-blue" (70896)
[Item: Y, Waka: N] No Waka value for NC DTI item "mysteriousdoorwithlocks" (75601)
[Item: Y, Waka: N] No Waka value for NC DTI item "grapefruitnecklace" (77779)
[Item: Y, Waka: N] No Waka value for NC DTI item "discofeverbackground" (79250)
[Item: Y, Waka: N] No Waka value for NC DTI item "featherflaredshoes" (80047)
[Item: Y, Waka: N] No Waka value for NC DTI item "dyeworksredradioactivemutantmarkings" (80441)
[Item: N, Waka: Y] No NC DTI data for Waka item "7thbirthdaycakeslice#1"
[Item: N, Waka: Y] No NC DTI data for Waka item "7thbirthdaycakeslice#2"
[Item: N, Waka: Y] No NC DTI data for Waka item "7thbirthdaycakeslice#3"
[Item: N, Waka: Y] No NC DTI data for Waka item "8thbirthdayrainbowcupcake"
[Item: N, Waka: Y] No NC DTI data for Waka item "8thbirthdaysparklercupcake"
[Item: N, Waka: Y] No NC DTI data for Waka item "8thbirthdaytiedwithabowcupcake"
[Item: N, Waka: Y] No NC DTI data for Waka item "babyspringdress"
[Item: N, Waka: Y] No NC DTI data for Waka item "butterflydress(fromfaeriefestivalevent)"
[Item: N, Waka: Y] No NC DTI data for Waka item "discofever"
[Item: N, Waka: Y] No NC DTI data for Waka item "dyeworksblue:iscawig"
[Item: N, Waka: Y] No NC DTI data for Waka item "dyeworksred:radioactivemutantmarkings"
[Item: N, Waka: Y] No NC DTI data for Waka item "featherflairedshoes"
[Item: N, Waka: Y] No NC DTI data for Waka item "festivebooktree"
[Item: N, Waka: Y] No NC DTI data for Waka item "floralblackcardigan"
[Item: N, Waka: Y] No NC DTI data for Waka item "grapefruitneckace"
[Item: N, Waka: Y] No NC DTI data for Waka item "mysteriousdoorwithlocksbackground"
[Item: N, Waka: Y] No NC DTI data for Waka item "valiantchampionwings"
[Item: N, Waka: Y] No NC DTI data for Waka item "waxcrayonwig"
[Item: N, Waka: Y] No NC DTI data for Waka item "youngsophiesdress"
```
Oops, I missed something important about the Sheets API here! This was causing us to set `{value: undefined}` for some items, which serialized as `{}`.
Just a little experiment to see this working in prod! I don't have permission to actually use this data yet, or a UI for it, just mostly want to see it hooked up to the Sheets API correctly and check the prod cache behavior.
New items have ?v= in the query string, which denied requesting them from the outfit thumbnail API route. Now, we allow any query string.
Also, I started allowing PNGs. I don't think any should actually come through here in practice? But it seems safe and forward-looking.
Huh, I guess TNT changed the folder name from `object` to `items` for new SVGs? Like, some of the old SVGs still are in `objects`, but…
Hmm, I wonder if I should re-run old manifests at some point? I wonder if maybe like, _everything_ changed, and we're just holding onto old URLs now…