According to our GlitchTip error tracker, every time we deploy, a
couple instances of `Async::Stop` and `Async::Container::Terminate`
come in, presumably because:
1. systemd sends a STOP signal to the `falcon host` process.
2. `falcon host` gives the in-progress requests some time to finish up
3. Sometimes some requests take too long, and so something happens.
(either a timer in Falcon or a KILL signal from systemd, not sure!)
that leads the ongoing requests to finally be terminated by raising
an `Async::Stop` or `Async::Container::Terminate`. (I'm not sure
when each happens, and maybe they happen at different points in the
process? Maybe one happens for the actual long-running ones, vs the
other happens if more requests come in during the meantime but get
caught in the spin-down process?)
4. Rails bubbles up the errors, our Sentry library notices them and
sends them to GlitchTip, the user presumably receives the generic
500 error, and the app can finally close down gracefully.
It's hard for me to validate that this is *exactly* what's happening
here or that my mitigation makes sense, but my logic here is basically,
if these exceptions are bubbling up as "uncaught exceptions" and
spamming up our error log, then the best solution would be to catch
them!
So in this change, we add an error handler for these two error classes,
which hopefully will 1) give users a better experience when this
happens, and 2) no longer send these errors to our logging 🤞❗️
That strange phenomenon where the best way to get a noisy bug out of
your logs is to fix it lmao
Oh rough, when moving an item list entry from one list to another, our
logic to merge their quantities if it's already in that list was just
fully crashing!
That is, moves without anything to merge were working, but moves that
required a merge were raising Internal Server Error 500, because the
`list_id` attribute wasn't present.
I'm not sure why this ever worked, I'm assuming using `list_id` in the
`where` condition would include it in the `select` implicitly in a
previous version of Rails? Or maybe Rails used to have fallback
behavior to run a second query, instead of raising
`MissingAttributeError` like it does now?
Well, in any case, this seems to fix it! Whew!
I don't think people see this very often visually, but it's showing up
in our error logging! The Rails API changed here long ago and we didn't
notice: to render public files, we should use the `file` argument
instead of `template`.
Trying something new and lightweight and more data-controlled!
I also turned down the sample rate for the performance traces feature,
because we hardly use it right now, and Sentry is always getting mad at
us for vastly exceeding our free plan quota—and like, we're not on
Sentry anymore so I imagine we have more wiggle room with that, but I
figure let's turn down the volume anyway, until we decide we want it.
I previously added a warning to the item page, and thought about doing
one here but was sicky and misjudged the complexity and forgot you
don't need to hook into the `knownGlitches` API field to do it! Easy
peasy for a hacky little bug message!
I *think* what I'm observing is that:
1. The zone restrictions are different between these items.
2. The zone restrictions *change* when reloading the page sometimes. (I
assume from remodeling?)
3. The items look very buggy on many pets, because many appearances
seem to expect different zone restrictions than the item actually
has.
I think what this means is:
1. TNT has finally unbound restricted zones from the item level, and
allowed different appearances to have different restrictions. Neat!
2. The API still serves it the same way, as a field on the item.
So I think this means we need to update our schema to reflect the fact
that an item's `zones_restrict` field isn't *really* a property of the
item; it's a property of the combination of the item and the current
body ID.
My gut take here is that maybe this means it's time for the Large
Refactor that I've kinda been interested in for a while, but been
avoiding because of Impress 2020 compatibility issues: instead of a
`body_id` field on assets, and having them directly belong to items,
make an `ItemAppearance` record (closer to how 2020's GQL API modeled
it, I was looking ahead to this possibility!) that's keyed on item and
body ID, and assets belong to *that*.
Then, we could move the zones restriction field onto the
`ItemAppearance` record instead. And then it doesn't really matter to
us how TNT models it internally; whatever we saw is what we use.
(Again, I looked ahead to this in the 2020 app, and tried to use the
`restrictedZones` field on `ItemAppearance` when possible—even though
it secretly just reads directly from the `Item`!)
…but that's a pretty big departure from how things are modeled now, and
isn't something we can just throw together—especially coordinating it
across both apps. I was getting close to being able to shut off 2020
from a *front-facing* perspective (but still keeping a lot of the GQL
endpoints open for the wardrobe-2020 frontend), but I don't think we're
very close to being able to try to target turning off 2020's *backend*
as a prereq to this; or at least, if we do, we should expect that to
take a while. (Counting now, there's still 9 GQL queries—not as many as
I expected tbh, but still quite a few.)
So idk how to sequence this! But for now, let's put out a warning, and
start setting expectations.
The main *intended* user-facing effect of this is that "Items you own"
and "Items you want" filters should work in wardrobe-2020 now!
It is also possible that I messed something up and that this might
break unrelated searches somehow! We'll find out!! 😅
Did this before we had the ability to trigger searches from the app
itself, to allow me to open up the browser JS console and call this
function directly. Now we can just do it in app, goodbye!
Yay, we finally added it, the part where we include the appearance data
for the items based on both the species/color and the alt style! Now,
switching to Faerie Acara correctly filters the search only to items
that would fit (I think literally just only body_id=0 items right now,
but we're not banking on that!)
This only *really* shows up right now in the case where you construct
an Advanced Search form query (which only the wardrobe-2020 app does
now, and in limited form), and we return the query back (which only
gets used by the HTML view for item search, which doesn't have any way
to build one of these requests against it).
This is because, if you just type in `fits:alt-style-87305`, we always
keep your search string the same when outputting it back to you, to
avoid the weirdness of canonicalizing it and changing it up on you in
surprising ways!
But idk, this is just looking forward a bit, and keeping the system's
semantics in place. I hope someday we can bring robust text filter
and Advanced Search stuff back into the main app again, maybe!
I considered this at first, but decided to keep it simple until it
turned out to matter. Oops, it already matters, lol!
I want the item search code to be able to easily tell if the series
name is real or a placeholder, so we can decide whether to build the
filter text in `fits:$series-$color-$species` form or
`fits:alt-style-$id` form.
So in this change, we keep it that `AltStyle#series_name` returns the
placeholder string if none is set, but callers can explicitly ask
whether it's a real series name or not. Will use this in our next
change!
Previously we did this hackily by comparing the ID to a hardcoded list
of IDs, but I think putting this in the database is clearer and more
robust, and it should also help with our upcoming item search stuff
that will filter by it!
Previously, passing in `fits:blue` would cause a crash, because
`species_name` part of the split would be `nil`, oops!
In this change, we use a regex for more explicitness about the pattern
we're trying to match. We'll also add more cases next! (You'll note the
error message mentions `fits:nostalgic-faerie-draik`, which isn't
actually possible yet, but will be!)
Previously, the query wouldn't fill into the search box or page title
if e.g. parsing had failed. Now it does!
I'm not sure why the rescue strategy we previously had here doesn't
work anymore (I'm sure it must've in the past sometime?), but this is
simpler anyway, let's go!
I think this is a bit clearer and lets us clean up some of the syntax a
bit (don't need to always say `filters <<`), and also it will let us
use `return`, which I'm interested in for my next change!
Right, fitting isn't just body_id = this one, it's also body_id=0!
Anyway, doing this query on its own is still deathly slow, I wonder if
the idea I had about left joins (back when I was still working in a
Rails version that didn't support it lol) could help! Might poke at
that a smidge.
Oh right, `imageUrl` is the name of the field relative to what the app
expects, but under the hood `useOutfitAppearance` actually makes that
an alias for `imageUrlV2(idealSize: SIZE_600)`.
So we need to cache it as the same field with the same params, rather
than as just plain `imageUrl`!
This fixes the bug where wearing an item from search would require a
network round-trip and visually remove all items in the meantime.
(Also, none of this issue was visible to most users, because item
search is still feature-flagged onto the old GQL one for most people!)
This makes clicking on search results in the new mode actually work! It
correctly adds it to the outfit, and removes other items.
The thing that's behaving strangely is that, when you add the item, we
visually remove all items until we can finish a fresh network request
for what they should all look like. This probably means that the cache
lookup for `useOutfitAppearance` is not as satisfied with what we cache
here as `findItemConflicts` is? Something to investigate!
It'd be nice to customize the message a bit, but this should be rare
and I'd prefer the simplicity of just going with the default text.
I ran into this when I made a mistake in how I process the return value
of search results, so React Query caught and raised the error via
React, as intended! And I was annoyed that it wasn't logged anywhere,
so that's my motivation for this change—but also, the old message is
pretty meh and has some layout problems anyway.
The database did a weird thing today, where it wouldn't even respond to
the usual stop signal, I had to fully `kill -9` it??
I didn't see anything in the logs indicating what it was busy doing,
and people online seem to describe having this problem sometimes but
with no obvious solution.
For now, I'll try turning on the slow query logger, to see if that
might give us hints about whether there was like a denial-of-service
query attack hitting us or something?
I feel like this was part of `will_paginate` back before the Rails
community had itself figured out about what belongs in a model?
But yeah, a default per-page value for search results does not belong
here. And I don't think anything references it anymore, because we pass
`per_page` to the `paginate` call in `ItemsController` explicitly! So,
goodbye!
First off, I think our code has converged on a convention of gracefully
returning `nil` for manifest-less situations, so we can do that instead
of raise! And then that lets us just simplify this check to whether
`manifest` is present, instead of `manifest_url`, so we stop crashing
in cases where we get to this point in the code and there's a manifest
URL but not a manifest.
This was a bit tricky! When I initially turned it on, running
`rails swf_assets:manifests:load` would trigger database errors of "oh
no we can't get a connection from the pool!", because too many records
were trying to concurrently save at once.
So now, we give ourselves the ability to say `save_changes: false`, and
then save them all in one batch after! That way, we're still saving by
default in the edge cases where we're downloading and saving a manifest
on the fly, but batching them in cases where we're likely to be dealing
with a lot of them!
Now we're *really* duplicating with Impress 2020's system lol, but I
need a way to not keep trying to load manifests that are actually 404,
which are surprisingly plentiful!
This doesn't actually stop us from loading anything yet, it just tracks
the timestamps and the HTTP status! But next I'll add logic to skip
when it was 4xx recently.
This is both unnecessary now, but also caused a bug in the new search
stuff where searching by zone would pass an extra `locale` argument to
a filter that doesn't need it!
Idk when this regressed exactly, but probably people didn't super
notice because I don't think it's a very common thing to type directly
into the Infinite Closet search box! (It used to be crucial to the old
wardrobe app.)
But I'm using it in the wardrobe app again now, so, fixed!
For now, I'm doing it with a secret feature flag, since I want to be
committing but it isn't all quite working yet!
Search works right, and the appearance data is getting returned, but I
don't have the Apollo Cache integrations yet, which we rely on more
than I remembered!
Also, alt styles will crash it for now!
`is:np` now means "is not NC and is not PB".
Note that it might be good to make NC and PB explicitly mutually
exclusive too? It would complicate queries though, and not matter in
most cases… the Burlap Usul Bow is the only item that we currently
return for `is:pb is:nc`, which is probably because of a rarity issue?
Adding new functionality to the item search JSON endpoint, and adding
an adapter layer to match the GQL format!
Hopefully this will be pretty drop-in-able, we'll see!
Clearing the way to be able to delete the announcement banner, which is
currently the only link!
I feel like there's room to redo the site layout to find a place to
more properly link to this from, but I don't have one yet! And this is
enough of a niche reference that I think this is good enough?
A bit of a hack, because the thing triggering it was also a bit of a
hack? I feel like there's something we gotta do with refactoring how
our multiple concepts of state are managed… but in any case! This seems
to keep basic outfit-loading working, while no longer getting us
trapped in autosave loops!
Here's how I reproduced the bug:
1. Open a saved outfit.
2. Set the browser devtools to apply a latency of 5sec to all requests.
3. Add an item to the outfit, and wait for the autosave to start.
4. While it still says "Saving", remove the item again.
5. Watch how, when the first autosave request comes in, the item is
re-applied to the outfit, then autosave gets stuck looping forever.
The issue was that, when an outfit finishes saving, the change in
outfit data was triggering this effect in `useOutfitState` that was
*meant* to *initialize* local state from the saved outfit, not to keep
them in sync all the time. (In general, when saved outfit data comes
back from the server, we don't want to use it to "fix" local outfit
state in the case of discrepancies, because the most common source of
discrepancy will be the user having made further changes!)
But anyway, one thing I didn't realize is that we *were* depending on
this hacky hook to do more than I thought: it was responsible for
syncing `id` and `appearanceId` to the local state after saving the
outfit. So, I replaced the `rename` action dispatch here with a new
action that explicitly sets all fields the server is responsible for!
Ah yeah, if you're not on the wardrobe page (so we don't need the Alt
Styles UI), and the outfit's `altStyleId` is null (as is the case for
the item preview page), then there's no need to load the alt styles for
that species.
So before this change, going to `/items/123` would include an XHR
request to `/species/<id>/alt-styles.json`, which would not be used for
anything. After this change, that request is no longer sent. Hooray!
The alt styles controller is the one place we use this right now, but
I'm planning to generalize this to loading appearances during item
search, too!
I also add more `only` fields to the alt styles `as_json` call, because
idk it feels like good practice to both 1) say what we need in this
endpoint, rather than rely on default behavior upstream, and 2) to
avoid leaking fields we didn't realize were on there. (And also to
preserve bandwidth, too!)
I think there's no call sites for these anymore, so now I can start
repurposing these methods for the new API endpoints I'm planning! :3
Now, `SwfAsset#image_url` approximately matches Impress 2020 logic: use
the thumbnail PNG from the manifest if one exists, or the Impress 2020
converter for canvas movies, or the old AWS copy generated by gnash if
necessary, or return nil.
I built this API endpoint in anticipation of a change I never actually
made! I'll just remove it for now, leaning toward cleanuppery over
holding onto something I'm not sure about.