According to our GlitchTip error tracker, every time we deploy, a
couple instances of `Async::Stop` and `Async::Container::Terminate`
come in, presumably because:
1. systemd sends a STOP signal to the `falcon host` process.
2. `falcon host` gives the in-progress requests some time to finish up
3. Sometimes some requests take too long, and so something happens.
(either a timer in Falcon or a KILL signal from systemd, not sure!)
that leads the ongoing requests to finally be terminated by raising
an `Async::Stop` or `Async::Container::Terminate`. (I'm not sure
when each happens, and maybe they happen at different points in the
process? Maybe one happens for the actual long-running ones, vs the
other happens if more requests come in during the meantime but get
caught in the spin-down process?)
4. Rails bubbles up the errors, our Sentry library notices them and
sends them to GlitchTip, the user presumably receives the generic
500 error, and the app can finally close down gracefully.
It's hard for me to validate that this is *exactly* what's happening
here or that my mitigation makes sense, but my logic here is basically,
if these exceptions are bubbling up as "uncaught exceptions" and
spamming up our error log, then the best solution would be to catch
them!
So in this change, we add an error handler for these two error classes,
which hopefully will 1) give users a better experience when this
happens, and 2) no longer send these errors to our logging 🤞❗️
That strange phenomenon where the best way to get a noisy bug out of
your logs is to fix it lmao
I don't think people see this very often visually, but it's showing up
in our error logging! The Rails API changed here long ago and we didn't
notice: to render public files, we should use the `file` argument
instead of `template`.
Yay, we finally added it, the part where we include the appearance data
for the items based on both the species/color and the alt style! Now,
switching to Faerie Acara correctly filters the search only to items
that would fit (I think literally just only body_id=0 items right now,
but we're not banking on that!)
Previously, the query wouldn't fill into the search box or page title
if e.g. parsing had failed. Now it does!
I'm not sure why the rescue strategy we previously had here doesn't
work anymore (I'm sure it must've in the past sometime?), but this is
simpler anyway, let's go!
For now, I'm doing it with a secret feature flag, since I want to be
committing but it isn't all quite working yet!
Search works right, and the appearance data is getting returned, but I
don't have the Apollo Cache integrations yet, which we rely on more
than I remembered!
Also, alt styles will crash it for now!
Adding new functionality to the item search JSON endpoint, and adding
an adapter layer to match the GQL format!
Hopefully this will be pretty drop-in-able, we'll see!
The alt styles controller is the one place we use this right now, but
I'm planning to generalize this to loading appearances during item
search, too!
I also add more `only` fields to the alt styles `as_json` call, because
idk it feels like good practice to both 1) say what we need in this
endpoint, rather than rely on default behavior upstream, and 2) to
avoid leaking fields we didn't realize were on there. (And also to
preserve bandwidth, too!)
I built this API endpoint in anticipation of a change I never actually
made! I'll just remove it for now, leaning toward cleanuppery over
holding onto something I'm not sure about.
I'm gonna also use this for a task to try to warm up *all* the
manifests in the database! But to start, just a simple one, to prepare
the alt styles page quickly on first run. (This doesn't really matter
in production now that I've already visited the page once, but it helps
when resetting things in dev, and I think more it's about establishing
the pattern!)
Preparing to finally move wardrobe-2020's item search to use the main
app's API endpoints instead!
One blocker I forgot about here: Impress 2020 has actual support for
knowing an item's true appearance, like by reading the manifest and
stuff, that we haven't really ported over. I feel like maybe I should
pause and work on the changes to manifest-archiving that I'd been
planning anyway? I'll think about it.
Okay, so I still don't know why rendering is just so slow (though
migrating away from item translations did help!), but I can at least
cache entire closet lists as a basic measure.
That way, the first user to see the latest version of a closet list
will still need just as much time to load it… but *only* the ones that
have changed since last time (rather than always the full page), and
then subsequent users get to reuse it too!
Should help a lot for high-traffic lists, which incidentally are likely
to be the big ones belonging to highly active traders!
One big change we needed to make was to extract the `user-owns` and
`user-wants` classes (which we use for trade matches for *the user
viewing the list right now*) out of the cached HTML, and apply them
after with Javascript instead. I always dislike moving stuff to JS, but
the wins here seem. truly very very good, all things considered!
I think this was to explain why `order` wasn't part of this query, and
we probably used to sort in the controller? But now the item search
module takes care of all that, this is just confusing to say now imo!
Mostly this is just me testing out what it would look like to
modularize the app more… I've noticed that some concerns, like
fundraising, are just not relevant to most of the app, and being able
to lock them away inside subfolders feels like it'll help tidy up
long folder lists.
Notably, I haven't touched the models case yet, because I worry that
might be a bit more complex, whereas everything else seems pretty
well-isolated? We'll try it out!
Okay right, the wardrobe-2020 app treats `state` as a bit of an
override thing, and `pose` is the main canonical field for how a pet
looks. We were missing a few pieces here:
1. After loading a pet, we weren't including the `pose` field in the
initial query string for the wardrobe URL, but we _were_ including
the `state` field, so the outfit would get set up with a conflicting
pet state ID vs pose.
2. When saving an outfit, we weren't taking the `state` field into
account at all. This could cause the saved outfit to not quite match
how it actually looked in-app, because the default pet state for
that species/color/pose trio could be different; and regardless, the
outfit state would come back with `appearanceId` set to `null`,
which wouldn't match the local outfit state, which would trigger an
infinite loop.
Here, we complete the round-trip of the `state` field, from pet loading
to outfit saving to the outfit data that comes back after saving!
Yay it works(*)! But two major missing pieces:
- Outfit saving doesn't persist it at all
- Item compatibility is unaffected: items will still appear in search
and in the preview, even when they don't fit anymore.
To help with space, I'm just showing the word "Nostalgic" (or "???" if
it's from a series we don't recognize, this is hardcoded by ID), and
trusting that from context it will be obvious that it's the "Nostalgic
Faerie" case or whatever. (Moreover, in both the button and the select
we're omitting the species name, by similar reasoning!)
Note that this _still_ doesn't actually apply the style to the outfit
whatsoever; this is all just local state as we're continuing to play
with UI concepts. Actually applying it is probably next though! (Though
there's a couple more UI things I want to do, like some affordances to
clarify that a Style is applied and that Expression changes won't work.)
It was a bit tricky to figure out the right API for this, since I'm
looking ahead to the possibility of splitting these across multiple
pages with more detail, like we do in DTI 2020.
What I like about this API is that the caller gets to apply, or not
apply, whatever scopes they want to the underlying hanger set (like
`includes` or `order`), without violating the usual syntax by e.g.
passing it as a parameter to a method.
Oh yeah, a long-standing limitation. Good thing we're better at stuff
now!
This is also probably the real cause of the weird number of slight
discrepancies between main DTI and DTI 2020 when I eyeballed stuff lol
oh, well, that and the missing default-lists. A bit messy!
Ahh I see, the way we got away with not having a `trading` scope before
was a weird metaprogramming `{owned/wanted}_trading` situation. Okay,
let's trash that in favor of our new stuff! And that helps us bulk the
queries too which is nice.
Building toward replacing more of the 2020 data sources! I think this is
an endpoint that benefits from bulk loading, esp with the way the item
page previews work. I also like taking the concept of "canonical" out of
the GQL interface, and instead just loading for each of the 50 species
and letting the client decide. (And then it can fast-swap between them!)
Just moving more stuff over! I modernized Item's `as_json` method while
I was here. (Note that I removed the NC/own/want fields, because I
think the only other place this method is still called from is the
quick-add feature on the closet lists page, and I think it doesn't use
these fields to do anything: updating the page is basically a full-page
reload, done sneakily.)
Preparing a better endpoint for wardrobe-2020 to use! I deleted the
now-unused swf_assets#index endpoint, and replaced it with an
"appearances" concept that isn't exactly reflected in the database
models but is a _lot_ easier for clients to work with imo.
Note that this was a big part of the motivation for the recent
`manifest_url` work—in this draft, I'm probably gonna have the client
request the manifest, rather than use impress-2020's trick of caching
it in the database! There's a bit of a perf penalty, but I think that's
a simpler starting point, and I have a hunch I'll be able to make up
the perf difference once we have the impress-media-server managing more
of these responsibilities.
This hasn't worked for a while, and I don't know an API off the top of
my head to drop in for it. Let's just delete it for now, and revisit it
later if we want to!
Dang, I'm really wishing I'd opened this sooner cuz I didn't realize it
would be THIS easy!!
The bug was that the `t` method started taking Ruby keyword params
instead of a hash object for `options`, so the syntax changed.
Womp womp!
A really really simple change! It works on the item page, the item
index page, item search, the homepage, and the item lists page.
The main reason I avoided this for so long (even before modernizing the
Rails app) was that the ElasticSearch stuff felt like it made it messy?
But now it's pretty simple, and it works in search already cuz I did
that when I implemented item search, so, nice!
This came in a few parts!
1. Add meta tags to let us know we're logged in.
2. Install React Query, which has the data-loading sensibilities I like
about Apollo without the GraphQL that has honestly been a drag.
3. Replace the outfit-loading and outfit-saving calls with API calls to
the main app.
4. Update the main app's API calls to use our more flexible data
constructs like "pose".
Would've loved to do this more incrementally, but it's hard to! You
can't split out outfit-loading and outfit-saving, or auth from any of
that, or the state gets all out-of-sorts.
Still, this is a good nugget we've pulled out all-in-all, and one that
people have been asking for! Can maybe look to logged-in item search
soon too, for own/want data?
I used the new profiler tools on this page, and noticed a lot of
allocations in the Globalize library, which we use for translating
database records. I realized that we were loading all of the fields of
not just all of the items on the page, but all of their translation
records in all locales! We used to scrape data for lots of languages, so
that can be quite a lot!
Unfortunately, Rails's `includes` method to efficiently preload related
records always loads all fields, and simply can't be overridden.
So, in this change we write manual preloading code, to identify the
records we need, load them in big bulk queries, and assign them back to
the appropriate associations. Basically just what `includes` does, but
written out a bit more, to give us the chance to specify SELECT and
WHERE clauses!
It shows up in development always, and if you're logged in as Me
Specifically in production!
I'm using this to poke at memory usage for pages that seem suspicious.
I don't know why our app reliably grows so large in RAM, but my hunch is
that maybe there are some pages that just use a truly large amount to
begin with - and I've learned Ruby doesn't release memory back after
it's GC'd, it just grows the process and keeps the free space to itself
in its own heap!
So I'm just eyeing pages that I know *can* have a lot going on, and
seeing what I find!
Now, like in DTI 2020, opening an outfit will go straight to the editor.
I'm not 100% on whether this is actually like. the superior behavior?
But I think it's good enough, and it's what the wardrobe-2020 code
expects, so let's just roll with it for now!
This was used by the Neopia server to send us the modeling data it requested out-of-band. But now we do all our modeling requests back in-app again, so we don't need this!
Okay, this is a process that idk if it's even been working for a while anyway, I don't think Neopets translates item names anymore?
And it's crashing when I try to model stuff now, so like. yeah ok I'm fine with just skipping this, it's a shame to lose out on potential data going forward but *I think there just isn't data to get anyway*
I hope this doesn't cause problems! But yeah, with Puma doing threading, and maybe switching to Falcon someday to get even better concurrency properties, I feel like this will probably be fine?
And it makes the UX a loootttt better, to be back in the world where all these forms just work, whew.
Oh okay, I was misinterpreting the error: it was that our NEOPETS_URL_ORIGIN secret value isn't the real Neopets.com IP address anymore, so amfphp requests were just plain *always* failing in production. Oops!
I've remove that environment variable from our production config, and now modeling is working in the bulk thing!
Also I'm noticing that we're using puma these days, which does good threading stuff. I think there might be merit to switching over to Falcon because of just how async-y our stuff is, but having 5 threads going is honestly probably good enough that I don't need to worry too much about mutual blocking, and could probably just write stuff to get Neopia out of the picture like *right now*. Neat!
Okay so… I'm worried about this because of Rails whole single-threaded situation, which doesn't really let it handle blocking on external network requests very well.
Ultimately I think we're gonna have to do a clever thing but idk quite what?
I should look into whether like, puma + the new async stuff can enable Rails to be more tolerable about this, and handle a few requests at once, instead of having to have the Neopia server doing it. (Right now, the Neopia server isn't really doing its job quite right, because it depends on the Rails app being *local* to send stuff to it.)
But for now, let's just extend the timeout, cuz it's basically always getting hit in production—because there's currently no other way to do modeling, oops lol
In the login case, we save the `return_to` parameter in the session, because login can be a multi-step process.
In the logout case, we just read it directly from the form params.
Note that you *could* end up in a weird scenario where an old return_to value sticks around for a bit? But we have the sense to delete it when we use it on a successful sign-in, and most links to the login page come with a `return_to` param which should reset it. So, you'd have to 1) have started but not finished a sign-in, 2) during the same session, and 3) get to the login page by an unusual means.
Probably fine!
This is a bit more standard, and has the bonus of being compatible with Devise, which is using `flash[:notice]` and so its flashes were coming out unstyled, oops!
Hey nice!!
Note that I removed an account delete button from the settings page. You can still send a DELETE request to the right endpoint to do it, but it's not gonna delete all the associated records, and I wanna think a bit about how to handle that better before exposing that button.
A lot of rough edges here (e.g. no styles on the flash messages), but it's working and that's good!!
I tested this by temporarily switching to the production database and logging in as matchu!
Still missing a lot of big features too, like registration, password resets, settings page, etc.
This removes login/logout/session logic for integrating with OpenNeo ID, replacing them with stubs that just redirect to `/?TODO` when you click login, and helpers that act as if you're not logged in.
This gives us a clean slate to plug in new Devise logic to integrate with the `openneo_id` database directly!
This one was pretty straightforward yaay! Main thing was the change from `render file` to `render template` in a couple places, oh and a thing with complex `order()` clauses.
I ran `rails zeitwerk:check`, which eager-loads the app, and it found two problems: `closet_group.rb` doesn't define `ClosetGroup` (cuz it's empty), and I left in a reference to a cache sweeper observer oops. Goodbye!
Some important little upgrades but mostly straightforward!
Note that there's still a known issue where item searches crash, I was hoping that this was a bug in Rails 4.2 that would be fixed on upgading to 5, but nope, oh well!
Also uhh I just got a bit silly and didn't actually mean to go all the way to 5.2 in one go, I had meant to start at 5.0… but tbh the 5.1 and 5.2 changes seem small, and this seems to be working, so. Yeah ok let's roll!
We've already swapped out the backend for this stuff to Impress 2020, so the resque task and the broken image report UI aren't actually relevant anymore. Delete them!
This helps us delete Resque soon too.
Huh! This cache key seemed to only be referenced in checks and expirations, but was never actually used! So I guess we've been loading the modeling predictions every time for a while huh??
We'll get smarter about that someday, but anyway, that lets us delete our Item resque tasks and ItemObserver!
Again I'm just not convinced of the perf on this, and it enables us to delete some whole infra over it, we can improve it another time if it's useful to!
Just removing some caching and the expiration of it! There's still more superfluous(?) caching on the item page to audit, but these seem a bit more sensible about avoiding loading extra data.
In the interest of clearing out Resque, I'm just gonna remove a lot of our more complex caching stuff, and we can do a perf pass for things like big item list pages once everything's upgraded. (I'm hopeful that the upgrades themselves improve perf; and if not, that some improved sensibilities 10 years later can find simpler approaches.)
I want to test some logged-in stuff, but the whole openneo_id app is a mess to integrate with (and I want to eliminate it down the line anyway), so here's a simple hacky thing that just gets you into a test user for development!
Turns out ~22% of our users initially land on a trade list.
We like to keep the campaign off the pages where space is at a
premium, so we try to whitelist it to major landing pages in order
to avoid accidentally creating a bad experience on some page :)
I've been doing this manually via email for a long time,
since building new stuff in the logged-in world was a pain in the old env.
But now here we are! Finally, finally :)
In particular, outfit_id == 0 would cause outfit_id? to
return false, so it wouldn't run the outfit presence
validation, so /donations/features would try to load
outfit #0 and fail.
Also, flash[:alert] instead of flash[:error] when outfit_id
is bad.
Turns out we need to assign closeted to actual items, not
the item proxies, since that's what we check against. (I
would've thought they're backed by the same instance of
the item anyway, but, whatever. The fix works :P)