I'm starting to work with the OpenID Connect stuff in NeoPass, and the
library I'm using for discovery doesn't seem to want to do it over a
plain `http://` connection. (I dug into the source files, and it just
actually is hardcoded to only work with HTTPS, as far as I can tell?)
So, I've added logic to `neopass-server` to try to make an HTTPS
certificate with the `mkcert` utility (if installed), which is a tool
that installs a root certificate authority on your local machine, then
helps you create certificates via that authority that will work only on
your local machine.
I think I'll also be able to remove the "main" server in front of the
backing server, because the library I'm using now will be able to
"discover" the auth and token endpoints, so it won't matter that our
local one uses different URLs than live NeoPass does? We'll find out!
I also remove `neopass-server` from the `Procfile`, because I think
it's a bit rude to have it auto-try to run `mkcert`. We could like,
make the process just a no-op in that case? But I think I'd prefer to
just run `neopass-server` by hand when I want it, for simplicity.
Hey nice, now we can actually get a round-trip auth success! This gets
us to the `Devise::OmniauthCallbacksController#neopass` success method,
so now we need to add our logic to actually login/signup!
In this change, we wire up a new NeoPass OAuth2 strategy for OmniAuth,
and hook up the "Log in with NeoPass" button to use it!
The authentication currently fails with `invalid_credentials`, and
shows the `owo` response we hardcoded into the NeoPass server's token
response. We need to finally follow up on the little `TODO` written in
there!
If you pass `?neopass=1` (or a secret value in production), you can see
the "Log in with NeoPass" button, which currently takes you to
OmniAuth's "developer" login page, where you can specify a name and
email and be redirected back. (All placeholder UI!)
We're gonna strip the whole developer strategy out pretty fast and
replace it with one that uses our NeoPass test server. This is just me
checking my understanding of the wiring!
This is setting us up for NeoPass, but first we're just gonna try stuff
with the "developer" strategy that's built in for testing, rather than
using the NeoPass dev server!
Oh right, all these cute overrides should be scoped to the page!
I guess we skipped this because we had pulled this out into a
separate stylesheet. Curiously learning more about how Turbo handles
this kind of thing, like that it doesn't *unload* stylesheets that
*leave* the page when you navigate!
I noticed an issue where Turbo-loading between the Your Items page and
the homepage would clobber each other's copy of jQuery, breaking things
sometimes. e.g. go to Your Items, then go to home, then go to Your
Items, and the page's JS fails because `$.fn.live` isn't defined.
I briefly tested the homepage and it didn't seem to actually depend on
any features from the later version of jQuery? At least not that I
noticed! So I'll just downgrade for consistency. (I also tried
upgrading the Your Items page, but there's too much usage of
`$.fn.live`, which is replaced with a notably different syntax in
jQuery 2.0+.)
First one, Turbo reasonably yelled at us in the JS console that we
should put its script tag in the `head` rather than the `body`, because
it re-executes scripts in the `body` and we don't want to spin up Turbo
multiple times!
I also removed some scripts that aren't relevant anymore, fixed a bug
in `outfits/new.js` where failing to load a donation pet would cause
the preview thing to not work when you type (I think this might've
already been an issue?), reworked `item_header.js` to just run once in
the `head`, and split scripts into `:javascripts` (run once in `head`)
vs `:javascripts_body` (run every page load in `body`).
Finally getting around to this because, with Turbo in play, it applies
to subsequent *pages* too, oops! Previously we at least had the blast
radius of this known issue constrained to the item page itself lol :p
Got some questions in Discord about account unlinking, and seeing
people look ahead to other potential integrations. Want to clarify that
unlinking will work here (barring any surprises!), and that there's no
data sharing _just_ yet!
Turbo expects this to exist, for features we don't use! I didn't bother
to check if Turbo has a way to turn this off too, I just said fine lol
Turbo worked in development without this because it only loads files as
needed, whereas in production it blocked the app from starting up. But
you can see the error in development by running `rails zeitwerk:check`,
which attempts to preload everything to make sure it all works, and you
get this:
```
NameError: uninitialized constant ActionCable (NameError)
class Turbo::StreamsChannel < ActionCable::Channel::Base
```
Fixed now!
Someone requested this in Discord, and I figured why not! I'm still
planning to move stuff away from Impress 2020 over time, I just figure
may as well have them more linked while this is still The Reality
This doesn't really matter, I just didn't realize the `.html` part was
optional, and I guess I omitted it here without realizing? But let's
add it for consistency.
The modeling code is slow! I think in production it's being cached, and
tbh I though I had development mode caching turned on over here, but
it's quite evidently _not_ doing it if so, so. Okay! Skip for now.
Oh right, we don't have Rails UJS going on anymore, which is what
handled the confirmation prompts for deleting lists. Turbo is the more
standard modern solution to that, and should speed up certain
pageloads, so let's do it!
Here I install the `turbo-rails` gem, then run `rails turbo:install` to
install the `@hotwired/turbo-rails` npm package. Then I move
`application.js` that's run all on pages but the outfit editor into our
section of JS that gets run through the bundler, and add Turbo to it.
I had to fix a couple tricky things:
1. The outfit editor page doesn't play nice with being swapped into the
document, so I make it require a full page reload instead.
2. Prefetching the Sign In link can cause the wrong `return_to` address
to be written to the `session`. (It's a GET request that does, ever
so slightly, take its own actions, oops!) As a simple hacky answer,
we disallow prefetching on that link.
Haven't fixed up the UJS stuff for confirm prompts to use Turbo yet,
that's next!
This `.gif` format is used in the items list "export to petpage"
feature, as the image URL for items whose URLs are known to contain
blocked words that prevent them from being used in petpages.
But when doing some Rails upgrade long ago, we didn't notice the new
security feature that blocks redirects to other sites without a special
flag being set. It was triggering 500 errors, oops.
Now, we set the flag!
I think this is the more canonical place for stuff like this these days!
It's nice to be able to just say the short name when calling `render`.
Here's the answer I looked up about it: https://stackoverflow.com/a/9892081/107415
My immediate motivation is that I'm looking at creating more About
pages, and thinking about where to put them; I think maybe we trash the
`StaticController`, move these partials out to here, and move terms
into a new `AboutController`?
When we moved more logic into the main app, we made some assumptions
about manifest art that were different than Impress 2020's, in hopes
that they would be More Correct for potential future edge cases.
Turns out, they were actually *less* correct for *current* edge cases!
Chips linked us to a few examples, including this Reddit post:
https://www.reddit.com/r/neopets/comments/1b8fd72/i_dont_think_thats_the_correct_image/
Fixed now!
The Sunday 1:15am time was chosen pretty arbitrarily; I think having it
happen at a "start of week" kind of weekday is clarifying for weekly
tasks, but I chose ":15" mostly to mitigate that thing where cron jobs
all run on the hour at the same time, while still feeling normal :p
I'm starting to port over the functionality that was previously just,
me running `yarn db:export:public-data` in `impress-2020` and
committing it to Git LFS every time.
My immediate motivation is that the `impress-2020` git repository is
getting weirdly large?? Idk how these 40MB files have blown up to a
solid 16GB of Git LFS data (we don't have THAT many!!!), but I guess
there's something about Git LFS's architecture and disk usage that I'm
not understanding.
So, let's move to a simpler system in which we don't bind the public
data to the codebase, but instead just regularly dump it in production
and make it available for download.
This change adds the `rails public_data:commit` task, which when run in
production will make the latest available at
`https://impress.openneo.net/public-data/latest.sql.gz`, and will also
store a running log of previous dumps, viewable at
`https://impress.openneo.net/public-data/`.
Things left to do:
1. Create a `rails public_data:pull` task, to download `latest.sql.gz`
and import it into the local development database.
2. Set up a cron job to dump this out regularly, idk maybe weekly? That
will grow, but not very fast (about 2GB per year), and we can add
logic to rotate out old ones if it starts to grow too far. (If we
wanted to get really intricate, we could do like, daily for the past
week, then weekly for the past 3 months, then monthly for the past
year, idk. There must be tools that do this!)
A few pieces here:
1. Convert all tables to `utf8mb4`+`utf8mb4_unicode_520_ci` strings.
2. Configure that as the server's default.
3. Configure the Rails database connection to use this encoding too.
Came together pretty well, whew! This has been a LONG time coming,
`latin1` is NOT a good charset for the year 2024!
Yeah what the heck, why is this here, we have the migration to drop it
and it's already dropped in production!
Y'know, sometimes I goof a migration and it sets things in a weird
state in development, maybe we didn't recover correctly from that or
something? But idk how we would have goofed this one. Whatever! I've
manually dropped it from my development machine, and it was already
correctly dropped on production, so, go figure!
This reverts commit cc339672b1.
Oh, wait, no, the state the schema file was in *was* correct. I'm not
sure why… huh ok, I need to debug my local state.
Did we end up behind on this migration in development? Idk! Weird! I'm
doing other migration stuff now and noticing this change slipping in
and I'm like. Huh! You should've already been there!
According to our GlitchTip error tracker, every time we deploy, a
couple instances of `Async::Stop` and `Async::Container::Terminate`
come in, presumably because:
1. systemd sends a STOP signal to the `falcon host` process.
2. `falcon host` gives the in-progress requests some time to finish up
3. Sometimes some requests take too long, and so something happens.
(either a timer in Falcon or a KILL signal from systemd, not sure!)
that leads the ongoing requests to finally be terminated by raising
an `Async::Stop` or `Async::Container::Terminate`. (I'm not sure
when each happens, and maybe they happen at different points in the
process? Maybe one happens for the actual long-running ones, vs the
other happens if more requests come in during the meantime but get
caught in the spin-down process?)
4. Rails bubbles up the errors, our Sentry library notices them and
sends them to GlitchTip, the user presumably receives the generic
500 error, and the app can finally close down gracefully.
It's hard for me to validate that this is *exactly* what's happening
here or that my mitigation makes sense, but my logic here is basically,
if these exceptions are bubbling up as "uncaught exceptions" and
spamming up our error log, then the best solution would be to catch
them!
So in this change, we add an error handler for these two error classes,
which hopefully will 1) give users a better experience when this
happens, and 2) no longer send these errors to our logging 🤞❗️
That strange phenomenon where the best way to get a noisy bug out of
your logs is to fix it lmao
Oh rough, when moving an item list entry from one list to another, our
logic to merge their quantities if it's already in that list was just
fully crashing!
That is, moves without anything to merge were working, but moves that
required a merge were raising Internal Server Error 500, because the
`list_id` attribute wasn't present.
I'm not sure why this ever worked, I'm assuming using `list_id` in the
`where` condition would include it in the `select` implicitly in a
previous version of Rails? Or maybe Rails used to have fallback
behavior to run a second query, instead of raising
`MissingAttributeError` like it does now?
Well, in any case, this seems to fix it! Whew!
I don't think people see this very often visually, but it's showing up
in our error logging! The Rails API changed here long ago and we didn't
notice: to render public files, we should use the `file` argument
instead of `template`.
Trying something new and lightweight and more data-controlled!
I also turned down the sample rate for the performance traces feature,
because we hardly use it right now, and Sentry is always getting mad at
us for vastly exceeding our free plan quota—and like, we're not on
Sentry anymore so I imagine we have more wiggle room with that, but I
figure let's turn down the volume anyway, until we decide we want it.
I previously added a warning to the item page, and thought about doing
one here but was sicky and misjudged the complexity and forgot you
don't need to hook into the `knownGlitches` API field to do it! Easy
peasy for a hacky little bug message!
I *think* what I'm observing is that:
1. The zone restrictions are different between these items.
2. The zone restrictions *change* when reloading the page sometimes. (I
assume from remodeling?)
3. The items look very buggy on many pets, because many appearances
seem to expect different zone restrictions than the item actually
has.
I think what this means is:
1. TNT has finally unbound restricted zones from the item level, and
allowed different appearances to have different restrictions. Neat!
2. The API still serves it the same way, as a field on the item.
So I think this means we need to update our schema to reflect the fact
that an item's `zones_restrict` field isn't *really* a property of the
item; it's a property of the combination of the item and the current
body ID.
My gut take here is that maybe this means it's time for the Large
Refactor that I've kinda been interested in for a while, but been
avoiding because of Impress 2020 compatibility issues: instead of a
`body_id` field on assets, and having them directly belong to items,
make an `ItemAppearance` record (closer to how 2020's GQL API modeled
it, I was looking ahead to this possibility!) that's keyed on item and
body ID, and assets belong to *that*.
Then, we could move the zones restriction field onto the
`ItemAppearance` record instead. And then it doesn't really matter to
us how TNT models it internally; whatever we saw is what we use.
(Again, I looked ahead to this in the 2020 app, and tried to use the
`restrictedZones` field on `ItemAppearance` when possible—even though
it secretly just reads directly from the `Item`!)
…but that's a pretty big departure from how things are modeled now, and
isn't something we can just throw together—especially coordinating it
across both apps. I was getting close to being able to shut off 2020
from a *front-facing* perspective (but still keeping a lot of the GQL
endpoints open for the wardrobe-2020 frontend), but I don't think we're
very close to being able to try to target turning off 2020's *backend*
as a prereq to this; or at least, if we do, we should expect that to
take a while. (Counting now, there's still 9 GQL queries—not as many as
I expected tbh, but still quite a few.)
So idk how to sequence this! But for now, let's put out a warning, and
start setting expectations.