Preparing a better endpoint for wardrobe-2020 to use! I deleted the
now-unused swf_assets#index endpoint, and replaced it with an
"appearances" concept that isn't exactly reflected in the database
models but is a _lot_ easier for clients to work with imo.
Note that this was a big part of the motivation for the recent
`manifest_url` work—in this draft, I'm probably gonna have the client
request the manifest, rather than use impress-2020's trick of caching
it in the database! There's a bit of a perf penalty, but I think that's
a simpler starting point, and I have a hunch I'll be able to make up
the perf difference once we have the impress-media-server managing more
of these responsibilities.
Ok so, impress-2020 guesses the manifest URL every time based on common
URL patterns. But the right way to do this is to read it from the
modeling data! But also, we don't have a great way to get the modeling
data directly. (Though as I write this, I guess we do have that
auto-modeling trick we use in the DTI 2020 codebase, I wonder if that
could work for this too?)
So anyway, in this change, we update the modeling code to save the
manifest URL, and also the migration includes a big block that attempts
to run impress-2020's manifest-guessing logic for every asset and save
the result!
It's uhh. Not fast. It runs at about 1 asset per second (a lot of these
aren't cache hits), and sometimes stalls out. And we have >600k assets,
so the estimated wall time is uhh. Seven days?
I think there's something we could do here around like, concurrent
execution? Though tbqh with the nature of the slowness being seemingly
about hitting the slow underlying images.neopets.com server, I don't
actually have a lot of faith that concurrency would actually be faster?
I also think it could be sensible to like… extract this from the
migration, and run it as a script to infer missing manifest URLs. That
would be easier to run in chunks and resume if something goes wrong.
Cuz like, I think my reasoning here was that backfilling this data was
part of the migration process… but the thing is, this migration can't
reliably get a manifest for everything (both cuz it depends on an
external service and cuz not everything has one), so it's a perfectly
valid migration to just leave the column as null for all the rows to
start, and fill this in later. I wish I'd written it like that!
But anyway, I'm just running this for now, and taking a break for the
night. Maybe later I'll come around and extract this into a separate
task to just try this on all assets missing manifests instead!
Ahh, I guess I missed these, I think they're maybe not actually used in
the app is why? cuz they're all default values that are overridden at
the actual call sites. But I ran into it when running `Pet.load` in the
console, and yeah let's just fix 'em up!
This hasn't worked for a while, and I don't know an API off the top of
my head to drop in for it. Let's just delete it for now, and revisit it
later if we want to!
Dang, I'm really wishing I'd opened this sooner cuz I didn't realize it
would be THIS easy!!
The bug was that the `t` method started taking Ruby keyword params
instead of a hash object for `options`, so the syntax changed.
Womp womp!
Really don't know why this wasn't a problem with Apollo (or was it??),
but yeah, don't save when there's a save error!! Then we reset the
mutation state when the outfit state changes.
It's weird to be reading this code and be like. was this not always an
issue? Maybe something in Apollo prevented this? Did we use optimistic
UI or something? Idk?
There's still an issue with it infinitely retrying in an error state
though.
Just sharing this out to gather info, since this might be coming kinda
soon!
I also moved the announcement higher up in the template, because it
gets broken on the user lists page which uses floats quite a bit for
the site header—and tbh I feel like this is better anyway lol.
In the impress-2020 app, we use this to prepopulate certain GraphQL
data into the Apollo cache when SSR'ing a page. We don't do that here,
so, goodbye!
The wardrobe-2020 app had a cute drawer that embeds the item page, but
honestly I don't think it was that valuable, and especially not when it
means we have to basically maintain two item pages lol. Let's decrease
the surface area!
A really really simple change! It works on the item page, the item
index page, item search, the homepage, and the item lists page.
The main reason I avoided this for so long (even before modernizing the
Rails app) was that the ElasticSearch stuff felt like it made it messy?
But now it's pretty simple, and it works in search already cuz I did
that when I implemented item search, so, nice!
Ok cool, I have just not been running any of this since moving out of
impress-2020, but now that we're doing serious JS work in here it's time
to turn it back on!!
1. Install eslint and the plugins we use
2. Set up a `yarn lint` command
3. Set up a git hook via husky to lint on pre-commit
4. Fix/disable all the lint errors!
Rather than letting the fact that the server API models outfits a bit
differently (underscore keys, integer IDs for things), I'd rather
convert it to the familiar field names and expected types!
This came in a few parts!
1. Add meta tags to let us know we're logged in.
2. Install React Query, which has the data-loading sensibilities I like
about Apollo without the GraphQL that has honestly been a drag.
3. Replace the outfit-loading and outfit-saving calls with API calls to
the main app.
4. Update the main app's API calls to use our more flexible data
constructs like "pose".
Would've loved to do this more incrementally, but it's hard to! You
can't split out outfit-loading and outfit-saving, or auth from any of
that, or the state gets all out-of-sorts.
Still, this is a good nugget we've pulled out all-in-all, and one that
people have been asking for! Can maybe look to logged-in item search
soon too, for own/want data?
Moreover, this code is in a bit of a flimsy state anyway: it'll kinda
work if you're logged in at impress-2020.openneo.net, but that wasn't
intentionally engineered, and I'm not sure exactly what circumstances
cause those cookies to send vs not send?
My intent is to clean up to replace all this with login code that
references the main app instead… this'll require swapping out some
endpoints to refer to what the main app has, but I'm hoping that's not
so tricky to do, since the main app does offer basically all of this
functionality already anyway. (That's not to say it's simple, but I
think it's a migration in the right direction!)
I used the new profiler tools on this page, and noticed a lot of
allocations in the Globalize library, which we use for translating
database records. I realized that we were loading all of the fields of
not just all of the items on the page, but all of their translation
records in all locales! We used to scrape data for lots of languages, so
that can be quite a lot!
Unfortunately, Rails's `includes` method to efficiently preload related
records always loads all fields, and simply can't be overridden.
So, in this change we write manual preloading code, to identify the
records we need, load them in big bulk queries, and assign them back to
the appropriate associations. Basically just what `includes` does, but
written out a bit more, to give us the chance to specify SELECT and
WHERE clauses!
It shows up in development always, and if you're logged in as Me
Specifically in production!
I'm using this to poke at memory usage for pages that seem suspicious.
I don't know why our app reliably grows so large in RAM, but my hunch is
that maybe there are some pages that just use a truly large amount to
begin with - and I've learned Ruby doesn't release memory back after
it's GC'd, it just grows the process and keeps the free space to itself
in its own heap!
So I'm just eyeing pages that I know *can* have a lot going on, and
seeing what I find!
We used to do this for weird clever caching tricks that I don't think
were actually very effective. We stopped using this a few months ago,
and now I'm finally cleaning up this supporting code!
Huh, Arel can *sometimes* handle just having an attribute stand in as
"X is true" in a condition, but sometimes gets upset about it. I guess
this changed in Rails since we recently wrote this?
Specifically, item search would crash on "is:nc" (but *not* "is:np"),
saying:
```
undefined method `fetch_attribute' for #<struct Arel::Attributes::Attribute relation=#<Arel::Table:0x0000000109a67110 @name="items", @klass=Item(…), @type_caster=#<ActiveRecord::TypeCaster::Map:0x0000000109a66e90 @klass=Item(…)>, @table_alias=nil>, name="is_manually_nc">
```
The traceback was a bit misleading (it happened at the part where we
merge all the scopes together), but that hinted to me that it working
with an attribute in a place where it expected a conditional. So I
converted the attribute in the `is_nc` scope to a conditional, and made
the matching change in `is_np`, and that fixed it! Ok phew!
The URL anchors were getting like. double-encoded? The `closet[]` part
was encoding as `closet%255B%255D`. Maybe a thing in Rails, where you
need to mark them `html_safe` to insert them in a URL like that?
Well anyway, those URLs are redundant now, I just have it link straight
to the same outfit page as the big link!
Now, like in DTI 2020, opening an outfit will go straight to the editor.
I'm not 100% on whether this is actually like. the superior behavior?
But I think it's good enough, and it's what the wardrobe-2020 code
expects, so let's just roll with it for now!
Ohh I see, I made a mistake converting this from Next.js routing. It's
not that we had a URL search parameter named `outfitId`; it's that if
you were coming from the `/outfits/:outfitId` route, it would use that!
I still haven't gotten the rest of the site to point that route to this
page, but I'll do that in a later change.
Notable things:
- We used to have the parameters in the hash (`#`) part of the URL.
- We used to use the key `outfit=123` instead of `outfitId=123`.
In this change, we add backwards-compatibility for these things, while
still keeping the latest behavior too, with no change to the URLs we
generate!
Looks like the version of Prettier I just installed is v3, whereas our
last run in the impress-2020 repo was with v2. I don't think we had any
special config in that project, I think these are just changes to
Prettier's defaults, and I'm comfortable accepting them! (Mostly seems
like a lot of trailing commas.)
Idk if this used to be different or what, but it looks like the current
behavior is: if you delete a closet list, it'll leave the hangers
present, but Classic DTI would not show them anywhere; but Impress 2020
(until recently) would crash about it.
Now, we use `dependent: :destroy` to delete the hangers when you delete
the list (which I think makes sense, and is different than what I
decided in the past but that's ok, and is what the current behavior
*looks* like to people!), and we add a migration that deletes orphaned
hangers.
The migration also outputs the deleted hangers as JSON, for us to hold
onto in case we made a mistake! I'm also backing up the database in
advance of running this migration, just in case we gotta roll back HARD!
This is an important workflow for people doing art stuff, I'm told! They used to use the Classic DTI broken image UI for this, but now that that's uhh Fully Gone, let's add this more explicitly!
Ah right, the CSS reset only applies in the ScopedCSSReset container, which doesn't work for elements portaled out with the <Portal> component (which a LOT of Chakra components use for things like tooltips etc).
Here, we take advantage of <Portal> having a hardcoded classname .chakra-portal, and applying it to them too!
This was used by the Neopia server to send us the modeling data it requested out-of-band. But now we do all our modeling requests back in-app again, so we don't need this!
Okay, this is a process that idk if it's even been working for a while anyway, I don't think Neopets translates item names anymore?
And it's crashing when I try to model stuff now, so like. yeah ok I'm fine with just skipping this, it's a shame to lose out on potential data going forward but *I think there just isn't data to get anyway*
I think we used this for both conversion to image, and also for CORS stuff when rendering Flash-based previews… let's trash it, I don't want to be growing our hard drive with files I don't think we use anymore!
If I'm wrong and it turns out we do use them for something, then like. hey I'm sure we'll find out soon enough, and it's very recoverable operation.
I hope this doesn't cause problems! But yeah, with Puma doing threading, and maybe switching to Falcon someday to get even better concurrency properties, I feel like this will probably be fine?
And it makes the UX a loootttt better, to be back in the world where all these forms just work, whew.
Oh okay, I was misinterpreting the error: it was that our NEOPETS_URL_ORIGIN secret value isn't the real Neopets.com IP address anymore, so amfphp requests were just plain *always* failing in production. Oops!
I've remove that environment variable from our production config, and now modeling is working in the bulk thing!
Also I'm noticing that we're using puma these days, which does good threading stuff. I think there might be merit to switching over to Falcon because of just how async-y our stuff is, but having 5 threads going is honestly probably good enough that I don't need to worry too much about mutual blocking, and could probably just write stuff to get Neopia out of the picture like *right now*. Neat!
Okay so… I'm worried about this because of Rails whole single-threaded situation, which doesn't really let it handle blocking on external network requests very well.
Ultimately I think we're gonna have to do a clever thing but idk quite what?
I should look into whether like, puma + the new async stuff can enable Rails to be more tolerable about this, and handle a few requests at once, instead of having to have the Neopia server doing it. (Right now, the Neopia server isn't really doing its job quite right, because it depends on the Rails app being *local* to send stuff to it.)
But for now, let's just extend the timeout, cuz it's basically always getting hit in production—because there's currently no other way to do modeling, oops lol
Just find_all_by's that I never cleaned up
Oddly enough, I still got a "neopets seems down" message out of this, idk if that's an actual bug or just sluggishness rn
Okay, right, if we're just using www.neopets.com (like we are for now), it fails on http://www.neopets.com because it triggers a redirect that we don't follow.
So here I 1) change the default to HTTPS, and 2) add HTTPS support to our little RocketAMF lib
Just cleaning up a bit! I'm sure there's more to remove, these were just some clear candidates: old wardrobe code, and stuff in `public` that I just fully don't recognize and don't think is doing anything? (We'll find out if something crashes though lol!)
Oops, this was causing the page to render in a weird zoomed-out way on mobile!
Note that, for most of the site, we intentionally haven't added this tag yet because most of our pages aren't especially responsively-designed; so we _want_ the device's best attempt to work with that, rather than trying to enforce something.
This required a buncha fixes to how SASS scoping works! Needed to add a bunch of imports for stuff that previously would get read from the global scope by being imported *after* the constants and mixins etc.
There's clearly a lot of refactor opportunity here, but I'm not gonna worry about it!!
I wasn't sure what we were actually using it for, turns out it was mostly polyfills for CSS features that are very standard now!
I didn't audit these changes very carefully tbqh because they seemed pretty simple? Fingers crossed!
Eyyy tasty! There were some issues with conflicting styles with the main app, but I think we got it!
Scoping Chakra's CSS reset was a big deal to not accidentally overwrite the app's own styles lol, and we had to solve a specificity problem for that, thanks Aria for the :where tip!! <3
We never had a specific reason why we didn't use the router for this I don't think? Not that I wrote down anyway. Let's just switch it over and see what happens!
I mainly did this as a misdiagnosis of the page reload problem fixed in c162864, but it seems like a good idea to try out anyway!
This I think is why the page was reloading when you try to item search? The failed import was triggering our "hey maybe this is an old module URL that got deleted" code?
We add jsbuilding-rails to get esbuild running in the app, and then we copy-paste the files we need from impress-2020 into here!
I stopped at the point where it was building successfully, but it's not running correctly: it's not sure about `process.env` in `next`, and I think the right next step is to delete the NextJS deps altogether and use React Router instead.
Nice, just turning it on seemed to do all we need for now!
Fair questions to be asked about like, should you be able to look up by username instead of email? But like idk, this feels simpler *and* more solid, to give you feedback on if it's the right email.
In the login case, we save the `return_to` parameter in the session, because login can be a multi-step process.
In the logout case, we just read it directly from the form params.
Note that you *could* end up in a weird scenario where an old return_to value sticks around for a bit? But we have the sense to delete it when we use it on a successful sign-in, and most links to the login page come with a `return_to` param which should reset it. So, you'd have to 1) have started but not finished a sign-in, 2) during the same session, and 3) get to the login page by an unusual means.
Probably fine!
This is a bit more standard, and has the bonus of being compatible with Devise, which is using `flash[:notice]` and so its flashes were coming out unstyled, oops!
Hey nice!!
Note that I removed an account delete button from the settings page. You can still send a DELETE request to the right endpoint to do it, but it's not gonna delete all the associated records, and I wanna think a bit about how to handle that better before exposing that button.
I noticed this was stopping changing your default list visibility bc contact neopets connection can't be empty, so I fixed that!
And then I just decided to scroll through every `belongs_to` relationship and add optional to the ones that jumped out at me lol
A lot of rough edges here (e.g. no styles on the flash messages), but it's working and that's good!!
I tested this by temporarily switching to the production database and logging in as matchu!
Still missing a lot of big features too, like registration, password resets, settings page, etc.
This removes login/logout/session logic for integrating with OpenNeo ID, replacing them with stubs that just redirect to `/?TODO` when you click login, and helpers that act as if you're not logged in.
This gives us a clean slate to plug in new Devise logic to integrate with the `openneo_id` database directly!
No user-facing functionality here yet, just configuring the database connection to work with openneo_id records.
This is a first step in integrating Devise stuff into this app instead of connecting with a weird second app.
My basic testing for this was to temporarily connect to production `openneo_id`, and see `AuthUser.first` correctly return a user!
I had added this many Rails versions ago during the recent upgrade process, because it was in latest Rails but not in the version of Rails I was using when replacing Elasticsearch with MySQL queries. We can remove it now!
lmao I keep forgetting things! note that the negative case of this filter, like the negative case of `fits`, is currently broken because Rails changed the default SQL mode and I didn't notice! We'll need to add a `database.yml` file and set `sql_mode: TRADITIONAL`.
Whew! Seems like a pretty clean one? Ran `rails app:upgrade` and stuff, and made some corrections to keyword arguments for `translate` calls. There might be more such problems elsewhere? But that's hard to search for, and we'll have to see.
This one was pretty straightforward yaay! Main thing was the change from `render file` to `render template` in a couple places, oh and a thing with complex `order()` clauses.
I ran `rails zeitwerk:check`, which eager-loads the app, and it found two problems: `closet_group.rb` doesn't define `ClosetGroup` (cuz it's empty), and I left in a reference to a cache sweeper observer oops. Goodbye!
Rails 5 added new validation on `belongs_to` to ensure the corresponding record exists. In the case of moving to the null list, this shouldn't trigger!
I wish we could flag that specifically `nil` is okay, but other values should be validated? But oh well, this is fine!
Ok so weird little situation, usually Arel will accept an attribute as a param to `order()`, but not when it's in a very specific situation of all of the following:
`Item.joins(:translations).includes(:translations).limit(30).order(Item::Translation.arel_table[:name])`
For some reason, it's all like "hey I can't call `to_sql` on an attribute!", but only in the scenario where all 3 of those other things are present. Weird!
Anyway, explicitly saying `.asc` fixes this. Ok!
Some important little upgrades but mostly straightforward!
Note that there's still a known issue where item searches crash, I was hoping that this was a bug in Rails 4.2 that would be fixed on upgading to 5, but nope, oh well!
Also uhh I just got a bit silly and didn't actually mean to go all the way to 5.2 in one go, I had meant to start at 5.0… but tbh the 5.1 and 5.2 changes seem small, and this seems to be working, so. Yeah ok let's roll!
Some tricks required here to get the dependencies to work out, but we got it!!
Oh also, we move away from the rbenv in Ubuntu's package manager, because it doesn't support more recent Rubies like 2.4.10.
This labeling technique hasn't worked in a long time bc it requires being logged in. These days we just manually label them with the 2020 support tools I think!
Clearing out the Neopets gem should help us manage some gem dep conflicts in the 4.2 upgrade too (I think the nokogiri one gets tricky?)
At one point we piloted a "Camo" service to proxy HTTPS image urls for us, but it doesn't exist anymore.
We already have proxies and stuff for this, so I left `Image` as a placeholder for this, but it's not working yet!
This also deletes our final reference to the Addressable gem, so we can remove it!
I don't think these work anymore, and our volunteers get new items into the db fast anyway, Impress 2020 is doing better spidering these days. And then we get to remove the cron job `whenever` gem!
Using `s3_path` and stuff made it sound like we were still referencing the original Amazon S3 images - but actually our new asset proxy just uses the same path structure, and we didn't change anything about it.
Oh also I deleted an after_conversion method that isn't used anymore, forgot about that!
We've already swapped out the backend for this stuff to Impress 2020, so the resque task and the broken image report UI aren't actually relevant anymore. Delete them!
This helps us delete Resque soon too.
Idk this one might actually be a bit of a pain to load? But I'd want to optimize it differently anyway, and there's overhauls we're already planning to do here.
Huh! This cache key seemed to only be referenced in checks and expirations, but was never actually used! So I guess we've been loading the modeling predictions every time for a while huh??
We'll get smarter about that someday, but anyway, that lets us delete our Item resque tasks and ItemObserver!
Again I'm just not convinced of the perf on this, and it enables us to delete some whole infra over it, we can improve it another time if it's useful to!
Just removing some caching and the expiration of it! There's still more superfluous(?) caching on the item page to audit, but these seem a bit more sensible about avoiding loading extra data.
In the interest of clearing out Resque, I'm just gonna remove a lot of our more complex caching stuff, and we can do a perf pass for things like big item list pages once everything's upgraded. (I'm hopeful that the upgrades themselves improve perf; and if not, that some improved sensibilities 10 years later can find simpler approaches.)
We uninstalled Flex, our Elasticsearch gem, to replace item search with direct DB queries; but I forgot these calls, oops!
I also kinda want to see about deleting the resque tasks altogether, since I'm not sure how to get Resque installed on latest rails bc there seems to be a conflict over the version of Rack? And it'd be nice to get rid of the complexity if we can.
Back in the day, `all` would immediately load up a query into an array, but now I think it's an alias for what `scoped` used to be: a relation that contains everything.
I want to test some logged-in stuff, but the whole openneo_id app is a mess to integrate with (and I want to eliminate it down the line anyway), so here's a simple hacky thing that just gets you into a test user for development!
Not being a subquery is better! I realized later that a LEFT JOIN would probably do it even betterer? with like `HAVING count(x) = 0`? but the `left_outer_joins` method doesn't seem to be in Rails 4, and I don't want to do stringy joins, so this is fine for now!
Right, previously we were querying "has *at least one asset* that is not in zone X" instead of "has NO assets that are in zone X".
I don't know a fast way to query for that, this will have to do for now!
Not doing the tricks with `is_positive` anymore, instead just calling different functions altogether at the call site.
Also, instead of classes, I feel like this is a lot more concise to just write as class methods that create certain instances of a trivial `Filter` data class. Without the tricks of `is_positive` in play, the value of classes goes way down imo.
Ohh ok, without this change all of our `scope`s were just immediately evaluating the argument and fetching _all_ such matching records immediately, instead of waiting to actually be called. This led to bugs like `pet_type.as_json` returning ALL pet states in the whole db, because the `PetState.emotion_order` scope was being treated as a single predefined query, rather than a query fragment to merge into the current context.
This also explains what happened in 724ed83: that's why things before the scope in the query were being ignored.
lol again this is hard to test so uhh I hope this didn't break it all!! though tbh I feel like we removed this feature or something anyway? idk it stopped working in some way
Tbh I'm not 100% sure this is a fix, I'm not sure what `haml_concat` was doing here, and the page is still crashing so it's hard to say. But fingers crossed!
Idk why, but when the `select` was the first thing in the query, it was getting ignored. I wonder if there's something about the `object_assets` scope that I'm not understanding that's overwriting it? Or the `joins`? But whatever, this works, I'm not worried about it for now!
The controller was like "oh yeah we have that cached" (from previous renders of the app on Rails 3 I think?), but the view disagreed, bc it was appending a template digest to the cache key. That's a smart feature, but not compatible with how we skip queries in the controller, so disable it for now!
We'll need to replace the item search query stuff with direct MySQL queries, but that's not ready yet bc the app still isn't booting, so we're committing this in a known broken state for now!
Rather than figure out how to upgrade the Stripe gem to be compatible with future Rails, I'd rather just delete the references, since it's currently unused.
I'm not so bold as to go in and fully trash all our donation code; I just want to ensure we're not sending people down broken codepaths, and that if they reach them, the error messages are clear enough.
We set up `impress-asset-images.openneo.net` to redirect to the right asset, without needing to depend on AWS anymore for HTML5-converted items!
Our quick fix for this: always serve `has_image: true` to the frontend, so it always tries to use the image, regardless of whether we've marked it as converted in the database. (We've turned off the converters too!)
Oh, yeah, shit, okay, when we set `self.url` like that, it's supposed to be the _canonical_ URL for the SWF, not our proxied one—this is the URL that's gonna go in the database.
We do proxying late in the process, like when we're actually setting up to download something, but for just referencing where the asset lives, we use `images.neopets.com`.
In this change, we revert the use of `NEOPETS_IMAGES_URL_ORIGIN`, but we _do_ update this to `https` for good measure. (We currently have both HTTP and HTTPS urls in the database, I guess neopets.com started serving different URLs at some point, this is probably the future! And anything interpreting these URLs will need to handle both cases anyway, unless we do some kind of migration update situation thing.)
We're migrating the incorrect assets with the following query (with the limit changed to match the number we currently see in the DB, just as a safety check):
```
UPDATE swf_assets SET url = REPLACE(url, 'http://images.neopets-asset-proxy.openneo.net', 'https://images.neopets.com') WHERE url LIKE 'http://images.neopets-asset-proxy.openneo.net%' ORDER BY id LIMIT 2000;
```
Okay, like in the previous commit, we're dealing with forced HTTPS, on a server that isn't going to cooperate with our dependencies' HTTPS version. And this time, I don't think there's a secret origin server that will accept `http://` requests for us.
Thankfully, we have the perfect hack in our back pocket: our own pre-existing images.neopets.com proxy server! I set the following in our secret `.env` file, and now we're good:
```
NEOPETS_IMAGES_URL_ORIGIN=http://images.neopets-asset-proxy.openneo.net
```
Oops, neopets.com finally stopped accepting `http://` connections, so our AMFPHP requests stopped working! And our current dependencies make it hard to make modern HTTPS requests :(
Instead, we're doing this quick-fix: we have a connection who knows the internal address for the Neopets origin server behind their CDN, which *does* still accept `http://` requests!
So, when `NEOPETS_URL_ORIGIN` is specified in the secret `.env` file (not committed to the repository), we'll use it instead of `http://www.neopets.com`. However, we still have that in the code as a fallback, just to be a bit less surprising to some theoretical future dev so they can see the real error message, and to self-document a bit of what that value is semantically doing! (The documentation angle is more of why it's there, rather than an actual expectation that any actual person in the future will run the code and get the fallback.)
There's a bug on Neopets.com that breaks links and images for *.openneo.net, on petpages specifically.
So, we've registered a new domain, and we're using that to serve outfit images now.
I'm a bit hesitant to add a new domain name to our like, permanent URL surface area, lol… but I'm not hearing back from TNT, and I already closed the doors on S3, so… here we are, whatever 😅
TNT started using HTTPS URLs! And our old Ruby version (lol 😬) still requires explicit invocation to perform SSL during a request, so requests were failing!
Now, we explicitly build the `Net::HTTPS` object, and turn on `use_ssl` if it's an HTTPS URL! (The shorthand invocation didn't seem to have an option for this, that I could find!)
Here, we turn off the hooks that enqueue outfit image updates, and we disconnect the `OutfitImageUploader` that manages uploaded S3 URLs, instead replacing it with an `image` method that simulates the same basic API.
This should cause _all_ views on Classic DTI to use the new outfit URLs. Some notable examples:
- The user's Outfits page
- The donations page
- The outfit page, and its sharing metadata
I hope I didn't miss anything in the views that will make this crash stuff! I tested the new model code in the Rails console, and checked it against invocations that I noticed when searching the codebase for `outfit.image` 🤞
Oops, right, I meant to use the new `impress-outfit-images.openneo.net` host for this! It works just fine from `impress-2020.openneo.net` as the backing source right now, but I want these semi-permanent URLs to be a bit more decoupled.
As part of our project to get off S3 and dramatically reduce costs, we're gonna start serving outfit images that Impress 2020 generates, fronted by Vercel's CDN cache! This should hopefully be just as fast in practice, without requiring an S3 storage cost. (Outfits whose thumbnails are pretty much unused will be evicted from the cache, or never stored in the first place—and regenerated back into the cache on-demand if needed.)
One important note is that the image at the URL will no longer be guaranteed to auto-update to reflect the changes to the outfit, because we're including `updated_at` in the URL for caching. (It also isn't guaranteed to _not_ auto-update, though 😅) Our hope is that people aren't using it for that use case so much! If so, though, we have some ways we could build live URLs without putting too much pressure on image generation, e.g. redirects 🤔
This change does _not_ disable actual outfit generation, because I want to keep that running until we see these new URLs succeed for folks. Gonna wait a bit and see if we get bug reports on them! Then, if all goes well, we'll stop enqueueing outfit image jobs altogether, and maybe wind down some of the infrastructure accordingly.
Oops, if you saved `SwfAsset` outside of modeling code, the `item` field would be empty, and so `item.body_specific?` wouldn't happen.
This would trigger when you even just report a broken image!
Now, we always run the SQL query to check for that flag.
Okay so, userlookup stuff hasn't worked in years, because it requires a login now.
But apparently, somewhere recently, the code inside our `neopets` gem started hard crashing, because of assumptions we made about the document we'd get back.
I'm not sure why it only recently started crashing? or if I'm even necessarily right about that?
But anyway, I'm just doing the easiest safest (🤞🏻) change possible: being more generous with the errors we swallow.
Test Plan:
Deploy and cross fingers.
Okay, fine, finally making this controllable from the db without requiring a deploy :P Setting this new field will cause `item.special_color` to return the corresponding color. This mainly affects what we show on the item page, and what colors we request for modeling on the homepage.
We recently flipped the switch for various hosts to force HTTPS, yay! This includes `neopia.openneo.net`.
However, I forgot to change the URL scheme in this file. This meant that the form submit from the homepage would go to `http://neopia.openneo.net/`, then redirect to `https://neopia.openneo.net/`, but only preserve the form data in certain browsers. This change should fix that!
Note: This probably breaks the dev environment, where we don't have a cert for `https://neopia.dev.openneo.net`. I'll fix that some other time!
Interestingly, these items *are* correctly detecting their special
color on the homepage for model progress. So, we *do* have the ability
to detect this. But I don't have good item data locally, so it would
be hard to test this, so I'm just gonna go with the cheap solution
again, sorry XP
In bfd825d, we refactored the "is item body-specific?" check. In the process, we dropped the check for the manual override flag, `explicitly_body_specific?`. Not sure if it was an accident or if I was just _so_ confident that it was gonna work :P In any case, re-add the check!
Okay, surprise, the bug was unrelated to Camo config (though I'm glad I cleaned
that up anyway :P). We now, at a low level, serve a placeholder image for item
thumbnail URL if, for some reason, we don't have a good thumbnail URL on hand.
One time I did a thing called Camo to try to get our HTTPS pages working,
because images.neopets.com not supporting HTTPS is crazy >_> I've diasbled it
these days, but it had debug behavior to append `?NO_CAMO_CONFIG` to all
proxied URLs when Camo was not configured.
When an item had no thumbnail URL for some reason (mall spider needs fixing,
maybe?), this caused Rails to try to map that empty string into the path
`/assets/?NO_CAMO_CONFIG`, which made Rails complain that it was trying to load
an asset that doesn't exist. This is probably a sign that using `image_tag` for
URLs that *should* be external URLs, but aren't strictly *guaranteed* to be, is
unwise - but, for now, I've just disabled that behavior. I hope Rails has a
better escape hatch for the empty string :P
Ooh, this one was nasty, and only one symptom ever got noticed:
1. Pick "Occupies: Collar" in Advanced Search.
You get the text query "occupies:necklace".
2. And, if you try to do "occupies:collar" even in text-based search,
you *also* get the results from "occupies:necklace" mixed in with
the correct results.
The trick is that, in Spanish, zone 24 (necklace) is named "collar",
as is zone 27 (collar). Not sure what to do for Spanish, but this
issue also leaked into English: we really don't want English to return
results for Spanish-named zones.
This is a tricky problem, though, because it'd be nice for es users
to be able to type "occupies:hat". I think we'll have to do the quick
fix for now, though, and just only interpret the query in the current
locale.
Turns out ~22% of our users initially land on a trade list.
We like to keep the campaign off the pages where space is at a
premium, so we try to whitelist it to major landing pages in order
to avoid accidentally creating a bad experience on some page :)
I've been doing this manually via email for a long time,
since building new stuff in the logged-in world was a pain in the old env.
But now here we are! Finally, finally :)
The "fits:8-bit-chomby" search filter was being read as color=8, species=bit.
Now, we split from the right-hand side of the filter instead.
Still a problem for anyone who explicitly types the Spanish/Portuguese
ordering of "fits:chomby-8-bits", but I'm okay with this cheap fix, since
I bet literally nobody has done that in the past month, if ever :P
In particular, outfit_id == 0 would cause outfit_id? to
return false, so it wouldn't run the outfit presence
validation, so /donations/features would try to load
outfit #0 and fail.
Also, flash[:alert] instead of flash[:error] when outfit_id
is bad.
Mostly this was because of Mac's bug where you, in Firefox:
1. Load a real pet with the default appearance (probs Happy Male) into the wardrobe
2. Use a search query containing ":"
3. See the pet biology vanish before your eyes!
I observed that this only happened in cases where the biology stuff in the URL
wasn't replaced by a state number, so figured that it'd probably be good to do
that anyway because biology fields are annoying, and it for some reason seemed
to fix the bug. (Something to do with query parsing and stupid internal state
issues, probably. Ugh. One of these days, I'll re-rewrite all this :P)
Turns out we need to assign closeted to actual items, not
the item proxies, since that's what we check against. (I
would've thought they're backed by the same instance of
the item anyway, but, whatever. The fix works :P)
It turns out that some pets for seemingly nonstandard colors have the
standard body type anyway, and vice-versa. This implies that we should
stop relying on a color's standardness, but, for the time being, we've
just revised the prediction model:
Old model:
* If I see a body_id, I find the corresponding color_ids, and it's wearable
by all pet types with those color_ids.
New model:
* If I see a body_id,
* If it also belongs to a basic pet type, it's a standard body ID.
* It therefore fits all pet types of standard color (if there's
more than one body ID modeled already). (Not really,
because of weird exceptions like Orange Chia. Should that be
standard or not?)
* If it doesn't also belong to a basic pet type, it's a nonstandard
body ID.
* It therefore only belongs to one color, and therefore the item
fits all pet types of the same color.
We used get_multi when preparing the proxies to decide which to
load from the database, but then sent multiple get requests to
Memcache to re-fetch the same data from that get_multi. Silly!
Use the data that's already stored on the proxy anyway.
Right now we're spending too much time expiring cache keys when
getting contributions. The longer-term fix is to move it to a
background task, but it's good to restrict deletions only to usable
locales rather than all the ones that Rails theoretically supports.
Fun bug! If you edit an outfit, but the outfit loads before the
closet items do, then we clone the outfit to give it its new
identity and therefore forget about its item load callbacks.
Now we have a cheap hack to forward item load data to the
outfit's clones. Hooray! Hope this doesn't break tons of things!
That is, Neopets.com will raise an error when you try to equip a
Kyrii Mage Cape to a pet who's already wearing Ceremonial Shenkuu
Warrior Armour, since the armor restricts the Collar zone which
the cape occupies. DTI, however, would just hide the Collar zone,
as if it were biology. Now, however, DTI will unwear the armor
when you wear the cape, and vice-versa (despite the restriction
relationship being one-directional).
Some lame benchmarking on my box, dev, cache classes, many items:
No proxies:
Fresh JSON: 175, 90, 90, 93, 82, 88, 158, 150, 85, 167 = 117.8
Cached JSON: (none)
Fresh HTML: 371, 327, 355, 328, 322, 346 = 341.5
Cached HTML: 173, 123, 175, 187, 171, 179 = 168
Proxies:
Fresh JSON: 175, 183, 269, 219, 195, 178 = 203.17
Cached JSON: 88, 70, 89, 162, 80, 77 = 94.3
Fresh HTML: 494, 381, 350, 334, 451, 372 = 397
Cached HTML: 176, 170, 104, 101, 111, 116 = 129.7
So, overhead is significant, but the gains when cached (and that should be
all the time, since we currently have 0 evictions) are definitely worth
it. Worth pushing, and probably putting some future effort into reducing
overhead.
On production (again, lame), items#index was consistently averaging
73-74ms when super healthy, and 82ms when pets#index was being louder
than usual. For reference is all. This will probably perform
significantly worse at first (in JSON, anyway, since HTML is already
mostly cached), so it might be worth briefly warming the cache after
pushing.
That is, once we get our list of IDs from the search engine, only
fetch records whose JSON we don't already have cached.
It's simpler here to use as_json, but it'd probably be even faster
if I figure out how to serve a plain JSON string from a Rails
controller. In the meantime, requests of entirely cached items
are coming in at about 85ms on average on my box (dev, cache
classes, many items), about 10ms better than the last
iteration.
Specifically, we were running a find_or_initialize_by for all 50
hangers, which isn't great. Collation logic is more complicated this
way, but query count is way lower.
Additionally, compare against hanger.list_id instead of hanger.list,
because hanger.list will fire a query if list_id is non-nil, but that
nil ID tells us everything we needed to know, anyway.
Bug report that this resolves:
...However, when I was using the "Import from SDB" tool just a few
minutes ago, it ended up adding EVERY neocash item into the "Not
In A List" section, regardless if I already that item imported
into my "Your Items". So, basically.. I had duplicates of
everything and it would not allow me to move them around into
separate catergories or anything. I know that every other time i've
used the import tool, it would only add NEW items that are not
currently already in my lists yet.
Most of the reasoning is documented in the big comment. In short, we tried
to solve the problem with caching, but the caching should hardly be necessary
now that the bottleneck should be fixed. We'll see on production if it
actually solves the whole problem, but I've confirmed in the console that
redefining this function makes random_basic_per_species (as called during
rendering) a ton faster. And this way we keep our randomness, woo!
This is a surprisingly huge performance gain. On my testing (with
cache_classes set to true to also cache templates), this sped up
closet_hangers#index rendering by a factor of 2 when there were a
significant number of items. Cool beans.
I think we can even hold off on the individual hanger caching now:
we've made the closet hanger partial tons faster by moving forms out
of them and doing this cache check earlier. I'm expecting significant
performance gains both here and on items#index (though less so there).
I'll deploy and see how much it helps in production; if not enough, we
can look at the layered caching of hangers, lists, groups, full pages,
etc.
So glad we don't *have* to move to a pagination model!
We lose no-JS support, which I kinda miss, but caching is gonna be more
important down the line. Delete form moves next, then we cache.
CSRF token changes: it looks like, by setting a data attribute in AJAX, I
was overwriting the CSRF token. I don't remember it working that way, but
now we use beforeSend to add the X-CSRF-Token header instead, which is nicer,
anyway. The issue might've been something else, but this worked :/
The CSS was also not showing the loading ellipsis properly. I think that's a
dev-only issue in how live assets are being served versus static assets, but
may as well add UTF-8 charset directives everywhere, anyway.
items#show has been very slow recently, and I think it's because there's a lot
of querying to be done. Another option would have been to attempt to
short-circuit Item#supported_species if not body specific, but that would
still leave us with 1s load times for body specific items, which is not
satisfactory. The short-circuiting might still be worth doing, but probably
not now.
I'm also not sure that this is actually the core performance problem, but
we'll see. It definitely helped on the dev server: items#show took about
200ms on item pages where everything but species images were cached, then
took about 30ms on subsequent loads. Looking like a good candidate.
TNT has started serving half-removed Corridor of Chance effects:
it has the asset ID and URL and all, but the zone ID is blank.
RocketAMF has patched the empty key bug, and now we ignore assets
associated with empty keys.
Specifically, the Tyrannian Meerca Spear is a pb item that contains
"pea", so its item page is only willing to show a Pea Chia. Now,
a color must be a whole word in the item name for special color
determination to work.
A few key changes:
* Don't reload the whole pet 8 times!! Sooo many bad things
happen, including redundant lookups of everything else and
too many item saves and reindexes. Instead, fetch the item
data, apply it to the items, and then save the items (once
each!)
* Updated my branch of globalize3 to be even better at avoiding
redundant queries when saving. Woo.
* Last realization: wrapping all the item saves in a single
transaction works wonders. COMMIT seems to have high overhead,
so doing only one took it from 50ms * 10 or whatever to 60ms.
Good stuff.
We were joining to the translations table to sort records
alphabetically, but then it sorted by *all* of the translations in
some strange way. Now use with_translations to restrict the join
to the current locale.
In particular, pet#load was handling locale-switching itself, but wasn't
switching back to original locale on error. We could've used a rescue
block, but, when I18n.with_locale is so cool, may as well use it fully.
We originally had a regression on name-matching, where, among
other issues, `straw hat` returned items containing both "straw"
and "hat", which isn't really helpful behavior since we're sorting
alphabetically. Now, `straw hat` behaves as expected.
Additionally, "phrases like these" behave as expected, too.
Confirmed features:
* Output (retrieval, sorting, etc.)
* Name (positive and negative, but new behavior)
* Flags (positive and negative)
Planned features:
* users:owns, user:wants
Known issues:
* Sets are broken
* Don't render properly
* Shouldn't actually be done as joined sets, anyway, since
we actually want (set1_zone1 OR set1_zone2) AND
(set2_zone1 OR set2_zone2), which will require breaking
it into multiple terms queries.
* Name has regressed: ignores phrases, doesn't require *all*
words. While we're breaking sets into multiple queries,
maybe we'll do something similar for name. In fact, we
really kinda have to if we're gonna keep sorting by name,
since "straw hat" returns all hats. Eww.
For example, the Meerca Maid Tray is a foreground item, so the SWF is marked
as compatible with all body types, but the item itself is clearly marked as
Meercas-only. items#show reflected this properly, but the swf_assets#index
call that the wardrobe uses ignored item.species_support_ids.
So, /bodies/:body_id/swf_assets.json?item_ids[]=... was deprecated in favor
of /pet_types/:pet_type_id/items/swf_assets.json?item_ids=[]..., which is
much like the former route but, before loading assets, also loads the pet
type and items, then filters the items by compatibility, then only loads
assets for the compatible items.
This one was actually pretty darn clever - nobody's abused it, but
I was reading a blog post where someone described this type of
issue, I realized it was a brilliant attack, and then realized
DTI was vulnerable. Oops. Thanks for the solution, Jamie!
http://jamie-wong.com/2012/08/22/what-i-did-at-khan-academy/#XSS+Fix
Many forms on the site contain a hidden authenticity_token field,
unique to each visitory. If a user submits a request with an
invalid authenticity_token, Rails assumes that it's a CSRF attempt
and logs out the user. So, if we happen to cache those forms with
authenticity_token fields, all users who use that form will have
the same authenticity_token (valid for only the first user who
saw the form, invalid for everyone else), and all requests made
through that form will log out the user. Bad news.
So, we stopped caching those forms. Yay!
Use the ImageMagick flatten command to generate the output all at
once instead of compositing each layer individually, and download
the layers in parallel. On my box, saving roopal27 five times took
a total of 30 seconds before, whereas now it takes 7 seconds. I
expect it to be even better on the production box, where latency
is even lower.
Sharing pane works, everything is great for guests. Logged in
users are on the way, since right now Share Outfit re-saves
anonymously rather than showing sharing data for the existing
outfit.
For example, the site was throwing a 500 error when loading pets
belonging to frozen users. Instead, we'll now rescue that
Neopets::User::AccountDisabledError and ignore it, since it's not
*vital* that we load gender/mood data from this pet; we can still
proceed to load its customization data without it.
The "Abominable Snowball Winter Onesie" can get blocked for including the string " On".
So, we meant to filter that to " O<b></b>n" so that the filter wouldn't return that false
positive on an XSS attempt, but were accidentally filtering it to " o<b></b&;gtn".
Fixed :)
So it turns out this was just one of those things I forgot to fix
the big database restructure came along: we were comparing
swf_asset.remote_id against parents_swf_assets.swf_asset_id, which
are two different identifiers entirely. Now using swf_asset.id,
so fixed :)
At first I thought this was an error in the data migration process when moving SWF assets
to having their own unique IDs, but then realized that the query for a pet state's SWFs
didn't include the (parent_type = 'Item') condition. Oops. Turns out, I only connected the
items to parent_swf_asset_relationships polymorphically. Pet states were still doing it the
hackish way. Set the pet states to use the lovely polymorphic relationship and we're good
to go.
After changing the database structure, we lost the feature where, once we discover
new assets for an item for a given body ID, we disconnect previously connected
assets. This commit reinstates that feature.
Due to a silly slip-up involving Javascript object literal syntax, we were
sending {csrf_param: "token"} instead of {authenticity_token: "token"} with
wardrobe AJAX requests. This would cause users to be auto-logged-out for
failing to provide a proper token. Oops.
Lots of scary bugs were being caused by the fact that the possibly-duplicate Neopets ID
was being treated as an SWF's real primary key, meaning that a save meant for object swf
number 123 could be saved to biology swf number 123. Which is awful.
This update gives SWFs their own unique internal ID numbers. All external lookups still use
the remote ID and the type, meaning that the client side remains totally unchanged (phew).
However, all database relationships with SWFs use the new ID numbers, making everything
cleaner. Yay.
There are probably a few places where it would be appropriate to optimize certain lookups
that still depend on remote ID and type. Whatever. Today's goal was to remove crazy
glitches that have been floating around like mad. And I think that goal has been met.