In the interest of clearing out Resque, I'm just gonna remove a lot of our more complex caching stuff, and we can do a perf pass for things like big item list pages once everything's upgraded. (I'm hopeful that the upgrades themselves improve perf; and if not, that some improved sensibilities 10 years later can find simpler approaches.)
We uninstalled Flex, our Elasticsearch gem, to replace item search with direct DB queries; but I forgot these calls, oops!
I also kinda want to see about deleting the resque tasks altogether, since I'm not sure how to get Resque installed on latest rails bc there seems to be a conflict over the version of Rack? And it'd be nice to get rid of the complexity if we can.
Back in the day, `all` would immediately load up a query into an array, but now I think it's an alias for what `scoped` used to be: a relation that contains everything.
Not being a subquery is better! I realized later that a LEFT JOIN would probably do it even betterer? with like `HAVING count(x) = 0`? but the `left_outer_joins` method doesn't seem to be in Rails 4, and I don't want to do stringy joins, so this is fine for now!
Right, previously we were querying "has *at least one asset* that is not in zone X" instead of "has NO assets that are in zone X".
I don't know a fast way to query for that, this will have to do for now!
Not doing the tricks with `is_positive` anymore, instead just calling different functions altogether at the call site.
Also, instead of classes, I feel like this is a lot more concise to just write as class methods that create certain instances of a trivial `Filter` data class. Without the tricks of `is_positive` in play, the value of classes goes way down imo.
Ohh ok, without this change all of our `scope`s were just immediately evaluating the argument and fetching _all_ such matching records immediately, instead of waiting to actually be called. This led to bugs like `pet_type.as_json` returning ALL pet states in the whole db, because the `PetState.emotion_order` scope was being treated as a single predefined query, rather than a query fragment to merge into the current context.
This also explains what happened in 724ed83: that's why things before the scope in the query were being ignored.
Idk why, but when the `select` was the first thing in the query, it was getting ignored. I wonder if there's something about the `object_assets` scope that I'm not understanding that's overwriting it? Or the `joins`? But whatever, this works, I'm not worried about it for now!
We'll need to replace the item search query stuff with direct MySQL queries, but that's not ready yet bc the app still isn't booting, so we're committing this in a known broken state for now!
We set up `impress-asset-images.openneo.net` to redirect to the right asset, without needing to depend on AWS anymore for HTML5-converted items!
Our quick fix for this: always serve `has_image: true` to the frontend, so it always tries to use the image, regardless of whether we've marked it as converted in the database. (We've turned off the converters too!)
Oh, yeah, shit, okay, when we set `self.url` like that, it's supposed to be the _canonical_ URL for the SWF, not our proxied one—this is the URL that's gonna go in the database.
We do proxying late in the process, like when we're actually setting up to download something, but for just referencing where the asset lives, we use `images.neopets.com`.
In this change, we revert the use of `NEOPETS_IMAGES_URL_ORIGIN`, but we _do_ update this to `https` for good measure. (We currently have both HTTP and HTTPS urls in the database, I guess neopets.com started serving different URLs at some point, this is probably the future! And anything interpreting these URLs will need to handle both cases anyway, unless we do some kind of migration update situation thing.)
We're migrating the incorrect assets with the following query (with the limit changed to match the number we currently see in the DB, just as a safety check):
```
UPDATE swf_assets SET url = REPLACE(url, 'http://images.neopets-asset-proxy.openneo.net', 'https://images.neopets.com') WHERE url LIKE 'http://images.neopets-asset-proxy.openneo.net%' ORDER BY id LIMIT 2000;
```
Okay, like in the previous commit, we're dealing with forced HTTPS, on a server that isn't going to cooperate with our dependencies' HTTPS version. And this time, I don't think there's a secret origin server that will accept `http://` requests for us.
Thankfully, we have the perfect hack in our back pocket: our own pre-existing images.neopets.com proxy server! I set the following in our secret `.env` file, and now we're good:
```
NEOPETS_IMAGES_URL_ORIGIN=http://images.neopets-asset-proxy.openneo.net
```
Oops, neopets.com finally stopped accepting `http://` connections, so our AMFPHP requests stopped working! And our current dependencies make it hard to make modern HTTPS requests :(
Instead, we're doing this quick-fix: we have a connection who knows the internal address for the Neopets origin server behind their CDN, which *does* still accept `http://` requests!
So, when `NEOPETS_URL_ORIGIN` is specified in the secret `.env` file (not committed to the repository), we'll use it instead of `http://www.neopets.com`. However, we still have that in the code as a fallback, just to be a bit less surprising to some theoretical future dev so they can see the real error message, and to self-document a bit of what that value is semantically doing! (The documentation angle is more of why it's there, rather than an actual expectation that any actual person in the future will run the code and get the fallback.)
There's a bug on Neopets.com that breaks links and images for *.openneo.net, on petpages specifically.
So, we've registered a new domain, and we're using that to serve outfit images now.
I'm a bit hesitant to add a new domain name to our like, permanent URL surface area, lol… but I'm not hearing back from TNT, and I already closed the doors on S3, so… here we are, whatever 😅
TNT started using HTTPS URLs! And our old Ruby version (lol 😬) still requires explicit invocation to perform SSL during a request, so requests were failing!
Now, we explicitly build the `Net::HTTPS` object, and turn on `use_ssl` if it's an HTTPS URL! (The shorthand invocation didn't seem to have an option for this, that I could find!)
Here, we turn off the hooks that enqueue outfit image updates, and we disconnect the `OutfitImageUploader` that manages uploaded S3 URLs, instead replacing it with an `image` method that simulates the same basic API.
This should cause _all_ views on Classic DTI to use the new outfit URLs. Some notable examples:
- The user's Outfits page
- The donations page
- The outfit page, and its sharing metadata
I hope I didn't miss anything in the views that will make this crash stuff! I tested the new model code in the Rails console, and checked it against invocations that I noticed when searching the codebase for `outfit.image` 🤞
Oops, right, I meant to use the new `impress-outfit-images.openneo.net` host for this! It works just fine from `impress-2020.openneo.net` as the backing source right now, but I want these semi-permanent URLs to be a bit more decoupled.
As part of our project to get off S3 and dramatically reduce costs, we're gonna start serving outfit images that Impress 2020 generates, fronted by Vercel's CDN cache! This should hopefully be just as fast in practice, without requiring an S3 storage cost. (Outfits whose thumbnails are pretty much unused will be evicted from the cache, or never stored in the first place—and regenerated back into the cache on-demand if needed.)
One important note is that the image at the URL will no longer be guaranteed to auto-update to reflect the changes to the outfit, because we're including `updated_at` in the URL for caching. (It also isn't guaranteed to _not_ auto-update, though 😅) Our hope is that people aren't using it for that use case so much! If so, though, we have some ways we could build live URLs without putting too much pressure on image generation, e.g. redirects 🤔
This change does _not_ disable actual outfit generation, because I want to keep that running until we see these new URLs succeed for folks. Gonna wait a bit and see if we get bug reports on them! Then, if all goes well, we'll stop enqueueing outfit image jobs altogether, and maybe wind down some of the infrastructure accordingly.
Oops, if you saved `SwfAsset` outside of modeling code, the `item` field would be empty, and so `item.body_specific?` wouldn't happen.
This would trigger when you even just report a broken image!
Now, we always run the SQL query to check for that flag.
Okay so, userlookup stuff hasn't worked in years, because it requires a login now.
But apparently, somewhere recently, the code inside our `neopets` gem started hard crashing, because of assumptions we made about the document we'd get back.
I'm not sure why it only recently started crashing? or if I'm even necessarily right about that?
But anyway, I'm just doing the easiest safest (🤞🏻) change possible: being more generous with the errors we swallow.
Test Plan:
Deploy and cross fingers.
Okay, fine, finally making this controllable from the db without requiring a deploy :P Setting this new field will cause `item.special_color` to return the corresponding color. This mainly affects what we show on the item page, and what colors we request for modeling on the homepage.
Interestingly, these items *are* correctly detecting their special
color on the homepage for model progress. So, we *do* have the ability
to detect this. But I don't have good item data locally, so it would
be hard to test this, so I'm just gonna go with the cheap solution
again, sorry XP
In bfd825d, we refactored the "is item body-specific?" check. In the process, we dropped the check for the manual override flag, `explicitly_body_specific?`. Not sure if it was an accident or if I was just _so_ confident that it was gonna work :P In any case, re-add the check!
Okay, surprise, the bug was unrelated to Camo config (though I'm glad I cleaned
that up anyway :P). We now, at a low level, serve a placeholder image for item
thumbnail URL if, for some reason, we don't have a good thumbnail URL on hand.
One time I did a thing called Camo to try to get our HTTPS pages working,
because images.neopets.com not supporting HTTPS is crazy >_> I've diasbled it
these days, but it had debug behavior to append `?NO_CAMO_CONFIG` to all
proxied URLs when Camo was not configured.
When an item had no thumbnail URL for some reason (mall spider needs fixing,
maybe?), this caused Rails to try to map that empty string into the path
`/assets/?NO_CAMO_CONFIG`, which made Rails complain that it was trying to load
an asset that doesn't exist. This is probably a sign that using `image_tag` for
URLs that *should* be external URLs, but aren't strictly *guaranteed* to be, is
unwise - but, for now, I've just disabled that behavior. I hope Rails has a
better escape hatch for the empty string :P
Ooh, this one was nasty, and only one symptom ever got noticed:
1. Pick "Occupies: Collar" in Advanced Search.
You get the text query "occupies:necklace".
2. And, if you try to do "occupies:collar" even in text-based search,
you *also* get the results from "occupies:necklace" mixed in with
the correct results.
The trick is that, in Spanish, zone 24 (necklace) is named "collar",
as is zone 27 (collar). Not sure what to do for Spanish, but this
issue also leaked into English: we really don't want English to return
results for Spanish-named zones.
This is a tricky problem, though, because it'd be nice for es users
to be able to type "occupies:hat". I think we'll have to do the quick
fix for now, though, and just only interpret the query in the current
locale.
I've been doing this manually via email for a long time,
since building new stuff in the logged-in world was a pain in the old env.
But now here we are! Finally, finally :)
The "fits:8-bit-chomby" search filter was being read as color=8, species=bit.
Now, we split from the right-hand side of the filter instead.
Still a problem for anyone who explicitly types the Spanish/Portuguese
ordering of "fits:chomby-8-bits", but I'm okay with this cheap fix, since
I bet literally nobody has done that in the past month, if ever :P
In particular, outfit_id == 0 would cause outfit_id? to
return false, so it wouldn't run the outfit presence
validation, so /donations/features would try to load
outfit #0 and fail.
Also, flash[:alert] instead of flash[:error] when outfit_id
is bad.
It turns out that some pets for seemingly nonstandard colors have the
standard body type anyway, and vice-versa. This implies that we should
stop relying on a color's standardness, but, for the time being, we've
just revised the prediction model:
Old model:
* If I see a body_id, I find the corresponding color_ids, and it's wearable
by all pet types with those color_ids.
New model:
* If I see a body_id,
* If it also belongs to a basic pet type, it's a standard body ID.
* It therefore fits all pet types of standard color (if there's
more than one body ID modeled already). (Not really,
because of weird exceptions like Orange Chia. Should that be
standard or not?)
* If it doesn't also belong to a basic pet type, it's a nonstandard
body ID.
* It therefore only belongs to one color, and therefore the item
fits all pet types of the same color.
We used get_multi when preparing the proxies to decide which to
load from the database, but then sent multiple get requests to
Memcache to re-fetch the same data from that get_multi. Silly!
Use the data that's already stored on the proxy anyway.
Right now we're spending too much time expiring cache keys when
getting contributions. The longer-term fix is to move it to a
background task, but it's good to restrict deletions only to usable
locales rather than all the ones that Rails theoretically supports.
Some lame benchmarking on my box, dev, cache classes, many items:
No proxies:
Fresh JSON: 175, 90, 90, 93, 82, 88, 158, 150, 85, 167 = 117.8
Cached JSON: (none)
Fresh HTML: 371, 327, 355, 328, 322, 346 = 341.5
Cached HTML: 173, 123, 175, 187, 171, 179 = 168
Proxies:
Fresh JSON: 175, 183, 269, 219, 195, 178 = 203.17
Cached JSON: 88, 70, 89, 162, 80, 77 = 94.3
Fresh HTML: 494, 381, 350, 334, 451, 372 = 397
Cached HTML: 176, 170, 104, 101, 111, 116 = 129.7
So, overhead is significant, but the gains when cached (and that should be
all the time, since we currently have 0 evictions) are definitely worth
it. Worth pushing, and probably putting some future effort into reducing
overhead.
On production (again, lame), items#index was consistently averaging
73-74ms when super healthy, and 82ms when pets#index was being louder
than usual. For reference is all. This will probably perform
significantly worse at first (in JSON, anyway, since HTML is already
mostly cached), so it might be worth briefly warming the cache after
pushing.
That is, once we get our list of IDs from the search engine, only
fetch records whose JSON we don't already have cached.
It's simpler here to use as_json, but it'd probably be even faster
if I figure out how to serve a plain JSON string from a Rails
controller. In the meantime, requests of entirely cached items
are coming in at about 85ms on average on my box (dev, cache
classes, many items), about 10ms better than the last
iteration.
Specifically, we were running a find_or_initialize_by for all 50
hangers, which isn't great. Collation logic is more complicated this
way, but query count is way lower.
Additionally, compare against hanger.list_id instead of hanger.list,
because hanger.list will fire a query if list_id is non-nil, but that
nil ID tells us everything we needed to know, anyway.
Bug report that this resolves:
...However, when I was using the "Import from SDB" tool just a few
minutes ago, it ended up adding EVERY neocash item into the "Not
In A List" section, regardless if I already that item imported
into my "Your Items". So, basically.. I had duplicates of
everything and it would not allow me to move them around into
separate catergories or anything. I know that every other time i've
used the import tool, it would only add NEW items that are not
currently already in my lists yet.
Most of the reasoning is documented in the big comment. In short, we tried
to solve the problem with caching, but the caching should hardly be necessary
now that the bottleneck should be fixed. We'll see on production if it
actually solves the whole problem, but I've confirmed in the console that
redefining this function makes random_basic_per_species (as called during
rendering) a ton faster. And this way we keep our randomness, woo!
TNT has started serving half-removed Corridor of Chance effects:
it has the asset ID and URL and all, but the zone ID is blank.
RocketAMF has patched the empty key bug, and now we ignore assets
associated with empty keys.
Specifically, the Tyrannian Meerca Spear is a pb item that contains
"pea", so its item page is only willing to show a Pea Chia. Now,
a color must be a whole word in the item name for special color
determination to work.
A few key changes:
* Don't reload the whole pet 8 times!! Sooo many bad things
happen, including redundant lookups of everything else and
too many item saves and reindexes. Instead, fetch the item
data, apply it to the items, and then save the items (once
each!)
* Updated my branch of globalize3 to be even better at avoiding
redundant queries when saving. Woo.
* Last realization: wrapping all the item saves in a single
transaction works wonders. COMMIT seems to have high overhead,
so doing only one took it from 50ms * 10 or whatever to 60ms.
Good stuff.
We were joining to the translations table to sort records
alphabetically, but then it sorted by *all* of the translations in
some strange way. Now use with_translations to restrict the join
to the current locale.
In particular, pet#load was handling locale-switching itself, but wasn't
switching back to original locale on error. We could've used a rescue
block, but, when I18n.with_locale is so cool, may as well use it fully.
We originally had a regression on name-matching, where, among
other issues, `straw hat` returned items containing both "straw"
and "hat", which isn't really helpful behavior since we're sorting
alphabetically. Now, `straw hat` behaves as expected.
Additionally, "phrases like these" behave as expected, too.
Confirmed features:
* Output (retrieval, sorting, etc.)
* Name (positive and negative, but new behavior)
* Flags (positive and negative)
Planned features:
* users:owns, user:wants
Known issues:
* Sets are broken
* Don't render properly
* Shouldn't actually be done as joined sets, anyway, since
we actually want (set1_zone1 OR set1_zone2) AND
(set2_zone1 OR set2_zone2), which will require breaking
it into multiple terms queries.
* Name has regressed: ignores phrases, doesn't require *all*
words. While we're breaking sets into multiple queries,
maybe we'll do something similar for name. In fact, we
really kinda have to if we're gonna keep sorting by name,
since "straw hat" returns all hats. Eww.
For example, the Meerca Maid Tray is a foreground item, so the SWF is marked
as compatible with all body types, but the item itself is clearly marked as
Meercas-only. items#show reflected this properly, but the swf_assets#index
call that the wardrobe uses ignored item.species_support_ids.
So, /bodies/:body_id/swf_assets.json?item_ids[]=... was deprecated in favor
of /pet_types/:pet_type_id/items/swf_assets.json?item_ids=[]..., which is
much like the former route but, before loading assets, also loads the pet
type and items, then filters the items by compatibility, then only loads
assets for the compatible items.
Use the ImageMagick flatten command to generate the output all at
once instead of compositing each layer individually, and download
the layers in parallel. On my box, saving roopal27 five times took
a total of 30 seconds before, whereas now it takes 7 seconds. I
expect it to be even better on the production box, where latency
is even lower.
Sharing pane works, everything is great for guests. Logged in
users are on the way, since right now Share Outfit re-saves
anonymously rather than showing sharing data for the existing
outfit.
For example, the site was throwing a 500 error when loading pets
belonging to frozen users. Instead, we'll now rescue that
Neopets::User::AccountDisabledError and ignore it, since it's not
*vital* that we load gender/mood data from this pet; we can still
proceed to load its customization data without it.
So it turns out this was just one of those things I forgot to fix
the big database restructure came along: we were comparing
swf_asset.remote_id against parents_swf_assets.swf_asset_id, which
are two different identifiers entirely. Now using swf_asset.id,
so fixed :)
At first I thought this was an error in the data migration process when moving SWF assets
to having their own unique IDs, but then realized that the query for a pet state's SWFs
didn't include the (parent_type = 'Item') condition. Oops. Turns out, I only connected the
items to parent_swf_asset_relationships polymorphically. Pet states were still doing it the
hackish way. Set the pet states to use the lovely polymorphic relationship and we're good
to go.
After changing the database structure, we lost the feature where, once we discover
new assets for an item for a given body ID, we disconnect previously connected
assets. This commit reinstates that feature.
Lots of scary bugs were being caused by the fact that the possibly-duplicate Neopets ID
was being treated as an SWF's real primary key, meaning that a save meant for object swf
number 123 could be saved to biology swf number 123. Which is awful.
This update gives SWFs their own unique internal ID numbers. All external lookups still use
the remote ID and the type, meaning that the client side remains totally unchanged (phew).
However, all database relationships with SWFs use the new ID numbers, making everything
cleaner. Yay.
There are probably a few places where it would be appropriate to optimize certain lookups
that still depend on remote ID and type. Whatever. Today's goal was to remove crazy
glitches that have been floating around like mad. And I think that goal has been met.