At this point, I've gone through all the assets, and the only ones
without manifests are:
1. The ones that truly have no manifest yet (that we know of)
2. The ones where execution happened to time out
I think the 5-second timeout is a very reasonable default for starting
the backfill, in a way that prioritizes moving forward; but now that we
have most things, I'd rather be able to re-run it with a more generous
timeout. So here we are!
I noticed a thing with like, an asset that I think referenced an item that
doesn't exist, which caused an error in the `body_specific?` validation
step?
Tbh that validation step needs fixed up in a number of ways, but I'm
scared to, since it's hard to know what will break modeling lol.
But in any case, more graceful handling is nice! If something happens,
I'd rather leave it null and try again later than have the job crash!
It's not just that none of them were 200 OK, it's that they were all 404.
In the event that something returns not-200 and not-404, we immediately
abort, so we shouldn't get to this case unless they were all 404!
Okay, I've simplified the migration to *just* add the column, and
instead added a task to find assets without manifest URLs and backfill
them.
Performance is a lot better now, using the `async-http` library, which
as I understand it supports both persistent connections when invoked
like this, and maybe also HTTP/2 multiplexing?? (Though I'm not
actually sure images.neopets.com does lol)
I'm not sure about the number of concurrent tasks I picked here, 100
seems okay for an internet thing and for such small requests, but I
worry that the CDN is gonna get annoyed or something. Well, we'll see!
This task is very resumable if it turns out we get frozen out or
something.