Commit graph

2 commits

Author SHA1 Message Date
b9b0db8b3a Some archive:create tweaks
I'm looking into what it would take to update the archive on a regular basis. The commands right now *are* pretty good at avoiding duplicate work… but the S3 upload still seems like it's taking very long even to just validate what's in the archive already. We might have to build our own little cache rather than using `aws s3 sync`, if we want faster incremental updates?

Here, I make a few quality-of-life changes to add a `archive:create` command that runs everything in a straight line. That way, I can let it run and see how much wall-time it takes, to be able to decide whether speeding it up feels necessary. (vs whether it's a few-hours task I can just set a reminder to manually run every week or something)
2022-10-02 07:08:40 -07:00
d9ca07c9b0 Use wget for archive:create:download-urls
Hey this is an exciting development! A list of URLs, that we want to clone onto our hard drive, turns out to be something `wget` is already very good at!

Originally I used `wget`'s `--input-file` option to process the `urls-cache.txt` file, but then I learned how to parallelize it from this StackOverflow answer: https://stackoverflow.com/a/11850469/107415. (Following the guidance in the comments, I removed `-n 1`, to avoid the overhead of extra processes and allow `wget` instances to keep using shared connections over time. Idk why it was in there, maybe the author didn't know `wget` accepts multiple args?)

Anyway yeah, it's working great, except for the weird images.neopets.com downtime! 😅 Specifically I'm noticing that all the item thumbnail images came back really fast, but the customization images are taking for-EV-er. I wonder if that's just caching properties, or if there's a different backing server for it and it's responding much more slowly? Who's to say!

In any case, I'm keeping the timeout in this script pretty low (10 seconds), and just letting failures fail. We can try re-running it again sometime when the downtime is resolved or the cache is warmed up.
2022-09-12 21:47:28 -07:00