impress-2020

OpenNeo/impress-2020

Fork 0

Commit graph

Author	SHA1	Message	Date
Matchu	ea8715cd90	Sanitize URLs saved by archive:create:list-urls Especially in our item thumbnails, there's a lot of messiness about what the URL protocol is. There are also some SWF assets whose "URLs" are just saved as paths. In this change, we start processing all our outputted URLs through a `sanitizeUrl` function, which tries to massage it into an `https://images.neopets.com` URL, and warns if it cannot. This also warns on some intentionally-different URLs, like our April Fools prank item lol Anyway, I love functions like this, because the warnings always help me discover the data problems! I wasn't aware of the path-only SWF URLs, for example, until this script started warning about the URL parse errors!	2022-09-12 20:52:45 -07:00
Matchu	ef9958c11e	Add asset URLs to archive:create:list-urls script Here, we read URLs out from the swf_assets table, including SWFs, manfests, and everything referenced by the manifests. There are a few data-polishing tricks we needed to do to get this to work! Most notably, newer manfests reference themselves, but older ones don't; so we try to infer the manifest URL from the other URLs. (Our database caches the manifest content, but not the manifest URL it came from.)	2022-09-12 17:26:11 -07:00
Matchu	3ce895d52f	Start building archive:create:list-urls script Just working on making an images.neopets.com mirror, just in case! To start, I'm extracting all the URLs we need to back up; and then I'll make a separate script whose job is to mirror all of the URLs in the list.	2022-09-12 15:53:22 -07:00

Author

SHA1

Message

Date

Matchu

ea8715cd90

Sanitize URLs saved by archive:create:list-urls

Especially in our item thumbnails, there's a lot of messiness about what the URL protocol is. There are also some SWF assets whose "URLs" are just saved as paths.

In this change, we start processing all our outputted URLs through a `sanitizeUrl` function, which tries to massage it into an `https://images.neopets.com` URL, and warns if it cannot.

This also warns on some intentionally-different URLs, like our April Fools prank item lol

Anyway, I love functions like this, because the warnings always help me discover the data problems! I wasn't aware of the path-only SWF URLs, for example, until this script started warning about the URL parse errors!

2022-09-12 20:52:45 -07:00

Matchu

ef9958c11e

Add asset URLs to archive:create:list-urls script

Here, we read URLs out from the swf_assets table, including SWFs, manfests, and everything referenced by the manifests.

There are a few data-polishing tricks we needed to do to get this to work! Most notably, newer manfests reference themselves, but older ones don't; so we try to infer the manifest URL from the other URLs. (Our database caches the manifest content, but not the manifest URL it came from.)

2022-09-12 17:26:11 -07:00

Matchu

3ce895d52f

Start building archive:create:list-urls script

Just working on making an images.neopets.com mirror, just in case! To start, I'm extracting all the URLs we need to back up; and then I'll make a separate script whose job is to mirror all of the URLs in the list.

2022-09-12 15:53:22 -07:00

3 commits