impress-2020/scripts/archive/create/download-urls.sh
Matchu 35713069fa Delta version of archive scripts
I like running the full `archive:create` to help us be _confident_ we've got the whole darn thing, but it takes multiple days to run on my machine and its slow HDD, which… I'm willing to do _sometimes_, but not frequently.

But if we had a version of the script that ran faster, and only on URLs we still _need_, we could run that more regularly and keep our live archive relatively up-to-date. This would enable us to build reliable fallback infra for when images.neopets.com isn't responding (like today lol)!

Anyway, I stopped early in this process because images.neopets.com is bad today, which means I can't really run updates today, lol :p but the delta-ing stuff seems to work, and takes closer to 30min to get the full state from the live archive, which is, y'know, still slow, but will make for a MUCH faster process than multiple days, lol
2022-10-13 15:08:29 -07:00

15 lines
875 B
Bash
Executable file

echo 'Starting! (Note: If many of the URLs are already downloaded, it will take some time for wget to quietly check them all and find the new ones.)'
xargs --arg-file=${URLS_CACHE=$(dirname $0)/urls-cache.txt} -P 8 wget --directory-prefix=${ARCHIVE_DIR=$(dirname $0)} --force-directories --no-clobber --timeout=10 --retry-connrefused --retry-on-host-error --no-cookies --compression=auto --https-only --no-verbose
# It's expected that xargs will exit with code 123 if wget failed to load some
# of the URLs. So, if it exited with 123, exit this script with 0 (success).
# Otherwise, exit with the code that xargs exited with.
# (It would be nice if we could tell wget or xargs that a 404 isn't a failure?
# And have them succeed instead? But I couldn't find a way to do that!)
XARGS_EXIT_CODE=$?
if [ $XARGS_EXIT_CODE -eq 123 ]
then
exit 0
else
exit $XARGS_EXIT_CODE
fi