Matchu
35713069fa
I like running the full `archive:create` to help us be _confident_ we've got the whole darn thing, but it takes multiple days to run on my machine and its slow HDD, which… I'm willing to do _sometimes_, but not frequently. But if we had a version of the script that ran faster, and only on URLs we still _need_, we could run that more regularly and keep our live archive relatively up-to-date. This would enable us to build reliable fallback infra for when images.neopets.com isn't responding (like today lol)! Anyway, I stopped early in this process because images.neopets.com is bad today, which means I can't really run updates today, lol :p but the delta-ing stuff seems to work, and takes closer to 30min to get the full state from the live archive, which is, y'know, still slow, but will make for a MUCH faster process than multiple days, lol
15 lines
875 B
Bash
Executable file
15 lines
875 B
Bash
Executable file
echo 'Starting! (Note: If many of the URLs are already downloaded, it will take some time for wget to quietly check them all and find the new ones.)'
|
|
xargs --arg-file=${URLS_CACHE=$(dirname $0)/urls-cache.txt} -P 8 wget --directory-prefix=${ARCHIVE_DIR=$(dirname $0)} --force-directories --no-clobber --timeout=10 --retry-connrefused --retry-on-host-error --no-cookies --compression=auto --https-only --no-verbose
|
|
|
|
# It's expected that xargs will exit with code 123 if wget failed to load some
|
|
# of the URLs. So, if it exited with 123, exit this script with 0 (success).
|
|
# Otherwise, exit with the code that xargs exited with.
|
|
# (It would be nice if we could tell wget or xargs that a 404 isn't a failure?
|
|
# And have them succeed instead? But I couldn't find a way to do that!)
|
|
XARGS_EXIT_CODE=$?
|
|
if [ $XARGS_EXIT_CODE -eq 123 ]
|
|
then
|
|
exit 0
|
|
else
|
|
exit $XARGS_EXIT_CODE
|
|
fi
|