Matchu
35713069fa
I like running the full `archive:create` to help us be _confident_ we've got the whole darn thing, but it takes multiple days to run on my machine and its slow HDD, which… I'm willing to do _sometimes_, but not frequently. But if we had a version of the script that ran faster, and only on URLs we still _need_, we could run that more regularly and keep our live archive relatively up-to-date. This would enable us to build reliable fallback infra for when images.neopets.com isn't responding (like today lol)! Anyway, I stopped early in this process because images.neopets.com is bad today, which means I can't really run updates today, lol :p but the delta-ing stuff seems to work, and takes closer to 30min to get the full state from the live archive, which is, y'know, still slow, but will make for a MUCH faster process than multiple days, lol
24 lines
No EOL
756 B
Bash
Executable file
24 lines
No EOL
756 B
Bash
Executable file
# Sort urls-cache-backup.txt (what we already have backed up).
|
|
cat $(dirname $0)/urls-cache-backup.txt \
|
|
| \
|
|
sort \
|
|
| \
|
|
uniq - $(dirname $0)/urls-cache-backup.sorted.txt \
|
|
&& \
|
|
# Sort urls-cache.txt (what's available on images.neopets.com).
|
|
cat $(dirname $0)/urls-cache.txt \
|
|
| \
|
|
sort \
|
|
| \
|
|
uniq - $(dirname $0)/urls-cache.sorted.txt \
|
|
&& \
|
|
# Compute the diff between these two files, filtering to lines that start
|
|
# with "> ", meaning it's in urls-cache.txt but not in urls-cache-backup.txt.
|
|
diff $(dirname $0)/urls-cache-backup.sorted.txt $(dirname $0)/urls-cache.sorted.txt \
|
|
| \
|
|
grep '^>' \
|
|
| \
|
|
sed 's/^>\s*//' \
|
|
| \
|
|
# Output to urls-cache-delta.txt, and to the screen.
|
|
tee $(dirname $0)/urls-cache-delta.txt |