Implement archive:upload:delta
Ok great, we can now run the delta archive process! It'd be nice to get this running on cron on the impress-2020 server, to a temporary folder? I *do* want to be remembering to run something regularly on my personal machine too though, to keep my own copy up-to-date…
This commit is contained in:
parent
12b5a56694
commit
88511d3dc6
2 changed files with 21 additions and 2 deletions
|
@ -8,7 +8,7 @@ yarn aws s3 ls --recursive s3://dti-archive/ \
|
||||||
sed -E 's/^[0-9]{4}-[0-9]{2}-[0-9]{2}\s+[0-9]{2}:[0-9]{2}:[0-9]{2}\s+[0-9]+\s+/https:\/\//' \
|
sed -E 's/^[0-9]{4}-[0-9]{2}-[0-9]{2}\s+[0-9]{2}:[0-9]{2}:[0-9]{2}\s+[0-9]+\s+/https:\/\//' \
|
||||||
| \
|
| \
|
||||||
# Hacky urlencode; the only % value in URLs list today is %20, so...
|
# Hacky urlencode; the only % value in URLs list today is %20, so...
|
||||||
sed -E 's/ /%20/' \
|
sed -E 's/ /%20/g' \
|
||||||
| \
|
| \
|
||||||
# Output to manifest-remote.txt, and print to the screen.
|
# Output to manifest-remote.txt, and print to the screen.
|
||||||
tee $(dirname $0)/../manifest-remote.txt
|
tee $(dirname $0)/../manifest-remote.txt
|
|
@ -1 +1,20 @@
|
||||||
echo 'archive:upload:delta -- TODO!'
|
cat $(dirname $0)/../manifest-delta.txt \
|
||||||
|
| \
|
||||||
|
# Remove the URL scheme to convert it to a folder path in our archive
|
||||||
|
sed -E 's/^https?:\/\///' \
|
||||||
|
| \
|
||||||
|
# Hacky urldecode; the only % value in the URLs list today is %20, so...
|
||||||
|
sed -E 's/%20/ /g' \
|
||||||
|
| \
|
||||||
|
# Upload each URL to the remote archive!
|
||||||
|
# NOTE: This is slower than I'd hoped, probably because each command has to
|
||||||
|
# set up a new connection? If we needed to be faster, we could refactor
|
||||||
|
# the `create` step to download to a temporary delta folder, then `cp`
|
||||||
|
# that into the main archive, but run `aws s3 sync` on just the delta
|
||||||
|
# folder (with care not to delete keys that are present in the remote
|
||||||
|
# archive but not in the delta folder!). But this seems to run at an
|
||||||
|
# acceptable speed (i.e. a few hours) when it's run daily.
|
||||||
|
while read -r path; do
|
||||||
|
yarn aws s3 cp $ARCHIVE_DIR/$path s3://$ARCHIVE_STORAGE_BUCKET/$path;
|
||||||
|
done
|
||||||
|
|
Loading…
Reference in a new issue