Topic: E621 WIP Archive

Posted under General

Note to mods: I have posted magnet links (ones I made, nothing illegal) and offsite links, if any of this breaks site rules, please let me know.

I've been making an E621 archive for a few weeks now and I thought I would start sharing.

The way I've been doing it is downloading every post tagged by year (I know that misses the ones without datetags, I will get those as well).

I have created torrents for these, I try to make them at least a few GB in size, so 1980 - 2006 are bundled together. I will try to seed the torrents as much as possible, but I know I won't be able to all the time. To help fix that, I have uploaded the files to send.now so people can download and help seed them. The files expire 15 days of no downloads, so everyone is welcome to get them while they last.

The files (pools not included, I'm doing those separately) are named with gallery-dl's default e621_<postid>_<md5hash>.<extension> and E621's metadata json file is included with the files. They have also been compressed with 7zip to save space (7z a -mx9, max compression if you were wondering).

If you would like to help datetag posts, just filter out posts with datetags already. Alternatively, I have a set (which I regularly update) that has posts needing a datetag. https://e621.net/posts?tags=set%3Adatetag_please

It is important to note that since this is still a WIP, these archive files may be updated occasionally. I will post the update once there has been enough new posts.

Magnet links:
1980 - 2006: magnet:?xt=urn:btih:dalljs6isr6bjuujtnu62dfbtbkdnngf&dn=1980-2006&xl=6606084745&fc=8

Here are the send.now links:
1980 - 1999 - https://send.now/8q0gxe32n0a4
2000 - https://send.now/7s2okf9lk5yv
2001 - https://send.now/rvrpmte44zz7
2002 - https://send.now/r153vwy51bke
2003 - https://send.now/r46y4g6492ez
2004 - https://send.now/hfbc488hvpg4
2005 - https://send.now/v1a6r9istj3n
2006 - https://send.now/thpf551w5fpo

Hmm, good question. I am wishing I had unlimited online storage, but that is like wishing for non-cellphone broadband. XD

Having additional mirrors that don’t get you rate limited could never be a bad option.

Plus you have the option to pick and choose what to download, all from everyone else who seeds.

aacafah said:
Is there a reason you prefer to do it this way instead of using the daily database exports & then downloading the raw image/video files separately?

If I used the db export option (I did initially look at it), I'd have to figure out a way to get the urls from the enormous csv, then filter out the ones I want to get first, then download them all, then get the metadata.
edit: forgot to mention I'd also have to figure out a way to get pools put in their own folder.

What I'm doing right now is using gallery-dl and aria2 to download every (tagged) post by year.

Updated

Donovan DMC

Former Staff

whowillevenreadthis said:
I'd have to figure out a way to get the urls from the enormous csv

All you need is the md5 and file extension

const ext = "png";
const md5 = "51c6730da8bd8fa657b6c15ee3c5e1a4";
const url = http://static1.e621proxy.ru/data/${md5.slice(0, 2)}/${md5.slice(2, 4)}/${md5}.${ext};

For what it's worth, I've been sorting Paheal archives by major tag/franchise, then having say, Lion_King\00.ZIP for hash=00xx...xx, Lion_King\01.ZIP for 01xx...xx, and so on. So, upto 256 archives per group. The other thing is I use exclusions by adding - (minus) symbol to beginning of a tag to reduce overlap. Downloading the same files over and over is stupid. XD

alphamule said:
Downloading the same files over and over is stupid. XD

If you're using gallery-dl then look up how to write downloaded links to local database (iirc sqlite). It'll automatically skip what's in said database, unless --no-skip argument is used. Much better than excluding tags.

justkhajiit said:
If you're using gallery-dl then look up how to write downloaded links to local database (iirc sqlite). It'll automatically skip what's in said database, unless --no-skip argument is used. Much better than excluding tags.

Well, I was doing that to make collections by franchise. I'd do a bunch of characters, and by the end I had something like "-series -character1 -character2 -character3 -character4 -character5 character6". Also, going back and tagging those that had a missing series tag. XD There's some that had related series that didn't normally get tagged at the same time, but I wanted them all in the same folder. So it'd also have "-series1 -series2 series3".

I was using a browser extension method. I would just import text files with the lists of URLs and have it save them to specific folders. I also was filtering and de-duping the downloads lists which is pretty much same thing as gallery-dl does with that flag. I have a huge folder full of lists of confirmed downloads and missing ones.

Original page: https://e621.net/forum_topics/58052?page=1