Topic: Best, most recent e621 downloader?

Posted under e621 Tools and Applications

I'm not from the US, but whatever this FCC group is doing with Net Neutrality right now is scaring me quite a bit. I'm not sure what's giving me this feel, but I think e6 won't survive because of it. As much as I hope it won't happen :(

That's why I'm thinking of downloading the entire e6 posts, in case this site has to shut down. I've seen people talking about the same idea as well, being just as concerned as me. Call me crazy, but I'd rather be safe than sorry!

However, most downloaders I've tried so far seem to be really outdated or buggy, so I thought I'd ask in here for a proper working one.

Judging from the official stats, downloading everything would require around 1.2TB of disk space. I'd have to shave off a few images from that, so I'm looking for a downloader that eventually has blacklist support. I have a lot of tags in my account's blacklist, so I can easily see myself saving ~0.4TB.

I mean, even if it doesn't have something fully-fledged out like that, I wouldn't mind having a downloader that at least grabs images in their full original quality. That's the most important part, and downloaders I've tried in the past have failed in doing so.

Anything you guys can recommend? Thank you!

Updated by Wulfre

Sorry for the creepy 5 minutes after posting response, just happened to be lurking when you opened this tread, but here is my biased answer: my downloader. https://github.com/Wulfre/e621dl

Has blacklist support and was last updated today. If you need any extra features just let me know and I'll see what I can do.

Updated by anonymous

Wulfre said:
Sorry for the creepy 5 minute after posting response, just happened to be lurking when you opened tread, but here is my biased answer: my downloader. https://github.com/Wulfre/e621dl

Has blacklist support and was last updated today. If you need any extra features just let me know and I'll see what I can do.

Woah, thank you so much for the quick reply, you're a livesaver! I'll definitely give this a shot later today. I'll come back and tell how it goes. You're the best!! (ノ´ヮ´)ノ*:・゚✧

Updated by anonymous

I just gave it a quick shot with the windows executable (v4.2.2) and I don't think it's working for me sadly.

It made itself a config.ini file, so I put in "tags = matotoma" as a quick test.
When I start the .exe again, it says there's a new version and it asks me if I want to run it anyway. But when I confirm with yes (y), nothing else happens.

I also tried the source code version (v4.2.3) with the latest version of python, but starting it up doesn't give me anything at all. Just a cmd box for a split second.
Running it with cmd says "No module named 'requests' ", and I can't figure out on how to get the requests file.

I've read your instructions on the Github page, but nothing seems help..

Updated by anonymous

tacklebox said:
I just gave it a quick shot with the windows executable (v4.2.2) and I don't think it's working for me sadly.

It made itself a config.ini file, so I put in "tags = matotoma" as a quick test.
When I start the .exe again, it says there's a new version and it asks me if I want to run it anyway. But when I confirm with yes (y), nothing else happens.

I also tried the source code version (v4.2.3) with the latest version of python, but starting it up doesn't give me anything at all. Just a cmd box for a split second.
Running it with cmd says "No module named 'requests' ", and I can't figure out on how to get the requests file.

I've read your instructions on the Github page, but nothing seems help..

I just released a new binary version (4.3.0) that I was working on today like 10 minutes ago. The last version (4.2.3) had a bug when notifying the user about a new version, so that might have been the issue.

The source version doesn't work unless you have all of the dependencies installed, which I guess I should explain better in the documentation.

If you still have an issue, post your config here. I didn't have any testers during development so I don't have any reference to know if anything is particularly hard to understand.

Updated by anonymous

What about the other websites like wildcritters.ws or veebooru.com or exhentai?

Updated by anonymous

Wulfre said:
I just released a new binary version (4.3.0) that I was working on today like 10 minutes ago. The last version (4.2.3) had a bug when notifying the user about a new version, so that might have been the issue.

The source version doesn't work unless you have all of the dependencies installed, which I guess I should explain better in the documentation.

If you still have an issue, post your config here. I didn't have any testers during development so I don't have any reference to know if anything is particularly hard to understand.

Thank you for your assistance, but I still don't think it's working out for me.. Here's everything I did in order:

started the 4.3.0 executable for the first time
cmd box came up for a split second, then created itself the config.ini file.

I edited the [defaults] section in the config.ini file with "ratings = q, e" (it was only 's' before).
then I added a new line with "tags = hidoritoyama", just as a quick test

when I start the .exe again, the cmd window just shows this:

e621dl  INFO  Running e621dl version 4.3.0
e621dl  INFO  Parsing config.

for a millisecond, then closes itself again. took me a while to screencap it.

I uploaded my config.ini in a pastebin if you want to check it out, am I doing something wrong? Thank you again!

Updated by anonymous

I still feel really creepy replying so fast, but we always seem to be on the forum at the same time.

Anyway, the tags go in their own sections below the defaults and blacklist, you can call the sections anything you want. I modified your config.ini for you.

The default section just fills in the days, ratings, and score if you leave them out of the searches you define below. Let me know if you just overlooked something in the docs or how I can explain the format better. Like I said, I didn't have anyone test the program while I was writing it, so everything makes sense to me because I already know what to expect.

https://pastebin.com/6H1tPgtp

Updated by anonymous

Wulfre said:
I still feel really creepy replying so fast, but we always seem to be on the forum at the same time.

Anyway, the tags go in their own section below the defaults and blacklist. I modified your config.ini for you. Let me know if you just overlooked something in the docs or how I can explain the format better.

https://pastebin.com/6H1tPgtp

Haha don't worry, you're not creepy! I'm just very thankful for your help. Will test this out now, many thanks!

Updated by anonymous

Alright, we're one step closer! \o/

I took your config.ini from the pastebin and replaced it entirely with what I had.
When I run the .exe this time, I get a completely new screen: imgur link

Just like what I was dealing with before, it only shows up for a split second as well, and then closes itself again.
It also made a directory under "e6dl\downloads\test" though, which sounds like a good sign to me! But there are no images inside.

I hope I'm not annoying you in any way with this, but I'm a bit new with programs like these.

Updated by anonymous

Okay, I'm getting the same result, let me look into it. It should work just fine with general tags for the time being. Also, you can run the exe through a command shell to keep the text on screen after the program is done.

It's not annoying at all, I'm actually pretty happy to have someone who doesn't know what they're doing (I hope that doesn't sound bad) because they are more likely to break the program and I can make changes to make it more friendly.

EDIT: I already found the issue. I didn't look at the config hard enough. If you want to download all posts ever put on the site you need to change days to something huge like 999999999. The original intent of the program was to run it daily and only download posts that were posted that day.

Updated by anonymous

Ah! I actually wanted to do both of those things, lucky!

So if I'd want to download everything, I'd have to put the days to something infinitely long and just let it do it's thing.

But if I only want to fetch today's posts, I'd have to put the days on 1 and run it daily.

Seems simple enough! I guess I'd have to do it at a specific time like a schedule though, so I won't miss out on anything. This is great!!

Any way I can make a donation to you? If you accept them of course.

Updated by anonymous

That's exactly how it works. If you wanna leave yourself some room for error you can always set days to something like 2 or 3 and it will check that number of days respectively.

I thought about leaving a donation link at the bottom of the documentation since the original writer that I forked from had one, but I didn't feel like it was appropriate for a program that not many people will use and I wrote for myself anyway. I'm just happy to have people using it and giving me feedback.

Updated by anonymous

Awwh, I would've donated without any hesitation. Either way, definitely keep up the amazing hard work! Can't wait until I try this out for real!

And thank you so much for the troubleshooting help! I'll definitely keep an eye on your Github page too.

Updated by anonymous

No problem. Here is my current config if you need more examples. I didn't put it on GitHub because even though the name implies explicit content, I don't need anyone who is just browsing code seeing my lists. ( ͡° ͜ʖ ͡°)

https://pastebin.com/raw/SpSj8ZJ2

Updated by anonymous

Sweet, seems like both our searches and blacklists are pretty similar.. hehehe

But I think this gave me an even better idea on how to use the config file! I hope I won't mess up anything if I do multiple searches like that.

One way or another it's really useful, big thanks for the extra tip! <3

Updated by anonymous

there's also RipMe which works for more websites as well, like Deviantart and Imgur, but it does require Java to run though...

Updated by anonymous

tacklebox said:
Ah! I actually wanted to do both of those things, lucky!

So if I'd want to download everything, I'd have to put the days to something infinitely long and just let it do it's thing.

But if I only want to fetch today's posts, I'd have to put the days on 1 and run it daily.

Seems simple enough! I guess I'd have to do it at a specific time like a schedule though, so I won't miss out on anything. This is great!!

Any way I can make a donation to you? If you accept them of course.

This tool will NOT work to download the full site archive. It uses page numbers and you can only go to page 750 before you start getting 403 errors. If you continue to generate 403 errors at a high rate, you might get yourself IP banned by CloudFlare. Until the tool is fixed to use before_id and abort on errors it's not something I'd suggest.

I also generally suggest that people don't make full site rips. I guarantee that you're not interested in a lot of the content on here, and you're wasting your time, the sites bandwidth, and about 1.2TB of storage space. You'll end up deleting most of it anyways. Pick some tags that interest you and go from there, you'll be MUCH happier.

Updated by anonymous

KiraNoot said:
This tool will NOT work to download the full site archive. It uses page numbers and you can only go to page 750 before you start getting 403 errors. If you continue to generate 403 errors at a high rate, you might get yourself IP banned by CloudFlare. Until the tool is fixed to use before_id and abort on errors it's not something I'd suggest.

I also generally suggest that people don't make full site rips. I guarantee that you're not interested in a lot of the content on here, and you're wasting your time, the sites bandwidth, and about 1.2TB of storage space. You'll end up deleting most of it anyways. Pick some tags that interest you and go from there, you'll be MUCH happier.

Thanks for specifying that actually. I based this program off of one I used to use before I wanted to add my own (admittedly sloppy) features, so I kept the same format and only glanced at the API documentation for things that I did not already have the structure for. I'll switch to before_id right away, it actually seems much nicer to use.

Updated by anonymous

Wulfre said:
Thanks for specifying that actually. I based this program off of one I used to use before I wanted to add my own (admittedly sloppy) features, so I kept the same format and only glanced at the API documentation for things that I did not already have the structure for. I'll switch to before_id right away, it actually seems much nicer to use.

The only caveat you have is that sort order doesn't work anymore under before_id. before_id forcefully sets the sorting order to post id descending. That generally doesn't matter for a downloader, since you want all of the posts for the query, and the order you get them doesn't matter. However you should be aware of the caveat in case it matters for something you plan to do with the tool later on.

Also make sure that your tool doesn't continually try things over and over again if it is getting non-200 http response codes. Easy way to test that is to see if it aborts when requesting a page above 750, as that will immediately give you a 403 error.

Updated by anonymous

KiraNoot said:
The only caveat you have is that sort order doesn't work anymore under before_id. before_id forcefully sets the sorting order to post id descending. That generally doesn't matter for a downloader, since you want all of the posts for the query, and the order you get them doesn't matter. However you should be aware of the caveat in case it matters for something you plan to do with the tool later on.

Also make sure that your tool doesn't continually try things over and over again if it is getting non-200 http response codes. Easy way to test that is to see if it aborts when requesting a page above 750, as that will immediately give you a 403 error.

I don't think that I'll be using the sort order for anything in the future. I also have a really simple way to check the response codes and I feel dumb for not just using them in the first place.

I'm pretty much already set to use before_id, it was only a few lines that I needed to change. A quick question though, is there any way to get the highest post id other than just checking https://e621.net/post/index.json?limit=1

EDIT: I take back that last question about finding the highest post id. I figured out a better way to get the result I wanted. I was just throwing code together and trying to get my questions out while I knew you might be around to see them.

Updated by anonymous

Wulfre said:
I don't think that I'll be using the sort order for anything in the future. I also have a really simple way to check the response codes and I feel dumb for not just using them in the first place.

I'm pretty much already set to use before_id, it was only a few lines that I needed to change. A quick question though, is there any way to get the highest post id other than just checking https://e621.net/post/index.json?limit=1

EDIT: I take back that last question about finding the highest post id. I figured out a better way to get the result I wanted. I was just throwing code together and trying to get my questions out while I knew you might be around to see them.

At the current time it is safe to provide no before_id for the first query, or if the code isn't flexible enough to do that, you can provide signed maximum 32bit integer, which is 2.47 billion or so.

My architectural suggestion is to create a dictionary and fill it with fields if they should be present, and submit them using the requests post and data= functionality. There is no requirement that fields appear in the URL query section, and POST requests are just as valid for the read APIs.

tags = input_tags
if score is not None:
  tags += ' score:>\d' % score
payload = {'limit': '320', 'tags': 'male female mammal'}
if before_id is not None:
  payload['before_id'] = before_id
requests.post("https://e621.net/post/index.json", data=payload)

You could also create a list of tags as a list and then ' '.join(tags) and ensure that you have everything that way.

This will save you from having to define defaults, and actually will make your searches faster.

P.S. When using before_id it's a good idea to make sure that your loop condition exits if before_id doesn't change on each iteration of the loop, or you can easily infinite loop in result sets.

Updated by anonymous

KiraNoot said:
This tool will NOT work to download the full site archive. It uses page numbers and you can only go to page 750 before you start getting 403 errors. If you continue to generate 403 errors at a high rate, you might get yourself IP banned by CloudFlare. Until the tool is fixed to use before_id and abort on errors it's not something I'd suggest.

I also generally suggest that people don't make full site rips. I guarantee that you're not interested in a lot of the content on here, and you're wasting your time, the sites bandwidth, and about 1.2TB of storage space. You'll end up deleting most of it anyways. Pick some tags that interest you and go from there, you'll be MUCH happier.

Oh damn, I was just about to prepare my config until I read this.
I wanted to blacklist a lot of tags anyway, but now I'm a bit worried of the potential errors and IP bans. So should I not use it at all to stay safe?

Updated by anonymous

tacklebox said:
Oh damn, I was just about to prepare my config until I read this.
I wanted to blacklist a lot of tags anyway, but now I'm a bit worried of the potential errors and IP bans. So should I not use it at all to stay safe?

I'd wait for Kira to respond if you are worried, since they know way more about how the site works than I do, but I did fix everything that they mentioned in the last 3 posts and it will be in the next release.

Updated by anonymous

Wulfre said:
I'd wait for Kira to respond if you are worried, since they know way more about how the site works than I do, but I did fix everything that they mentioned in the last 3 posts and it will be in the next release.

Alright I'll just wait then, thanks!

Updated by anonymous

tacklebox said:
Alright I'll just wait then, thanks!

Other than the missing sanity check that before_id is changing on each request it looks like it should be acceptable. The added checks for the HTTP status code should be enough that it will stop if it encounters problems, and that will keep you out of trouble if it goes too fast or starts to do something the site doesn't like.

I haven't looked through the whole thing from top to bottom, but overall it seems like it should get the job done.

I still suggest setting a list of tags you'rs interested instead of just a blacklist, but you're not going to get blocked for something like that. :P

Updated by anonymous

tacklebox said:
Alright I'll just wait then, thanks!

New release is published now. I'll keep fixing things as I find them or am told about them.

KiraNoot said:
Other than the missing sanity check that before_id is changing on each request it looks like it should be acceptable. The added checks for the HTTP status code should be enough that it will stop if it encounters problems, and that will keep you out of trouble if it goes too fast or starts to do something the site doesn't like.

I haven't looked through the whole thing from top to bottom, but overall it seems like it should get the job done.

I still suggest setting a list of tags you'rs interested instead of just a blacklist, but you're not going to get blocked for something like that. :P

Of course, I wasn't expecting a full code review for my hacky program. Just making sure that no one gets in trouble for using it while I keep tweaking it.

Updated by anonymous

Wulfre said:
New release is published now. I'll keep fixing things as I find them or am told about them.

Great! I'm still completely formatting my hard drive, which can take a while.. I should've done it while sleeping last night.

Either way, I'll finally get everything started once it's done. Thank you very much!

Updated by anonymous

111111111 said:
And to think, I started panic-downloading my porn the hard way.

I got you covered.😉

Updated by anonymous

Just wanted to update really quick and say everything went simply wonderful!
No issues, silky smooth and super fast, plus it barely even filled up my drive too!

Definitely feel like I should've made multiple directories like [muscular] and [pokemon], but I was sort of in a rush due to me panicking back then.
I might do it another day, but for now I definitely feel more relieved! I'll run it daily to keep my collection updated too~

Are you really sure you don't accept any donations? You pretty much just saved my life over here and I couldn't be more thankful.

Updated by anonymous

I would put up my downloader, but next to nobody uses Ubuntu, or knows how to install Bash for Windows. Plus, it's still a work in progress, adapted from my organizer program, Faux-Boxer.

Updated by anonymous

Edit: I forgot that I have a windows version already up. If you want to give it a try, you'll need to run it from the command line, and there's no blacklist support.

https://github.com/Youboontoo/faux-boxer-downloader-win/releases/tag/v1.0.0

Extract everything into a folder, then double-click the .bat file. Follow the instructions to proceed with downloading. Still working on it, so please be patient with any bugs you might encounter.

Updated by anonymous

tacklebox said:
Just wanted to update really quick and say everything went simply wonderful!
No issues, silky smooth and super fast, plus it barely even filled up my drive too!

Definitely feel like I should've made multiple directories like [muscular] and [pokemon], but I was sort of in a rush due to me panicking back then.
I might do it another day, but for now I definitely feel more relieved! I'll run it daily to keep my collection updated too~

Are you really sure you don't accept any donations? You pretty much just saved my life over here and I couldn't be more thankful.

Thanks for the update. Glad everything went well! You're the first person to tell me that it actually works as expected from the point of an end-user.

I'm pretty sure that I won't be accepting donations, at least for this project. All I did was wrap a bit of logic around the e621 API, and it was for myself foremost. I just decided to share it because what's the point of keeping something that could be useful to other people for only myself?

20-Shades-Of-Faux-Pa said:
Edit: I forgot that I have a windows version already up. If you want to give it a try, you'll need to run it from the command line, and there's no blacklist support.

https://github.com/Youboontoo/faux-boxer-downloader-win/releases/tag/v1.0.0

Extract everything into a folder, then double-click the .bat file. Follow the instructions to proceed with downloading. Still working on it, so please be patient with any bugs you might encounter.

Oh hey, did't realize you were Youboontoo on GitHub. I actually really like how simple this looks. I never bothered learning all of Microsoft's commands or PowerShell, so this could be really useful for someone who doesn't want to mess around with prerequisites (like python for my own program).

Updated by anonymous

You can try setting up Hydrus (DL ). I think it's not remotely as user friendly or intuitive as it should be, but maybe in a year or two it will be the best tool for everyone. It can subscribe to multiple sites, rip chan threads, and act as a private (or public) booru. As it turned out, I don't quite care enough or need to use a program like that, but I can tell it suits people who want to screw around organizing and searching their media in an offline repository.

I highly recommend putting the .db files on an SSD or your fastest storage device to greatly quicken maintenance tasks, like the regular, automatic syncing with public repositories. Something that hadn't finished overnight that I had begun hours before falling asleep completed in an hour once I moved my .db files over.

Updated by anonymous

I'll respond to this thread with another question: does there exist a downloader that can actually parse the tags of an image, and include those tags in the metadata of the image?

Updated by anonymous

HarryBenson said:
I'll respond to this thread with another question: does there exist a downloader that can actually parse the tags of an image, and include those tags in the metadata of the image?

It's already difficult to add regular metadata like Author, Program Used and whatnot to PNG files, so I'd say no. But just because I say doesn't mean there aren't any.

Updated by anonymous

I forgot that some pictures can save me thousands of words...

Album of Hydrus after setup, and some demo of basic operations. NSFW, cub, and there was an incidental shota human.

Hydrus also supports blacklisting and aliases and implications, because it's basically a private booru like I had said.

I believe the tags are stored in client.mappings.db, not directly as media metadata.

Updated by anonymous

I have a complete archive of the site (all images and image metadata) and there's another website which copies all of e621's posts (anyone remember its address?). Some of the tags will be outdated as I don't constantly hit the site for updates. Instead I download about a day at a time, 5 seconds between pictures, within around 24 hours of an image being posted so any tag changes after that I miss. Originally I was going to make an AI to help with automatic tagging and I'll get to it someday...

If you're downloading just to make sure the content is never fully gone don't bother. I can release a torrent if Bad Dragon ever shutdowns and closes the site. It'll take you a long time to download the entire site without negatively effecting its performance.

I'm not saving the wiki or forum posts.

Updated by anonymous

Wulfre said:
Sorry for the creepy 5 minutes after posting response, just happened to be lurking when you opened this tread, but here is my biased answer: my downloader. https://github.com/Wulfre/e621dl

Has blacklist support and was last updated today. If you need any extra features just let me know and I'll see what I can do.

Thanks for that. I've been looking for something simple like this for awhile.

Updated by anonymous

Gave this tool a bash and it is quite handy for bulk downloading images according to tags. I was wondering though if it might be possible to add in some additional capability to it, say creating sub-folders according to certain tags. I proposed this idea in another thread to have something that had the same functionality as the Furaffinity Extension add-on that is available for Google Chrome and Firefox (see link) https://e621.net/forum/show/250500.

What I want to be able to do is download images with the My Little Pony tag and filter them into subfolders according to the Character tag and if needs be into another subfolder for the Artist tag. Is this something that could be done with the tool as it currently is or do some modifications need to be made?

Updated by anonymous

Since most people know of my script from this thread, I'll put an update about it here. The script now retries a couple of times if it fails to connect, I changed the output to be more informative, and for all the nerds out there, I updated code with the accepted PEPs from 3.6 and 3.7.
https://github.com/Wulfre/e621dl

I also did make the script its own thread, but it got buried. I'll leave a link for it still in case anyone wants to give feedback there.
https://e621.net/forum/show/247767

Updated by anonymous

  • 1