Topic: Any functioning e621 downloaders anymore?

Posted under e621 Tools and Applications

I've been looking around a lot and have downloaded a few programs but none seem to work anymore. I look at a lot of images on this site and like to keep a local directory of pieces I like, but it's very very tedious when I want to browse EVERY page and the scale is so broad that there's over 600 pages.

I was wondering if there were any functioning e621 downloaders that use tags?

I've looked at NeBuR's e621downloader as well as e621dl, and another couple of programs I had from months before but they didn't work either. Can't remember their names anymore since they're just called e621downloader as well.

Updated by savageorange

I'm using e6collector from forum #142112 I have version 1.0.2 and it seems to be working, but I can't figure out how to search for multiple tags. Instead I'm adding posts I like to temporary set, download the set content, delete set and create it again.
I've just tried using newest version, but it doesn't work for me

Updated by anonymous

How about coming up with some description of what kind of tool you'd like to see?
What kind of search requests do you need to handle?

Updated by anonymous

hslugs said:
How about coming up with some description of what kind of tool you'd like to see?
What kind of search requests do you need to handle?

Well just something that can download pages of posts based on tags that are searched. Like Digimon rating:e, or something like that.

It doesn't need to handle pools, but that is a plus. There's more images than pools, and there's already a tab for pools so I can use that.

I wonder if it's possible to have it not download things blacklisted? Not sure if it's possible, but would be good to see.

Edit: Also, I use Windows 10 and the ones I've found seem to be python or Linux based. Kind of a bummer, but hopefully there can be a tool that works on it. Wish I was good enough at coding to do it myself. I can make a GUI in Visual Studio if someone knew how to do this stuff in C#. I know a little, but not enough.

Updated by anonymous

DarkSpyro92 said:
Well just something that can download pages of posts based on tags that are searched. Like Digimon rating:e, or something like that.

So you want a tool to download say all 50*127 full-size images that a search for "digimon rating:e" returns right?

I wonder if it's possible to have it not download things blacklisted? Not sure if it's possible, but would be good to see.

Yeah but you'd need to supply the blacklist a file and the tool would need to process it.

Updated by anonymous

Granberia said:
I can't figure out how to search for multiple tags.

I haven't used e6collector before, but I read through the source and it indicates that you have to URL encode the tags first. Perhaps this is why you couldn't search multiple tags? Spaces need to be encoded as %20 Some other special characters might need encoding for some tags so the easiest way to be sure is probably to make the search on e6 and then copy the part of the url after https://e621.net/post/index/1/

DarkSpyro92 said:
Also, I use Windows 10 and the ones I've found seem to be python or Linux based

Why not just install python?

Updated by anonymous

hslugs said:
So you want a tool to download say all 50*127 full-size images that a search for "digimon rating:e" returns right?

Yeah. I'd like a tool like that.

Yeah but you'd need to supply the blacklist a file and the tool would need to process it.

Would a text file with the tags work? I think I made a program like that in class before.

Updated by anonymous

purple.beastie said:
Why not just install python?

I have installed python but can't seem to figure out how to use it. I'm looking up tutorials now. The e621dl use python but says it can use Windows, but with the errors I'm getting, I think it might be deprecated.

Updated by anonymous

DarkSpyro92 said:
I have installed python but can't seem to figure out how to use it. I'm looking up tutorials now. The e621dl use python but says it can use Windows, but with the errors I'm getting, I think it might be deprecated.

I'm not sure what e621dl is. It only turns up in this thread and two others in search, and it doesn't seem to have its own thread. Perhaps e621:tools can help you? There is a list of deprecated tools at the bottom. I think e6collector.py is the only general purpose tag downloader that is still functional though.

If you get stuck with Python, describe the problem and maybe I can help.

Updated by anonymous

purple.beastie said:
I'm not sure what e621dl is. It only turns up in this thread and two others in search, and it doesn't seem to have its own thread. Perhaps e621:tools can help you? There is a list of deprecated tools at the bottom. I think e6collector.py is the only general purpose tag downloader that is still functional though.

If you get stuck with Python, describe the problem and maybe I can help.

https://github.com/wwyaiykycnf/e621dl

Updated by anonymous

I have a in-house solution I use that I'll be putting up on my GitHub soon, but it's for linux, and it's not quite up to snuff with the rate-of-requests rule... If you're interested, I'll be putting up a link to it on my thread sometime today or tommorow.

Updated by anonymous

DarkSpyro92 said:
Yeah. I'd like a tool like that.

https://gist.github.com/anonymous/78d4ba06c148d7bfbea62263f58441d6
Create blacklist.txt in current directory. Only "tag" and "-tag" are supported.

Would a text file with the tags work? I think I made a program like that in class before.

Yeah, it's really a simple problem, and exactly the kind that makes scripting languages worth learning. Scripts like this are usually written as needed, and often follow personal preferences.

Here's another version, this is how I prefer Danbooru-like downloaders to work. Although I'm not sure symlinks are available in Windows.

https://gist.github.com/anonymous/c5cfd7b55f9b52d8cd5829e022f581ad

Updated by anonymous

purple.beastie said:
So, perhaps give Wulfre's fork a try?
https://github.com/Wulfre/e621dl

I tried that and apparently it's supposed to create a config.ini file. When I run it, it says I should use pip install -r requirements.txt, which apparently isn't acknowledged by either the command line or Python 3.6.

hslugs said:
https://gist.github.com/anonymous/78d4ba06c148d7bfbea62263f58441d6
Create blacklist.txt in current directory. Only "tag" and "-tag" are supported.

Yeah, it's really a simple problem, and exactly the kind that makes scripting languages worth learning. Scripts like this are usually written as needed, and often follow personal preferences.

Here's another version, this is how I prefer Danbooru-like downloaders to work. Although I'm not sure symlinks are available in Windows.

https://gist.github.com/anonymous/c5cfd7b55f9b52d8cd5829e022f581ad

I looked through that and saw where the blacklist is acknowledged and that it searches for tags, but not where the tags are to be input. Could you tell me where that is? I see multiple places where I might put tags, but I'm not sure any of them are right. Is it on

def parse_tags(tagstring):

if not tagstring:

return [ ]

Updated by anonymous

DarkSpyro92 said:
I tried that and apparently it's supposed to create a config.ini file. When I run it, it says I should use pip install -r requirements.txt, which apparently isn't acknowledged by either the command line or Python 3.6.

Does just pip in the command line say something like command not recognized? If so the path to pip is not in your PATH environment variable for some reason. You could try doing a file search for pip.exe where you installed python and then use "absolute path to pip.exe goes here" install -r requirements.txt.

If pip by itself is recognized, maybe you are not executing the command in the right directory and it can't find requirements.txt?

I looked through that and saw where the blacklist is acknowledged and that it searches for tags, but not where the tags are to be input. Could you tell me where that is?

If I'm not mistaken, the command for graball, for example, should be python graball.py "tags go here -minus -these -ones"

Updated by anonymous

DarkSpyro92 said:

I looked through that and saw where the blacklist is acknowledged and that it searches for tags, but not where the tags are to be input. Could you tell me where that is?

You pass the query to the program when you run it.
As in ./graball.py "humor".
This is apparent from looking at the last 10 lines of the program.

Updated by anonymous

savageorange said:
You pass the query to the program when you run it.
As in ./graball.py "humor".
This is apparent from looking at the last 10 lines of the program.

I get this error.

Traceback (most recent call last):
File "graball.py", line 3, in <module>
import urllib, requests, json
ImportError: No module named requests

I seem to get that error with a lot of these programs. What are the requests? Are those the queries I pass?

Updated by anonymous

DarkSpyro92 said:
I seem to get that error with a lot of these programs. What are the requests? Are those the queries I pass?

requests is a third party python library. Normally pip is used to install such third party dependencies.

Updated by anonymous

purple.beastie said:
requests is a third party python library. Normally pip is used to install such third party dependencies.

Now that I installed it, I get this error.

> https://e621.net/post/index.json?tags=digimon rating:e

Traceback (most recent call last):
File "graball.py", line 136, in <module>
fetch_results(sys.argv[1])
File "graball.py", line 117, in fetch_results
posts = e6get('/post/index', tags=search)
File "graball.py", line 17, in e6get
resp = requests.get(url, args)
AttributeError: module 'requests' has no attribute 'get'

I searched up this error but all it says is that you have to call the requests module in the code? At least that's what I'm getting from this.

https://stackoverflow.com/questions/12258816/module-object-has-no-attribute-get-python-error-requests

You probably know more about how to solve this issue, though. Thank you for helping me, btw.

Updated by anonymous

DarkSpyro92 said:
You probably know more about how to solve this issue, though. Thank you for helping me, btw.

Happy to help. Not quite sure how to diagnose this though. I tried to reproduce the issue on my end without success. You could try reinstalling requests in case it was just a fluke. Here's the command for that: pip install requests -I.

Otherwise, could you post the output of these commands?:

python -c "import requests, sys; print(sys.executable, requests.__file__, dir(requests))"
where pip
pip show requests

Updated by anonymous

purple.beastie said:
Happy to help. Not quite sure how to diagnose this though. I tried to reproduce the issue on my end without success. You could try reinstalling requests in case it was just a fluke. Here's the command for that: pip install requests -I.

Otherwise, could you post the output of these commands?:

python -c "import requests, sys; print(sys.executable, requests.__file__, dir(requests))"
where pip
pip show requests

Pip doesn't work for me. I get an error when using it.

pip : The term 'pip' is not recognized as the name of a cmdlet, function, script file, or operable program. Check the spelling of the name, or if a path was included, verify that the path is correct and try again.
At line:1 char:1
+ pip install requests -I
+ ~~~
+CategoryInfo: ObjectNotFound:(pip:String)[],CommandNotFoundException
+FullyQualifiedErrorId : CommandNotFoundException

Instead I used git with a link from that link you gave me.
$ git clone git://github.com/requests/requests.git

Updated by anonymous

In that case I suspect you have installed requests wrongly (for example, simply copying files/directories into the Python site-packages directory)

I'd suggest trying to install pip via these instructions , and then trying again to install requests using pip.

(pip is not a builtin part of python; I don't think purple-beastie mentioned this fact. But it is fairly easy to install.)

EDIT: git clone git://github.com/requests/requests.git ? -- yeah, that won't by itself get you a working install of requests. I can go into more detail, but I'd strongly recommend the pip route, it's simpler.

Updated by anonymous

DarkSpyro92 said:
Pip doesn't work for me. I get an error when using it.

Oh, sorry I thought you got pip working. I'd try installing it, as savageorange said. They started bundling pip with python starting with 3.4 though so I suspect that it is installed but that your command line is simply unaware of the path to it.

If you want to try an alternative you could call python setup.py install in the directory you cloned from github. No guarantee this will work though.

Updated by anonymous

savageorange said:
I'd suggest trying to install pip via these instructions , and then trying again to install requests using pip.

purple.beastie said:
Oh, sorry I thought you got pip working. I'd try installing it, as savageorange said. They started bundling pip with python starting with 3.4 though so I suspect that it is installed but that your command line is simply unaware of the path to it.

If you want to try an alternative you could call python setup.py install in the directory you cloned from github. No guarantee this will work though.

I got it working now. I had to type in py pip install requests -I. I tried putting py before on another pip request but it didn't work. I'm glad it did this time.

Edit: The program started working, but ran into an error.

Traceback (most recent call last):
File "graball.py", line 136, in <module>
fetch_results(sys.argv[1])
File "graball.py", line 121, in fetch_resultsfetch_single(post)
File "graball.py", line 113, in fetch_single
link_image_tags(img, tags, id, ext)
File "graball.py", line 61, in link_image_tags
os.symlink(pname, tname)
OSError: [WinError 123] The filename, directory name, or volume label syntax is incorrect: '../../post/42/d6/42d6665688211a722622e59b76db7765.jpg' -> 'tags/<3/1288685.jpg'

Updated by anonymous

.. that looks like a valid path, but I don't know the details of Windows' behaviour here. Possibly the use of / rather than \` is a problematic linuxism here. ... Looking at your code purple-beastie -- it does seem to hardcode / rather than using os.path.sep and/or os.path.join[1]. IMO you cannot rely on that working. [1] specifically, lines 40, 44, 56, 57

Updated by anonymous

DarkSpyro92 said:
I got it working now. I had to type in py pip install requests -I. I tried putting py before on another pip request but it didn't work. I'm glad it did this time.

Edit: The program started working, but ran into an error.

Traceback (most recent call last):
File "graball.py", line 136, in <module>
fetch_results(sys.argv[1])
File "graball.py", line 121, in fetch_resultsfetch_single(post)
File "graball.py", line 113, in fetch_single
link_image_tags(img, tags, id, ext)
File "graball.py", line 61, in link_image_tags
os.symlink(pname, tname)
OSError: [WinError 123] The filename, directory name, or volume label syntax is incorrect: '../../post/42/d6/42d6665688211a722622e59b76db7765.jpg' -> 'tags/<3/1288685.jpg'

Looks like Windows does not allow directories named "<3".

Try this version, it should skip tags with forbidden characters:
https://gist.github.com/anonymous/6ec9b5739ad50f589330f4bcdd02e93a
Better yet, start with fetchall instead. It simpler and I'm not even sure you'll need the kind of directory structure graball creates.

Also max/minid fixed in graball, I'm pretty sure it should be minid.

Update: oh and savageorange may be right as well, I haven't used Windows for so long I did not even thought of that. Just stick with fetchall then, it doesn't do directory stuff.

Updated by anonymous

hslugs said:
Looks like Windows does not allow directories named "<3".

..TIL. And this wasn't obvious because of the probably overly-broad except OSError: pass. Checking the strerror attribute of the exception for specific OK cases is probably saner, if you can't just check for specific subclasses.

Updated by anonymous

Some more fixes, blacklisting "tag1 tag2" should work properly now.

https://gist.github.com/anonymous/e3818dd963f9bc1cb59d5ae39f349939
https://gist.github.com/anonymous/7463668a4d684a015e281d108e133ab4

Also a simple pool downloader for static or growing pools, like comics and such. Takes pool id.

https://gist.github.com/anonymous/ad7c432c40580bc423f2fec1ad92a3d6

And this wasn't obvious because of the probably overly-broad except OSError: pass.

Yeah that should have been FileExistsError.

Updated by anonymous

hslugs said:
Some more fixes, blacklisting "tag1 tag2" should work properly now.

https://gist.github.com/anonymous/e3818dd963f9bc1cb59d5ae39f349939
https://gist.github.com/anonymous/7463668a4d684a015e281d108e133ab4

Also a simple pool downloader for static or growing pools, like comics and such. Takes pool id.

https://gist.github.com/anonymous/ad7c432c40580bc423f2fec1ad92a3d6

Yeah that should have been FileExistsError.

Hey I forgot to ask. Where are the files saved when I download them? I don't see anything about destination in the code.

Edit: Nevermind, I found them. They're organized in a way I've never seen, but I can manually move them around.

http://imgur.com/a/CSzoS

Unless you know of a simple bit of code that will organize them by pictures instead of these folders? It seems to be organized by path.

Downloading post 1255504 post/6a/16/6a165ded3bb6fe0617a4a176513b4450.png

In this case it would be a folder labeled 6a>folder labeled 16>file labeled 6a165ded3bb6fe0617a4a176513b4450.png with another file with the same label but with extension .json

Also, thank you for making this. I'm at least happy to have a program that functions. While it will require a bit of tedious work on my end, at least it will save me the time of having to go through numerous amounts of pages. That's a plus, and I'm grateful for it.

Updated by anonymous

DarkSpyro92 said:
They're organized in a way I've never seen, but I can manually move them around.

Check any image URL on e6 lol.
http://static1.e621proxy.ru/data/sample/f6/67/f6674de8fdc0a2a9e34658e3338edb5c.jpg
"f6674de8fdc0a2a9e34658e3338edb5c" is the hex md5 of the file contents, and it starts like f6-67-...

Also use fetch-all if you plan to move them around.
The kind of structure grab-all makes is useful if you are trying to keep a partial mirror of the site.

Unless you know of a simple bit of code that will organize them by pictures instead of these folders?

What do you mean by pictures?

Updated by anonymous

hslugs said:
Check any image URL on e6 lol.
http://static1.e621proxy.ru/data/sample/f6/67/f6674de8fdc0a2a9e34658e3338edb5c.jpg
"f6674de8fdc0a2a9e34658e3338edb5c" is the hex md5 of the file contents, and it starts like f6-67-...

Also use fetch-all if you plan to move them around.
The kind of structure grab-all makes is useful if you are trying to keep a partial mirror of the site.

Seem to have run into one more error. I hope that this isn't too much trouble for you having to do this. I'm sure I won't be the only one grateful for this, though.

Traceback (most recent call last):
File "graball.py", line 139, in <module>
fetch_results(sys.argv[1])
File "graball.py", line 124, in fetch_results
fetch_single(post)
File "graball.py", line 116, in fetch_single
link_image_tags(img, tags, id, ext)
File "graball.py", line 64, in link_image_tags
os.symlink(pname, tname)
FileNotFoundError: [WinError 3] The system cannot find the path specified: '../../post/b3/b9/b3b959f47757643a029392d892fa79cd.png' -> 'tags/.../1245837.png'

Updated by anonymous

I don't think ... is a valid name either. The rules involved are overly complicated, IMO.
(in this case, . is permitted but filenames may not consist solely of .)

Yeah that should have been FileExistsError.

If that was all the try/except clause was designed for, it can be entirely removed -- pass exist_ok=True when calling os.makedirs, instead.

Updated by anonymous

hslugs said:
Check any image URL on e6 lol.
http://static1.e621proxy.ru/data/sample/f6/67/f6674de8fdc0a2a9e34658e3338edb5c.jpg
"f6674de8fdc0a2a9e34658e3338edb5c" is the hex md5 of the file contents, and it starts like f6-67-...

Also use fetch-all if you plan to move them around.
The kind of structure grab-all makes is useful if you are trying to keep a partial mirror of the site.

What do you mean by pictures?

Well nevermind. Using fetch-all gets just the pictures instead of separate folders. And holy crap is it going to make things confusing since it's just doing it inside the folder fetch-all is in. However that means I can just move fetch-all to different folders and run it in each one to do what I want.

Edit: Huzzah! By moving fetch-all.py to the folder I want the program to occur in and opening a command line in that folder makes it download to that folder. That way I don't have to move the pictures, I just have to move the program. Or do what I did and just copy the program to different locations, which also works.

Edit 2: Holy fuck there's so many posts. o_o It's downloading all of them that aren't blacklisted, and even then that's over 600 pages of posts I'm downloading. This is probably going to take a few hours or more.

Updated by anonymous

Yeah.. that kind of problem is why I prefer to use a download manager (eg. DownThemAll, aria) with lists of URLs. Easier to control (pause/resume downloading, limit download speed..).

Programs don't need to be in current directory to execute them BTW.
Suppose you had a bunch of subdirectories named after searches, like

  • mydir\\digimon
  • mydir\\shark herm
  • etc..

You could put the script in mydir. Then you could change into digimon directory, and run it like ..\\fetchall.py "digimon".

(another solution is to put the directory fetchall.py is in on your PATH, so that you can run fetchall.py "mysearchterms" no matter where you are. This is slightly more complicated to do but makes things easy in the longer term)

Updated by anonymous

savageorange said:
Yeah.. that kind of problem is why I prefer to use a download manager (eg. DownThemAll, aria) with lists of URLs. Easier to control (pause/resume downloading, limit download speed..).

Programs don't need to be in current directory to execute them BTW.
Suppose you had a bunch of subdirectories named after searches, like

  • mydir\\digimon
  • mydir\\shark herm
  • etc..

You could put the script in mydir. Then you could change into digimon directory, and run it like ..\\fetchall.py "digimon".

(another solution is to put the directory fetchall.py is in on your PATH, so that you can run fetchall.py "mysearchterms" no matter where you are. This is slightly more complicated to do but makes things easy in the longer term)

That's possible? Huh. Also, I looked at DownThemAll. If it saves a web page's pictures, if you linked it to each page of the search, wouldn't it only save the thumbnails at that size and not the full image?

Updated by anonymous

That action you are talking about is just a particular part of DownThemAll. In the DTA Download Manager, you can right click and in Advanced submenu, it gives the option to import downloads (eg. as a .txt file with one URI per line)

If you consider what fetchall/graball does currently (get post info from e621, download each file named in each post's "file_url"), it wouldn't be hard to modify it to generate a list of URLs instead. Then you can import that .txt file into DTA.

(this is what my system does. I've got one program that gets the post info matching a query, and another program that takes post info and outputs a list of URLs as a text file I can import.)

Updated by anonymous

savageorange said:
That action you are talking about is just a particular part of DownThemAll. In the DTA Download Manager, you can right click and in Advanced submenu, it gives the option to import downloads (eg. as a .txt file with one URI per line)

If you consider what fetchall/graball does currently (get post info from e621, download each file named in each post's "file_url"), it wouldn't be hard to modify it to generate a list of URLs instead. Then you can import that .txt file into DTA.

(this is what my system does. I've got one program that gets the post info matching a query, and another program that takes post info and outputs a list of URLs as a text file I can import.)

But I was wanting the actual images, not URLs, because I'd like to be able to access the images offline. Or do you mean alter fetch-all to get URLs, then use the URLs with DTA?

Updated by anonymous

DarkSpyro92 said:
But I was wanting the actual images, not URLs, because I'd like to be able to access the images offline. Or do you mean alter fetch-all to get URLs, then use the URLs with DTA?

Yes, the second.

Updated by anonymous

savageorange said:
Yes, the second.

Hum... I can try to modify it myself. I've never worked with Python before but I do know a little bit of code.

Updated by anonymous

Just curious: why don't just fav them and download with any fav downloader?
NVM i got it.

Updated by anonymous

DarkSpyro92 said:
Hum... I can try to modify it myself. I've never worked with Python before but I do know a little bit of code.

I'd be happy to offer some hints if you want, but I never found a link for fetchall.py. Am I blind or is not linked in this thread?

Updated by anonymous

Granberia said:
I'm using e6collector from forum #142112 I have version 1.0.2 and it seems to be working, but I can't figure out how to search for multiple tags. Instead I'm adding posts I like to temporary set, download the set content, delete set and create it again.
I've just tried using newest version, but it doesn't work for me

It works fine for me, i even use a batch file to organize the pictures into folders named with the tags i let it download

Updated by anonymous

DarkSpyro92 said:
It's linked, but it's fetch-all.py. Let me grab it.

https://gist.github.com/anonymous/e3818dd963f9bc1cb59d5ae39f349939

Thanks.
Well, the structure here seems quite simple to me. fetch_single is called for each post in the post list. It handles various checks (which are irrelevant to this change) and then it calls fetch_image to actually perform the download.

You can see that fetch_image consists of just one line urllib.request.urlretrieve(url, filename=img).
fetch_image is only ever called from fetch_single, so it is simpler in this case to ignore fetch_image entirely; but I thought I'd point out this is where the download actually happens in the current program.

In fetch_single you see the line fetch_image(img, url). Since we know this is the call that (in the present system) causes the download, this is the line we should replace in order to write the url to file instead.

Now, supposing we had a file handle already open, we could change this line to something like
downloadqueue.write(url + '\\n')
(\\n meaning newline, which is how lines are separated. Omitting it would mean the urls were all concatenated onto one line, which the download manager would interpret as one colossal -- and invalid -- url)

So then there is the question of when to open the filehandle which I have named downloadqueue. I consider fetch_results to be the most appropriate place for this, for a variety of reasons. After opening it, we need to get the downloadqueue file handle to fetch_single in order to use it.

We have to open the downloadqueue outside of the while loop, because we only want to open it once, not repeatedly.
In Python the best way to open files is usually via
with, as in with open('c:/downloadqueue.txt', 'wt') as downloadqueue:

And then, because with opens a block and downloadqueue will only exist within that block, all the statements from while posts: to posts = e6get... need to be indented one level.

Going back to fetch_single, the def line needs to be adjusted to receive a second parameter, like this: def fetch_single(post, downloadqueue):

After making this modification, all we have to do is to actually pass our downloadqueue as a second parameter after post, when we call fetch_single from within fetch_results. I won't describe exactly how to do this as I believe it's pretty obvious.

I hope this has been clear; feel free to ask for further clarification if needed.

---

As a side note, it is worth considering whether you want to open the file with wt mode (write, text) or at mode (append, text). I chose 'wt' because it generally is more straightforward to deal with as a user - it empties the file when you first open it.
However, by appending to the end of the file, you can 'pile together' a collection of downloads from multiple searches, by simply running the tool with those different searches one after another.
The downside of that is when you are done 'piling up downloads' and have imported them into DTA, you need to manually empty the downloads queue or those old items will just be left there.

Updated by anonymous

  • 1