Hey there. This is a project I've been working on for a few weeks getting the infrastructure set up, and I've just recently finished getting it to the point where I can actually have it do something other than use up bandwidth and CPU time.
The project is, for lack of a better name, known as SuperiorBOT.
What does it do?
Currently, not much. Much of the time was spent setting up data collection from popular furry sites (e621, SoFurry, and FurAffinity only currently), and the only thing I've had it do so far was flag exact pixel-for-pixel matches of images. You might have noticed this if you looked at the flag history over the past few days.
What is it going to do?
Quite a bit, hopefully. There's a bunch of things on E621 that I've felt could be automated (or at least made easier) with the help of having lots of data available. In other words, basically I've built myself a local copy of the databases of e621, SoFurry, and FurAffinity. The following is a list of potential ideas that may or may not be implemented.
Current work
6/28/16 - Replacing image cache system with zip files instead of storing them directly on the filesystem. Disk I/O is beginning to be a problem when I have millions of 20kb png files scattered around...
Unimplemented Dangerous Stuff
If something's listed in this section, it means I'll consult with and get approval from admins before implementing any of it.
- Add year tags to posts missing them (2008, 2013, etc) based on matching decoded image MD5 sums with images found on FurAffinity.
- Add artist tags to images based on their source links.
- Automatically generate new artists when posts by them are uploaded for the first time.
- Add sources to images when more are found on FurAffinity or SoFurry.
- Remove/fix dead sources (for example data.furaffinity.net)
- Transfer applicable tags from inferior posts to superior posts.
- Automatically add color-based tags like greyscale, monochrome, sepia, black_and_white, restricted_palette, alpha_channel
- Add ratio and resolution tags to new posts
Unimplemented Safe Stuff
Some less interesting stuff that poses zero danger to e621 site operation.
- Fix decoded image MD5 sums for gif images so it takes into account more than just the very first frame.
- Add support for data collection from more furry sites!
- Build a list of images that are visually similar to each other without being parented or linked in any way (using the same technique that powers http://iqdb.harry.lu/) Still unknown if any action taken on these posts will be automated or fully manual.
Changelog
- 6/20/16 Post below now automatically updates with status information about the bot.
- 6/20/16 Created this post.
Updated by user 59725