Topic: Fluffle - Reverse image search service

Posted under e621 Tools and Applications

Fluffle

Fluffle is a reverse image search service, like TinEye and SauceNAO, but scoped only to the furry community.
You give Fluffle a picture, and it will attempt to find that picture on e621, Fur Affinity, Weasyl, Furry Network and Twitter.
A few use cases are finding the artist of an image you saved locally or quickly finding the other places an artist might’ve posted a piece.

You can give Fluffle a try at fluffle.xyz if you like!

Feel free to leave a comment here or shoot me a message on Telegram @NoppesTheFolf if you have questions, spotted a bug or whatever.
Feedback in general is also very much appreciated!

Browser extension

To streamline the process of reverse searching an image, Fluffle also has a browser extension for Chrome and Firefox.
The extension is very simple: right-click an image on any website and in the menu that pops up, there will be an option to reverse search the image.
Click on it, and the browser will open a new tab in which the image is reverse searched using Fluffle.

Telegram bot

On top of the website and the browser extension, there is also a bot for Telegram.
In private chats (the conversation between you and the bot) it will simply reverse search any images you send to it.
The bot is most useful in channels and group chats where it can automatically append sources to any images sent.
There are also limited customization options as to how the reverse search results are presented in your chats.
On the website you can read more about the Telegram bot.

API

For fellow developers: there is also an API available which you can use in your own applications to make use of Fluffle’s reverse search capabilities.
Check out the documentation for more information on that.

Since the project is open source I'd also be happy to collaborate with others to create new integrations.
For example, bots for Discord and Reddit would be neat, but I don't have the time to also maintain all of that.

It... works for twitter... I love it already.
Edit: And the extension will speed up so much. Thank you for that.

I just figured out, it doesn't work on artstation, and deviantart. Can we hope to have these websites added in the future? It has the potential to become the best reverse search engine.

Updated

dubsthefox said:
I just figured out, it doesn't work on artstation, and deviantart. Can we hope to have these websites added in the future? It has the potential to become the best reverse search engine.

If possible I'll also suggest Inkbunny (altho I'd understand not wanting to interact with that site either) and itaku.ee.

strikerman said:
If possible I'll also suggest Inkbunny (altho I'd understand not wanting to interact with that site either) and itaku.ee.

because of cub content or is there another reason?

This is neat! Probably a fair amount of work to set up, too.

Feature ideas:

1. Search by md5, sha1, or equal. I know that doesn't let you do a "similarity" search, but it will find exact matches. This might not be as useful for searching sites like FA, which recompresses many images, but it would probably work OK for searching other sites. On the positive side, it makes the uploaded data size real small. If you do this, you probably only need to support one or two hash algorithms; md5 seems to be the most popular.

2. It might be interesting to have a per-site image count displayed somewhere on Fluffle, sort of like what e621 does on its front page. Something like "currently searching 1,234,567 images on e621, 7,891,011 images on FA, 121,314 images on Weasyl", etc. This would help show what Fluffle's reach or scope is. It's kind of "tourist information", so it probably only needs to show up interactively, and not on the API. I know that when you do a search interactively, Fluffle displays the total count of images it is searching, but it isn't broken down by site.

One (possible) bug report:

Drag-and-drop isn't working for me, at least for some use cases.

1. Go to an image post here on e621, such as https://e621.net/posts/62687 .
2. Click on the image on e621, drag it to the "drag 'n drop a fluffy critter here" box on Fluffle, and let go.
3. Fluffle gives the red "Did you drop a file which originates from the browser?" warning box. The expected action is that Fluffle would accept the image and start a search.

I see the same thing in step 3 if I do any of the following three things:

(a) Drag the URL only of the e621 post ( https://e621.net/posts/62687 ) to Fluffle.

On an e621 post, right-click the image on the post page and select "Open in new tab". Then, (b) drag either the new image to Fluffle, or (c) just the URL of the new image ( http://static1.e621proxy.ru/data/sample/b5/90/b590959363fc38fda82a52dc3fc1fed9.jpg ) to Fluffle.

Bringing up an image in my OS file manager, and dragging-and-dropping that to Fluffle, works as expected - Fluffle uploads the image and does a search.

This is all with Firefox 102.4.0esr, 64-bit, desktop, on Linux.

Thank you!

One of the reasons why SauceNAO didn't originally do sites like FA was because they couldn't get permission to crawl those websites.
With Twitter the problem is that, it's Twitter, you would have to whitelist individual accounts and even with those accounts they could be posting memes and IRL images as well.

Inkbunny is one of the few sites that can already be reverse searched, but it's limited to MD5 hash and not visuals.

dubsthefox said:
because of cub content or is there another reason?

Do people just magically forget that e621 hosts cub content when talking about inkbunny?

mairo said:
Do people just magically forget that e621 hosts cub content when talking about inkbunny?

Meant more in terms of the dev's personal comfort

mairo said:
Do people just magically forget that e621 hosts cub content when talking about inkbunny?

Of course, I am aware of that. I just couldn't imagine another reason why Strikerman should have said,"altho I'd understand not wanting to interact with that site either", when talking about Inkbunny. I just wanted to make sure there isn't another scandal, in this fandom, I am not aware of.

dubsthefox said:
I just figured out, it doesn't work on artstation, and deviantart. Can we hope to have these websites added in the future? It has the potential to become the best reverse search engine.

strikerman said:
If possible I'll also suggest Inkbunny (altho I'd understand not wanting to interact with that site either) and itaku.ee.

The vision I have for Fluffle is for it to really only be scoped to the furry community. So ArtStation and itaku.ee are very unlikely candidates to be added due to the wider audience they have. DeviantArt might be feasible to some extent, but I'd have to look into it more, I'll keep it in mind!

Regarding Inkbunny, the cub content is indeed the issue. Besides personal opinions there are also legal reasons. To my knowledge, posts on there don't get tagged that well either, meaning I can't reliability filter out NSFW cub content like I do with e621.

furbitron said:
Search by md5, sha1, or equal. I know that doesn't let you do a "similarity" search, but it will find exact matches. This might not be as useful for searching sites like FA, which recompresses many images, but it would probably work OK for searching other sites. On the positive side, it makes the uploaded data size real small. If you do this, you probably only need to support one or two hash algorithms; md5 seems to be the most popular.

Adding those hashes would actually be quite a big undertaking for several technical reasons. Why would you use this over a similarity search if I may ask? Because Fluffle already has the ability to determine which images are an exact match based on the results of the similarity search. When it comes to efficiency, Fluffle also already resizes and compresses the images in your browser before sending them over to save you data.

furbitron said:
It might be interesting to have a per-site image count displayed somewhere on Fluffle, sort of like what e621 does on its front page. Something like "currently searching 1,234,567 images on e621, 7,891,011 images on FA, 121,314 images on Weasyl", etc. This would help show what Fluffle's reach or scope is. It's kind of "tourist information", so it probably only needs to show up interactively, and not on the API. I know that when you do a search interactively, Fluffle displays the total count of images it is searching, but it isn't broken down by site.

Hey that's a neat idea! It could show those counts when you click or hover over the summed number. If you're looking for tourist information, you should check out Fluffle's status page. It reveals how many images per website have been indexed so far and on a given day. Just know that I created that status page for myself to keep an eye on indexation progress. It's not supposed to be intuitive. One day I do want to create a public dashboard with fancy stats, but one thing at a time.

furbitron said:
Drag-and-drop isn't working for me, at least for some use cases.

Sadly, that's partially expected behavior for Firefox. I ran across it as well, hence the error message. If you try it with Chrome (or presumably any Chromium based browser), you'll see it works as you expected it to. Good thing Fluffle has a browser extension that you can use as a workaround for Firefox!

It is weird that it gives you the same error message when you drag an URL into it. That is not supposed to happen so thanks for pointing that out. I'll see if perhaps I can fix that!

noppes said:
The vision I have for Fluffle is for it to really only be scoped to the furry community. So ArtStation and itaku.ee are very unlikely candidates to be added

As an uploader, I can say, it would cut quite some time while uploading. Reducing the amount of tabs to be opened. It's maybe 5 - 10 seconds each time, but it adds up.

noppes said:
Adding those hashes would actually be quite a big undertaking for several technical reasons.

.
md5sum ~/porn/* >md5s.txt & and come back tomorrow? :D

"Adding a database column is a PITA" and "The originals got deleted, or backed up to punched tape, and would be a PITA to restore" are the other leading alternatives. :)

Why would you use this over a similarity search if I may ask?

To find an exact match for an image, if one is available. This might be important if you want to feed the Fluffle results into further automation, like renaming or moving files on your local disk, or similar.

I'm not sure why, but the similarity searches I've seen don't seem to report "100%" even when they should. Going back to my previous example, if you start at https://e621.net/posts/62687 , and click "reverse SauceNAO search", you get https://saucenao.com/search.php?url=http://static1.e621proxy.ru/data/sample/b5/90/b590959363fc38fda82a52dc3fc1fed9.jpg . The result that says "Creator: keishinkae" and "Source: www.furaffinity.net" links back to that same post here on e621... but SauceNAO only rates the similarity at 89.76%.

Sharp-eyed readers may notice that SauceNAO is searching for the e621 thumbnail/resize, rather than the original. If you feed SauceNAO the original, at http://static1.e621proxy.ru/data/b5/90/b590959363fc38fda82a52dc3fc1fed9.jpg, you get https://saucenao.com/search.php?url=http://static1.e621proxy.ru/data/b5/90/b590959363fc38fda82a52dc3fc1fed9.jpg , which again finds the e621 post... but with a slightly worse similarity ranking of 89.05%.

Fluffle does better - if I download the original of that post from e621, and then upload it to Fluffle, Fluffle finds the post here and rates it 97.66% similar.

It would be possible for further automation to have a rule like "anything rated 97.0% or above can be assumed to be the same", but I think that would eventually get confused. I'm pretty sure I've seen SauceNAO give ratings in the 90s for two different versions of the same picture - like, pants and no-pants. I don't have an example handy, though.

Some of it just to have more knobs and buttons. :D Some sites don't support search-by-md5 natively, so if you're creating a third-party search, it'd be a neat feature to add. For some sites, it does mean touching each original image twice - once for the similarity data, once the for the md5 - but if you do that as you scrape images, it's probably not completely horrible on CPU or disk. Some sites, like e621, can tell you what they think the md5 is via the API. Other sites provide it indirectly - part of the path on Weasyl appears to be the sha256 of the file involved - but it's not always documented, so it's liable to change on you. Also, knowing the md5 of some images and the sha256 of others might be a little less than helpful.

When it comes to efficiency, Fluffle also already resizes and compresses the images in your browser before sending them over to save you data.

I'm on desktop and Ethernet, so I kind of don't care how much data it takes. :) It would be less inbound data to you, in case that's important for your hosting plan.

It could show those counts when you click or hover over the summed number.

Click is a little better for mobile, because mobile can't hover. On desktop it doesn't really matter.

If you're looking for tourist information, you should check out Fluffle's status page .

That's the kind of thing I was thinking of - thanks! I understand that it's not "production".

It will be interesting to see what the (image) occupancy rate on FA ends up being. Your estimate of "roughly 50%" is probably as good as anyone who doesn't have root on FA can make. :) Other sites that publish(ed) a count range from about 60% (Furiffic, RIP), to 67% (SoFurry), to 85% (e621).

[re: Firefox errors] I ran across it as well, hence the error message.

I don't know exactly what it is, but Firefox seems to enforce some kind of same-origin or cross-site protection stuff that Chrome doesn't, by default. I've seen at least one other site say that sometimes it doesn't work with Firefox (RedditMetis - it needs to hit Reddit to get some data, crunches that data, and then shows it to you.)

The error message was a little confusing to me, specifically the "a file which originates from the browser" part. I would understand it if Fluffle couldn't deal with an image from a file:// URL, since that really would "originate from the browser", instead of the net. Since I was trying to use it on images I had loaded in the browser from a web site (e621), I wasn't sure why those counted as originating from the browser.

It is weird that it gives you the same error message when you drag an URL into it.

To be more specific on the steps to reproduce:

1. Open a new tab in Firefox, and in that tab, go to an image on on e621, such as https://e621.net/posts/62687 .
2. Either a) open a new window in Firefox (File > New Window) or b) open a new tab in Firefox, and then drag the new tab out of the existing Firefox window, to create a new Firefox window.
3. In the new Firefox window, go to https://fluffle.xyz/ .
4. Back in the first window, switch to the tab with the e621 image.
5. Click in the address bar on that tab until the entire URL is selected.
6. Click on the URL, drag it to the Fluffle drop target in the other Firefox window, and drop it.
7. Fluffle gives the "Did you drop a file..."? error box. Expected: Fluffle should have started a search based on the dropped URL.

As a test, I also tried putting a URL in a plain text file, and then selecting the URL in the text file and dragging it to Fluffle's drop target. Same result - Fluffle gives me the "Did you drop a file...?" error box.

I do also have uBlock Origin installed in Firefox, but it says the only thing it's blocking on Fluffle is a Cloudflare tracker, at static.cloudflareinsights.com/beacon.min.js .

Thanks!

noppes said:
To my knowledge, posts on there don't get tagged that well either, meaning I can't reliability filter out NSFW cub content like I do with e621.

Inkbunny is one of the only websites that mandate tagging and users can add additional tags to artists submissions. This is why I fucking love inkbunny, it's so technically advanced and actually has decent rules and guidelines in place - yet somehow it has the stigma of "that" website, hence why my confusion because twitter and e621 has so much more cub material than inkbunny does, because even cub people sometimes seem to be abandoing it as they do not allow humans in pornographic artwork limiting scope of also cub artists while twitter allows this as well. I guess you get around twitter by having whitelist of furry artists and not whitelisting artists that do young looking artwork.

Considering this fact, I guess it's one new scredriwer in the toolbox which I might remember if old tools don't fit or stop working in future, but as it is right now there are several alternatives now that already work currently.

mairo said:
...alternatives now that already work currently.

For twitter as well? Every one I have asked (two people, hah), said there is no good way to reverse search on twitter.

dubsthefox said:
For twitter as well? Every one I have asked (two people, hah), said there is no good way to reverse search on twitter.

SauceNAO does (Index #41: Twitter), but because they do basically everything, majority of the sites they cover are for weeb/human stuff and the index hasn't updated since 2019, it hasn't been that useful. I would imagine they have similar problem where indexing anything from twitter is nightmare because it's social media and not artsite (ARTISTS! FUCKING STOP USING TWITTER EXCLUSIVELY!) so the engine will start finding food pics every time you try to reverse search furries because the amount of food pics.
Google also really often indexes tweets.

But the main problem with twitter will continue to be that it needs to be whitelisted as per creator unless you want infinite amount of data and usually with reverse search I'm looking for stuff I have no idea where it even might be from, so it wouldn't work with new/unknown artists that obviously wouldn't be on that whitelist.

mairo said:
(ARTISTS! FUCKING STOP USING TWITTER EXCLUSIVELY!)

Given recent events, this may fix itself. It somewhat depends on how fast Twitter turns into /b/ .

[...] so it wouldn't work with new/unknown artists that obviously wouldn't be on that whitelist.

Twitter has tags too, and at least in theory, it would be possible to find "new" artwork or artists to index by running searches on #furryart, #anthroart, etc via the Twitter API. If you were feeling really excited, you could keep a count of how many times a "new" artist (not previously followed by your search-bot) gets found via a tag search, and once that count hits 5 or 10 or 20, then you either have your search-bot follow that artist automatically, or have it give you a report of "artists you should probably add to my follow list".

That would fall down a little bit if there was an artist that was brand new to Twitter and didn't know to tag their art with the "popular" keywords yet. It might also fall down if some of the tags are language-specific - do German furry artists tag their images with #pelzig or #furry on Twitter?

I am not the greatest at understanding C#, but I looked at Fluffle's code a little bit, and I think it's currently operating by following a certain list of users on Twitter.

so the engine will start finding food pics every time you try to reverse search furries because the amount of food pics.

There is some logic that appears to be trying to decide if a particular image is furry artwork or not, before it gets indexed by Fluffle. Unfortunately I can't tell how it works. :D I can see requests called is-furry-art-v2 and is-furry-artist-v2 getting sent somewhere, but I can't figure out who is replying to those requests, or what criteria are used when replying.

I think the reply might come from the Postgres database that Fluffle uses, but I'm not sure. I feel like there's a Postgres database schema, and maybe some stored procedures, that aren't currently published in the Github project, but I'm not sure. Again, I'm not a C# expert, so maybe that stuff is in there somewhere and I'm just not seeing it.

furbitron said:
md5sum ~/porn/* >md5s.txt & and come back tomorrow? :D

"Adding a database column is a PITA" and "The originals got deleted, or backed up to punched tape, and would be a PITA to restore" are the other leading alternatives. :)

In short, Fluffle doesn't download the original images when it doesn't need to for indexing and also does not store any of the downloaded images permanently. So, adding a hash like MD5 would require redownloading every image Fluffle has ever indexed. Sure, it is possible and I might someday because I want to improve the image fingerprinting algorithm behind Fluffle (which requires downloading everything again anyway), but right now that does not have priority.

furbitron said:
To find an exact match for an image, if one is available.

Fair enough, if you really want to make sure it is 100% the exact same image.

furbitron said:
I'm not sure why, but the similarity searches I've seen don't seem to report "100%" even when they should.

Ah, I guess that is a side effect from me developing this mostly inside of my own bubble. Maybe it would be better to not even expose the scores to users.

Fluffle does some magic in the background the categorize all images into four different groups: exact matches, toss-ups, alternatives and improbable matches. The exact matches are the one the website shows with the green indicator. Fluffle is very confident those are the same image as submitted. The alternative category is pretty self-explanatory: images with changes which make it deviate from the submitted one. Sometimes Fluffle is just not able to accurately determine whether it is an exact match or an alternative because the data it is basing the results on is inconclusive, hence the toss-up category. Both toss-ups and alternatives are shown using an orange indicator. Improbable matches speak for themselves and are shown in red.

How it exactly determines those categories... I couldn't tell you anymore. You will have to look at the source code for that one :P

furbitron said:
I'm on desktop and Ethernet, so I kind of don't care how much data it takes. :) It would be less inbound data to you, in case that's important for your hosting plan.

Since you seem curious, inbound data is not the reason Fluffle does that. It is actually to reduce load on the server's CPU (smaller images are less work to process) as processing power is going to be a bottleneck before bandwidth. A nice side effect is that people on metered and/or slower connections also benefit from it.

furbitron said:
The error message was a little confusing to me.

Regarding the entire bug report, that is indeed the main problem with it. It is clear to me what the problem is and how to reproduce it. Thanks again!

mairo said:
while twitter allows this as well

You are... actually right. I didn't realize Twitter allowed that. That kinda forced my hand to look into the whole legal situation more carefully again and under my country's law there are actually no objections to that sort of "virtually created" (drawn art and such) content. So I guess my whole legal argument is moot now. I don't think search engines, like Fluffle, should be opinionated like that even though it conflicts with my own personal ethics in this particular case. Hence I suppose adding support for Inkbunny isn't something I can object to without also taking a stance against Twitter.

mairo said:
Google also really often indexes tweets.

Ehh, personally I haven't had much luck with that. I just tried it again with some art from Wagnermutt, but it couldn't find it. Of course that is small sample but considering their popularity you'd expect it to find it.

furbitron said:
Twitter has tags too, and at least in theory, it would be possible to find "new" artwork or artists to index by running searches on #furryart, #anthroart

Why haven't I thought of this. I literally used this method when working on another project of mine regarding reverse searching fursuits, but instead with hashtags like #FursuitFriday and such lol.

furbitron said:
I am not the greatest at understanding C#, but I looked at Fluffle's code a little bit, and I think it's currently operating by following a certain list of users on Twitter.

You are correct. Fluffle does indeed work based on a whitelist. That whitelist is generated based on data pulled from e621 at the moment. So there is definitely room for improvement there.

The short version is that it gets all Twitter users ever mentioned on e621 and checks if those users post furry art using some AI magic. All artists get their timeline scraped as far back as Twitter allows and on every tweet that contains images, Fluffle uses more AI magic to determine if those images are furry art. If the case, it indexes them. And that is how Fluffle prevents all of those food pictures from showing up!

furbitron said:
I can't figure out who is replying to those requests, or what criteria are used when replying.

That code is in a confusing place indeed.

noppes said:
In short, Fluffle doesn't download the original images when it doesn't need to for indexing and also does not store any of the downloaded images permanently.

I keep forgetting that there are now people old enough to create web sites who didn't grow up with dial-up Internet. :D

e621 says they have about 3.6 million posts, and the total file size is 5 TB. I expect that 5 TB is rounded, so it could be anywhere from 4.5 TB to 5.5 TB. Doing the math, that says the average post here is around 1.2 to 1.5 MB. Fluffle currently searches about 27.5 million posts, so that would be roughly 33 TB to 41 TB.

I can go down to the local computer store and buy 16 TB of hard drive for about US$305 with tax, or roughly about €330 with VAT. So, only about €990 to store all of the images Fluffle currently indexes, plus a few TB for expansion. :D (Getting those drives plugged in to a shared server in a rack somewhere is an exercise for the student...)

Ah, I guess that is a side effect from me developing this mostly inside of my own bubble. Maybe it would be better to not even expose the scores to users.

I still think showing the scores is a good idea.

I think SauceNAO is the only other online reverse search I have used that shows the scores, so that's what I'm comparing to. I think other sites (TinEye, Google) usually sort the results so that the higher scores are first, but they don't tell you what the scores are.

For comparing images on my local disk, I usually use Geeqie . The similarity search gives you a list of images it thinks are the same, along with a percentage similarity score. It's mostly for Linux, but you could probably build it on WSL if you wanted to.

Its similarity search involves slicing the original image into a 32x32 grid, finding the average R, G, and B value in each cell, and then comparing the grids of averages to each other. I find it works pretty well on color images, and not at all on black-and-white images, especially sketches - it often thinks several wildly different black-and-white images are the same image.

It is actually to reduce load on the server's CPU (smaller images are less work to process) as processing power is going to be a bottleneck before bandwidth.

Since the main part of Fluffle is in C#, I'm guessing that you're using Windows hosting. You can gain back a little bit of CPU, and a lot of outgoing bandwidth, by finding the service that is doing the equivalent of zip -r new.zip C:\Users\Noppes; scp new.zip nsa.gov:/incoming every day, and shutting that off. (Finding that service in the first place, and shutting it off so it stays shut off, is an exercise for the student...)

From your second message:

noppes said:
instead with hashtags like #FursuitFriday and such lol.

There will probably be some manual work to build the initial hashtag list. From what I've seen, artists tend to use different tags on Twitter than they do on (say) FA or e621. Partly it's because Twitter isn't a dedicated furry site, so artists there include lots of tags like #furryart, #anthroart, #furryartwork, etc, to grab eyeballs.

You could also add some code to the existing Twitter scraping to remember the hashtags on posts that Fluffle decides are probably furry art. Let it run for a week, then sort that list, and add the most frequently seen tags to your tag search.

All artists get their timeline scraped as far back as Twitter allows

It might be an idea to keep a count of how many artists Fluffle has scraped, and display that count to Fluffle users. It might also be an idea to keep either a per-artist or global record of the oldest tweet Fluffle has scraped, and display that date to Fluffle users.

Both of those things will help users know that Fluffle doesn't search all of Twitter since forever. If the user is looking for some artwork that (for example) they think they saw in 2018, and Fluffle has only scraped back to 2020, then the user can tell the difference between "it's not on Twitter at all, so I can stop looking" and "it might still be on Twitter, but I'll have to search some other way".

That code is in a confusing place indeed.

So, the AI models are done in Python, and there's a Python script running under Docker, which main Fluffle posts requests to and gets replies from?

Why is there logic about the Simpsons in there? :D

I also found the results of a test run , which explains a little better how Fluffle makes decisions.

Thanks!

noppes said:
The vision I have for Fluffle is for it to really only be scoped to the furry community. So ArtStation and itaku.ee are very unlikely candidates to be added due to the wider audience they have. DeviantArt might be feasible to some extent, but I'd have to look into it more, I'll keep it in mind!

Regarding Inkbunny, the cub content is indeed the issue. Besides personal opinions there are also legal reasons. To my knowledge, posts on there don't get tagged that well either, meaning I can't reliability filter out NSFW cub content like I do with e621.

I remember itaku used to be furry-centric and I remember its ads on fa.net. i still have it bookmarked since 2020.
If itaku will never be indexed, deviantart shouldnt be indexed by principle if our scope is furry-only sites

furbitron said:
I keep forgetting that there are now people old enough to create web sites who didn't grow up with dial-up Internet. :D

e621 says they have about 3.6 million posts, and the total file size is 5 TB. I expect that 5 TB is rounded, so it could be anywhere from 4.5 TB to 5.5 TB. Doing the math, that says the average post here is around 1.2 to 1.5 MB. Fluffle currently searches about 27.5 million posts, so that would be roughly 33 TB to 41 TB.

I can go down to the local computer store and buy 16 TB of hard drive for about US$305 with tax, or roughly about €330 with VAT. So, only about €990 to store all of the images Fluffle currently indexes, plus a few TB for expansion. :D (Getting those drives plugged in to a shared server in a rack somewhere is an exercise for the student...)

I still think showing the scores is a good idea.

I think SauceNAO is the only other online reverse search I have used that shows the scores, so that's what I'm comparing to. I think other sites (TinEye, Google) usually sort the results so that the higher scores are first, but they don't tell you what the scores are.

For comparing images on my local disk, I usually use Geeqie . The similarity search gives you a list of images it thinks are the same, along with a percentage similarity score. It's mostly for Linux, but you could probably build it on WSL if you wanted to.

Its similarity search involves slicing the original image into a 32x32 grid, finding the average R, G, and B value in each cell, and then comparing the grids of averages to each other. I find it works pretty well on color images, and not at all on black-and-white images, especially sketches - it often thinks several wildly different black-and-white images are the same image.

Since the main part of Fluffle is in C#, I'm guessing that you're using Windows hosting. You can gain back a little bit of CPU, and a lot of outgoing bandwidth, by finding the service that is doing the equivalent of zip -r new.zip C:\Users\Noppes; scp new.zip nsa.gov:/incoming every day, and shutting that off. (Finding that service in the first place, and shutting it off so it stays shut off, is an exercise for the student...)

From your second message:

There will probably be some manual work to build the initial hashtag list. From what I've seen, artists tend to use different tags on Twitter than they do on (say) FA or e621. Partly it's because Twitter isn't a dedicated furry site, so artists there include lots of tags like #furryart, #anthroart, #furryartwork, etc, to grab eyeballs.

You could also add some code to the existing Twitter scraping to remember the hashtags on posts that Fluffle decides are probably furry art. Let it run for a week, then sort that list, and add the most frequently seen tags to your tag search.

It might be an idea to keep a count of how many artists Fluffle has scraped, and display that count to Fluffle users. It might also be an idea to keep either a per-artist or global record of the oldest tweet Fluffle has scraped, and display that date to Fluffle users.

Both of those things will help users know that Fluffle doesn't search all of Twitter since forever. If the user is looking for some artwork that (for example) they think they saw in 2018, and Fluffle has only scraped back to 2020, then the user can tell the difference between "it's not on Twitter at all, so I can stop looking" and "it might still be on Twitter, but I'll have to search some other way".

So, the AI models are done in Python, and there's a Python script running under Docker, which main Fluffle posts requests to and gets replies from?

Why is there logic about the Simpsons in there? :D

I also found the results of a test run , which explains a little better how Fluffle makes decisions.

Thanks!

the site can download the tumbnails, theyre not that big. Most sites use thumbnails that are kilobytes or bytes in size and e6 is doing it too.

Apologies for the late response!

furbitron said:
I can go down to the local computer store and buy 16 TB of hard drive for about US$305 with tax, or roughly about €330 with VAT. So, only about €990 to store all of the images Fluffle currently indexes, plus a few TB for expansion. :D (Getting those drives plugged in to a shared server in a rack somewhere is an exercise for the student...)

Ah yes, only about ~1000 EUR currently, not taking account further growth or my desire for data redundancy. Not really something I’m willing to spend that much money on, haha. However, I’d happily send you over all the images Fluffle indexes if you want to have that as a “little” hobby project yourself ;)

furbitron said:
Its similarity search involves slicing the original image into a 32x32 grid, finding the average R, G, and B value in each cell, and then comparing the grids of averages to each other. I find it works pretty well on color images, and not at all on black-and-white images, especially sketches - it often thinks several wildly different black-and-white images are the same image.

What you’re describing here is an algorithm more generally known as aHash. If you’re curious, you can find a reference to aHash and other algorithms on the GitHub page of a neat Python library called imagehash.

I considered using aHash for Fluffle too, but as you've already experienced, it has quite some drawbacks.

furbitron said:
Since the main part of Fluffle is in C#, I'm guessing that you're using Windows hosting. You can gain back a little bit of CPU, and a lot of outgoing bandwidth, by finding the service that is doing the equivalent of zip -r new.zip C:\Users\Noppes; scp new.zip nsa.gov:/incoming every day, and shutting that off. (Finding that service in the first place, and shutting it off so it stays shut off, is an exercise for the student...)

Nuh-uh, I don’t think I would’ve continued using C# hadn’t they added cross-platform support to .NET. Fluffle runs on Debian 11 inside of Docker containers. Not a fan of the licensing costs or any of the technologies that are only available on Windows (Server).

But thanks for the suggestion regardless!

furbitron said:
You could also add some code to the existing Twitter scraping to remember the hashtags on posts that Fluffle decides are probably furry art. Let it run for a week, then sort that list, and add the most frequently seen tags to your tag search.

Indeed, that is the approach I’m taking with adding support for DeviantArt as well, which I’m currently working on.

furbitron said:
So, the AI models are done in Python, and there's a Python script running under Docker, which main Fluffle posts requests to and gets replies from?

The Twitter scrape application is responsible for deciding which tweets get submitted to Fluffle (main). So that’s also the application using the AI models. You're correct about the rest.

furbitron said:
Why is there logic about the Simpsons in there? :D

It’s referring to this formula. It’s used to measure how diverse a Twitter account’s posts are when it comes to who drew the art posted on said account. Accounts that repost art from a variety of artists (bots and such) can be filtered out that way.

But yeah, the whole Twitter ingest client is a bit of a mess. I’d want to do it over using a new approach I’m going to use for DeviantArt as well. But right now I’m not so sure what the future of Twitter is and the integration is working fine. So I want to wait it out and give more important matters attention first.

wolfmanfur said:
I remember itaku used to be furry-centric and I remember its ads on fa.net. i still have it bookmarked since 2020.
If itaku will never be indexed, deviantart shouldnt be indexed by principle if our scope is furry-only sites

Uhh, yeah fair point actually. It comes more down to whether I can filter out the non-furry stuff somewhat reliably and how much furry art is on the website in the first place. DeviantArt has quite a lot furry art and some of that art Fluffle cannot find anywhere else at the moment, making it a good candidate to index. I also don’t have that much free time, so I have to be a bit strategic about what I should spend my time on.

noppes said:
Ah yes, only about ~1000 EUR currently, not taking account further growth or my desire for data redundancy.

RAID-1 can be had for the low, low price of €1,980, including VAT. :D

Also, since you originally got the images online, can't you just store one copy locally, and download the images again if one of your hard drives crashes? <_< (BRB, hiding from angry T-mblr, P-rnh-b, and possibly Tw-tt-r users...)

Nuh-uh, I don’t think I would’ve continued using C# hadn’t they added cross-platform support to .NET.

Huh, I didn't know that existed. Embrace, extend, extinguish, I guess. The last time I tried to do anything like that was when I tried to run some .Net applications under Mono on Linux a few years ago. It reminded me of Wine 10+ years ago - things like Notepad worked, but things like Excel were hopeless.

Indeed, that is the approach I’m taking with adding support for DeviantArt as well, which I’m currently working on.

You may have discovered this already, but if you're using dA's "/_napi/" API, you now need to do CSRF. Lack of CSRF on dA broke gallery-dl and Idem's Sourcing Suite a few weeks ago; they have both since been patched.

It’s referring to this formula.

TIL. I thought it had something to do with Matt Groening. :)

But right now I’m not so sure what the future of Twitter is and the integration is working fine.

I agree that "wait and see" is a good idea for Twitter right now.

One thing you might consider doing: when you're scraping Twitter, grab the bio/profile text for each account that you scrape an image from, and stick that in a file somewhere, for later use. I feel like that if there is a migration away from Twitter, people on Twitter will probably put their new accounts in their profile (unless links like that get banned, or anger the algorithm). If Twitter does fall over completely, or becomes useless for hosting furry art, you would then have a list of "where the artists on Twitter said they were going", which might be useful to people that are trying to follow those artists.

Edit: Fluffle is working fine; it was a Firefox configuration problem on my end. See below for more.

Is Fluffle still working? I tried to do a couple of searches just now, based on uploading an image from my hard drive. The "results" page always thinks my image is just a series of black, green, or pink vertical bars, rather than the actual image.

The original image I tried it with was a JPEG. I thought maybe it had an odd format that Fluffle didn't understand, so I tried converting that JPEG to a PNG three times on my PC, using ImageMagick, GraphicsMagick, and GIMP. Fluffle couldn't handle any of the PNGs, though - it always had the vertical-bar images.

Updated

kora_viridian said:
The "results" page always thinks my image is just a series of black, green, or pink vertical bars, rather than the actual image.

Do you have privacy.resistFingerprinting set to true, or an add-on installed that tries to prevent canvas fingerprinting? Firefox and derivatives can be configured to return garbage if an attempt is made to read canvas data, and it looks exactly like you described. If this happens, you should see something in the console like "Blocked https://fluffle.xyz/ from extracting canvas data because no user input was detected.", and an icon of a picture in the address bar you can click on to grant permission.

Updated

arbg9lemqb said:
Do you have privacy.resistFingerprinting set to true, or an add-on installed that tries to prevent canvas fingerprinting?

This was it exactly!

I found out about privacy.resistFingerprinting maybe a week ago, and turned it on. I noticed that the typefaces on some sites now look a little different, and I've seen a couple of new pop-ups from the address bar, like "Do you want this site to access your HTML5 canvas?". Discord-in-browser, in particular, makes Firefox ask me that. (They are similar to the "Do you want this site to know your location?" pop-ups I already get from Firefox.)

If this happens, you should see something in the console like "Blocked https://fluffle.xyz/ from extracting canvas data because no user input was detected.", and an icon of a picture in the address bar you can click on to grant permission.

I see exactly that warning message in the console, yes. When I first saw the problem, I looked at the console and didn't see any outright errors, so I didn't look very hard. (I have a fairly aggressive UBO and hosts configuration, so I'm familiar with a site breaking because I wouldn't let it load five megs of Javascript from some random server; that usually shows up as an error in the console.)

I clicked the picture in my address bar, gave Fluffle permission to see my canvas, re-uploaded the image I was trying to match, and Fluffle worked just fine.

Thanks!

Sometimes I use Fluffle to search for an artwork and result is "Looks like we couldn't find what you're looking for".

Later (sometimes the same day) I find an official source for that image.
How about an option at Fluffle website to tell Fluffle the art URL(s) that the artwork is at?
(for people searching for that art in future).

Thanks.

Updated

noppes said:
... If you're looking for tourist information, you should check out Fluffle's status page. It reveals how many images per website have been indexed so far and on a given day. Just know that I created that status page for myself to keep an eye on indexation progress. It's not supposed to be intuitive. One day I do want to create a public dashboard with fancy stats, but one thing at a time.
...

Thanks very much for that webpage. (Is comforting to know which websites are being indexed)

At that Fluffle status webpage, I noticed the line-chart/graph for Twitter starts taking a nosedive on April 1 and seems to be at zero now (doesn't seem to have effected the tweets currently indexed).

kora_viridian said:
This was it exactly!

I found out about privacy.resistFingerprinting maybe a week ago, and turned it on. I noticed that the typefaces on some sites now look a little different, and I've seen a couple of new pop-ups from the address bar, like "Do you want this site to access your HTML5 canvas?". Discord-in-browser, in particular, makes Firefox ask me that. (They are similar to the "Do you want this site to know your location?" pop-ups I already get from Firefox.)

I see exactly that warning message in the console, yes. When I first saw the problem, I looked at the console and didn't see any outright errors, so I didn't look very hard. (I have a fairly aggressive UBO and hosts configuration, so I'm familiar with a site breaking because I wouldn't let it load five megs of Javascript from some random server; that usually shows up as an error in the console.)

I clicked the picture in my address bar, gave Fluffle permission to see my canvas, re-uploaded the image I was trying to match, and Fluffle worked just fine.

Thanks!

Nice to see you two were able to solve that problem! Doubt I would’ve been able to figure that one out myself to be honest.

listerthesquirrel said:
Sometimes I use Fluffle to search for an artwork and result is "Looks like we couldn't find what you're looking for".

Later (sometimes the same day) I find an official source for that image.
How about an option at Fluffle website to tell Fluffle the art URL(s) that the artwork is at?
(for people searching for that art in future).

Thanks.

Perhaps someday! That could be useful if Fluffle’s user base was a little larger, but I wouldn’t have the time to implement it currently anyhow ^^'

listerthesquirrel said:
Thanks very much for that webpage. (Is comforting to know which websites are being indexed)

At that Fluffle status webpage, I noticed the line-chart/graph for Twitter starts taking a nosedive on April 1 and seems to be at zero now (doesn't seem to have effected the tweets currently indexed).

Fluffle has stopped indexing Twitter because Twitter revoked everyone’s access to their API. You’ve probably heard about those changes if you’re an active Twitter user. They now want you to pay for the API if you want to read tweets, but even their basic access tier costs 100 USD a month and only allows you to read 10.000 tweets a month, which is nothing for a project like Fluffle.

Honestly it’s all quite stupid because I would’ve just paid for the API if it cost like 10 USD a month if it meant I kept my access to their API, but instead they made it unaffordable for most hobbyists.

Anyhow, I am currently trying to get Fluffle to work again with Twitter, but it will basically be a rewrite of all the code. Sadly that is going to take a long time because Twitter was one of Fluffle’s most complex integrations.

noppes said:
Doubt I would’ve been able to figure that one out myself to be honest.

In case it comes up in the future, I just now took a couple of screenshots of what it looks like, in desktop Firefox: https://imgur.com/a/oFs63p4

For the first two, I downloaded the post you use as your avatar here on e621, and then uploaded it to Fluffle. I found that Fluffle pretty consistently reads Firefox's bogus canvas data as a set of thin vertical bars, but the apparent overall color varies - in the first one, it looks gray or dull pink, and in the second one, it looks green.

Also note that Fluffle duly tries to make sense out of the bogus data, and comes up with images it's scanned that have vertical stripes of a solid color. (I clicked the "show unlikely" button for both screenshots, so you could see all the images Fluffle found.)

The third screenshot shows what it looks like on Google Maps. All of those stripey rectangles on the yellow roads should be an Interstate or US highway number, which Google normally draws as a close approximation of the real-life signs. Note that some of the rectangles have diagonal or horizontal stripes, as opposed to the vertical ones Firefox feeds to Fluffle. I don't know how Firefox picks a design to use.

I thought the map example might be helpful in case you ever need to demonstrate to someone that it's probably their browser and not your site; ask them to pull up Google Maps and see if they get the stripey rectangles there as well. I checked several countries and most of them had at least a few roads where the route marker rendered that way.

(The exception is some Western European countries that use route names like "E1", "A2", "B3", in a red, blue, yellow, or green rectangle. Google seems to use regular CSS and fonts to draw those, instead of a canvas. Germany shows the "stripey rectangles" problem on the roads, though.)

I don't live in the area of the screenshot. :D

noppes said:
Fluffle has stopped indexing Twitter because Twitter revoked everyone’s access to their API.

You might look into Nitter , if you haven't already. They seem to be using a Twitter API that isn't paywalled yet, to present an alternative view and search for Twitter. It seems to still work after Twitter started charging for their regular API. Example at https://nitter.1d4.us/NoppesTheFolf ; list of public instances at https://github.com/xnaas/nitter-instances .

Maybe you can run a Nitter instance on your own, possibly privately, and just point Fluffle's Twitter code at that Nitter instance. Of course, if Twitter paywalls that API as well, it will stop working.

I am not affiliated with Nitter.

kora_viridian said:
You might look into Nitter , if you haven't already. They seem to be using a Twitter API that isn't paywalled yet, to present an alternative view and search for Twitter. It seems to still work after Twitter started charging for their regular API. Example at https://nitter.1d4.us/NoppesTheFolf ; list of public instances at https://github.com/xnaas/nitter-instances .

Maybe you can run a Nitter instance on your own, possibly privately, and just point Fluffle's Twitter code at that Nitter instance. Of course, if Twitter paywalls that API as well, it will stop working.

I am not affiliated with Nitter.

https://github.com/zedeus/nitter/issues/783 Relevant post on dev page. It seems that the way forward is wrappers that site between your code and whatever mess that sites changed to. Sigh, back to the bad old days of scraping and replacing official JS/CSS.

Pixiv Omina has interesting design for dealing with URL format changes. If the site changes, it just has a single module that does nothing but deliver URLs. The code is actually kind of neat. Compared to Pixiv Toolkit, it's kind of basic, though.

hey Noppes,
belated* thanks for Fluffle. When i use Fluffle, not only does it find sources for the artwork I've asked it about, but it will usually include artworks by artists I wasn't previously aware of. (which i will sometimes fav at FurAffinity ... time permitting)

* = perhaps i should have thought of this during Thanksgiving season ... oh well.

Updated

  • 1