Topic: I made a thing to suggest posts you may like

Posted under e621 Tools and Applications

This topic has been locked.

Furrin_Gok said:
Thirteen images got deleted before the favorite could be unregistered, and so it became a negative once it did.

Before I started the 1000+ image test, the count was at 4 even though my favourites were empty at the time. So I've somehow gone from empty at 4 to empty at -13.

But we're leaning a bit off-topic.

Updated by anonymous

I just want you to know, zoranu, this is amazing! The hit rate for me is about 50% which is far better than what I alone can do browsing the site most of the time. Thanks for making this!

Updated by anonymous

trying this thing again just now and it looks like it's been improved a lot. 382 favorites currently and it only took about 3-8 seconds for the results to come up (vastly faster than it used to be) and the thumbnail images are now sized like they are here so they no longer look squished or stretched.

Updated by anonymous

I've made some more changes to how weights are attached to both individual tags and entire posts, so let me know if this is better or worse than the old way. Also, the color scheme has been updated to look a little more similar to how things look here.

Updated by anonymous

Genjar

Former Staff

This new algorithm seems good, and many of the bad matches seem to be caused by mistags and undertagged things.

I wouldn't favorite most of the matches (largely because of art quality), but the content matching is working pretty well.

Too many generic anthro felines and canines for my tastes, but that's probably because I have some 'canine' Pokemon and such among my favorites. Can't blame the algorithm for thinking that I like canines and felines in general.

Updated by anonymous

Genjar said:
(largely because of art quality)

Too bad poor quality is something you can't blacklist...

Genjar said:
Too many generic anthro felines and canines for my tastes, but that's probably because I have some 'canine' Pokemon and such among my favorites. Can't blame the algorithm for thinking that I like canines and felines in general.

Tag numbers could also have something to do with that. The 305523 canines have a much higher chance of showing up than, say, the 4558 amphibians.

Updated by anonymous

Seems to be timing out for me. No results on Firefox or Chrome after 10+ minutes, just spinning circle. I tried without my blacklist and with ESET's protection disabled from the system tray to no avail.

It would be interesting to see the results of the algorithm's analysis in human-readable terms. I suppose I'm referring to the algorithm's weighting of our favorites? For instance, my favorites might imply a 7.## weighting for the young tag and a 5.## weighting (out of 10.00) for pokemon. Also, general diagnostic details would be useful in cases like mine, such as favorites fetch time, analysis time, results generation time, and then displaying that information immediately to monitor the search's progress.

Edit: Still no results after keeping the search open for 3+ hours. I tried searching for suggestions for other users with fewer favorites (<1000), and I got their results in a timely manner.

Edit2: I tried someone else with more favorites than me, got their result in a couple minutes, tried myself again, and also got a result after a few minutes.

Edit3: Searched again with my blacklist, which took 2-3 minutes. I started adding my impressions and thoughts to this edit, but it's enough for its own post.

Updated by anonymous

abadbird said:
Seems to be timing out for me. No results on Firefox or Chrome after 10+ minutes, just spinning circle. I tried without my blacklist and with ESET's protection disabled from the system tray to no avail.

There was an issue causing a timeout when loading large favorite lists. That has been corrected.

abadbird said:
It would be interesting to see the results of the algorithm's analysis in human-readable terms.

I'll try to add some info in the next version, if I can come up with a way to format it in a way that makes some sense.

abadbird said:
Also, general diagnostic details would be useful in cases like mine, such as favorites fetch time, analysis time, results generation time, and then displaying that information immediately to monitor the search's progress.

I'll try to work this in too.

Updated by anonymous

zoranu said:
There was an issue causing a timeout when loading large favorite lists. That has been corrected.

That's great. The sudden change from non-working to working a few hours later with no changes in approach on my part suggested some back-end activity.

Here's my feedback + talking out my ass as usual.

General:

  • oldest result that I noticed was posted 6 months ago (Dec. 12)
  • results seem more relevant than the other times I've tried this
    • after checking all 300 results, I have 15 tabs still open of posts I would probably fav whereas the earlier incarnations of this project netted at most 3 lucky posts that were borderline to taste
    • there's another 5 tabs for artists I don't think I've seen whose galleries seem deserving of a close look

Some filtering:

  • skip posts that are flagged for deletion
    • if possible and where applicable, switch the result to the original/higher quality post
  • to reduce redundant results, skip parent/child if one is already selected as a result, their tags are highly similar (90%?), and they were uploaded close together
    • might want to give preference to child posts (excluding edits) since those tend to be the more finished or final versions
    • can also evaluate parent-child post scores, fav counts, and fav'd users to identify redundant results and the "better" version
  • for posts in pools, only return the post with the best combination of score and weighted tags
    • I really didn't need to see all or most of these
  • here's a thought: ignore posts that the user has voted on but not fav'd to filter out posts they've already seen
    • if possible, might as well filter out posts that a user has commented on or uploaded too, but perhaps not posts that the user has tagged (?)

Use score as a stronger indicator (and maybe fav counts):

  • check the range and "density" of the scores (and fav counts?) of the analyzed favorites to establish a "score floor" below which posts are negatively weighted
    • in my case, I have 3844 favs yet only 376 (+6 deleted by takedowns...) have <20 score (arbitrarily chosen), so I shouldn't have gotten as many results with low scores as I had due to my apparent materialism or conformity
      • posts outside the user's typical range can be analyzed separately for tags deserving high weighting because they were fav'd in spite of their aberrance
        • just look at my aforementioned <20 score favs for some glaring similarities and shout "pervert" like a cliche anime girl
        • moreover, the fav analyzer identifying such an outlier group presents the opportunity for a secondary search with weightings strongly mirroring that group
      • caveat: lower score (5-20 after a few hours) is less of an indicator on new posts (probably normalizes after a week)
      • caveat: scores on older posts should be adjusted to align them with e621's current voting userbase => from eyeballing this search per year increment, probably something like a 10% increase after [3 years], a 50% increase after [6 years], a 100% increase after [10 years] as coordinates plotted on a graph
      • caveat: I expect safe posts get upvoted less than questionable and explicit posts, so they should be handled separately
        • perhaps the ratings of results should be roughly proportional to that of the user's favs
  • ultimate => measure the affinity of user favs against other users as a predictor

Updated by anonymous

abadbird said:

  • here's a thought: ignore posts that the user has voted on but not fav'd to filter out posts they've already seen

That one requires your login info to work. Only someone logged into your account can see your votes and perform searches on them.

abadbird said:

  • check the range and "density" of the scores (and fav counts?) of the analyzed favorites to establish a "score floor" below which posts are negatively weighted
    • other stuff about scores...

The score:# metatag works. Try playing around with that.

Updated by anonymous

BlueDingo said:

  • here's a thought: ignore posts that the user has voted on but not fav'd to filter out posts they've already seen

That one requires your login info to work. Only someone logged into your account can see your votes and perform searches on them.

I did quickly check votedup:[notmyusername] when I was typing that post and saw many more pages than I had voted on posts to fill them, so I assumed my search worked. What the hell. Tried other usernames and only got posts I voted on, showing 2 pages. A bug.

  • check the range and "density" of the scores (and fav counts?) of the analyzed favorites to establish a "score floor" below which posts are negatively weighted
    • other stuff about scores...

The score:# metatag works. Try playing around with that.

I was using score:# to check some stuff while composing that section. I was talking about the analyzer using the user's favs to establish a baseline of scores that it could use to find better results, because a fair chunk of the analyzer's results was scored well below the bottom 10% of my favs (i.e., 19 and under). Say what you will, the score distribution of my favs clearly demonstrates that I'm unlikely to fav something scored that low, so I should have seen much less of such posts from the analyzer than I had. The rest of that group of points was observations and conditions I would factor into a score-weighting algorithm.

The score distribution of my favs looks like:
-1000..10 = 86
10..20 = 358
20..30 = 480
30..40 = 535
40..50 = 500
50..60 = 411
60..70 = 329
70..80 = 280
80..90 = 230
90..100 = 177
100..110 = 162
110..120 = 133
120..130 = 104
130..140 = 81
...and so on.

I know that kind of data can be used to determine a lower threshold, in my case, where posts scored beneath it are outliers, and the similarities of those outliers inform some facet(s) of the user's taste. I can conceptualize some ideas for sussing out meaningful information from such data, but of course I don't have the math or programming knowledge needed to fit that into the fav analyzer's algorithm.

Updated by anonymous

abadbird said:

  • ultimate => measure the affinity of user favs against other users as a predictor

I actually wrote and implemented this as a test a while back, it doesn't produce very good results. It's almost identical to sorting by post score and removing things you have already marked as favorite or voted on. Including voted posts makes the results worse when used in a predictor model because users use votes in different ways. It produced a gradient of trash where if you had too few favorites it would toss you in with everyone and only list posts with huge favorite counts and high scores, if you had too many it wouldn't be able to gain a clear understanding of what category you fell in and give the same results.

Updated by anonymous

abadbird said:
I did quickly check votedup:[notmyusername] when I was typing that post and saw many more pages than I had voted on posts to fill them, so I assumed my search worked. What the hell. Tried other usernames and only got posts I voted on, showing 2 pages. A bug.

Weird, it works perfectly fine for me. I get 54 pages no matter what I type in.

abadbird said:

Numbers

The score distribution of my favs looks like:
-1000..10 = 86
10..20 = 358
20..30 = 480
30..40 = 535
40..50 = 500
50..60 = 411
60..70 = 329
70..80 = 280
80..90 = 230
90..100 = 177
100..110 = 162
110..120 = 133
120..130 = 104
130..140 = 81
...and so on.

More numbers

These are my numbers:

-9001..-01 = 9
00..09 = 1147 (likely due to me tagging so many gentags:<12 images)
10..19 = 699
20..29 = 277
30..39 = 157
40..49 = 89
50..59 = 45
60..69 = 46
70..79 = 28
80..89 = 30
90..99 = 24
100..109 = 16
110..119 = 17
120..129 = 11
130..139 = 16
140..9001 = 46

Updated by anonymous

welp, worked for me. then again like, 90% of my favorites are just butt pics so...not really a hard nut to crack. still cool tough

Updated by anonymous

Updated again.
Changed up the analysis process a little bit.
Progress is now a bit more informative than a generic progress bar.
Added an option to include order:random in some API calls. Leads to more diverse results, but due to the random nature results may better or worse than normal.
Added maximum output. Doesn't affect generation times, but may improve load times on slower computers/connections.
Still trying to come up with an organized way to show a breakdown of how favorites affect results, but have nothing good yet.

Updated by anonymous

Any ideas on how to exclude lots of specific images from the search without favouriting them or blacklisting every post ID? Votes don't work for this (so voted:whatever is out), sets don't seem to work as far as I can tell and adding every single ID to the blacklist would require at least 26000 characters which I doubt it can handle. Blacklisting tags can only get you so far, plus it would be nice to stop images I've already upvoted from constantly appearing in the results. Luckily, the ones I've downvoted don't seem to show up unless I make them.

Edit: Blacklisting post IDs doesn't work anyway.

Updated by anonymous

this thing just keeps getting better. :)

Updated by anonymous

BlueDingo said:
Just for shits and giggles, I tried the following individually: (v0.4)

1. post #1049426 - 0 results. No surprise here.
2. post #1005112 - 0 results. Kinda surprised. I figured it would find at least 1 other chair image.
3. post #974624 - 0 results. I think zero_pictured images stops it from working...
4. post #521328 - 0 results. Has a copyright this time (Star Fox) but still found nothing.
5. post #1144516 - 0 results. animate_inanimate this time, no species or characters.
6. post #1157879 - 0 results. Character and copyright tags present but no species tags (or many other tags for that matter).
7. post #1157971 - 3 results. Character and copyright tags present but no species tags. Lots of general tags. Finally found something but what exactly did it look for?
8. post #1159014 - 0 results. Species tagged, others not. I'm guessing character, species and copyright tags don't make a difference. Looks like solo was ignored as well.
9. post #286706 - 0 results. big_breasts tag present, but not much else. Low tag count seem to kill it.
10. post #315433 - 0 results. Zero pictured, ~40 general tags.
11. post #765898 - 6 results. Some basic gen tags present. All results had the same character, maybe character tags aren't ignored?
12. post #207584 - 0 results. Previous test without character tag.
13. post #1146671 - 23 results. Gentags only, over 45. Not zero_pictured. See results section.

Results:
What I learned:
  • zero_pictured kills it.
  • Displayed result count is always 1 higher than actual result count.
  • It only seems to care about certain general tags and ignore everything else, though test 11 suggests otherwise.
  • There may be a minimum tag requirement for gentags. Too few and it won't work.
  • Well-tagged images (>30 tags) are way more likely to be found.
  • There may be a priority system in place, ensuring at least one tag is in every result.

Try your favourites again and see if a tag shows up in every result. If it does, post it.

Random shit:
  • Gotta love the numbers in the results section:
    • Test 7 (lucky 7) with 3 results (good things come in threes).
    • Test 7 and 11 next to each other (7-Eleven ).
    • Test 11 had 6 results. The 6th prime number is 11 (1,2,3,5,7,11,13).
    • Test 13 (Unlucky 13) which had 23 results (23 enigma ).
    • 7+11-13 = 5 (Law of Fives)
    • 7, 11 and 13 are consecutive prime numbers.

Kek.

Updated by anonymous

Minor update again. Nothing user side this time just performance improvements.
New version is 10-15% faster in my tests.

This is likely the last update for this incarnation of the tool. It's coded in PHP and I'm hitting a wall in terms of performance. The single thread nature of PHP really isn't suited well for quickly pushing through large data sets like the ones being used here. So, I'm going to start working on a new version in a different language that has better multi-threading support.

Updated by anonymous

zoranu said:
Minor update again. Nothing user side this time just performance improvements.
New version is 10-15% faster in my tests.

This is likely the last update for this incarnation of the tool. It's coded in PHP and I'm hitting a wall in terms of performance. The single thread nature of PHP really isn't suited well for quickly pushing through large data sets like the ones being used here. So, I'm going to start working on a new version in a different language that has better multi-threading support.

I'm going to lean in and whisper "Go" and then see myself out for the unpopular opinion of languages to learn and enjoy.

In all seriousness, for something with built in concurrency primitives, it's a pretty solid language, and has a batteries included standard library.

Updated by anonymous

Alright, I've got something a bit different here

http://zoranu.ddns.net/favs/favs.php

There is a new slider that will adjust how much recent favorites are preferred over older ones.

It still takes about as long as the previous version, but is capable of more calculations in the same time. (7200 posts analysed in 90-120 seconds compared to ~4500 in the same amount of time in older versions)

Yes it is still PHP, but I think I've solved the concurrency issue for now. However, the this version only operates if my home server is online. I'm unemployed right now, so I can't afford to rent a server that has enough power to run this.

Updated by anonymous

Create Blacklist button/page is broken:

Object not found!

The requested URL was not found on this server. The link on the referring page seems to be wrong or outdated. Please inform the author of that page about the error.

If you think this is a server error, please contact the webmaster.

And a search gave me this with no pictures:

Hey, abadbird, in 109.1235 seconds 7200 images have been anaylzed.

On Firefox 56.0.2.

Updated by anonymous

Yeah the blacklist button isn't working from that page, I'll get that fixed soon. For now though blacklists set on the old version are still used by the new one.

As for the zero results, I've checked the logs and everything seemed to be working as expected but somehow managed to generate a profile that couldn't find a single match. This should be fixed now.

Updated by anonymous

notawerewolf said:
page only times out and refuses to load

Had some storms and power outages here last night, so my server went down. However, after checking the logs I can see that your request completed, just in an unexpected way.

Every time someone hits submit, the server running the UI checks to see if a processing server is online. If it is, a request is generated and then that server is checked one more time before the request is sent. If the process server goes down in those few moments between the first and second check, the UI server will handle the request in the background while it waits for the other server to come back (usually just a minute or two) but, in this case it didn't come back for several hours. So, the UI server handled the entire process. The UI server isn't really fast enough for that though. It runs a single thread @ 2.4GHz, while the processing server runs 8 threads @ 4.1GHz

So, your request did complete eventually, 24 minutes and 33 seconds later.

You can see it by entering your username here: http://zoranu.ddns.net/favs/loadold.php

This can also be used if you want to see your last generated results without having to wait for the system to calculate them again.

Updated by anonymous

I tried http://zoranu.ddns.net/favs/favs.php today. I clicked Submit, the GUI said 'Done' but nothing was shown. I tried again, nothing was shown. About half an hour later, I tried the loadold link you just gave; it resulted in 'Couldn't find savageorange. This file may not exist, or generation may not be complete.'

Server status at this time says

S1: Online
S2: Offline
S3: Online

Most recently, I tried toggling SFW mode, as I wasn't absolutely certain whether the display indicated it was on or off. This made no difference to the outcome.

On the positive side, I did figure out the 'broken thumbnails' issue I had before was due to Privacy Badger addon -- I needed to adjust the slider for static.e621.net (I set it to 'disallow cookies' rather than 'block completely'). Confirmed by loading notawerewolf's suggested posts via 'loadold'.

Edit: also tested in Min (rather than Firefox), on the off chance my addons were interfering. No difference in result.

Updated by anonymous

was gonna post my results from newest link but 96% were posts from an artist whos work makes up 0.0072 of my favourites. I don't like any of them, or the artist it slapped everywhere for some reason. I also haven't favourited one of their posts in quite awhile, either

for the standard site (10 depth setting):

Open

1. post #989913 - HELL YES, faved
2. post #1334034 - hate urethral but this gives me the weirdest boner, i guess fav
3. post #883981 - eh... good but no fav
4. post #384369 - HELL YES, faved
5. post #1176302 - eh... good but no fav
6. post #1083305 - no idea how it was smart enough to grab a pic of a chick that looked entirely ambiguous and was SUPER hot. faved and wtf that's nuts
7. post #1079322 - swear i favourited every page in this pool, guess i missed this one. a total given, and of course faved
8. post #1172027 - sucks, no fav
9. post #1326729 - meh, no fav
10. post #382283 meh, no fav
11. post #77819 - HELL YES, faved
12. post #711548 - HELL YES, faved
13. post #204245 - sucks, no fav
14. post #1072494 - art is so hot but group is my biggest turn off. it got the art on point but definitely no fav
15. post #1270845 - good, faved
16. post #1196362 - good, faved
17. post #168922 - meh, no fav
18. post #1136804 - good, faved
19. post #834670 - HELL YES, faved
20. post #854575 - good, faved. sucks it's supposedly a chick tho

new one was awful on default settings. totally messed up for me. old one on high setting was pristine, given I favourite about one in twenty posts I see (and I only search under the gay tag) and it got 12/20. plus there was only one of them I disliked at all. i'm impressed

Updated by anonymous

Checked up on the new thing again today, verified that my results weren't in cache (since Zoranu helpfully left the cache directory publically accessible) This time, it didn't return immediately, and eventually (~270 sec) came up with a list of results.

Of 300 recommendations, ~23 were interesting enough to look at, and 1 was interesting enough to fav (post #1286550).

About 20 had no obvious connection to the themes in my current favs.

Went to the old one and set search depth to 10. That produced about 16 of 300 results that were interesting enough to look at, but no favs. The results list was more consistently on-theme; aside from that I would say it's roughly the same quality-wise as the new version.

Updated by anonymous

I was playing around with some of the code today and came up with this.

https://i.imgur.com/6rvGwzO.png

It's a chrome extension that tries to find images similar to the one you're looking at. It's early version right now, so it has some issues, but I thought I'd share it anyway.

Known issues:

  • Results will sometimes be duplicated.
  • Extension works on e926 as well, but still draws results from e621 so results may contain mature content.
  • Code is similar to that in the Favorite Analyzer, so results may occasionally be wildly inaccurate.

Download: https://rpg.gs/suggester_ext.crx (plugin is hosted there because I have an SSL certificate for that domain, and chrome throws errors without https)
Chrome won't auto-install this, so download it then drag and drop onto chrome.

Updated by anonymous

zoranu said:
I was playing around with some of the code today and came up with this.

https://i.imgur.com/6rvGwzO.png

It's a chrome extension that tries to find images similar to the one you're looking at. It's early version right now, so it has some issues, but I thought I'd share it anyway.

Sounds like something I can use. Let us know when you've ironed out all the kinks and made a Greasemonkey version.

Updated by anonymous

I have an idea.
You can get someone's upvotes with the tag votedup:username, so why not add a version of your tool that use the votes instead of the favorites?

Well, it does not work.

Updated by anonymous

Zenti said:
I have an idea.
You can get someone's upvotes with the tag votedup:username, so why not add a version of your tool that use the votes instead of the favorites?

Well, it does not work.

Yeah, the voted/votedup/voteddown search tags only work when logged in, and I'm trying to avoid requiring a login.

BlueDingo said:
Sounds like something I can use. Let us know when you've ironed out all the kinks and made a Greasemonkey version.

Working on it.

Update: Done. Well, the greasemonkey part anyway. The duplication should be fixed too, just let me know if it comes up. As for the inaccurate results, thats something I'm always working on but there's only so much information that can be gathered from tags alone.
https://rpg.gs/suggest.user.js

Update2: Just checked, also works on firefox mobile with tampermonkey

Updated by anonymous

Update time!

v2.2.0.2 Changes:
(This is still on the test page at http://zoranu.ddns.net/favs/favs.php)

  • Analysis method changed again. Tag types (species, artist, general, etc) are now handled as separate things, influencing scores differently.
  • Blacklist can now be updated without leaving page.
  • Webm posts now have a border to distinguish them from still image posts.
  • "Confidence" rating under images now a bit more meaningful.

Other:

  • Analysis produces a detailed dump of why a post was rated the way it was. Currently looking into making this available in a non-messy way.
  • Backend processes moved to Google Cloud to improve speed.

Updated by anonymous

I have made an attempt at creating a Greasemonkey script version of the analyzer. This one works a bit differently. Instead of checking your favorites, then showing results this version compares posts as you browse and sorts the page by similarity to your favorites.

Preview

Download

Updated by anonymous

I tried this tool out, so far i'm liking it!
It is a bit "wonky" at times but it gets the job done.
Only complain is that on the first time that i used it, i got like 40 images in a row of cows, not that i dislike them but considering what i have on favorites... its inaccurate.

Ultimate score: 8/10

Updated by anonymous

Quick update:

v2.2.1.0

  • Minor bug fixing and code cleanup on back end.
  • No longer considered a 'test' version, so it has been moved to the home page (http://zoranu.ddns.net)

NEW! v3 test (http://zoranu.ddns.net/beta/favs.php)

  • Has sort of an 'AI' now, results WILL be slow while it builds its data sets (no idea how long this will take, depends on how many people use it)
  • Results should become more accurate as more people use it.

Updated by anonymous

Doesn't seem as confident about me this time around, really wants to give me images that have lots of characters in one panel, or comics, but that doesn't match what is in my favorites at all. Confidence scores are all very low. Not sure if it just needs more input.

You might want to boost solo and duo tags based on how often they show up in the favorites. Character count plays heavily into my favorites.

Updated by anonymous

I'm very glad to see development of this thing is still going on! This thing has found me some of my favorite artists. Just a few notes on the new UI.

In the latest version, there doesn't seem to be an option to narrow results down by tags.

Also, it would be quite useful if all links opened in a new tab, as the calculated results are lost upon reloading, which can prove tricky during... one-handed browsing.

Again, keep it up!

Updated by anonymous

KiraNoot said:
Doesn't seem as confident about me this time around, really wants to give me images that have lots of characters in one panel, or comics, but that doesn't match what is in my favorites at all. Confidence scores are all very low. Not sure if it just needs more input.

You might want to boost solo and duo tags based on how often they show up in the favorites. Character count plays heavily into my favorites.

I'm trying to avoid manually adjusting the values of specific tags, but I'll see if I can nudge the calculations a bit to improve that.

anonymousanalogue said:
I'm very glad to see development of this thing is still going on! This thing has found me some of my favorite artists. Just a few notes on the new UI.

In the latest version, there doesn't seem to be an option to narrow results down by tags.

Also, it would be quite useful if all links opened in a new tab, as the calculated results are lost upon reloading, which can prove tricky during... one-handed browsing.

Again, keep it up!

I'll roll out an update in a few days to do both of these things.

Updated by anonymous

Cool! Another possibly useful feature would be to filter the actual input on which the system analyzes. Right now it defaults to every post a user has ever favorited, but being able to filter those by their upload date, or relevant tags could be useful in case you wanted the bot to only select images based on the tag preferences you expressed in solo or male/female images.

I don't know how complicated that would be on the backend side, but if the bot just feeds search results, it might work just replacing the "Username" field with a tag field that defaults to "fav:yourusernamehere". This might also have a few other neat applications, like suggesting images based on a set instead.

Updated by anonymous

I actually use this quite a lot. Thanks for this! It is very useful

Updated by anonymous

Pushed out a new update.
Adjusted how tag types are handled again. Character count is now taken into consideration. Also tried to set up some load balancing, let me know if you have issues.
or maybe not... Everything worked for a few minutes, then killed the server. On a related note, I wouldn't recommend Elastic Beanstalk. It shouldn't take 20 minutes to deploy an update to one file across 4 servers.

Updated by anonymous

AWS anything for this seems like a bad idea. At least for ones wallet anyways.

Hope you can get it working again. It's been interesting watching this project progress.

Updated by anonymous

Yeah, AWS isn't ideal. But unfortunately my home internet isn't reliable enough for this kind of thing anymore, so can't use my own server. I was using Google Cloud for a while, but they changed policies and now PHP's timeout can't be adjusted, so my scripts keep failing. Before that I as using another host, but had to stop after receiving some strongly worded emails about 'resource usage' and 'terms of service violations'. Some places get quite upset when you use an 'unlimited' account for more than just a basic website...

But, anyway things are back up for now. New calculations are in place and seem to be stable. But, as always, let me know if there are any issues.

Updated by anonymous

Latest version is at http://zoranu.ddns.net/favs.php

The backend for the beta version is a completly different monster that I haven't had a chance to update yet. It has a 600mb data set built (when a user favorites x, they are more likely to like y and z, etc) and I want to be sure any changes aren't going to invalidate that data or make it incompatible.

Updated by anonymous

Overall, I'd say the search is returning the best and most balanced results to date.

I've had this idea for a long time

Maybe you're already doing this, but I can't know if I don't say something.

I've had this idea for a long time, but expressing it is challenging. Basically, I think a user's preferences can be identified to some degree by (1) adding up all instances of a tag in their favorites and comparing it against (2) the number of posts in their favorites and (3) the total number of posts with that tag. In theory, that can identify fetishes and other appealing concepts while also accounting for a given tag's popularity (e.g., anthro vs taur, m/f vs impregnation, etc.). I had thought of another factor today, but it slipped away :/.

If a tag is uncommon but a user has a significant number of favorites with it, then the FavoriteAnalyzer should recommend posts with it. If a tag is common (e.g., human) but it doesn't appear often among a user's favorites, then it gets negative weighting. If a tag is common and also occurs with regularity among a user's favorites, perhaps that isn't so indicative of their preferences unless other tags belonging to the same logical group (e.g., anthro, feral, human, etc.) have weaker relative representation.

That gets interesting when considering artists. Does a user prefer more "mainstream" artists (need to consider scores and favs too, I guess) or do they fav a healthy amount of obscure artists too? Maybe a user only sees and favs popular artists but they would really like to see art from lesser known artists? That can also apply to characters and franchises.

Issues and Suggestions

1. I clicked on the Score Details of the very first result, and it loaded a page that, while interesting itself, caused me to lose my search results. Needs to open in a separate tab or some sort of popup, or search results need to be preserved.

2. If I click Submit without entering a "required" username, the search timer begins and "Please wait..." appears. I got results in 1:25. The Score Details of the first post showed no "Tags increasing score" but most of the post's tags under "Tags decreasing score". What gives? lol

Doing that a second time gave me results in 0:38. With minimal scrolling, a few results were repeated between the searches, like post #1219080, post #264133, and post #464478. And there was also a urethral post... need that blacklist.

3. Going back to issue #1 in the same tab, I clicked the back button from the Score Details page and the username field was empty, but I clicked on it and my username reappeared. If this page has a username in memory, shouldn't it always be displayed? If I hadn't clicked on the username field, would it still use my favs if I clicked Submit?

4. Shouldn't "Tags increasing score" order those tags by their score? There does appear to be some sort of ordering where some, but not all, broad tags are placed at the end of the lists and other tags are grouped together (e.g., cum_* tags). Regardless, the list is difficult to parse and the results are difficult to interpret. I highly doubt some basic tags that I'm neutral toward should have such high scores compared to some more niche tags that I do like.

5. Some posts have no "Tags decreasing score" and when that column does have something, their negative score is minuscule compared to tags in the other column. Feels WIP. Kind of incredulous and curious as to why I apparently have marginal negative affinity toward ferrin, zangoose, entei, and corsac_fox. I suppose that means the search expects me to have more of those fav'd than I do. Nonetheless, that may be correct. The real test is if the search recognizes when a user strongly dislikes something, like MLP or Undertale, without relying on their blacklist.

6. Since a tag's score is the same across results' Score Details--at least for the first tag I checked--can you also produce a profile page for the given username with the rest of the search results? It would have something like [username's] Top 50 Tags and Bottom 50 Tags.

7. Filtering search results. Have the search tabulate all the results' tags and allow users to filter those tags in or out, preferably without reloading the page if that means losing filtered results.

8. The ability to support user accounts and allow them to adjust preferences, probably not specific scores but rather "don't calculate for this tag" and "I [like/dislike] this tag [somewhat/strongly/very strongly]". Also, the ability for users to rate their affinity for posts' content (i.e., subject matter, not quality) as [like/neutral/dislike] to better calibrate future searches.

9. Maybe the search can give browser or even phone notifications if on mobile when the search completes? That's a recent thing. Be [current year]!

Updated by anonymous

Ok, I've fixed the no username issue. I wrote "Required", but forgot to actually require it. Not sure why it was showing those results, as the script shouldn't even run without a username.

Score details now open in a box without leaving search results.
Score details should be a little less nonsensical now.

As for negative scoring tags being small, larger negatives can exist but generally push the image's score so far down that it won't appear in search results due to being outnumbered by higher scoring images. The script searches through around 10,000 images before showing any results, so low scoring images get weeded out fairly early in the process.

Accounts with preferences are something that I've been considering for a while, just trying to decide on the best way to implement it.

Notifications? Maybe, but I'd rather just not have the search take so long.

abadbird said:

I've had this idea for a long time

Maybe you're already doing this, but I can't know if I don't say something.

I've had this idea for a long time, but expressing it is challenging. Basically, I think a user's preferences can be identified to some degree by (1) adding up all instances of a tag in their favorites and comparing it against (2) the number of posts in their favorites and (3) the total number of posts with that tag. In theory, that can identify fetishes and other appealing concepts while also accounting for a given tag's popularity (e.g., anthro vs taur, m/f vs impregnation, etc.). I had thought of another factor today, but it slipped away :/.

If a tag is uncommon but a user has a significant number of favorites with it, then the FavoriteAnalyzer should recommend posts with it. If a tag is common (e.g., human) but it doesn't appear often among a user's favorites, then it gets negative weighting. If a tag is common and also occurs with regularity among a user's favorites, perhaps that isn't so indicative of their preferences unless other tags belonging to the same logical group (e.g., anthro, feral, human, etc.) have weaker relative representation.

That gets interesting when considering artists. Does a user prefer more "mainstream" artists (need to consider scores and favs too, I guess) or do they fav a healthy amount of obscure artists too? Maybe a user only sees and favs popular artists but they would really like to see art from lesser known artists? That can also apply to characters and franchises.

I've already tried this and it works pretty well. A stripped down version is what the current search is based on. However, the server that's running the live version can't handle the full implementation right now. I'm working on making it more efficient, but I'm stretching the limits of my programming knowledge.

Update
I've made a few more changes.
You can now leave/come back/refresh without losing results/search progress.
Username is now remembered.
There is now a progress bar.
URL is now http://zoranu.ddns.net not http://zoranu.ddns.net/favs.php

Updated by anonymous

Please pardon my ignorance, but it seems like the server is down? I haven't seen any other forums talking about this software, so I don't know if something has happened since a year ago.

Updated by anonymous

  • 1
  • 2