Topic: Automating Year Tags

Posted under Tag/Wiki Projects and Questions

Pup

Privileged

A few months ago Zambs messaged me about a good way to automate year tags based on site IDs, and I've finally gotten round to it.

With the info they gave and looking into it more I've found that:

Twitter/X uses a snowflake for their IDs which include the time and date they were posted for anything past Nov 2010, you can convert it to Unix Epoch with:
(Tweet ID / 2 ** 22) + 1288834974657

Discord does the same since the start of 2015, converted with this code:
(Discord ID >> 22) + 1420070400000

FurAffinity has the timestamp in their image URLs, plus the post IDs are sequential, so I can find the first and last post ID for each year and use that as well.

Inkbunny also uses sequential IDs, so can do the same as FA just without the timestamp.

The idea is that if an image doesn't already have a year tag then check if it has any of those sources, if it does then check them all for the earliest time, then if the E6 post wasn't created before that year, add the year tag.

Of course I'd keep a log of what posts it changes so I can undo things if needed, and I'd be able to test it offline with the DB Exports first.
Given most uploads are from either FA or Twitter/X I thought it'd be a pretty good way to do it, and was hoping for some feedback before actually implementing it in case I've missed something or there's other sites I could add to the list.

Watsit

Privileged

Art isn't always posted when it's created. Particularly for artists that have a patreon or subscribestar with delayed releases, something posted today may be from last year. Sometimes an artist will have old art lying around that they forgot about, and decide to post months or years later when they notice it again. On places like Twitter, it's not unusual for art to be reposted months or years later.

I only worry about year tags when it's written in the image, as part of a signature or watermark for example. Otherwise, I try to not assume when it was made regardless of when it was posted because it's easy to make a bad assumption.

Updated

Pup

Privileged

watsit said:
Art isn't always posted when it's created. [..]

True, but I feel 99.9% of art posted would be commissions or recent art, and even with a month delay it'd only be 1 in 12 posts from those specific artists.
Then given artists that post with a delay often add "this was the patreon exclusive for March", I could check for months in the description, or the words "patreon", "subscribestar", or "exclusive", and not tag those, further reducing mistags.

Quick edit: I just wanted to add that with reposts from Twitter, E6 would still need uploads with sources and that duplicates are flagged and removed pretty quickly, so taking the oldest source would still be the most accurate, especially if it's a repost from FA and an older post lists the FA source.

Updated

Watsit

Privileged

pup said:
True, but I feel 99.9% of art posted would be commissions or recent art, and even with a month delay it'd only be 1 in 12 posts from those specific artists.

I think it might be a bit more than that. I'm not sure anyone would be able to give an accurate accounting for mistags though, since I don't think many people double-check year tags. I've noticed a couple being mistagged with the wrong year, due to the uploader tagging the year it was posted but it was clearly signed with a different year, but if there's no date in the image and no description, I'm not very likely to go to the source and check (and even if there is a description, I may not notice a discrepancy between the tagged year and what the description may say about when it was made). Such mistagged years are likely not noticed or fixed all that often, which is my biggest concern. If a bot (or an overzealous tagger) starts mass tagging posts with a year based on best-guesses, there would be little double-checking and fixing when it ends up wrong... if such information is even available and it's instead left as pure guesswork with no way to ever know if the tagged year is correct.

pup said:
Then given artists that post with a delay often add "this was the patreon exclusive for March", I could check for months in the description, or the words "patreon", "subscribestar", or "exclusive", and not tag those, further reducing mistags.

That would still fail with posts like post #4705312 and post #4716876. They're even someone who has a patreon, but aren't always shilling it in their descriptions. I'm also noticing several posts of their images having the wrong year tagged (post #4430880 post #4430871 post #4430858 and probably more all tagged 2023, despite the sources having titles saying they were from AC22).

pup said:
Quick edit: I just wanted to add that with reposts from Twitter, E6 would still need uploads with sources and that duplicates are flagged and removed pretty quickly, so taking the oldest source would still be the most accurate, especially if it's a repost from FA and an older post lists the FA source.

I was more thinking art that gets posted on Twitter but doesn't get posted here initially, then it gets reposted months later and someone notices it to post here, sourcing that repost (and Twitter being Twitter, doesn't make it easy to find previous uploads of the same image to know there's an earlier post).

Updated

As much as I'd love to have year tags generated automatically, that's going to fail for a lot of images being shared the year after they were originally created. Plus, if an older upload is replaced with a cleaner version from a different year, this merges the tags and puts the wrong years. (Yes, this happens already, but it'll be worse if years are tagged automatically.)
However, if there's a suggestion to add a year during upload based on the timestamp, filename, url, etc. that could be useful. (in the same idea of selecting sex, rating, etc during upload)

Pup

Privileged

watsit said:
I think it might be a bit more than that. I'm not sure anyone would be able to give an accurate accounting for mistags though, since I don't think many people double-check year tags.
[..]

Yeah, it's definitely a concern that once tagged it'd be hard to know which are correct and which aren't without going through all of them.

That would still fail with posts like post #4705312 and post #4716876. They're even someone who has a patreon, but aren't always shilling it in their descriptions. I'm also noticing several posts of their images having the wrong year tagged (post #4430880 post #4430871 post #4430858 and probably more all tagged 2023, despite the sources having titles saying they were from AC22).

I wouldn't want to make something that mistags often and cases like that would be really hard to catch. I was thinking I could filter out an artist if they mention patreon in any post description or in their artist wiki entry, which wouldn't be too difficult, but it wouldn't be foolproof. In that artist's case it's not listed there and they don't even have a conditional DNP, which they probably should have given their wiki entry, so even avoiding those wouldn't help.

I was more thinking art that gets posted on Twitter but doesn't get posted here initially, then it gets reposted months later and someone notices it to post here, sourcing that repost (and Twitter being Twitter, doesn't make it easy to find previous uploads of the same image to know there's an earlier post).

There's reverse searches like https://fluffle.xyz/ that also work with twitter, but I don't know how far back it goes, or if it's only certain artists. It's certainly not something you could rely on and use for every post.

Overall I think I'll avoid this for now then, unless someone has a good idea to reduce the amount of mistags it could cause. I was going to suggest adding an uploaded_2024 style tag as date:2024 gives an error, but after reading the search help it looks like date:>2024-01-01 date:<2025-01-01 works, even if it's awkward.

Pup

Privileged

martinegrass57 said:
As much as I'd love to have year tags generated automatically, that's going to fail for a lot of images being shared the year after they were originally created. Plus, if an older upload is replaced with a cleaner version from a different year, this merges the tags and puts the wrong years. (Yes, this happens already, but it'll be worse if years are tagged automatically.)
However, if there's a suggestion to add a year during upload based on the timestamp, filename, url, etc. that could be useful. (in the same idea of selecting sex, rating, etc during upload)

Adding it as a suggestion on the upload page could be good but it'd still suffer from a lot if the points raised earlier. It'd be easier to add an optional "year" text box, but then I doubt many people would source the original image and instead just use the year they're uploading it.

pup said:
A few months ago Zambs messaged me about a good way to automate year tags based on site IDs, and I've finally gotten round to it.

With the info they gave and looking into it more I've found that:

Twitter/X uses a snowflake for their IDs which include the time and date they were posted for anything past Nov 2010, you can convert it to Unix Epoch with:
(Tweet ID / 2 ** 22) + 1288834974657

Discord does the same since the start of 2015, converted with this code:
(Discord ID >> 22) + 1420070400000

FurAffinity has the timestamp in their image URLs, plus the post IDs are sequential, so I can find the first and last post ID for each year and use that as well.

Inkbunny also uses sequential IDs, so can do the same as FA just without the timestamp.

Adding to this, I recently found tumblr post to Epoch code.
(Tumblr ID >> 20) + 1000000000000

Pup

Privileged

I got thinking about this a bit more, and what about tags like earliest_upload_2024?

It'd be more useful than searching E6 upload dates, a lot easier to do given the syntax I mentioned earlier, while also not filling the current year tags and making them less accurate. Plus these tags would be 99.9% accurate given it'd be based on their earliest upload date across sites, rather than when it was drawn.

alphamule

Privileged

Hmm, I noticed that some images have metadata from when generated, that indicates that they had specific years. Sometimes (rarely) 2+ years before the earliest known source because the creator held onto it or the site vanished. This is actually right up Pup's alley to find, given that you've made other scripts for metadata.

Pup

Privileged

alphamule said:
Hmm, I noticed that some images have metadata from when generated, that indicates that they had specific years. Sometimes (rarely) 2+ years before the earliest known source because the creator held onto it or the site vanished. This is actually right up Pup's alley to find, given that you've made other scripts for metadata.

Thank you, that's actually a really good catch, and with that I could actually add the proper year tag alongside the earliest upload year.

martinegrass57 said:
Adding to this, I recently found tumblr post to Epoch code.
(Tumblr ID >> 20) + 1000000000000

And thanks as well, I'll make a note to add that to my list of checks :3

  • 1