Topic: guide for helping find sources through images filenames

Posted under Off Topic

in interest of archiving artworks, finding sources when things get reposted without source in discord channels or knowing where I saved an image in my folders from, I've learned a lot about different sites filename patterns to help trace back where an image came from even when saucenao/google reverse image search or fluffle.xyz fail. this post will detail some of that info. note that I tried to keep all examples sfw in case you want to share this elsewhere.

twitter

example filenames:
GE6xTJ9bEAALVIf.jpg
GRMaPbuWQAA4aDQ.jpg
FeJ1H1qacAADK6B.jpg
EV41jd8U0AAETHw.jpg
twitter uses an encoding method called "snowflake" that puts some information like which of their servers the image is stored on but more important to us, a timestamp into this filename. notice how the older files are E, then F, then the newer ones G?
if you want to read up more on it, https://en.wikipedia.org/wiki/Snowflake_ID https://hackerfactor.com/blog/index.php?/archives/634-Name-Dropping.html

here is a python script (not written by me) that you can use to convert said snowflakes to a timestamp https://files.catbox.moe/6n8xqh.py
or

code
import base64
import datetime

def convert_twitter_image_filename_to_datetime(filename):
    filename_padded = filename + '=' * ((4 - len(filename) % 4) % 4)
    bytes_sequence = base64.urlsafe_b64decode(filename_padded)
    timestamp_bytes = bytes_sequence[:8]
    timestamp_int = int.from_bytes(timestamp_bytes, byteorder='big')
    timestamp_ms = (timestamp_int >> 22) + 1288834974657  # shift 41 bits to 64 bits and add Twitter's epoch time

    # Convert to seconds and then to datetime object
    timestamp_s = timestamp_ms / 1000
    dt = datetime.datetime.fromtimestamp(timestamp_s, tz=datetime.timezone.utc)

    return dt

filename = input("Enter the Twitter image filename: ")
dt = convert_twitter_image_filename_to_datetime(filename)
print("Decoded timestamp:", dt)

you can use this to search with twitters advanced search, but be aware that the time reflected by the generated timestamp from this does not reflect the post time exactly (eg, poster typing longer text while the image finished uploading, scheduled tweets) so I usually try to inch forward second by second in the search.
lets use GE6xTJ9bEAALVIf.jpg as example.
with the script it decodes as 2024-01-28 09:06:44.414000+00:00
first, to check if the image still exists, we can form back the direct image url
https://pbs.twimg.com/media/GE6xTJ9bEAALVIf?format=jpg&name=orig
if it does, you can use advanced search and probably find it
since:2024-01-28_09:06:44_UTC until:2024-01-28_09:06:46_UTC
and go to media tab, try again if it shows no results at all (twitter may refuse to work/have hard ratelimiting on this kind of search idk) if you're lucky you may have already found the image

you can also use filters like minfaves:20 to filter by minimum like count to narrow results down better. adding words before the search can also help like if youre lucky the poster mentioned the characters name in the tweet text.
EDIT: tarrgon is epic and made a tool to more easily do this!!! https://yiff.today/twitterimage

I dont know the video filename pattern decoding, but here are some examples anyways
7wKou0krk3SDGrUa
0Y3lJlLEEqTY73Kv
rWRijKG-4edAWUFg
ZH58QMK9ntLqRtZM
since its not easy to directly get twitter videos its worth to note some extensions/downloaders may affect what you get and just put the tweet ID like 1764425818468495419 instead

furaffinity

example filenames:
1407129145.melangetic_tsundereflareonsunaxe.png
1536765858.hexe_koivap.png
1674833958.posexe_shapes.png
furaffinity does it nicely, a timestamp, uploaders name, then whatever original filename the poster put on upload. you can easily check the time by looking up any epoch unix timestamp converter, and check the uploaders gallery to see if the image is there around that time.
to verify if it still exists, you can also reconstruct the url as well, like the first example would be https://d.furaffinity.net/art/melangetic/1407129145/1407129145.melangetic_tsundereflareonsunaxe.png
if it happens to be deleted it can still be possible to recover it from a furaffinity archiving site out there since you can find the artist name.

inkbunny

example filenames:
4899055_PlaymanRGS_chillet.png
1760948_Sorrynothing_furks2.png
3920719_unsignedNEZ_img_3089.png
it seems similar to furaffinity, a number, poster name, original filename. I'm not 100% sure if the number is a timestamp but you can use it for reference to the posts time similarly when browsing the posters gallery.
rebuilding the url is also possible, like https://nl1.ib.metapix.net/files/full/3920/3920719_unsignedNEZ_img_3089.png
/full/ first 4 digits of timestamp / rest of the filename.extension
I do not know as much about inkbunny so if anyone knows what the offset or format on the timestamp is if it's a timestamp at all do tell.

e621 (and some other booru sites)

example filenames:
db7bcf7761ebfeac8c81a5c55c892f9c.jpg
8937cc10553dd4a83bb54f37e9624916.png
f374c6d70ca0192b77ae22cd6a8e76c8.png
now while this may look like a long mess, it's actually just the images md5 hash!
due to tagging images can easily be searched, but you can also search the site like md5:8937cc10553dd4a83bb54f37e9624916 in a search and find the image, or https://e621.net/posts/?md5=f374c6d70ca0192b77ae22cd6a8e76c8
searching by md5 also works on rule34 eg md5:e20decc11b12b8a224279bca225ea0aa works too, luckily since the tagging there tends to be more barebones. searching md5: also works with danbooru so I assume its just a thing on majority of danbooru forks lol
e621 direct image urls can be reconstructed easily too, http://static1.e621proxy.ru/data/db/7b/db7bcf7761ebfeac8c81a5c55c892f9c.jpg is just
static1.e621.net/data/ first 2 characters of md5 / next 2 characters of md5 / full md5
on e6, you can filter status:any to see deleted posts if it doesnt show any results by default, the images tag info / source links may still be available for deleted posts too which can help finding it either at the source or any site that imports posts from e621 (tbib, rule34, probably some more idk)

danbooru filenames are like __umbreon_pokemon_drawn_by_merino_merino_9999__1de710b1da8907abb57d20efb0e95326 as an example, last bit is of course the md5 and the artist info in filename also helps.

itaku

examples:
pipi_xcKm6oT.png
20241128_PMD_PFP_4_DragonseekerArt_563px_7uFr0UD.png
380_ENEceGp.png or 380_ENEceGp_xl.png
frankly I dont know but images saved directly via (view uncompressed) button dont have _xl or _sm etc at the end. it seems to just be the original filename + some random bunch of letters or numbers, if you know more please tell. at least the tagging helps find any images you recognise the itaku filename pattern from still.
(offtopic side note to anyone reading this, pls use itaku if you upload art, uncompressed image uploading + simple tagging is great)

deviantart

example filenames:
_fanart__sylveon_by_ayinai-dejxrwu.png, moondance_by_lynx3000-dffjkys.png
snorkel_bunny___art_trade_by_estefanoida_dd8c63u-fullview.png
summer_day_by_ko_yuki_chan_dadzgjk-pre.png
db97cui-cd520873-4a89-4c58-b115-e32618405124.jpg
post title, by (artist), random letter jumble that usually seems to start with d (is this snowflake ids again? idk). -fullview at the end if its saved from clicking on the image to maximise it first, pre at the end if its before clicking on the image I think, I've also seen 414w-2x at the end so it just seems to be some size specific thing. the entire filename becomes a whole long letter jumble when uh, good question. but the letter jumble at the end of filenames that contain artist name and the start of the ones that are a complete jumble seem to overlap
dhojivp-cbc9e303-fc98-47e9-bdeb-7afc7f4d3a49
dance_of_fire_and_ice_by_mrsilveralpha_dhojivp-pre
dhojivp here definitely has to mean something but I dont know. pls help if you know more about those.

bluesky

example filenames:
bafkreic6gqwadvawd6vgk37ufkv2wdohoxxmxd6pisrfiybvgxbz4pfkue.jpg
bafkreidjdsmtudkzsrzxepreavg6dtz2de7bz53hmupv7lu5ua33ecvzj4.jpg
bafkreielltqpjcbaltbcfmhup3cwav64lon7swaqm2dr64k2riiej42fam.jpg
I dont fucking know bro :crylaugh: but its recognisable enough, might be encoding some timestamp except it isnt consistent at all, like check this

trying to find a pattern

space put in the position where it starts being different for easier comparison
bafkreie bwnwtm4kdvvqum2ayy3rr3vmgqdbaibdlmo2frxam4imqybjvbi 3 days ago
bafkreih g2v6xsee2cpsrgxqr3vp7g2yagfpr7khrg5s4udgy4kmf62dmwu 4 days ago
bafkreicm ej32uyuafp32jwfuxkfd7y3f2oumht7pyxb4qyekbnbtzlpo34 6 days
bafkreif tm7cxljbuowkkgofa2iusox2lgfhsxx46fdaqa6u2bbrbxeut34 7 days
bafkreig jvwwkbxesneohghpbiczu5os753qzrce4aoxz7huthbz7born7a 13 days
bafkreic g6wmq2lcc2bfjdm3rq4ab25nexvxh7l3zqtiowyqq2aiqr43kpy 6 months
bafkreid rraglj2vnvvmu52zkvjkhnuug7zxpdrmiglrtopejhfnspxrc7e 11 months

bluesky is open source so if anyone's bothered to dig the github to find wtf this letter jumble is please tell
(offtopic but also pls dont use bluesky for art its compression and image size limit of 2000 pixels in largest dimension is low and its easy to scrape by ai too anyways, more people need to be aware of this. though I've seen some posts as png before so I'm not completely sure about it always being jpgs hmm)

pixiv

example filenames
124223250_p0.jpg
66054458_p0.png
74762136_p4_master1200.jpg
master1200 is preview
the long number is literally just the post id
pX is just which image in the post it is (p0 is 1, p1 is 2, etc)
since pixiv post ids count up this also means the filenames can easily be sorted chronologically, (I like seeing old artwork with a low number)
very simple, thanks pixiv.

other

most phones examples: IMG_0189.jpg, IMG_1356.jpg
many art programs defaults: untitled106, titelloses, sans_titre untitled in other languages)
catbox: 6n8xqh, or other around this length random letter number strings
reddit: mzzub78jxq3e1, yall-think-im-ready-for-ul-yet-fluff-v0-k61ew342nn3e1 what-i-dont-get-it-v0-6wi5ltg2jq3e1
imgur: sUr25YE.jpg, qajln3h.jpg, JcU6v1d.jpg IYnI3PK.jpg
4chan: 1711706058270696.jpg 1732280500859323.jpg also some kind of timestamp
tumblr: tumblr_e2f2b5e5917d7c52e24e3cc868ac95b1_6fd84f31_540.jpg, tumblr_f41b7aeed8a15d09a12f0e99d2158fd9_6868b7ef_2048.jpg, Tumblr_l_2363424764721848.jpg

________________
end. for now.

these filename patterns are all why I absolutely recommend not altering (rather just adding if you rename) art filenames you download. or just using downloaders that save useful info like post id directly in filename automatically such as this extension https://greasyfork.org/en/scripts/430132-twitter-click-n-save or using gallery-dl https://github.com/mikf/gallery-dl as these tools allow for best quality downloading while also maintaining source in filename, long as you're bothered enough setting them up in the first place.

anyways, lots of rambling done, please spread this knowledge and share any corrections or extra information, these kinds of things being well known would help the overall art community a lot and help in archival and retrieval of unsourced content in general.

Updated

  • 1