Forum - My journey here, or how I ended up making my own selfhosted e621like

GreyCat8

Member

-----------------------------------------------
UPDATE: A lot has changed since the first version, see README.md at https://gitlab.com/greycat8-dev/d-glut. Gitlab is always the latest version
-----------------------------------------------

This was kind of a necessity for me. e621.net has been blocked in my country for years, and now, with bans on VPN, "misfire" on Cloudflare and unsolvable captchas that do not load in, doing anything remotely useful on the internet has become a problem.

It was back then when I decided to download like 1% of all posts as a "fun little project". WHICH LASTED I KID YOU NOT 4 ENTIRE MONTHS (yeah my ISP kinda sucks) of my phone downloading over TOR 24/7.

In the end, using Zepiwolf's The Wolf's Stash, i've downloaded 221981 posts (~4% of e621) which, even compressed (transcoded into webp/avif and ditching the alpha-channel), take up 150 GB.

An obvious problem arose - how to organize this huge ass library.

TheWolfsStash is a great app for browsing e621 and downloading some posts, but it can't do much beyond that.

e621-ng would probably be literally perfect, but I'm too dumb to figure out Docker.

Jellyfin literally crashed when i told it to scan a folder with ~100000 images (not even half of my library!)

And other "stock" imageboards/webgalleries would probably also be either not suited for such scales, too hard to set up, have incompatibility in formats or require you to manually import and tag each image.

That's when I realised if I want a perfect offline e621 library manager, i would have to make it myself.

TheWolfsStash downloads posts with name format <artist>-<id>.jpg, which already was a good start, but it did not provide other tag info in a usable way.

Initially, i tried to code some bullshit to fetch tags by post id using e621 api, but that did not work because

1. I had no idea what am i even trying to do (normally I only make desktop software, not anything web-related)
2. Cloudflare captcha.

I spent quite a lot of time trying to replicate TheWolfsStash trick of swapping in a legit Android WebView cookie file to bypass Cloudflare, but with pure Python on Windows and Firefox. And it did not work, not even once, no matter how hard i tried.

And them I finally RTFM and learned that I can just download a complete DB dump. * Facepalm *

So, three days ago i finally set on a mission to finally make a self-hosted offline e621 knockoff.

And here it is, open-source.

gitlab.com/greycat8-dev/d-glut

(Had to put it on gitlab - cant even register on github because stupid captcha never loads in)

D-Glut, which i called so because it is intended as a partial mirror of e621. E621 is sodium glutamate, and D-glutamate is a normally rare mirrorred form of L-glutamate, so this name is also a meta-pun.

Rough prototype with very basic functionality was vibe-coded with duck.ai. Almost everything was then manually refined with many swear words and somewhat ugly-looking, but real and working code.
All done in pure python3, only standard library, no pip modules, no django, with SQLite3 for database.

Features:

- Uses e621 DB dumps (e621net/db_export) to provide tag info
- Mandatory API key access so nobody can randomly stumble onto your "homework folder" server.
- Can convert AVIF into JPG if your browser does not support it.
- Sorts your posts into folders on disk by artist
- 8 themes in the signature e621 and e6ai colors
- No cookies - all the little data it needs (theme, page, search term, posts per page) - is stored in query parameters

Anti-features (things not implemented, but planned for later)

- No tag panel, so memorize your favorite tags
- No individual post pages (yes, it is all just a single search page for now)
- No thumbnailing for images (your router is gonna have a bad time) and videos (enjoy [VIDEO PLACEHOLDER] rectangle for now)
- No parent/child relations for posts and no pools.
- Can't upload to website (you have to put posts in a folder, via ftp/smb/direct access to the machine)

Requirements:

- Linux (almost everything works on Windows too. Not all tho, but I plan on fixing soon)
- python3 (dunno starting from which subversion, but works fine on 3.11-3.13)
- avifdec (apt install libavif)

Adding images for the first time:

1. Create folders named with author names
2. Download images using The Wolf's Stash
3. If you want to save disk space, convert them to avif/webp (tested, up to 10x reduction for JPGs and 2x for PNGs).

I found this cool command-line tool on github (github.com/A-Sverdrup/minsizer) and it's mostly great, but some files produced by it (specifically some PNGs converted with cavif) cause "This image cannot be displayed, because it contains errors" in Firefox. The tradeoff of not using avif however was much worse so i stuck with it and just implemented AVIF decoding in D-Glut.

4. Put 'em into __unknown__
5. Run autosort.py

Setting up database:

1. Download e621 database dumps
2. Unpack csv's from gz's
3. Run depost.py
4. Run initfts.py

Setting up:

1. Open Notepad/Python IDLE/your other favorite IDE and open D-Glut.py
2. Change ADMIN_KEY
3. Change some other constants if you want.
4. Save.
5. Run D-Glut.py
6. Go to <your server>:<your_port>/admin?api_key=<ADMIN_KEY you just set>. If on same machine and default port, this is localhost:621/admin?api_key=<your ADMIN_KEY>
7. Generate and enroll api_keys to hand out to users (by the way, your ADMIN_KEY is also always a valid api_key)
8. Go to <your server>:<your_port>/admin?api_key=<Any of api keys you enrolled> and enjoy

Adding more images:

1. Create more folders for artists you want to sort by
2. Put new images into __unknown__
3. Run autosort.py manually or from admin panel ("Autosort library")
4. Run initfts.py manually or from admin panel ("Update search DB")
5. Click "Rescan" in admin panel or restart server

Updated 8 days ago

GreyCat8

Member

16 days ago

Ok a new update just dropped.

Implemented a proper db schema, tag aliases, individual post pages, tag panel, better temp file handling.

TODO/SOON: parent/child post relationship

TODO: pools, wiki pages, general ui (scaling, download button)

Also sorry db changes ARE breaking.

Aacafah

Moderator

16 days ago

Impressive work for a largely from scratch project. If you need help with e6 backend stuff, feel free to ask me, I love helping community devs with weird stuff like this.

On the topic of parent/child relationships, you could probably get those & most other metatags working just from the data in the exports, but definitely not all of them.

If you really want a crazy task, although I wouldn't imagine this needs grouped searches, you'd probably be able to hack something together pretty simply if you just take our regular expression tokenizer & get fancy with recursion.

Congrats on what you've got done so far!

GreyCat8

Member

13 days ago

More DB breaking changes yippee!!!

DB now takes up just 6 GB instead of 12 thanjs to dedump overhaul which dropped the need for temp tables

Finally implemented pools, now it's just basically the wiki_pages and remaining properties of posts (which there is still a lot of - so far i only implemented parent/child and description)

GreyCat8

Member

13 days ago

aacafah said:
Impressive work for a largely from scratch project. If you need help with e6 backend stuff, feel free to ask me, I love helping community devs with weird stuff like this.
On the topic of parent/child relationships, you could probably get those & most other metatags working just from the data in the exports, but definitely not all of them.
If you really want a crazy task, although I wouldn't imagine this needs grouped searches, you'd probably be able to hack something together pretty simply if you just take our regular expression tokenizer & get fancy with recursion.
Congrats on what you've got done so far!

Thanks!

This was a great learning experience for me as well - i never worked with SQL, CSS and Web development before (i used to make only desktop apps with tkinter).

Yeah, i already figured out some metatags, but didn't have time to implement them yet (i wanted to get pools working first). I targeted initially at id: score: and rating:, but looking at cheatsheet again, seems i still have a LOT of work ahead of me.

Surprisingly, this project DOES need group searches because of tag_aliases - due to how i parse them, tag pairs that are aliased to each other (i.e. vulva <-> pussy or batoid <-> ray_(fish)) cause problems (unclear, which one should take priority over the other, and posts are tagged with just one that is not known apriori resulting in empty searches half of the time), so i replace those (as i called them) "alias loops" with forks on import (vulva -> ~pussy ~vulva + pussy -> ~pussy ~vulva), but now that means you can't use - or ~ with those. I have a workaround in mind, but am still yet to test it.

Also, jeez, that regex is absolutely brutal!

Speaking of regexes, here's a real problem i encountered - since python's sqlite comes without REGEXP and LIKE is way too slow, i use MATCH on a subset FTS table which has entries only for images that are actually present in the library.

And MATCH for some reason considers an underscore (_) a valid delimiter, which results in, for example, "vaginal" matching "vaginal_penetration", "vaginal_masturbation" and "vaginal_fluids", essentially acting like "vaginal_*". This is a something i still can't quite figure out how to fix.

Aacafah

Moderator

13 days ago

Oof, using pure SQL searches sounds painful.

I wouldn't be too worried about most of them; there's a lot that are probably redundant in this context or just can't be done with the info provided by the exports, & most of the remainder are more tedious to implement than difficult to implement.

I'm not too familiar with Python & SQL myself, so this might not be helpful, but in case they are, here's some thoughts I had about some of the problems you've run up against.

greycat8 said:
And MATCH for some reason considers an underscore (_) a valid delimiter, which results in, for example, "vaginal" matching "vaginal_penetration", "vaginal_masturbation" and "vaginal_fluids", essentially acting like "vaginal_*". This is a something i still can't quite figure out how to fix.

It'd be really odd if you can't use a backslash to escape that character, but I'm not familiar enough with SQL or Python to help much with that; that's a bizarre limitation for it to not have a workaround for.

greycat8 said:
Surprisingly, this project DOES need group searches because of tag_aliases - due to how i parse them, tag pairs that are aliased to each other (i.e. vulva <-> pussy or batoid <-> ray_(fish)) cause problems (unclear, which one should take priority over the other, and posts are tagged with just one that is not known apriori resulting in empty searches half of the time), so i replace those (as i called them) "alias loops" with forks on import (vulva -> ~pussy ~vulva + pussy -> ~pussy ~vulva), but now that means you can't use - or ~ with those. I have a workaround in mind, but am still yet to test it.

I think that might be a misunderstanding; no 2 active aliases should be able to point to one another. It might just be that it hasn't been filtered to solely active aliases beforehand? Otherwise, we might be doing something funky on our end.

While that might improve matters, that is a pretty rough situation; we automatically update posts with the antecedent tag when an alias removing it is approved, so we just convert it to the consequent tag (as all instances of the antecedent tag should be replaced by the consequent tag), but your local alias & post data would slowly get out of sync with ours. I guess the simplest solution to that would be to update your local alias data at a set interval, SELECT posts currently tagged with newly aliased antecedent tags, & either resync their data or manually replace the old tag with the new one (or just update all local post data at a set interval)? That sounds like a tough one to manage.

greycat8 said:
Speaking of regexes, here's a real problem i encountered - since python's sqlite comes without REGEXP and LIKE is way too slow, i use MATCH on a subset FTS table which has entries only for images that are actually present in the library.

I'm not too familiar with SQL myself, but I'm surprised it's too slow; can you create indexes for the tables to speed up LIKE queries?

Donovan DMC

Former Staff

13 days ago

greycat8 said:
And MATCH for some reason considers an underscore (_) a valid delimiter, which results in, for example, "vaginal" matching "vaginal_penetration", "vaginal_masturbation" and "vaginal_fluids", essentially acting like "vaginal_*". This is a something i still can't quite figure out how to fix.

It should be noted that e6 escapes underscores in sql queries
https://github.com/e621ng/e621ng/blob/c4e8bbd7e3b5eb1cd11723f9a2e432325d7dc637/config/initializers/core_extensions.rb#L6-L16 (when matching with LIKE, usually for wildcards)
and otherwise uses tsvectors
https://github.com/e621ng/e621ng/blob/c4e8bbd7e3b5eb1cd11723f9a2e432325d7dc637/app/models/application_record.rb#L104-L108 (when searching text, without wildcards)

I'd highly recommend against using sqlite for a task as complicated as this, sqlite is missing a lot of useful functionalities
Postgres is what the site uses and should be very easy to replace sqlite with, and I'd wager would help avoid a lot of headaches in the future

If storage space is an issue, I don't think it should really be any different
overall my tables for the imports add up to <6GB
Screenshot (there's more in the db, which is why du reports larger)

Updated 13 days ago

GreyCat8

Member

9 days ago

New update

Implemented wiki_pages
Implemented DText (~95% complete - links are finnicky, [Itable] straight up does not work, and [section] is not even implemented)Ironically, the only page it can't render 95% correctly is help:dtext.
Implemented REGEXP, so issue with MATCH is now resolved.
Posts and pools now use DText
D-Glut can now distinguish types of non-existent posts (Missing: Exists on e621 but is not present in library; Invalid: does not exist on e621; Future: does not exist yet)
Finally update documentation in admin panel

SOON:

Proper documentation in Wiki (added when creating db)
More mascots?
Finish implementing DText!
Posts score? (insane hopium this will not break pools (it probably will))

UNSOON:

API (i'm already completely fucked enough with my own internal api, so this will wait a long time)
InitFTS and dedump from within admin panel?
Ability to show all posts, even those not present (may sound like a good idea until you realise with <5% of e621 in library you will get 1 real post per >20 [MISSING POST] and even less for more casual users who probably won't try to download entire esix)
Metatags (there are just too much + why would you even want most of them?)
Group searches (they are basically a requirement, but it is indeed a crazy task to figure out)

GreyCat8

Member

9 days ago

aacafah said:
Oof, using pure SQL searches sounds painful.

It was.

I think that might be a misunderstanding; no 2 active aliases should be able to point to one another. It might just be that it hasn't been filtered to solely active aliases beforehand? Otherwise, we might be doing something funky on our end.

Yeah, i didn't filter them. What is probably more bizzare is that i encountered duplicate entries (probably not active ones, but still strange that they remain in the db).

but your local alias & post data would slowly get out of sync with ours. I guess the simplest solution to that would be to update your local alias data at a set interval, SELECT posts currently tagged with newly aliased antecedent tags, & either resync their data or manually replace the old tag with the new one (or just update all local post data at a set interval)? That sounds like a tough one to manage.

I just rebuild the whole DB weekly from new dumps lmao

I'm not too familiar with SQL myself, but I'm surprised it's too slow; can you create indexes for the tables to speed up LIKE queries?[/section]

I think i tried that, and it did not help much.
I do index my tables anyway.

GreyCat8

Member

9 days ago

donovan_dmc said:
It should be noted that e6 escapes underscores in sql queries

Wait, so you CAN just use backslashes? What a bummer. I didn't know, and tried to fuck around with quotation marks as escape characters which only half-worked, and which made me switch away from that.

and otherwise uses tsvectors

Not a something i know about, may look into it. Although would've been better if i did back when i used MATCH on fts5. Now that i switched to REGEXP, i probably won't go back.

I'd highly recommend against using sqlite for a task as complicated as this, sqlite is missing a lot of useful functionalities
Postgres is what the site uses and should be very easy to replace sqlite with, and I'd wager would help avoid a lot of headaches in the future

In my case, portability is a larger concern. With SQLite you can just take and move all files, and it will still work. Can't do that with MySQL or Postgres though.

Also, for my use case, just REGEXP was missing, but python has a rather easy way to implement and provide it.

If storage space is an issue

It does not matter whether DB takes up 6 or 11 GB, when you have 150 GB of images.

The 11-gb file was caused by me still using vibecoded shitty import script until recently, which inpomted everything AS TEXT INTO temp_tables; thus using up twice the space it actually needed, once i cleaned up those and did all INSERT and CAST AS in one step, all this unneded taken space was gone

GreyCat8

Member

9 days ago

aacafah said:
I'm not too familiar with SQL myself, but I'm surprised it's too slow; can you create indexes for the tables to speed up LIKE queries?

On my hardware, LIKE would take up on order of minutes on the full 5.9-million posts table.

Currently, REGEXP on a subset table (just those 228000 posts that are in my library) takes ~5-10 seconds

Donovan DMC

Former Staff

9 days ago

greycat8 said:
In my case, portability is a larger concern. With SQLite you can just take and move all files, and it will still work. Can't do that with MySQL or Postgres though.

I mean, you can
Just run a docker container for postgres right next to wherever your app is with the data directory as a folder there and it's freely portable

greycat8 said:
On my hardware, LIKE would take up on order of minutes on the full 5.9-million posts table.
Currently, REGEXP on a subset table (just those 228000 posts that are in my library) takes ~5-10 seconds

This sounds like you have zero indexing

GreyCat8

Member

9 days ago

aacafah said:
If you need help with e6 backend stuff, feel free to ask me, I love helping community devs with weird stuff like this.

I found var(--color-tag-pool) and var(--color-tag-pool-alt) while ripping tag link colors.

This seems to imply there is/was a extra tag category "pool", but from what i figured out there is'nt, with actual categories being:
0: general
1: artist
2: contributor (this one is not displayed on e621 for some reason?)
3: copyright
4: character
5: species
6: invalid
7: meta
8: lore

Is it actually used anywhere?

GreyCat8

Member

9 days ago

donovan_dmc said:
I mean, you can
Just run a docker container for postgres right next to wherever your app is with the data directory as a folder there and it's freely portable

Docker is toooo haaaaard :(

Also, while container ia portable, docker engine isn't. With python there's at least what's known as Python Embeddable package

RedPhoenix42

Member

9 days ago

Is duck ai smarter than chatgpt?

I swear chatgpt got dumber and worse at coding lately.

GreyCat8

Member

9 days ago

redphoenix42 said:
Is duck ai smarter than chatgpt?
I swear chatgpt got dumber and worse at coding lately.

Dunno, never used chatgpt

Duck.ai is honestly quite good at python. The code almost always works (without obvious errors), but produces wrong results fairly often because either the AI misunderstood the task or you didn't explain the task specifics deeply enough

A great tool anyway.

For me - especially for working with unfamiliar libraries. With it, i can code some shit from what is basically stock code samples in just a couple of days rather than spending a couple of weeks RTFM or a couple of months to figure out everything by just messing with stuff

GreyCat8

Member

9 days ago

Quick update

DText is properly implemented.

Fixed a fatal error with lists which prevented some pages from even loading.
Fixed (hopium) links.
It still can't render e621:dtext correctly because implementing Escaping DText had unforeseen consequences.
Sections are now implemented.
Tag colors in [color=...] are implemented.

GreyCat8

Member

8 days ago

aacafah said:
think that might be a misunderstanding; no 2 active aliases should be able to point to one another. It might just be that it hasn't been filtered to solely active aliases beforehand? Otherwise, we might be doing something funky on our end.

I checked now, with filter for only active aliases there are no more duplicate entries and there is only a single alias loop remaining:

There is just one bad active alias - tag "ears.", which is aliased to itself.

Updated 8 days ago

Donovan DMC

Former Staff

8 days ago

greycat8 said:
There is just one bad active alias - tag "ears.", which is aliased to itself.

There is no active alias from ears, and the only inactive alias from it is it invalid_tag

Watsit

Privileged

8 days ago

donovan_dmc said:
There is no active alias from ears, and the only inactive alias from it is it invalid_tag

ears. with the period. It indeed shows as being aliased to itself.

Donovan DMC

Former Staff

8 days ago

watsit said:
ears. with the period. It indeed shows as being aliased to itself.

Huh, I'm surprised that doesn't cause thrashing issues when adding it to posts
https://e621.net/post_versions?commit=Search&search%5Btags_added%5D=ears.

It's been used a grand total of 3 times in the last 13 years (with 2 of them being 13 years ago), why was it even aliased

alias #8878 says "Placeholder" as a reason, which is not explanatory at all

GreyCat8

Member

8 days ago

greycat8 said:

I found var(--color-tag-pool) and var(--color-tag-pool-alt) while ripping tag link colors.
This seems to imply there is/was a extra tag category "pool", but from what i figured out there is'nt, with actual categories being:
0: general
1: artist
2: contributor (this one is not displayed on e621 for some reason?)
3: copyright
4: character
5: species
6: invalid
7: meta
8: lore
Is it actually used anywhere?

Also, does tag_string in posts db_export include implied tags or barely those the post was initially tagged with?

Donovan DMC

Former Staff

8 days ago

greycat8 said:

I found var(--color-tag-pool) and var(--color-tag-pool-alt) while ripping tag link colors.
This seems to imply there is/was a extra tag category "pool", but from what i figured out there is'nt
Is it actually used anywhere?

No, that isn't used anywhere

Also rather than picking things out of the final compiled product you could just look at the source

greycat8 said:
2: contributor (this one is not displayed on e621 for some reason?)

What do you mean by that? I'm very certain I didn't forget anything when implementing that category

Aacafah

Moderator

8 days ago

Turns out it's a bit annoying to search the source for these because we use theming & interpolation with Sass; here are all the places we use those tag colors. Contributor is used, but not in plain source text; it's added to the output CSS programmatically, along with the other tag categories. Pool isn't used anywhere from what I can tell.

greycat8 said:
Also, does tag_string in posts db_export include implied tags or barely those the post was initially tagged with?

That contains all the tags on the post when the DB was exported. Implied tags are automatically added when the implication is approved; consequently, searching on the site doesn't handle implied tags in any special way. Only aliased tags need to be converted to their consequent tag, as all occurrences of the antecedent tag should be converted to the consequent tag.

GreyCat8

Member

8 days ago

donovan_dmc said:
Also rather than picking things out of the final compiled product you could just "look at the source"

At least i did not colorpick them from a screenshot of post count digit art on front page! XD

(I did this for my overall theme - #00417b, #00549b, #002f5b, #be973a, #fdba31, #8e7c41, #c4cbc3, #fff9e1 and #bfc6be all come from https://e621.net/images/counter/5.png)

GreyCat8

Member

8 days ago

aacafah said:
Only aliased tags need to be converted to their consequent tag, as all occurrences of the antecedent tag should be converted to the consequent tag.

Wait, so can they NOT be converted in posts' tag_string??

GreyCat8

Member

8 days ago

donovan_dmc said:
What do you mean by that? I'm very certain I didn't forget anything when implementing that category

Huh, i don't remember this one being on e621 (~2019), but it apparently is. Also, TheWolfStasth does not support it which is probably why i did not even know about this category before making D-Glut

Donovan DMC

Former Staff

8 days ago

greycat8 said:
Wait, so can they NOT be converted in posts' tag_string??

The tag string always has aliases resolved

greycat8 said:
Huh, i don't remember this one being on e621 (~2019), but it apparently is. Also, TheWolfStasth does not support it which is probably why i did not even know about this category before making D-Glut

The category is fairly new (added in December 2024)
lore/meta/invalid are also new since 2019, they were added with e621ng in March 2020

Aacafah

Moderator

8 days ago

E.g. If tag_x is aliased to tag_y, then every place tag_x is tagged on a post will be automatically changed to tag_y in the tag_string.

greycat8 said:
Huh, i don't remember this one being on e621 (~2019), but it apparently is.

Donovan DMC himself added that relatively recently.

greycat8 said:
Also, TheWolfStasth does not support it which is probably why i did not even know about this category before making D-Glut

Most people don't use TWS (I'm the only one I know who does); most people use binaryfloof's e1547.

GreyCat8

Member

8 days ago

Update.

Changed completely how search works.
Implemented group search, which is theoretically not even limited to 10 groups like e621's.
Regexes finally work EXACTLY as they should, no more false positives, no more false negatives.

Breaking DB change to tag_aliases (?)

GreyCat8

Member

8 days ago

aacafah said:
Most people don't use TWS (I'm the only one I know who does); most people use binaryfloof's e1547.

I'll take a look at that.

TheWolfsStash downloads posts with name format <artist>-<id>.jpg, which already was a good start, but it did not provide other tag info in a usable way.

Oh wait that's probably a problem

Guess i'll really take a look at that.

GreyCat8

Member

8 days ago

Hotfix

Fixed a DUMB DUMB STUPID IDIOT mistake in the tokenizer regexp

Man i really should change my bad habit of not tracking versions (the only version that exists is the one i'm currently working on)

GreyCat8

Member

8 days ago

greycat8 said:
I'll take a look at that.
Oh wait that's probably a problem
Guess i'll really take a look at that.

Well,

~~This is a great app with some features TWS can't even dream of~~_{nevermind, i just didn't dig deep enough in TWS' menus, e1547 straight up just sucks}, but man, this UI design is horrendous. Zero customizability, most essential buttons hidden in menus, excessive margins, buttons both on top and bottom so that you have to reach across the whole screen, infintely scrolling feed instead of pages (also unclear what "select all" actually does there), downloader which is very easy to accidentaly dismiss (which stops the downloads).

Thankfully the downloads' naming scheme is nearly identical to TWS

Updated 4 days ago

GreyCat8

Member

8 days ago

Update

Finally, multithreading.

Previously the server handled each request consequently, which meant each new tab/new user would have to wait longer and longer until all previous are finished. No longer the case, all is parallel, boom.

Propper logging.

That's it for now

GreyCat8

Member

4 days ago

Update:

Breaking DB changes: fts_tags renamed to fts_posts. tag_implications implemented and required from now on.

Implemented some metatags. Improved wiki pages for tags. Some error-proofing.

GreyCat8

Member

3 days ago

donovan_dmc said:
No, that isn't used anywhere

But is it (pool tag category) planned still?
~~Was it ever planned? And how many years ago?~~

GreyCat8

Member

3 days ago

watsit said:
ears. with the period. It indeed shows as being aliased to itself.

This got me thinking. Are longer alias/imply chain loops possile?

-> tag1 -> is not (at least anymore because "ears." somehow existed and now can't be removed because of the same checks which should've prevented something like this from happening in the first place)

tag1 <-> tag2 is not

but what about longer chains? (How deep do the checks actually go?)

Is
-> tag1 -> tag2 -> tag3 ->
possible?

What about 10 long?

-> tag1 -> tag2 ->... -> tag10 ->

Donovan DMC

Former Staff

3 days ago

greycat8 said:
This got me thinking. Are longer alias/imply chain loops possile?
-> tag1 -> is not (at least anymore because "ears." somehow existed and now can't be removed because of the same checks which should've prevented something like this from happening in the first place)
tag1 <-> tag2 is not
but what about longer chains? (How deep do the checks actually go?)
Is
-> tag1 -> tag2 -> tag3 ->
possible?
What about 10 long?
-> tag1 -> tag2 ->... -> tag10 ->

Implication chains can be infinitely long, though tags with transitives (implications/aliases) cannot be aliased to another tag
The checks don't go any level deep, it's just a check for antecedent and consequent being the same
Any other method should be caught by other mechanisms, like preventing circular implications, and preventing implying to/from aliased tags

Donovan DMC

Former Staff

3 days ago

greycat8 said:
But is it (pool tag category) planned still?
~~Was it ever planned? And how many years ago?~~

No, why would we need a category for that? If pools were to have some tagging system they'd either using the existing tags or some completely different system

As far as I can tell it's a forgotten idea that never got written down, it didn't exist before theme changes were made at the beginning of ng, and all code before ng is closed source

GreyCat8

Member

3 days ago

donovan_dmc said:
No, why would we need a category for that? If pools were to have some tagging system they'd either using the existing tags or some completely different system
As far as I can tell it's a forgotten idea that never got written down, it didn't exist before theme changes were made at the beginning of ng, and all code before ng is closed source

Yeah makes sense because we already have that as a separate db table and a whole site section.

(Although inpool: metatag already works like a kind of a remnant of this idea)

GreyCat8

Member

3 days ago

Plans for the closest future:

- thumbnailing via ImageMagick

- post properties ~~favcount~~, source, is_deleted, is_pending

- mimic e621's post frame closer ~~(yellow/green/red border based on rating~~_{WRONG, it's based on status}, score attached to bottom edge)

- keep downloading posts

- think of ways to improve search since regexp will get slower and slower on large scale

- ~~improve autosorter~~ done

- ~~overall simplify/cleanup code~~ done

- add walkthrough for first installation (currently D-glut just errors out with "you sure you didn't forget dedump?")

Updated 1 day ago

savageorange

Member

3 days ago

Your minimum version of Python might be (technically) 3.7 (or *specifically CPython* 3.6), as your DText processing assumes that dictionaries return items in the order they were inserted. Since 3.7 release was 7+ years ago, I'd guess just about anyone can run it.

You are doing some things with (non-)spacing around colons that I didn't know were legal. That might also be version gated.

GreyCat8

Member

3 days ago

savageorange said:
Your minimum version of Python might be (technically) 3.7 (or *specifically CPython* 3.6), as your DText processing assumes that dictionaries return items in the order they were inserted. Since 3.7 release was 7+ years ago, I'd guess just about anyone can run it.

Oh great, i was hoping for >3.8 as that's the last* version to suppport Windows 7 (which some still use)

_{*with pypy it's possible to get 3.10 running, but i don't believe anyone other than me would actually dp that.}

You are doing some things with (non-)spacing around colons that I didn't know were legal. That might also be version gated.

And I didn't know that was not legal.
Can you tell where exactly do i have weird spacing?

savageorange

Member

3 days ago

OK.
Well first I should mention that the code is definitely not PEP8 -compliant (PEP8 is the style guide for Python) -- which is not version-related, but will affect people's willingness to read your code.

Examples of smooshing keywords or expressions onto the end of brackets,parentheses, quotes, and colons:

for img in reversed(RIsort(selected_names))if reverse else RIsort(selected_names):

IMAGE_NAME_LIST = [os.path.split(i)[-1].replace('.png.avif','.jpg')for i in IMAGE_LIST]

title=f'{img}\n\n'+'\n'.join([f'{TAG_TYPE[i]}: {" ".join(tags[i])}'for i in range(len(tags))if tags[i]])

if os.path.isfile(os.path.join(TEMP_DIR, name)):logging.info(f'create_images: {name} is already in temp.')

if isinstance(ids,(int,str)):return([IMAGE_DICT[int(ids)][0]]if int(ids) in IMAGE_DICT else ([None]if keep else []),[IMAGE_DICT[int(ids)][1]]if int(ids) in IMAGE_DICT else [reason(ids)]if keep else []) 
    elif isinstance(ids,(tuple,list)):return RIsortJ([(IMAGE_DICT[int(i)]if int(i) in IMAGE_DICT else(None,reason(i)))for i in ids]if keep else[IMAGE_DICT[int(i)] for i in ids if int(i) in IMAGE_DICT])

while stack:html.append('</ul>');stack.pop()

That's the kind of stuff that surprised me. I think it occurs quite often in your code.

On the more cosmetic side, I tried running the code through black, which is a code formatter that tries to comply with PEP8.
Couldn't pastebin it due to profanity, so here's some subsections illustrating the difference:

Space Saver

def scan():
    global IMAGE_DICT, IMAGE_ZIP
    log("Scanning D-Glut directory...")
    IMAGE_LIST = RIsort(
        i for i in glob(os.path.join("*", "*.*"), root_dir=ROOT_DIR) if "__" not in i
    )
    IMAGE_NAME_LIST = [
        os.path.split(i)[-1].replace(".png.avif", ".jpg") for i in IMAGE_LIST
    ]
    IMAGE_DICT = {
        int(i.split("-")[-1].split(".")[0]): (
            i,
            os.path.split(i)[-1].replace(".png.avif", ".jpg"),
        )
        for i in IMAGE_LIST
    }
    IMAGE_ZIP = [*zip(IMAGE_LIST, IMAGE_NAME_LIST)]
    log(f"{len(IMAGE_LIST)} images found")
    return len(IMAGE_LIST)


def RIsort(lst):
    return sorted(lst, key=lambda j: int(j.split("-")[-1].split(".")[0]), reverse=True)


def RIsortJ(lst):
    return sorted(
        lst, key=lambda j: int(j[-1].split("-")[-1].split(".")[0]), reverse=True
    )

def generate_image_html(selected_names, class_, reverse=False):
    images_html = ""
    for img in reversed(RIsort(selected_names)) if reverse else RIsort(selected_names):
        if img.startswith("SPECIAL:NOT"):
            id = int(img.split("-")[-1].split(".")[0])
            R = reason(id)
            if R.startswith("SPECIAL:NOTPRESENT"):
                tags = get_tags(id, "posts")
                title = f"{img}\n\n" + "\n".join(
                    [
                        f'{TAG_TYPE[i]}: {" ".join(tags[i])}'
                        for i in range(len(tags))
                        if tags[i]
                    ]
                )
                errortype = "MISSING POST"
            elif R.startswith("SPECIAL:NOTFUTURE"):
                tags = get_tags(id, "posts")
                title = f"{img}\n\nThis post is from the future!"
                errortype = "FUTURE POST"
            elif R.startswith("SPECIAL:NOTEXIST"):
                title = f"{img}\n\nThis post does not exist!"
                errortype = "INVALID POST"
            images_html += ERROR_TEMPLATE.format(id, title, errortype)
        else:
            id = int(img.split("-")[-1].split(".")[0])
            tags = get_tags(id)
            counts = get_property(
                id,
                "posts",
                "score,up_score,down_score",
                col0=False,
                on_fail=["?", "?", "?"],
            )
            title = (
                f"{img}\n\n"
                + f"Score: {counts[0]} (+{counts[1]} {counts[2]})\n\n"
                + "\n".join(
                    [
                        f'{TAG_TYPE[i]}: {" ".join(tags[i])}'
                        for i in range(len(tags))
                        if tags[i]
                    ]
                )
            )
            if img.endswith((".mp4", ".webm")):
                images_html += VIDEO_TEMPLATE.format(
                    id, title, counts[0], counts[1], counts[2]
                )
            else:
                images_html += IMAGE_TEMPLATE.format(
                    class_,
                    id,
                    class_,
                    TEMP,
                    img,
                    title,
                    counts[0],
                    counts[1],
                    counts[2],
                )
    return images_html

def select_images(ids, keep=False):  # Select image files based on ids
    if isinstance(ids, (int, str)):
        return (
            (
                [IMAGE_DICT[int(ids)][0]]
                if int(ids) in IMAGE_DICT
                else ([None] if keep else [])
            ),
            (
                [IMAGE_DICT[int(ids)][1]]
                if int(ids) in IMAGE_DICT
                else [reason(ids)] if keep else []
            ),
        )
    elif isinstance(ids, (tuple, list)):
        return RIsortJ(
            [
                (IMAGE_DICT[int(i)] if int(i) in IMAGE_DICT else (None, reason(i)))
                for i in ids
            ]
            if keep
            else [IMAGE_DICT[int(i)] for i in ids if int(i) in IMAGE_DICT]
        )


def pagify(images, num_images, is_random, page):
    if images:
        if is_random:
            files, names = zip(*random.sample(images, min(num_images, len(images))))
            pages = 1
        else:
            start_index = page * num_images
            if truncated := images[start_index : start_index + num_images]:
                files, names = zip(*truncated)
            else:
                filtered_files = []
                filtered_names = []
            pages = int(len(images) // num_images)
    else:
        files = []
        names = []
        pages = 0
    return files, names, pages


def get_tags(id, table="fts_posts"):
    words = get_property(
        id, table, "tag_string", on_fail="Your DB is outdated!"
    ).split()
    tags = [[], [], [], [], [], [], [], [], []]
    for tag in words:
        tags[get_property(tag, "tags", "category", key="name", on_fail=6)].append(tag)
    tags.insert(5, tags.pop(0))
    return tags

One functional thing I noticed:

r"(?i)\[color=general\](.*?)\[/color\]": r'<span style="color:var(--general)">\1</span>',  # tag colors
        r"(?i)\[color=artist\](.*?)\[/color\]": r'<span style="color:var(--artist)">\1</span>',
        r"(?i)\[color=contributor\](.*?)\[/color\]": r'<span style="color:var(--contributor)">\1</span>',
        r"(?i)\[color=copyright\](.*?)\[/color\]": r'<span style="color:var(--copyright)">\1</span>',
        r"(?i)\[color=character\](.*?)\[/color\]": r'<span style="color:var(--character)">\1</span>',
        r"(?i)\[color=species\](.*?)\[/color\]": r'<span style="color:var(--species)">\1</span>',
        r"(?i)\[color=invalid\](.*?)\[/color\]": r'<span style="color:var(--invalid)">\1</span>',
        r"(?i)\[color=meta\](.*?)\[/color\]": r'<span style="color:var(--meta)">\1</span>',
        r"(?i)\[color=lore\](.*?)\[/color\]": r'<span style="color:var(--lore)">\1</span>',

This looks like it could be written as a single k:v pair, using an additional capture with an alternation:

r"(?i)\[color=(general|artist|contributor|copyright|character|species|invalid|meta|lore)\](.*?)\[/color\]": r'<span style="color:var(--\1)">\2</span>',  # tag colors

Probably there are others that this principle could be applied to. The 'Table' regexps look like a possible candidate.

Updated 3 days ago

Donovan DMC

Former Staff

3 days ago

savageorange said:
Couldn't pastebin it due to profanity

Does.. does pastebin not allow profanity?

savageorange

Member

3 days ago

donovan_dmc said:
Does.. does pastebin not allow profanity?

Well, you can.. but the paste has to be set to Private. It will detect the profanity and tell you so. So it's useless for sharing with other people.

GreyCat8

Member

2 days ago

donovan_dmc said:
Does.. does pastebin not allow profanity?

It's my comment on firefox & avifs, probably

GreyCat8

Member

2 days ago

savageorange said:
OK.
Well first I should mention that the code is definitely not "PEP8"

but will affect people's willingness to read your code.

Examples of smooshing keywords or expressions onto the end of brackets,parentheses, quotes, and colons:

Yeah my bad.

That's the kind of stuff that surprised me. I think it occurs quite often in your code.

What's worse, i write code like that in general. this is what 10 years of self-taught python ("fuck around and find out") without any RTFM does to you, apparently

I will try to make it clearer

This looks like it could be written as a single k:v pair, using an additional capture with an alternation:
r"(?i)\[color=(general|artist|contributor|copyright|character|species|invalid|meta|lore)\](.*?)\[/color\]": r'<span style="color:var(--\1)">\2</span>',  # tag colors
Probably there are others that this principle could be applied to. The 'Table' regexps look like a possible candidate.

Huge thanks. I wanted to do something like this but was so afraid to fuck up everything else again (did i tell here how much pain was making dtext links properly work?)

Updated 2 days ago

Donovan DMC

Former Staff

2 days ago

savageorange said:
Well, you can.. but the paste has to be set to Private. It will detect the profanity and tell you so. So it's useless for sharing with other people.

In the near decade that I've had my account (my oldest paste is in March 2016), I have never noticed that

What I find even more insane is that even with having a pro account I don't bypass it

Though I find it very hard to believe that none of the bot error logs that I have get automatically uploaded to pastebin have never had swearing them, I saw the popup firsthand

_{...maybe my error reporting can silently fail if pastebin just rejects the paste}

Donovan DMC

Former Staff

2 days ago

Also FYI on color tags, short names for categories can also be used

https://github.com/e621ng/dtext/blob/d3595043f5d605cb0b52b1d02e618cb719e8c43d/ext/dtext/dtext.cpp.rl#L125-L135
gen general
art artist
cont contributor
copy copyright
char character
spec species
inv invalid
meta
lor lore

GreyCat8

Member

2 days ago

donovan_dmc said:
Also FYI on color tags, short names for categories can also be used
https://github.com/e621ng/dtext/blob/d3595043f5d605cb0b52b1d02e618cb719e8c43d/ext/dtext/dtext.cpp.rl#L125-L135
gen general
art artist
cont contributor
copy copyright
char character
spec species
inv invalid
meta
lor lore

Thanks, implemented

GreyCat8

Member

2 days ago

greycat8 said:
I just rebuild the whole DB weekly from new dumps lmao

It takes ~600-700 seconds (WITH indexing now), which, if you do weekly and shut the server down (which is not even required because this is sqlite! You can build a separate DB and hot-swap posts.db!) is still less than 0,002% downtime

GreyCat8

Member

2 days ago

Update:

Made my shitty code readable

Slightly improved dedump+autosorter+initfts

uhh i forgor

Updated readme

savageorange

Member

2 days ago

greycat8 said:
I will try to make it clearer

Thanks, consistent spacing makes it a lot easier for me to tell what is going on.

Huge thanks. I wanted to do something like this but was so afraid to fuck up everything else again (did i tell here how much pain was making dtext links properly work?)

pyTest or Nose2 could help you with that, though I know constructing test cases is tedious.
All you need to make your code importable, given that you're already doing the if __name__ == "__main__": thing, is to have a proper filename (_ is allowed in module names, - isn't)

Example testing:

import dtext from D_Glut 

# this is written for pyTest, which allows a pretty minimalist way of writing test-suites: each test is just a function whose name starts with 'test_'

def test_foo():
    # the easiest way to make test cases is probably to (after confirming your confidence in the code currently being correct) ..
    # make dtext() log its input and output in exactly the format that could be pasted in here. 
    # (I'm using triple-quotes here because I expect most actual input/output pairs will have multi-line strings.)
    assert dtext("""foo""") == """bar"""

# I guess you could also make a test case for the failing 'DText help' page by pulling the correct output from E621's rendering.

Updated 2 days ago

Donovan DMC

Former Staff

2 days ago

DText is built to C++, so you can probably make bindings directly to the dtext code e6 uses
https://github.com/e621ng/dtext/blob/master/ext/dtext/dtext.cpp

The ruby bindings are here, it returns [string, post_ids[]]

GreyCat8

Member

2 days ago

savageorange said:
All you need to make your code importable, given that you're already doing the if __name__ == "__main__": thing, is to have a proper filename (_ is allowed in module names, - isn't)

I know. I had to use dglut = __import__('D-glut') a couple of prototypes i used for implementing new features based on SQL.

pyTest or Nose2 could help you with that, though I know constructing test cases is tedious.

I just had several tabs open with each type of request (front page, posts (listing), posts (search), single post, post (parent/child posts), pools, pool, wikis, wiki page (e621:index, e621:cheatsheet, e621:dtext)) and manually tested - restarted server, waited up to 60 seconds because of Address already in use, refreshed pages, saw if anything is broken, fix broken stuff, grind and repeat.

Updated 1 day ago

GreyCat8

Member

1 day ago

Huge update

Restructured python code

Fixes to control.js which caused anchor links to not work

Tweak themes for better readability (also removed style_ivory2 because it sucks)

Tweak UI

Implement fav_count

Simplify & improve dedump scripts

e1547toTWSrename is now integrated into autosort

Address the fact that i accidentally included avifdec.exe - now it's intended, not accidental. Include dwebp.exe (even though it's unused).

Update admin panel, implement maintenance mode.

greycat8 said:
Man i really should change my bad habit of not tracking versions

Pre-restructure code is now on legacy branch and will not be updated anymore

GreyCat8

Member

1 day ago

greycat8 said:
In the end, using Zepiwolf's The Wolf's Stash, i've downloaded 221981 posts (~4% of e621) which, even compressed (transcoded into webp/avif and ditching the alpha-channel), take up 150 GB.

If this is not a secret, just how big is e621?

From my estimates on my 4,5% subset (which is compressed tho) the whole e621 would take up somewhere in 3.5-5 TB range, which could probably even fit on one disk if i was i fucking millionaire or something. Despite how good ai is for coding, i'll never forget and never forgive ram and ssd prices

savageorange

Member

about 23 hours ago

greycat8 said:
If this is not a secret, just how big is e621?

https://e621.net/stats

Currently 11.3 TB of files. (if you are asking about DB size, stats page doesn't have that.)

GreyCat8

Member

about 12 hours ago

Update

I can't believe adding thumbnailing was this easy

News: May 04, 2025 Show

Topic: My journey here, or how I ended up making my own selfhosted e621like