Topic: Process for uploading superior verion of a post

Posted under General

Hi! I was curious about the procedure for uploading a superior version of an existing post where the sole difference is file size—not quality, not even file format, just the number of bytes.

For some background, I recently searched order:filesize and noticed how massive some of the images are (over 100 MiB each for the first few posts). post #4572247 is a perfect example: a PNG over 96 MiB in size. I downloaded the image and ran it through a program called ImageOptim and got a pixel-for-pixel-identical PNG file under 30 MiB in size, and that’s without stripping any metadata or other information. This size reduction would improve load times for users, especially those with poor connections, and presumably, save a few cents on server storage. The program I use can losslessly recompress PNGs, JPGs, GIFs, and SVGs (should they ever be accepted here). In my experience, JPG photographs will usually compress about 9% smaller—artwork might compress more—and PNGs anywhere from <1% to 80%, depending on the complexity and detail of the image. To get the smallest possible PNGs, I run them through the program a second time. That second pass almost always shaves off several (hundred) more kilobytes.

Given that, is there a (relatively easy) way to upload these optimized files? Or is that even acceptable to do, per e621’s current policies? To avoid overwhelming mods/janitors/staff, I would only upload if the space reduction is significant, as with the example above. I would almost certainly upload no more than once per day, if that. (It can easily take 15 minutes just to run the program on a file anyway.)

Thank you!

manitka said:
Optimized files are not accepted I’m pretty sure.

Okay, then. I have to admit, the tech nerd in me is a little disappointed, but I can understand. Thank you!

perihelia said:
Okay, then. I have to admit, the tech nerd in me is a little disappointed, but I can understand. Thank you!

You can check with a mod or janitor but I think we prefer the original files as they were made

If you want to understand why optimised files (of already existing posts) arent accepted, it's to prevent people from taking an already existing post and going down an optimisation spiral. It also wouldn't be fair to the original uploader, since they would lose a bit of their uploading limit.

Thank you for that explanation! That rationale makes perfect sense, although I am curious now why it affects the original uploader’s limits, and not mine, for instance. I assume it’s just a limitation of the system, or perhaps meant to penalize those who frequently upload low-quality files.

perihelia said:
meant to penalize those who frequently upload low-quality files.

Hit the nail on the head there
It's only really a problem for people when they start racking up dozens of deletions, it takes 40 deletions (assuming zero approvals) to lose the ability to upload

donovan_dmc said:
It's only really a problem for people when they start racking up dozens of deletions, it takes 40 deletions (assuming zero approvals) to lose the ability to upload

Oh, wow. I see how that’s problematic. (I’ve never uploaded anything, so I’m incredibly ignorant about how that whole system works.)

It’d be cool if at some point in the future there was a well-regulated way to upload optimizations. (Perhaps only files larger than 1 MiB would be eligible for reupload, and only when the new file is at least 20% smaller than before, and all the embedded metadata is identical, and only by privileged users, etc.) I say this, in part, because I always browse e621 in a private tab, and images (seemingly) aren’t cached on my computer. For small images, that’s barely noticeable; for large ones though, it’s definitely surprising whenever it happens.

perihelia said:
Oh, wow. I see how that’s problematic. (I’ve never uploaded anything, so I’m incredibly ignorant about how that whole system works.)

It’d be cool if at some point in the future there was a well-regulated way to upload optimizations. (Perhaps only files larger than 1 MiB would be eligible for reupload, and only when the new file is at least 20% smaller than before, and all the embedded metadata is identical, and only by privileged users, etc.) I say this, in part, because I always browse e621 in a private tab, and images (seemingly) aren’t cached on my computer. For small images, that’s barely noticeable; for large ones though, it’s definitely surprising whenever it happens.

Beyond the previously mentioned reasons, optimization is also unwanted because it changes file data, adding an additional step in verifying that posts are identical between source and e621. Inkbunny addresses this by storing the pre-optimization MD5 hash for comparison and reverse search purposes, but it also strips color profile information, affecting the visual information of a post despite purporting to be lossless.

In the interest of archival, I don't think it's a good idea to allow users (or even the site) to modify the data of a resource. The file is no longer intact as it was, and it can't be studied later outside of the visual information, reducing its value as a piece of history. A lot of metadata includes information such as which art program was used to create it, providing valuable insight into the tools and processes of the time that would otherwise be lost, especially if the original source is later deleted.

Updated

song said:
Beyond the previously mentioned reasons, optimization is also unwanted because it changes file data, adding an additional step in verifying that posts are identical between source and e621. Inkbunny addresses this by storing the pre-optimization MD5 hash for comparison and reverse search purposes…

Ah, I wasn’t thinking about hashes. I can see how that’s tricky to get right.

song said:
…but it also strips color profile information, affecting the visual information of a post despite purporting to be lossless.

…A lot of metadata includes information such as which art program was used to create it, providing valuable insight into the tools and processes…

For what it’s worth, the level of optimization that I use (and am in favor of) is truly lossless and does not strip metadata or reduce/remove any color channels. There is a setting I could enable to remove all that, but then you lose all the important things you mentioned: copyright information, camera settings (not relevant for e621 of course, but otherwise very useful and near ubiquitous), information about the program used, and color space information, which, unless it’s sRGB or Generic Gray 2.2 will affect rendering, as you said. So I think we’re largely in agreement on that point.

If I ignore everything but MD5 hashing for a moment, and if I assume it’s possible for e621 to decode specific data from inside an accepted image format and hash it—I’ve no doubt that’s incredibly challenging, if not entirely off limits—could you hash the metadata and decompressed pixel data separately to later verify that optimizations are in fact lossless? (I completely respect the position and approach of e621 on this issue, I‘m just fascinated by the technical factors you mentioned.) The element ordering of a data sequence being hashed affects the resulting hash, so this would enforce a slightly stronger notion of “lossless” than what I was thinking of, where the relative order of the components in the metadata must also be preserved. As you mentioned with Inkbunny, storing a hash of the original file is also necessary, meaning in my hypothetical three separate hashes would be stored for every post.

Stop trying to force the optimisation question, it is never going to happen.

MD5 is important as it helps to keep track of what the original file is.
Without it, malicious users can:

  • Flag any post for being "unoptimised",
  • Mask pirated content in the guise of it being "optimised",
  • Repeatedly upload previously-deleted content.

Bear in mind that the janitors also have to check every source before approving a post to make sure the poster got it from where they said they did.

thegreatwolfgang said:
Stop trying to force the optimisation question, it is never going to happen.

Taking a step back, I can see how the direction and tone of the thread started to derail with my response to moderator Donovan. (Specifically, with my last paragraph: “It’d be cool if…”.) That’s entirely my fault, and I apologize for doing that. In hindsight, I should have ended the conversation there. It’s too late for that, so I‘ll end the conversation here.

Thank you again to Mantika, SNPtheCat, Donovan DMC, Song, and TheGreatWolfgang. I really appreciate you all taking the time to inform me.

  • 1