Describe the desired feature We currently (since <a class="commit-

Just like <a class="user-mention notranslate" data-hovercard-type="user" data-hovercar

I'm not convinced by either of those excuses. T

Just like <a class="user-mention notranslate" data-hovercard-type="user"

Move image copyrights to image about wesnoth HOT 22 OPEN

doofus-01 commented on June 7, 2024

Move image copyrights to image

from wesnoth.

Comments (22)

doofus-01 commented on June 7, 2024 2

I finally got out of that github cathole, and have a mostly working metadata check. (My struggle, that completely trashed the commit history of one of my repositories, seems to have more to do with github webhooks than anything purely git related.)

https://github.com/doofus-01/wesnoth-Bad_Moon_Rising/actions/runs/9238915366/job/25417430535 passes because the touched image has an artist tag and an accepted copyright tag.
https://github.com/doofus-01/wesnoth-Bad_Moon_Rising/actions/runs/9238927894/job/25417459952 fails because the copyright is not an accepted license.

Figuring out how to make a PR for this sounds like another cathole I don't want to squish around in right now, probably best to get something mostly finished first. I'll update the first post when there's progress.

But for now, the specific check I'm looking at is:

Fail if an image is updated or added without Artist or Copyright EXIF tag
Fail if an image has both tags, but the copyright isn't an acceptable one. In the checks linked above, the list is
- GNU GPL v2+
- CC BY-SA 4.0
  But other licenses should probably be added. Public Domain and CC0 are obvious candidates, but I'm not sure what that means as far as generative AI, so I'd rather keep that in a separate issue - one that may need to be resolved before this is mainlined, but still not here.
Should the Artist tag be checked against some list, so the check fails if there is an unrecognized contributor? I think so, this would be a flag that the credits/about need to be updated, but maybe this can be added later.
Is there any other tag that should be included?

from wesnoth.

cooljeanius commented on June 7, 2024 1

One problem with this is that some of the tools for keeping image size down will strip EXIF data; I'm not sure if woptipng.py does this, but many similar tools do...

from wesnoth.

Pentarctagon commented on June 7, 2024 1

Even if the copyright information is kept in the images themselves, we could still keep a file around with the hashes of all the images files and check against that.

For the actual question: IIRC it might have something to do with the shallow checkout (--depth=1)? I vaguely remember that being a problem when I tried implementing the current python script, which is why it keeps track of the file hashes to avoid needing to clone the entire repository history since that can take multiple minutes by itself.

from wesnoth.

Elvish-Hunter commented on June 7, 2024

Just like @cooljeanius said, I remember that one of the things that utils/woptipng.py (and probably also utils/optiwebp.py) does to reduce the file size is to remove all those metadata. Most image editors can also edit the EXIF fields, even putting some default values that can overwrite what's already there.
To me, it looks like there are too many ways to accidentally mess up the EXIF metadata, so we shouldn't rely on them.

from wesnoth.

Pentarctagon commented on June 7, 2024

I could also remove it from the CI check and then update everything en-masse as part of the release work for each point release.

from wesnoth.

doofus-01 commented on June 7, 2024

I'm not convinced by either of those excuses.

There are ways to mess up anything, how is that a reason? CI fails if there's an extra white space, it should be able to handle this.
Removing the 200 bytes or whatever it is from the EXIF data is useless hyper-optimization.

from wesnoth.

doofus-01 commented on June 7, 2024

I could also remove it from the CI check and then update everything en-masse as part of the release work for each point release.

That sounds like it would shift the meta-work to you, and doesn't seem right if there is a better way. Maybe there isn't a better way, but I'd like to try.

A side benefit of using the EXIF data is that the artists, Wesnoth, and CC licenses could get marginally more exposure, as the image is grabbed and reused.

from wesnoth.

Pentarctagon commented on June 7, 2024

Just like @cooljeanius said, I remember that one of the things that utils/woptipng.py (and probably also utils/optiwebp.py) does to reduce the file size is to remove all those metadata.

Probably using EXIF data in the webp files would be fine, since optiwebp doesn't do anything besides check if converting from png/jpg to webp results in a meaningful size reduction. It doesn't touch existing webp images or do anything else to reduce file size, though I have no idea if EXIF data would survive the conversion to webp for existing images.

from wesnoth.

soliton- commented on June 7, 2024

I think this is definitely worth exploring. Having author and copyright info in the files themselves would mean they are still there even if someone uses them in their own project without copying the copyright.csv entries in some form.

Currently utils/woptipng.py calls convert -strip which removes nearly all meta data. Didn't test utils/optiwebp.py since you can't even run it on a path you want. cwebp itself appears to only care for meta data when given the -metadata option.

So our tooling would certainly need adjusting. Btw, not stripping meta data at all is not a good option as can be seen in e3ee472 for example where image sizes increase by thousands of percent in size just because of the added meta data.

from wesnoth.

doofus-01 commented on June 7, 2024

I've got a bit of a learning curve to get through to actually make this happen, but it looks like it should be possible to do this in CI:

Read new/changed images for specific EXIF tags
Fail CI if the specific metadata is missing (and maybe there should be a test that there isn't extra data?)

Then as a manually run script (not necessarily all one script):

Read the existing CSV and update the existing images to have the metadata.
Read through the images and export the metadata to some-human readable file, whether that's the current CSV or some updated format.

from wesnoth.

doofus-01 commented on June 7, 2024

Well, I'm stuck... Nothing really specific to this image stuff, just git fun and inexperience with github workflows.

I'm trying to get a list of files in a pull request that are different than the merge base - a very basic and common task, one would think. There's an example in the map-diff tool, and a few others in stack overflow, but I can't seem to get this. It doesn't help that there's not really a way to test this locally, AFAICT.

All my attempts to get a diff result in either a clean & empty diff or a fatal error.
How do I reference the thing in the blue box, (triggered by actions/checkout@v4)?

git merge-base --fork-point sounds promising, but hasn't helped yet.

Any help appreciated. Thanks.

from wesnoth.

doofus-01 commented on June 7, 2024

Even if the copyright information is kept in the images themselves, we could still keep a file around with the hashes of all the images files and check against that.

Right, I was trying to say that was part of the deal: there is a bookkeeping file that is human readable, and there are scripts to deal with it. It's just not part of CI; the images and whatever .cfg or .lua references them are all that need be touched in a PR. Storing the image hashes there makes sense, and there is no reason the bookkeeping file can't be automatically updated. Eventually... maybe...

For the actual question: IIRC it might have something to do with the shallow checkout (--depth=1)? I vaguely remember that being a problem when I tried implementing the current python script, which is why it keeps track of the file hashes to avoid needing to clone the entire repository history since that can take multiple minutes by itself.

I hadn't looked at that, it's something to try, thanks. It's very frustrating that the info is right there, it just can't be touched without a whole lot of BS overhead.

from wesnoth.

gfgtdf commented on June 7, 2024

I am a bit worried that this will result in a noticable size increase of the wesnoth github repo since it could mean that we will change the image files more often in the future. Unlike text files, git usually cannot handle binay files very well and afaik any change to them is stored as a full replacement of the image file.

from wesnoth.

doofus-01 commented on June 7, 2024

I am a bit worried that this will result in a noticable size increase of the wesnoth github repo since it could mean that we will change the image files more often in the future. Unlike text files, git usually cannot handle binay files very well and afaik any change to them is stored as a full replacement of the image file.

How worried are you, really?

I'm not seeing it, you've got to explain.

from wesnoth.

Pentarctagon commented on June 7, 2024

@gfgtdf I assume that'd be pretty easy to test out?

from wesnoth.

gfgtdf commented on June 7, 2024

How worried are you, really?
I'm not seeing it, you've got to explain.

So i'm not really an expert on the git details, so what i'm saying might be wrong, but i think i remember from an earlier discussion that it's a bad idea to change binary files too often in git repos because the (git history) compression cannot really store diffs of them, so that the git history can become very large if binary files get changed often. (a quick search on the internet gave me different contradictory results on this topic so i actually don't know. Some sources say this only applies extra compressed binary files and that compression part of git doesn't really make a diffrence between binary and text files, and other pages say git simply cannot do this for any kind of binary files at all)

I also don't know how often this change will actually change the files, (except once of course to add the first metadata, which would be fine in i any case i guess). My worries were more about cases where we later decide to add extra data, change the format, add authors etc.

@gfgtdf I assume that'd be pretty easy to test out?

Its a bit tricky actually since one needs to make sure to compare compressed git histoy to compressed git history, but sureley possible.

from wesnoth.

Pentarctagon commented on June 7, 2024

Its a bit tricky actually since one needs to make sure to compare compressed git histoy to compressed git history, but sureley possible.

I mean that all you'd need to do would be to make a metadata change and then check how much data changed in the diff, wouldn't you? If the diff only says that 50 bytes where added, then surely git doesn't do a full replacement anyway.

from wesnoth.

CelticMinstrel commented on June 7, 2024

If the diff only says that 50 bytes where added, then surely git doesn't do a full replacement anyway.

I doubt you can make such an assumption. It's certainly possible to do a binary diff which works out very similar in functionality to a text diff, but even if only 50 bytes were added, there's a pretty high chance that a lot more bytes were changed. And there's a point where the diff would end up being bigger than the original file. I'm not sure what it would take to hit that point with image or sound files though.

from wesnoth.

Pentarctagon commented on June 7, 2024

Based on https://stackoverflow.com/a/59346690 and https://stackoverflow.com/a/53648035, the answer seems to be:

initially binary files are stored as completely separate copies
after git packs them (either every so often automatically or after running git gc), then it will store the changes as a delta between the two binary files

So I don't think ballooning the git history is a reason this can't be done.

from wesnoth.

doofus-01 commented on June 7, 2024

I also don't know how often this change will actually change the files, (except once of course to add the first metadata, which would be fine in i any case i guess). My worries were more about cases where we later decide to add extra data, change the format, add authors etc.

This was the source of my confusion; I don't understand why we'd suddenly start changing this info frequently, if at all. Moot point, I guess, if what Pentarctagon found is true. BTW, where does one find a repository's resource usage?

If we continue to use the copyrights.csv as the readable export list, it would be the author/artist tag and license/copyright that are used, and no reason a comment/notes tag can't be added where needed. The date, directory, and hash are for tracking, I guess they could still be added to the CSV, but I don't see a reason to add them to the Exif tags.

from wesnoth.

Pentarctagon commented on June 7, 2024

What do you mean by a repository's resource usage, specifically?

from wesnoth.

CelticMinstrel commented on June 7, 2024

Probably something similar to the total size of the repository history on disk?

from wesnoth.

Move image copyrights to image about wesnoth HOT 22 OPEN

Comments (22)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent