Comments (22)
I finally got out of that github cathole, and have a mostly working metadata check. (My struggle, that completely trashed the commit history of one of my repositories, seems to have more to do with github webhooks than anything purely git related.)
- https://github.com/doofus-01/wesnoth-Bad_Moon_Rising/actions/runs/9238915366/job/25417430535 passes because the touched image has an artist tag and an accepted copyright tag.
- https://github.com/doofus-01/wesnoth-Bad_Moon_Rising/actions/runs/9238927894/job/25417459952 fails because the copyright is not an accepted license.
Figuring out how to make a PR for this sounds like another cathole I don't want to squish around in right now, probably best to get something mostly finished first. I'll update the first post when there's progress.
But for now, the specific check I'm looking at is:
- Fail if an image is updated or added without Artist or Copyright EXIF tag
- Fail if an image has both tags, but the copyright isn't an acceptable one. In the checks linked above, the list is
- GNU GPL v2+
- CC BY-SA 4.0
But other licenses should probably be added. Public Domain and CC0 are obvious candidates, but I'm not sure what that means as far as generative AI, so I'd rather keep that in a separate issue - one that may need to be resolved before this is mainlined, but still not here.
- Should the Artist tag be checked against some list, so the check fails if there is an unrecognized contributor? I think so, this would be a flag that the credits/about need to be updated, but maybe this can be added later.
- Is there any other tag that should be included?
from wesnoth.
One problem with this is that some of the tools for keeping image size down will strip EXIF data; I'm not sure if woptipng.py
does this, but many similar tools do...
from wesnoth.
Even if the copyright information is kept in the images themselves, we could still keep a file around with the hashes of all the images files and check against that.
For the actual question: IIRC it might have something to do with the shallow checkout (--depth=1
)? I vaguely remember that being a problem when I tried implementing the current python script, which is why it keeps track of the file hashes to avoid needing to clone the entire repository history since that can take multiple minutes by itself.
from wesnoth.
Just like @cooljeanius said, I remember that one of the things that utils/woptipng.py
(and probably also utils/optiwebp.py
) does to reduce the file size is to remove all those metadata. Most image editors can also edit the EXIF fields, even putting some default values that can overwrite what's already there.
To me, it looks like there are too many ways to accidentally mess up the EXIF metadata, so we shouldn't rely on them.
from wesnoth.
I could also remove it from the CI check and then update everything en-masse as part of the release work for each point release.
from wesnoth.
I'm not convinced by either of those excuses.
-
There are ways to mess up anything, how is that a reason? CI fails if there's an extra white space, it should be able to handle this.
-
Removing the 200 bytes or whatever it is from the EXIF data is useless hyper-optimization.
from wesnoth.
I could also remove it from the CI check and then update everything en-masse as part of the release work for each point release.
That sounds like it would shift the meta-work to you, and doesn't seem right if there is a better way. Maybe there isn't a better way, but I'd like to try.
A side benefit of using the EXIF data is that the artists, Wesnoth, and CC licenses could get marginally more exposure, as the image is grabbed and reused.
from wesnoth.
Just like @cooljeanius said, I remember that one of the things that utils/woptipng.py (and probably also utils/optiwebp.py) does to reduce the file size is to remove all those metadata.
Probably using EXIF data in the webp files would be fine, since optiwebp doesn't do anything besides check if converting from png/jpg to webp results in a meaningful size reduction. It doesn't touch existing webp images or do anything else to reduce file size, though I have no idea if EXIF data would survive the conversion to webp for existing images.
from wesnoth.
I think this is definitely worth exploring. Having author and copyright info in the files themselves would mean they are still there even if someone uses them in their own project without copying the copyright.csv entries in some form.
Currently utils/woptipng.py calls convert -strip which removes nearly all meta data. Didn't test utils/optiwebp.py since you can't even run it on a path you want. cwebp itself appears to only care for meta data when given the -metadata option.
So our tooling would certainly need adjusting. Btw, not stripping meta data at all is not a good option as can be seen in e3ee472 for example where image sizes increase by thousands of percent in size just because of the added meta data.
from wesnoth.
I've got a bit of a learning curve to get through to actually make this happen, but it looks like it should be possible to do this in CI:
- Read new/changed images for specific EXIF tags
- Fail CI if the specific metadata is missing (and maybe there should be a test that there isn't extra data?)
Then as a manually run script (not necessarily all one script):
- Read the existing CSV and update the existing images to have the metadata.
- Read through the images and export the metadata to some-human readable file, whether that's the current CSV or some updated format.
from wesnoth.
Well, I'm stuck... Nothing really specific to this image stuff, just git fun and inexperience with github workflows.
I'm trying to get a list of files in a pull request that are different than the merge base - a very basic and common task, one would think. There's an example in the map-diff tool, and a few others in stack overflow, but I can't seem to get this. It doesn't help that there's not really a way to test this locally, AFAICT.
All my attempts to get a diff result in either a clean & empty diff or a fatal error.
How do I reference the thing in the blue box, (triggered by actions/checkout@v4
)?
git merge-base --fork-point
sounds promising, but hasn't helped yet.
Any help appreciated. Thanks.
from wesnoth.
Even if the copyright information is kept in the images themselves, we could still keep a file around with the hashes of all the images files and check against that.
Right, I was trying to say that was part of the deal: there is a bookkeeping file that is human readable, and there are scripts to deal with it. It's just not part of CI; the images and whatever .cfg or .lua references them are all that need be touched in a PR. Storing the image hashes there makes sense, and there is no reason the bookkeeping file can't be automatically updated. Eventually... maybe...
For the actual question: IIRC it might have something to do with the shallow checkout (--depth=1)? I vaguely remember that being a problem when I tried implementing the current python script, which is why it keeps track of the file hashes to avoid needing to clone the entire repository history since that can take multiple minutes by itself.
I hadn't looked at that, it's something to try, thanks. It's very frustrating that the info is right there, it just can't be touched without a whole lot of BS overhead.
from wesnoth.
I am a bit worried that this will result in a noticable size increase of the wesnoth github repo since it could mean that we will change the image files more often in the future. Unlike text files, git usually cannot handle binay files very well and afaik any change to them is stored as a full replacement of the image file.
from wesnoth.
I am a bit worried that this will result in a noticable size increase of the wesnoth github repo since it could mean that we will change the image files more often in the future. Unlike text files, git usually cannot handle binay files very well and afaik any change to them is stored as a full replacement of the image file.
How worried are you, really?
I'm not seeing it, you've got to explain.
from wesnoth.
@gfgtdf I assume that'd be pretty easy to test out?
from wesnoth.
How worried are you, really?
I'm not seeing it, you've got to explain.
So i'm not really an expert on the git details, so what i'm saying might be wrong, but i think i remember from an earlier discussion that it's a bad idea to change binary files too often in git repos because the (git history) compression cannot really store diffs of them, so that the git history can become very large if binary files get changed often. (a quick search on the internet gave me different contradictory results on this topic so i actually don't know. Some sources say this only applies extra compressed binary files and that compression part of git doesn't really make a diffrence between binary and text files, and other pages say git simply cannot do this for any kind of binary files at all)
I also don't know how often this change will actually change the files, (except once of course to add the first metadata, which would be fine in i any case i guess). My worries were more about cases where we later decide to add extra data, change the format, add authors etc.
@gfgtdf I assume that'd be pretty easy to test out?
Its a bit tricky actually since one needs to make sure to compare compressed git histoy to compressed git history, but sureley possible.
from wesnoth.
Its a bit tricky actually since one needs to make sure to compare compressed git histoy to compressed git history, but sureley possible.
I mean that all you'd need to do would be to make a metadata change and then check how much data changed in the diff, wouldn't you? If the diff only says that 50 bytes where added, then surely git doesn't do a full replacement anyway.
from wesnoth.
If the diff only says that 50 bytes where added, then surely git doesn't do a full replacement anyway.
I doubt you can make such an assumption. It's certainly possible to do a binary diff which works out very similar in functionality to a text diff, but even if only 50 bytes were added, there's a pretty high chance that a lot more bytes were changed. And there's a point where the diff would end up being bigger than the original file. I'm not sure what it would take to hit that point with image or sound files though.
from wesnoth.
Based on https://stackoverflow.com/a/59346690 and https://stackoverflow.com/a/53648035, the answer seems to be:
- initially binary files are stored as completely separate copies
- after git packs them (either every so often automatically or after running
git gc
), then it will store the changes as a delta between the two binary files
So I don't think ballooning the git history is a reason this can't be done.
from wesnoth.
I also don't know how often this change will actually change the files, (except once of course to add the first metadata, which would be fine in i any case i guess). My worries were more about cases where we later decide to add extra data, change the format, add authors etc.
This was the source of my confusion; I don't understand why we'd suddenly start changing this info frequently, if at all. Moot point, I guess, if what Pentarctagon found is true. BTW, where does one find a repository's resource usage?
If we continue to use the copyrights.csv
as the readable export list, it would be the author/artist tag and license/copyright that are used, and no reason a comment/notes tag can't be added where needed. The date, directory, and hash are for tracking, I guess they could still be added to the CSV, but I don't see a reason to add them to the Exif tags.
from wesnoth.
What do you mean by a repository's resource usage, specifically?
from wesnoth.
Probably something similar to the total size of the repository history on disk?
from wesnoth.
Related Issues (20)
- EI: In S16 Eleventh Hour, Decorative Allied Units (Civilians) Can Block Recruitment HOT 3
- Unit UI: add the possibility to show a UMC stat HOT 7
- [Units] Give Elemental units their own race (`race=elemental`) HOT 10
- ``allow_era`` for MP campaigns HOT 2
- language_selection: System default language percentage is incorrect HOT 5
- WML: `[move_unit]` tag should support the `check_zoc` key, and others HOT 4
- tooltips being recreated on every mouse move HOT 10
- MP Campaigns: ``leader_lock=yes`` when no ``type=`` defined HOT 6
- editor: Save into addon when saving a scenario HOT 6
- editor: Pick proper dir and filename in Save Map/Scenario As dialog HOT 10
- EI: Baneblade Item's Weapon Has a Hidden "Always Use On Defence" Effect That Should be Included in its Description HOT 2
- URL hitboxes on centered text are left-aligned HOT 1
- WFL referencing unit variables strange number manipulation HOT 3
- Make elevation terrain invisible on the minimap
- Images placed with `[item]` remain on the map when resetting a replay HOT 2
- Exception caught too late when opening a help page with invalid utf8 HOT 1
- Campaignd should tell UMC authors to rename MyAwesomeAddon HOT 4
- [object] can carry garbage data to next scenario HOT 6
- Deprecate [filter_weapon] in weapon special unit filters HOT 8
- Weapon special filters can be affected by the weapon special leading to infinite recursion HOT 4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from wesnoth.