Giter VIP home page Giter VIP logo

picdedupe's Introduction

WARNING This is very, very early work. It is probably not useful to you at this point.

picdedupe

A command line tool (as well as a re-usable library) to aid in dealing with a large number of incoming pictures (e.g. from a camera or phone), before adding them to your collection.

The Sales Pitch

Imagine you have one or more nicely kept collections of pictures (e.g. your family pics). Now imagine you find an SD card around the house, or you want to back up your mom's iPhone. You might want to add some of these pictures to your collection. But you probably don't want to add all pictures. More specificially:

  • You DON'T want exact duplicates.
  • You DON'T want lower quality images of what you have already:
    • JPEG versions (when you HEIC).
    • Lower resolution or cut-down versions.
  • You DON'T want screenshots.
  • You MAY NOT want 10 pictures of the exact same thing, even if they are slightly different.
  • You DON'T want the .mov part of a Live Photo you have removed.

picdedupe helps you with all of these, in modular, expandable way!

It will try to do The Right Thing™. But you can change this if you don't agree. You can make it ask you for every individual case as well!

picdedupe is fast! On its first run, it needs to index your collection, which might take a while, depending on the size. But after that, it can reuse that index to cut down on redundant steps.

Fixits

picdedupe has a modular architecture. The common problems described above are detected by a Fixit. Each one can offer one or more solutions in the form of a FixitAction. You can easily write these these yourself. But picdedupe already comes with a few out-of-the-box:

  • Exact duplicate file

    • default: move the candidate file to ./_dupes, leaving a note
    • or: soft-delete the candidate file to ./_trash
    • or: ignore and do nothing
  • The file date does not match the image date

    • default: update the file date to match
    • or: ignore and do nothing
  • Similar image, worse quality

    • default: move the candidate file to ./_similar, leaving a note
    • or: soft-delete the candidate file to ./_trash
    • or: ignore and do nothing
  • Similar image, better quality

    • default: ignore and do nothing
    • or: add the file into the collection & move the collection file to ./_similar, leaving a note
  • Found the .mov file of a Live Photo (and the .jpg is there)

    • default: Rename the file to .jpg.mov
    • or: move to ./_live so we can double-check if we like them.
    • or: soft-delete the candidate file to ./_trash
    • or: ignore and do nothing
  • Found the .mov file of a Live Photo (and the .jpg is missing!)

    • default: soft-delete the candidate file to ./_trash
    • or: ignore and do nothing
  • Found a .png file (most likely a screenshot)

    • default: soft-delete the candidate file to ./_trash
    • or: ignore and do nothing

Series & Groups

picdedupe can work on individual files. But it is also aware of files belonging together.

A FileGroup is a collection of files that represent the same image. For example:

  • A .jpg and .mov mag form a single Live Photo.
  • A .jpg and its RAW should probably stay together.

A FileSeries is a collection of FileGroup that are taken around the same time, at the same location, by the same camera. The file numberings cannot have gaps.

Series allow picdedupe to make bigger decisions. For example, if it finds exact dupes of multiple files in the same FileSeries, it might assume that all files in it were already considered. And so it might consider the entire FileSeries as a dupe (even though not all the files can be found in the collection), saving you a lot of time!

Groups allow for similar operations accross the group. Examples:

  • Rename the .mov part of a Live Photo to .jpg.mov
  • Rename a .jpg version of an exist HEIC to .heic.jpg
  • Treat them as one when renaming, moving or deleting.

Configurable & Modular & Exandable

  • YOU set the default actions for all encountered scenarios.
  • YOU can write your own custom Fixit
  • YOU can write your own custom FixitAction
  • YOU can write another (GUI?) frontend
  • etc...

And if you do, feel free to add them into this project for everyone to use!

WARNING: THIS IS NOT READY TO BE USED BY ANYONE.

WARNING: USE THIS AT YOUR OWN RISK (OF LOSING YOUR PICTURES)! MAKE BACKUPS!

picdedupe's People

Contributors

superkoder avatar

Watchers

 avatar

picdedupe's Issues

Feature: Auto-Correct Wrong File Dates

When we have high-quality metadata that clearly indicates that an image was captured on a different date than what the filesystem says, we should be able to correct the filesystem's ctime/mtime.

We should still add a {filename}.txt so the User is aware of our change.

Implement "Live Photo" Steps

If a short movie of a picture (e.g. Apple's Live Photo) is detected, it should take the right steps:

  1. Name the file appropriately, and move it to the same location as the original.
  2. The picture should be (re)named filename.jpg (or similar)
  3. The short movie should be (re)named filename.jpg.mov (or similar)
  4. Locally, leave a {filename}.txt with info on what it did and where, for the User's verification.

As always, it is up to the User to verify this and to only do this when backups have been made.

Implement "Exact Dupe" steps

It can currently detect file dupes. But nothing is done yet, apart from printing that info.

It should probably:

  1. Double-check if it is indeed a dupe (rather than trusting MD5)
  2. Move the candidate file into a local ./_DUPES subfolder.
  3. Add a {filename}.txt with info on why it is considered a dupe.

It will be the User's task to check this folder and, eventually, remove it.

Implement "Higher Resolution Version" Steps

If a higher quality version of the same image is found, it should take the right steps:

  1. Name the file appropriately, and move it to the same location as the original.
  2. Rename the original to reflect that it is a derivative (e.g. filename_lowres.jpg or filename_small.jpg).
  3. Locally, leave a {filename}.txt with info on what it did and where, for the User's verification.

As always, it is up to the User to verify this and to only do this when backups have been made.

Implement "Better File Type Version" steps

If a better file format of the same image is detected (e.g. an HEIC of an existing JPEG), it should take steps:

  1. Name the file appropriately, and move it to the same location as the original.
  2. Rename the original to reflect that it is a derivative (e.g. filename.heic.jpg).
  3. Locally, leave a {filename}.txt with info on what it did and where, for the User's verification.

As always, it is up to the User to verify this and to only do this when backups have been made.

Resilience Against Weak Metadata

In today's version, we find a lot of false positives for similar pictures. That is because we compare metadata, and for some pictures, there is hardly any metadata to work with. It could be that pictures from the same day, with the same resolution, all map to each other.

We should be resilient against this, by not trying to do anything when the metadata is simply too weak to be conclusive. We should add a {filename}.txt file to explain the problem, though. Just so that the User is aware and can take it into account.

Detect RAWs of pictures

We are currently only looking for the most popular picture & video formats. We should expand this to include RAW files.

If we are certain about a RAW, we should take the right steps:

  1. Name the file appropriately, e.g. filename.jpg.CR2 and move it to the same location as the original. Alternatively, we might want to put all raws into a ./_RAW subfolder??
  2. Locally, leave a {filename}.txt with info on what it did and where, for the User's verification.

As always, it is up to the User to verify this and to only do this when backups have been made.

Resilience Against Not Running on Mac

This is made for macOS specifically (it uses the Spotlight commands). So it needs to check that it is, in fact, running on a macOS to avoid terrible things from happening.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.