Giter VIP home page Giter VIP logo

soph's Introduction

Soph

This is a simple utility to import pictures while handling duplicates gracefully.

Note that this is yet unstable. Current versions will almost certainly not work with future versions, due to differences in hash formatting (until I’ve added version migration support). This isn’t a big problem however, because it’s possible to just reimport all pictures.

Usage

To import all pictures from directory imports to collection:

$ soph imports collection

If similar images have been found after processing, a feh window will open with all of them (the new picture and the similar ones in the collection). Use the arrow keys to scroll through them and delete the one(s) you don’t want by pressing <Enter>, then quit with q.

Importing means: Copy the file into collection while giving it a hash-based filename. This allows a simple directory listing of collection to extract all this information. So in a way the filenames act as a database.

Here is an example run with one new image, one similar one and one exactly the same.

$ soph imports collection
[Info#init] Reading hashdir, decoding filenames and initializing database
[Info#process] Starting image processing of 3 files in import directory
[Info#process] New image at imports/002.jpg: 1 similar image(s) found
[Info#process] New image at imports/001.png: Already present as collection/b0ec08147fc1b495-0e02fe1b61760fa06703f87e8388780b01ff.png, removing the import file
[Info#process] New image at imports/003.png: New image, importing it
[Info#process] Importing new file imports/003.png into library to path collection/dea63899c760cc0b-f9f81f001c3980f81fc1fc07e0ce0ce0cecc.png
[Info#similars] Processing 1 similar images
[Info#process] New image at imports/002.jpg: 1 similar image(s) found
[Info#similar] File imports/002.jpg to import has 1 similar images:
[Info#similar]   collection/3cc098ca1efed3c5-18f307b231870071ff0f20f17f01fb01f00f.png
[Info#similar]   Opening them and the one to import in feh, delete the ones you don't want with <Enter>, then quit feh with <q>
[Info#process] Importing new file imports/002.jpg into library to path collection/6d384ae4863fc970-18f307b233870071df0f20f17f01fb01f00f.jpg
[Info] Finished import of 2 images

Note: The output is a bit misleading, since it reports the same picture twice. This is due to it actually first doing a pass through all images without asking the user for similar pictures first (it just skips them), then when that’s done it does another pass through all the similar pictures with actually asking the user this time and doing the appropriate action. Really desirable on large imports. Output will be made cleaner in future versions.

How it works

The above command will do the following

  1. Do a file listing of collection to know previously imported images
  2. Process every picture in imports (recursively)
  3. Import every picture in imports into collection
    • If the new picture is already present (same content hash), delete it
    • If the new picture is not yet present, import it
    • If the new picture has similar images present, open all of them in feh, allowing you to delete the ones you don’t want. If after that the new one wasn’t deleted, import it.

Installing

Nix (recommended)

The preferred way to install is with Nix:

$ nix-env -if .

Most derivations are cached by cache.nixos.org so it won’t take too long to build.

This also automatically makes sure feh is available.

Stack

If you don’t have Nix but Stack, you can (probably) install it with:

$ stack install

You also need to install feh.

soph's People

Contributors

infinisil avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

soph's Issues

Find better name

The initial version was a terminal where you could enter a hash and it would search for it, which is why I named it hashsearch. But now this changed, and it's a smart deduplicating importer.

Potential candidates:

  • ddimp (DeDuplicating IMPorter)
  • himp (Haskell IMPorter)
  • infimp (INFinisil IMPorter)

They all kinda suck

Progress information

The logs currently only output which item they're processing right now. It would be nice to see the progress, how many done, how many left to do.

Other set operations

Picture collections can be viewed as a set of distinct images. What the current version does (deduplicating import from the import directory into the hash directory) is just a combination of reading a list of pictures into a set, and doing the union between two sets. It should be possible to implement other set-like operations like intersection, difference.

  • Intersection corresponds to "having these two image collections, which ones are present in both of them?"
  • Difference corresponds to "Which pictures are in one but not in the other collection?"

Both of those are useful from a practical standpoint.

Output summary of actions

After execution, should output a summary of all actions taken.

This should contain information such as which pictures were duplicates of each other, how long it took, how many errors were encountered, how many similar images were found, what files have been moved where.

Support deferred action modes

Deferred action should first run all calculations without moving any files and then when done, the user can decide whether to execute the moves.

Make nix-build'able without any -A

Currently default.nix evaluates to a haskell packages set, so you need nix-build -A hashsearch to only build this package. It would be nice to be able to do nix-build to build it.

Best done via passthru.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.