Giter VIP home page Giter VIP logo

semantic-photos's Introduction

Semantic photos

Make you photo albums semantically search-able. Using scene captioning combined with tagged people and locations make your albums work better for you and recall memories faster and naturally.

Photo searching is typically restricted to tag filtering and keyword searching if images have description notes manually added. This app aims to make photo libraries easier and more natural to search through by creating a search index based on

  • image-to-text auto-captioning
  • reverse geo coding to add place name descriptions
  • descriptions of named people in the image

This app leverages your existing photo libraries and creates a light weight database to run searches without the need to copy, move or modify your original images. This currently supports JPEG, PNG, and HEIC image types.

Getting started

Clone the repository

git clone https://github.com/brainsqueeze/semantic-photos.git

Photo search databases can be build automatically using the builder. CLI options include

  -h, --help            show this help message and exit
  --geonames_user GEONAMES_USER
                        Username for Geonames API (default: None)
  --type {MACOS_PHOTO_LIBRARY,DIGIKAM_PHOTO_LIBRARY}
  --photo_lib_path PHOTO_LIB_PATH
                        Absolute path to the photo library to process (default: None)
  --chroma_path CHROMA_PATH
                        Override the path to the ChromaDB database (default: None)
  --album ALBUM         Album name to process (default: None)

An example build looks like

python -m semantic_photos.build \
    --type MACOS_PHOTO_LIBRARY \
    --photo_lib_path ~/Pictures/Photos Library.photoslibrary \
    --chroma_path ./database \
    --album album_1 \
    --album album_2 \
    --album album_3 \
    ...

This project uses the Geonames API to do reverse geocoding on lat/lon coordinates. You will need to create an account and use your username to authenticate with the service. Once you have a username you can pass it to the build tool as a CLI parameter like --geonames_user=<user_name>. Usernames can also optionally be set as an environmental variable:

export GEONAMES_USERNAME=<your_username>

Optionally set a path to cache the transformer models, image-to-text models, and ChromaDB files

export MODEL_CACHE_DIR=/cache_dir/<some_path>

Building a search database on your own photos

import os

from tqdm import tqdm

from semantic_photos.galleries.database import DigikamReader
from semantic_photos.geographies import GeonamesReverseGeocoder
from semantic_photos.models.caption import ImageCaption
from semantic_photos.models.documents import ImageVectorStore
from semantic_photos.models.schema import ImageData
from semantic_photos.utils import describe_people_in_scene, describe_geo_location

captioner = ImageCaption(device="cpu")  # if CUDA is available then set `device="cuda"`
rev_geo_coder = GeonamesReverseGeocoder(geonames_user=os.getenv("GEONAMES_USERNAME"))
docs = ImageVectorStore(os.environ["MODEL_CACHE_DIR"])

album_names = [...]

metadata = []
with DigikamReader(path="/absolute/path/to/photo/library") as db:
    albums = db.albums
    for album in album_names:

        for record in tqdm(
            db.stream_media_from_album(album_id=albums[album]["album_id"]),
            total=albums[album]["count"],
            desc=f"Loading {album}"
        ):
            meta = albums[record.relative_path]
            img_data = ImageData(
                path=os.path.join(meta["path"], record.image_file_name),
                album_name=meta["name"],
                file_name=record.image_file_name,
                created=record.creation_date,
                caption=captioner.caption(os.path.join(meta["path"], record.image_file_name))[0]["generated_text"],
            )

            if record.people_names:
                img_data.people_description = describe_people_in_scene(record.people_names.split(','))
            if record.lat and record.lon:
                geos = rev_geo_coder.find_nearby_place_name(
                    latitude=record.lat,
                    longitude=record.lon
                )
                img_data.geo_description = describe_geo_location(geos.get("geonames", []))

            metadata.append(img_data)

            if len(metadata) > 256:
                docs.add_images(metadata)
                metadata.clear()
        
        if len(metadata) > 0:
            docs.add_images(metadata)
            metadata.clear()

rev_geo_coder.teardown()

This will create a ChromaDB database within the directory set by MODEL_CACHE_DIR. The database can be queried like

docs.query("Three people wearing sunglasses swimming in a pool near Pacific Beach")

Searching

drawing

To start the search app simply run

python -m semantic_photos.app

This will start a Gradio app server at http://localhost:7860. This looks for a ChromaDB database located at MODEL_CACHE_DIR. You can override this location if your Chroma file is located elsewhere by running

python -m semantic_photos.app --chroma_path=<absolute_path>

Limitations

Currently this works with MacOS, and Digikam photo libraries. If you would like to enable support for other libraries please make a pull request.

semantic-photos's People

Contributors

brainsqueeze avatar

Watchers

 avatar Kostas Georgiou avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.