Giter VIP home page Giter VIP logo

Comments (3)

afong3 avatar afong3 commented on August 13, 2024 1

@gustavo-alberto

Interesting idea with MongoDB - I'm curious how it'll turn out.

I went ahead with the np.memmaps and have been able to successfully stitch 29GB of overlapping images (2,500 tiff images) and wrote a 15GB file. The resulting image has a shape of 62328, 79244, 3. My machine has a 13th Gen i9 processor and 16GB RAM. Alignment took approximately 18 minutes in a Jupyter notebook.

I edited the base Tile class which would I would need to make more elegant before sending a PR. I also have changed some of the logic necessary to use np.memmaps such as file saving by chunks. I won't have time in the next few days to make any clean code changes to send for a PR but eventually I likely will.

I'm not sure how this would act with as many images as you're hoping for but this does prove your assumptions that you can stitch and save a mosaic which sums to more data than you have RAM.

from stitch2d.

afong3 avatar afong3 commented on August 13, 2024

Memory management is also critical for my use case. My hopes is to be able to make a mosaic file size which is greater than what is normal to be had in RAM. I've done some preliminary testing saving Big TIFF files externally from Stitch2d by the use of np.memmaps. From what I understand, np.memmaps should allow for all of the feature matching and other calculations to remain the same so long as the images are loaded into this structure.

I'm currently in the process of making a new Tile class to store data from a TIFF with np.memmaps instead of np.ndarray. I imagine the saving of the stitched mosaic might have some tricky element to it but I will report back. Please let me know if you make any parallel progress.

from stitch2d.

gustavo-alberto avatar gustavo-alberto commented on August 13, 2024

@afong3

I didn't know memmaps. I've been thinking about the possibility of working with non-relational databases like MongoDB. MongoDB supports concurrent read/write operations with multiprocessing, effectively avoiding file concurrency issues. The idea is to store tile-related information (matching points, descriptors, etc.) in MongoDB rather than holding everything in memory during runtime.

Proposed Process Structure:

  1. Database Structuring:

    • Image Collection: Store information such as:
      • Unique image identifier
      • Image path or URL
      • Image position in the mosaic
      • Matching points (calculated with OpenCV)
      • Other relevant attributes (metadata, processing results, etc.)
  2. Image Loading and Unloading:

    • Image Buffer: Implement an image buffer to load a subset of images into memory. As processing of a subset is completed, unload it from the buffer and load the next subset.
    • Lazy Loading: Use lazy loading techniques to fetch data from the database only when necessary.
  3. Gradual Processing:

    • Block Division: Divide the mosaic into smaller blocks that can be processed individually. Each block can be loaded, processed, and unloaded independently.
    • Parallelization: Leverage parallelization to process multiple blocks simultaneously, depending on hardware capabilities.
  4. State Maintenance and Checkpoints:

    • Checkpoints: Implement checkpoints to save intermediate processing states in the database. This allows resuming the process from the last checkpoint in case of failures.
    • Progress Tracking: Maintain a progress log in the database for monitoring and restarting the process if needed.

My hardware setup includes an i5 14th generation CPU, 32GB DDR5 RAM, and an RTX 3050 GPU. Despite having sufficient resources, memory usage hits 100% due to attempting to load all images into memory. My test dataset comprises approximately 8000 images, each 1600x1600 pixels.

Cluster Processing:

To optimize memory usage, I propose processing a fixed number of images per cluster (e.g., 9 images per cluster). Each cluster can be independently processed by the processor. Limiting the number of clusters processed concurrently will help manage memory usage. Additionally, utilizing the GPU for parallel processing could significantly speed up the process.

Challenges:

  1. Blank Images: Currently, blank images without matching points cause errors in processing. I suggest a brief preprocessing step to identify blank images and approximate their placement in the mosaic based on known x and y positions for filling the area.

  2. Manual Cluster Handling: When manually handling clusters (assembling two sub-mosaics and then merging them), the script fails, possibly due to differing sub-mosaic sizes. This needs to be addressed.

The main challenge lies in structuring everything and designing classes and methods for each part of the process, following certain design patterns.

from stitch2d.

Related Issues (9)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.