Giter VIP home page Giter VIP logo

Comments (4)

qarmin avatar qarmin commented on June 2, 2024 1

I rechecked how exactly this works and I don't really see a big problem in algorithm(at least with disabled delete outdated result option).

  • Valid files are collected
  • Cache file is loaded with all entries
  • if option to remove outdated files is enabled, then cache files that not exists are removed
  • Returned is structure with all entries
  • Structure split into two parts:
    • already cached files - files that have same modification date, size and path like tested files
    • non cached files - files that changed or that were not cached
  • non cached files are checked and at after operation are connected with cached, and saved to file

Few things that can be improved(I'm working on it):

  • multithreading checking for outdated files
  • code is duplicated in several parts

Additionally, czkawka stats each file in the cache even if --image_delete_outdated_cache_entries is false in ~/.config/czkawka/czkawka_gui_config_4.txt

this looks like bug, but with current master branch I cannot reproduce problem, can you reproduce this with e.g. #1072 (should use new cache files format)?

from czkawka.

qarmin avatar qarmin commented on June 2, 2024 1

With #1064, loading from ssd ~100000 cached results(in duplicate files mode) with testing if all files exists takes less than second, so not sure what exactly is/was a problem(I'm talking about 12 hours of cache processing.

RUST_LOG=debug ./czkawka - should provide more info about timings, if issue will still persist.

from czkawka.

chapmanjacobd avatar chapmanjacobd commented on June 2, 2024

I don't really see a big problem in algorithm

I guess my main point in the prior message is that the algorithm should probably pre-filter the list of files via directory prefix before doing IO. It is much, much faster for the CPU to do a substring check than for the CPU to ask IO about a specific file. It really makes a big difference, especially in my specific case where the cache has many files outside of the directory that was specified for a specific run.

To put it into context, the user specifies a list of directories -d /dir1/ -d /dir2/. If the files from the cache are neither in /dir1/ nor /dir2/ it seems unnecessary for czkawka to do IO lookups outside of those two directories.

But yes, I see you have spent a lot of time thinking about it in #1072 so I will try it out and see if I still experience the above issue.

Thank you for looking into this

from czkawka.

chapmanjacobd avatar chapmanjacobd commented on June 2, 2024

After building from loading_saving e9765e1

cargo run --release --bin czkawka_cli image -d /home/xk/d/98_Me

it runs in only a few seconds so I think you have fixed the issue :)

from czkawka.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.