Comments (4)
I rechecked how exactly this works and I don't really see a big problem in algorithm(at least with disabled delete outdated result option).
- Valid files are collected
- Cache file is loaded with all entries
- if option to remove outdated files is enabled, then cache files that not exists are removed
- Returned is structure with all entries
- Structure split into two parts:
- already cached files - files that have same modification date, size and path like tested files
- non cached files - files that changed or that were not cached
- non cached files are checked and at after operation are connected with cached, and saved to file
Few things that can be improved(I'm working on it):
- multithreading checking for outdated files
- code is duplicated in several parts
Additionally, czkawka stats each file in the cache even if --image_delete_outdated_cache_entries is false in ~/.config/czkawka/czkawka_gui_config_4.txt
this looks like bug, but with current master branch I cannot reproduce problem, can you reproduce this with e.g. #1072 (should use new cache files format)?
from czkawka.
With #1064, loading from ssd ~100000 cached results(in duplicate files mode) with testing if all files exists takes less than second, so not sure what exactly is/was a problem(I'm talking about 12 hours of cache processing.
RUST_LOG=debug ./czkawka
- should provide more info about timings, if issue will still persist.
from czkawka.
I don't really see a big problem in algorithm
I guess my main point in the prior message is that the algorithm should probably pre-filter the list of files via directory prefix before doing IO. It is much, much faster for the CPU to do a substring check than for the CPU to ask IO about a specific file. It really makes a big difference, especially in my specific case where the cache has many files outside of the directory that was specified for a specific run.
To put it into context, the user specifies a list of directories -d /dir1/ -d /dir2/
. If the files from the cache are neither in /dir1/ nor /dir2/ it seems unnecessary for czkawka to do IO lookups outside of those two directories.
But yes, I see you have spent a lot of time thinking about it in #1072 so I will try it out and see if I still experience the above issue.
Thank you for looking into this
from czkawka.
After building from loading_saving e9765e1
cargo run --release --bin czkawka_cli image -d /home/xk/d/98_Me
it runs in only a few seconds so I think you have fixed the issue :)
from czkawka.
Related Issues (20)
- Finding duplicate Directory Structures HOT 1
- Reference Path Image Preview on Krokiet v7.0.0 (Windows)
- Slow Directory Browsing on Krokiet v7.0.0 (Windows)
- Bulk rename and sort by date
- How to ignore a group in subsequent searches?
- Additional Info Columns for Similar/Duplicate Videos | Resolution, Codec, Bitrate HOT 1
- Delete Only Files and not Groups?
- FInd Not duplicated files HOT 2
- linux_czkawka_gui crashes if Sort is selected with any option
- better handle hard link.s HOT 7
- Remote server functionality HOT 2
- The next step for this software
- Crashes when manually adding a directory
- how to support avif format?
- sort options
- Window Theme Options? HOT 1
- Why not use flutter and rust?
- Warning when compiling czkawka_core: struct `Hamming` is never constructed
- Crash when pasting a path in custom select > Path
- I want to compile a krokiet that uses translations from other languages. How should I do it?
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from czkawka.