sreedevk / deduplicator Goto Github PK
View Code? Open in Web Editor NEWFilter, Sort & Delete Duplicate Files Recursively
License: MIT License
Filter, Sort & Delete Duplicate Files Recursively
License: MIT License
Is your feature request related to a problem? Please describe.
Currently, there is no fine control over the directory scanning process. Using the Globwalk Builder, we can add options like --max-depth & --follow-links to offer more control over the directory scanning process.
Describe the solution you'd like
https://docs.rs/globwalk/latest/globwalk/#advanced-globbing
cc: @beeb adding this issue here to discuss options
Version: deduplicator 0.2.1
compiled from git master branch.
System: Windows 10
Cmdline: deduplicator.exe --follow-links --json "c:/" > "c:\temp\deduplicator-report_C.txt"
Scannned an entire C drive but after running 2h30m gives an utf8 error. Error does not give detailed information about the filename and folder.
[00:21:59] 3232274 paths mapped
[00:00:08] ###### 2692395/2692395 indexed files sizes
[02:30:50] ###### 2613147/2613147 indexed files hashes
Error: path contains invalid UTF-8 characters
The fact that there are a lot of unwrap's in the scanner code makes it so that the app panics when I run it against my home directory:
thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: Os { code: 2, kind: NotFound, message: "No such file or directory" }', [...]/deduplicator-0.0.3/src/scanner.rs:52:65
stack backtrace:
0: rust_begin_unwind
at /rustc/69f9c33d71c871fc16ac445211281c6e7a340943/library/std/src/panicking.rs:575:5
1: core::panicking::panic_fmt
at /rustc/69f9c33d71c871fc16ac445211281c6e7a340943/library/core/src/panicking.rs:65:14
2: core::result::unwrap_failed
at /rustc/69f9c33d71c871fc16ac445211281c6e7a340943/library/core/src/result.rs:1791:5
3: <core::iter::adapters::map::Map<I,F> as core::iter::traits::iterator::Iterator>::try_fold
4: <alloc::vec::Vec<T> as alloc::vec::spec_from_iter::SpecFromIter<T,I>>::from_iter
5: rayon::iter::plumbing::Folder::consume_iter
6: rayon::iter::plumbing::bridge_producer_consumer::helper
7: <rayon::vec::IntoIter<T> as rayon::iter::IndexedParallelIterator>::with_producer
8: rayon::iter::extend::<impl rayon::iter::ParallelExtend<T> for alloc::vec::Vec<T>>::par_extend
9: deduplicator::scanner::duplicates
10: deduplicator::app::App::init
11: <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll
12: tokio::runtime::park::CachedParkThread::block_on
13: tokio::runtime::scheduler::multi_thread::MultiThread::block_on
14: tokio::runtime::runtime::Runtime::block_on
15: deduplicator::main
Faillible actions (especially against the filesystem) should be properly handled (most likely skipping the affected files during the scan).
Is your feature request related to a problem? Please describe.
In order to avoid fixed issues from resurfacing, unit tests need to be added for each mod.
Describe the solution you'd like
Add Tests for:
Describe alternatives you've considered
N/A
Additional context
As the application is going through major changes, it's essential to make sure that it doesn't break.
Probably issue in cyrillic characters in path
thread 'main' panicked at 'byte index 11 is not a char boundary; it is inside 'с' (bytes 10..12) of `/фото с iPhone/IMG_6740 1.JPG`', src/output.rs:12:14
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
I installed clap 4.0.32 from https://github.com/clap-rs/clap but its not help
Since this application might be useful to regain disk space, I think it could be interesting to have an option to ignore files smaller than a user-defined threshold.
I think its generally expected feature of this tool, it can be interactively like "select file to delete: 1, 2, 3" with some --interactive
option or just auto delete first match (keep only last one) with --remove
.
Describe the bug
When using the stable version of rust, cargo cannot build the application.
** Runtime Info **
Install Type: Install with cargo
App Version: [e.g. v1.0.6]
Expected behavior
Installing the application without error))
Platform Details (please complete the following information):
Additional context
Erroneous code example:
#[repr(u128)] // error: use of unstable library feature 'repr128'
enum Foo {
Bar(u64),
}
If you're using a stable or a beta version of rustc, you won't be able to use
any unstable features. In order to do so, please switch to a nightly version of
rustc (by using rustup).
If you're using a nightly version of rustc, just add the corresponding feature
to be able to use it:
#![feature(repr128)]
#[repr(u128)] // ok!
enum Foo {
Bar(u64),
}
I don't own or have access to a windows machine. If anybody does have access to a windows machine, please help out by testing deduplicator on windows.
Required Information:
Is it possible, instead of deleting duplicates, to replace them with hard links or symbolic links? as in the Duplicate Searcher program http://malich.ru/duplicate_searcher
Just had an idea of how to maybe improve performance. Instead of hashing all files, how about first differentiating them by using the file size (which is unlikely to be identical for two large files that are different) and only relying on the hash when they have the same size?
Describe the bug
While scanning deep directory trees, there's a small delay before the scanning files progress bar kicks in. For this period, deduplicator does not display any output. The goal is to add a ProgressBar::new_spinner() to make sure that the users can see that deduplicator is trying to traverse the directories to find files to process.
Describe the bug
The memory usage while scanning a 127 GB directory of PDFs, Images & Videos shot up to 26 GiB from 4.8 GiB (Initial), causing the desktop manager (lightdm) to crash & restart.
Runtime Info
App Arguments: none
Install Type: cargo install
App Version: 0.0.8
Expected behavior
Reduced Memory Consumption.
Platform Details (please complete the following information):
Describe the bug
After the scanning is complete, the app hangs for a second before printing the output. This is clearer with large directories.
** Runtime Info **
Install Type: [e.g. cargo install
]
App Version: [e.g. v0.1.1]
Expected behavior
Printing should be fast
Platform Details (please complete the following information):
OS: Windows 11 Enterprise
OS version: 22000.1335
When I try to run app, I get an error "Error: unable to open database file (code 14)"
Example:
cargo run --release -- --dir=test_data
warning: unused imports: `Frame`, `Rect`
--> src\app\ui.rs:5:45
|
5 | layout::{Constraint, Direction, Layout, Rect},
| ^^^^
...
9 | Frame, Terminal,
| ^^^^^
|
= note: `#[warn(unused_imports)]` on by default
warning: unused import: `std::thread`
--> src\app\mod.rs:14:5
|
14 | use std::thread;
| ^^^^^^^^^^^
warning: unused import: `std::time::Duration`
--> src\app\mod.rs:15:5
|
15 | use std::time::Duration;
| ^^^^^^^^^^^^^^^^^^^
warning: unused imports: `Block`, `Borders`, `Widget`
--> src\app\mod.rs:18:15
|
18 | widgets::{Block, Borders, Widget},
| ^^^^^ ^^^^^^^ ^^^^^^
warning: unused import: `Backend`
--> src\app\ui.rs:4:15
|
4 | backend::{Backend, CrosstermBackend},
| ^^^^^^^
warning: associated function `cleanup` is never used
--> src\app\mod.rs:39:8
|
39 | fn cleanup(term: &mut Terminal<CrosstermBackend<io::Stdout>>) -> Result<()> {
| ^^^^^^^
|
= note: `#[warn(dead_code)]` on by default
warning: associated function `render_cycle` is never used
--> src\app\mod.rs:51:8
|
51 | fn render_cycle(term: &mut Terminal<CrosstermBackend<io::Stdout>>) -> Result<()> {
| ^^^^^^^^^^^^
warning: associated function `init_render_loop` is never used
--> src\app\mod.rs:58:8
|
58 | fn init_render_loop(term: &mut Terminal<CrosstermBackend<io::Stdout>>) -> Result<()> {
| ^^^^^^^^^^^^^^^^
warning: associated function `init_terminal` is never used
--> src\app\mod.rs:69:8
|
69 | fn init_terminal() -> Result<Terminal<CrosstermBackend<io::Stdout>>> {
| ^^^^^^^^^^^^^
warning: struct `EventHandler` is never constructed
--> src\app\event_handler.rs:7:12
|
7 | pub struct EventHandler;
| ^^^^^^^^^^^^
warning: associated function `init` is never used
--> src\app\event_handler.rs:10:12
|
10 | pub fn init() -> Result<events::Event> {
| ^^^^
warning: associated function `handle_keypress` is never used
--> src\app\event_handler.rs:21:8
|
21 | fn handle_keypress(keyevent: KeyEvent) -> Result<events::Event> {
| ^^^^^^^^^^^^^^^
warning: enum `Event` is never used
--> src\app\events.rs:1:10
|
1 | pub enum Event {
| ^^^^^
warning: struct `Ui` is never constructed
--> src\app\ui.rs:12:12
|
12 | pub struct Ui;
| ^^
warning: associated function `generate_file_list` is never used
--> src\app\ui.rs:15:8
|
15 | fn generate_file_list() -> impl Widget {
| ^^^^^^^^^^^^^^^^^^
warning: associated function `generate_info_bar` is never used
--> src\app\ui.rs:27:8
|
27 | fn generate_info_bar() -> impl Widget {
| ^^^^^^^^^^^^^^^^^
warning: associated function `generate_file_desc` is never used
--> src\app\ui.rs:31:8
|
31 | fn generate_file_desc() -> impl Widget {
| ^^^^^^^^^^^^^^^^^^
warning: associated function `render_frame` is never used
--> src\app\ui.rs:35:12
|
35 | pub fn render_frame(term: &mut Terminal<CrosstermBackend<io::Stdout>>) -> Result<()> {
| ^^^^^^^^^^^^
warning: `deduplicator` (bin "deduplicator") generated 18 warnings
Finished release [optimized] target(s) in 0.30s
Running `target\release\deduplicator.exe --dir=test_data`
Error: unable to open database file (code 14)
error: process didn't exit successfully: `target\release\deduplicator.exe --dir=test_data` (exit code: 1)
I tried to install app via cargo, also I tried to build app from source code, but I got the same error.
How can I fix it?
Is your feature request related to a problem? Please describe.
Currently, deduplicator is only installable via cargo (rust's build tool). Need to make pre-built binary download options available to make deduplicator easily accessible to more people.
Describe the solution you'd like
Create workflows to cross compile binaries for the following platforms
Describe alternatives you've considered
Additional context
N/A
Describe the bug
#29 added path autocomplete for --dir option. The autocomplete works for bash but not zsh.
** Runtime Info **
App Arguments: --dir
Install Type: cargo install && cargo run
App Version: 0.1.1
Expected behavior
Autocomplete should work for --dir on zsh & other shells
Platform Details (please complete the following information):
Is your feature request related to a problem? Please describe.
If I don't want to exclude duplicates of a single file type, It's very difficult to do it.
Describe the solution you'd like
Add a --exclude-types
/ -x
to exclude certain file types from being scanned by deduplicator
Is your feature request related to a problem? Please describe.
The Interactive mode allows the deletion of files one duplicate group at a time. When working with millions of files, this can be tedious.
Describe the solution you'd like
In order to automate this process by using deduplicator in scripts, adding options like --keep-latest --keep-oldest can help.
Describe alternatives you've considered
adding custom config files that can parse a DSL to decide which files to keep [idea for the future]
** Is your request for an opportunity linked to a problem? Please describe it.
I have a clear and deep folder structure and some files have long names, it is also important for me to see where the duplicates are.
Describe the solution you would like.
So I would suggest adding a new flag that shows the full path to the files.
Moreover, I use a laptop with FullHD and some links are too long and can't be shown properly. So I think it would be a good idea to add the ability to move the line to the next line.
Is your feature request related to a problem? Please describe.
deduplicator --dir ~/Media
vs
deduplicator ~/Media
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.