Giter VIP home page Giter VIP logo

deduplicator's Issues

[Bug] "Error: path contains invalid UTF-8 characters"

Version: deduplicator 0.2.1 compiled from git master branch.
System: Windows 10
Cmdline: deduplicator.exe --follow-links --json "c:/" > "c:\temp\deduplicator-report_C.txt"

Scannned an entire C drive but after running 2h30m gives an utf8 error. Error does not give detailed information about the filename and folder.

[00:21:59] 3232274 paths mapped   
[00:00:08] ###### 2692395/2692395 indexed files sizes  
[02:30:50] ###### 2613147/2613147 indexed files hashes   
Error: path contains invalid UTF-8 characters

Panic when scanning

The fact that there are a lot of unwrap's in the scanner code makes it so that the app panics when I run it against my home directory:

thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: Os { code: 2, kind: NotFound, message: "No such file or directory" }', [...]/deduplicator-0.0.3/src/scanner.rs:52:65
stack backtrace:
   0: rust_begin_unwind
             at /rustc/69f9c33d71c871fc16ac445211281c6e7a340943/library/std/src/panicking.rs:575:5
   1: core::panicking::panic_fmt
             at /rustc/69f9c33d71c871fc16ac445211281c6e7a340943/library/core/src/panicking.rs:65:14
   2: core::result::unwrap_failed
             at /rustc/69f9c33d71c871fc16ac445211281c6e7a340943/library/core/src/result.rs:1791:5
   3: <core::iter::adapters::map::Map<I,F> as core::iter::traits::iterator::Iterator>::try_fold
   4: <alloc::vec::Vec<T> as alloc::vec::spec_from_iter::SpecFromIter<T,I>>::from_iter
   5: rayon::iter::plumbing::Folder::consume_iter
   6: rayon::iter::plumbing::bridge_producer_consumer::helper
   7: <rayon::vec::IntoIter<T> as rayon::iter::IndexedParallelIterator>::with_producer
   8: rayon::iter::extend::<impl rayon::iter::ParallelExtend<T> for alloc::vec::Vec<T>>::par_extend
   9: deduplicator::scanner::duplicates
  10: deduplicator::app::App::init
  11: <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll
  12: tokio::runtime::park::CachedParkThread::block_on
  13: tokio::runtime::scheduler::multi_thread::MultiThread::block_on
  14: tokio::runtime::runtime::Runtime::block_on
  15: deduplicator::main

Faillible actions (especially against the filesystem) should be properly handled (most likely skipping the affected files during the scan).

[Feature] Add Unit Tests

Is your feature request related to a problem? Please describe.
In order to avoid fixed issues from resurfacing, unit tests need to be added for each mod.

Describe the solution you'd like
Add Tests for:

  • hashing
  • file scanning
  • finding duplicates by size
  • finding duplicates by hashes
  • printing output (integration tests)
  • interactive mode (integration tests)

Describe alternatives you've considered
N/A

Additional context
As the application is going through major changes, it's essential to make sure that it doesn't break.

Crash while searching

Probably issue in cyrillic characters in path

thread 'main' panicked at 'byte index 11 is not a char boundary; it is inside 'с' (bytes 10..12) of `/фото с iPhone/IMG_6740 1.JPG`', src/output.rs:12:14
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

[feature] minimum file size

Since this application might be useful to regain disk space, I think it could be interesting to have an option to ignore files smaller than a user-defined threshold.

[feature] to remove duplicates

I think its generally expected feature of this tool, it can be interactively like "select file to delete: 1, 2, 3" with some --interactive option or just auto delete first match (keep only last one) with --remove.

[Bug] compilation error

Describe the bug
When using the stable version of rust, cargo cannot build the application.

** Runtime Info **
Install Type: Install with cargo
App Version: [e.g. v1.0.6]

Expected behavior
Installing the application without error))

Screenshots
image

Platform Details (please complete the following information):

  • OS: RED OS
  • Terminal Emulator: mate-terminal
  • Shell ZSH

Additional context
Erroneous code example:

#[repr(u128)] // error: use of unstable library feature 'repr128'
enum Foo {
    Bar(u64),
}

If you're using a stable or a beta version of rustc, you won't be able to use
any unstable features. In order to do so, please switch to a nightly version of
rustc (by using rustup).

If you're using a nightly version of rustc, just add the corresponding feature
to be able to use it:

#![feature(repr128)]

#[repr(u128)] // ok!
enum Foo {
    Bar(u64),
}

Test Deduplicator on Windows OS

I don't own or have access to a windows machine. If anybody does have access to a windows machine, please help out by testing deduplicator on windows.

Required Information:

  1. Benchmarks - Speed & Memory Efficiency
  2. Bug Reports

Idea for better performance

Just had an idea of how to maybe improve performance. Instead of hashing all files, how about first differentiating them by using the file size (which is unlikely to be identical for two large files that are different) and only relying on the hash when they have the same size?

[Bug] Progress Bar for Globwalker

Describe the bug
While scanning deep directory trees, there's a small delay before the scanning files progress bar kicks in. For this period, deduplicator does not display any output. The goal is to add a ProgressBar::new_spinner() to make sure that the users can see that deduplicator is trying to traverse the directories to find files to process.

[Bug] Excessive Memory Consumption

Describe the bug
The memory usage while scanning a 127 GB directory of PDFs, Images & Videos shot up to 26 GiB from 4.8 GiB (Initial), causing the desktop manager (lightdm) to crash & restart.

Runtime Info
App Arguments: none
Install Type: cargo install
App Version: 0.0.8

Expected behavior
Reduced Memory Consumption.

Platform Details (please complete the following information):

  • OS: Arch Linux (Kernel: 5.15.86-1-lts)
  • Terminal Emulator: Kitty
  • Shell: Zshell

[Bug] Output Printing Slow

Describe the bug
After the scanning is complete, the app hangs for a second before printing the output. This is clearer with large directories.

** Runtime Info **
Install Type: [e.g. cargo install]
App Version: [e.g. v0.1.1]

Expected behavior
Printing should be fast

Platform Details (please complete the following information):

  • OS: Arch Linux
  • Terminal Emulator: Kitty
  • Shell: Bash

Error: unable to open database file (code 14)

OS: Windows 11 Enterprise
OS version: 22000.1335

When I try to run app, I get an error "Error: unable to open database file (code 14)"

Example:

cargo run --release -- --dir=test_data
warning: unused imports: `Frame`, `Rect`
 --> src\app\ui.rs:5:45
  |
5 |     layout::{Constraint, Direction, Layout, Rect},
  |                                             ^^^^
...
9 |     Frame, Terminal,
  |     ^^^^^
  |
  = note: `#[warn(unused_imports)]` on by default

warning: unused import: `std::thread`
  --> src\app\mod.rs:14:5
   |
14 | use std::thread;
   |     ^^^^^^^^^^^

warning: unused import: `std::time::Duration`
  --> src\app\mod.rs:15:5
   |
15 | use std::time::Duration;
   |     ^^^^^^^^^^^^^^^^^^^

warning: unused imports: `Block`, `Borders`, `Widget`
  --> src\app\mod.rs:18:15
   |
18 |     widgets::{Block, Borders, Widget},
   |               ^^^^^  ^^^^^^^  ^^^^^^

warning: unused import: `Backend`
 --> src\app\ui.rs:4:15
  |
4 |     backend::{Backend, CrosstermBackend},
  |               ^^^^^^^

warning: associated function `cleanup` is never used
  --> src\app\mod.rs:39:8
   |
39 |     fn cleanup(term: &mut Terminal<CrosstermBackend<io::Stdout>>) -> Result<()> {
   |        ^^^^^^^
   |
   = note: `#[warn(dead_code)]` on by default

warning: associated function `render_cycle` is never used
  --> src\app\mod.rs:51:8
   |
51 |     fn render_cycle(term: &mut Terminal<CrosstermBackend<io::Stdout>>) -> Result<()> {
   |        ^^^^^^^^^^^^

warning: associated function `init_render_loop` is never used
  --> src\app\mod.rs:58:8
   |
58 |     fn init_render_loop(term: &mut Terminal<CrosstermBackend<io::Stdout>>) -> Result<()> {
   |        ^^^^^^^^^^^^^^^^

warning: associated function `init_terminal` is never used
  --> src\app\mod.rs:69:8
   |
69 |     fn init_terminal() -> Result<Terminal<CrosstermBackend<io::Stdout>>> {
   |        ^^^^^^^^^^^^^

warning: struct `EventHandler` is never constructed
 --> src\app\event_handler.rs:7:12
  |
7 | pub struct EventHandler;
  |            ^^^^^^^^^^^^

warning: associated function `init` is never used
  --> src\app\event_handler.rs:10:12
   |
10 |     pub fn init() -> Result<events::Event> {
   |            ^^^^

warning: associated function `handle_keypress` is never used
  --> src\app\event_handler.rs:21:8
   |
21 |     fn handle_keypress(keyevent: KeyEvent) -> Result<events::Event> {
   |        ^^^^^^^^^^^^^^^

warning: enum `Event` is never used
 --> src\app\events.rs:1:10
  |
1 | pub enum Event {
  |          ^^^^^

warning: struct `Ui` is never constructed
  --> src\app\ui.rs:12:12
   |
12 | pub struct Ui;
   |            ^^

warning: associated function `generate_file_list` is never used
  --> src\app\ui.rs:15:8
   |
15 |     fn generate_file_list() -> impl Widget {
   |        ^^^^^^^^^^^^^^^^^^

warning: associated function `generate_info_bar` is never used
  --> src\app\ui.rs:27:8
   |
27 |     fn generate_info_bar() -> impl Widget {
   |        ^^^^^^^^^^^^^^^^^

warning: associated function `generate_file_desc` is never used
  --> src\app\ui.rs:31:8
   |
31 |     fn generate_file_desc() -> impl Widget {
   |        ^^^^^^^^^^^^^^^^^^

warning: associated function `render_frame` is never used
  --> src\app\ui.rs:35:12
   |
35 |     pub fn render_frame(term: &mut Terminal<CrosstermBackend<io::Stdout>>) -> Result<()> {
   |            ^^^^^^^^^^^^

warning: `deduplicator` (bin "deduplicator") generated 18 warnings
    Finished release [optimized] target(s) in 0.30s
     Running `target\release\deduplicator.exe --dir=test_data`
Error: unable to open database file (code 14)
error: process didn't exit successfully: `target\release\deduplicator.exe --dir=test_data` (exit code: 1)

I tried to install app via cargo, also I tried to build app from source code, but I got the same error.

How can I fix it?

[Feature] Add Pre-Built Binary Download

Is your feature request related to a problem? Please describe.
Currently, deduplicator is only installable via cargo (rust's build tool). Need to make pre-built binary download options available to make deduplicator easily accessible to more people.

Describe the solution you'd like
Create workflows to cross compile binaries for the following platforms

  • x86_64
  • AArch64

Describe alternatives you've considered

  • Distributing through linux package repositories (plans for the future)

Additional context
N/A

[Bug] --dir autocomplete not working on zsh

Describe the bug
#29 added path autocomplete for --dir option. The autocomplete works for bash but not zsh.

** Runtime Info **
App Arguments: --dir
Install Type: cargo install && cargo run
App Version: 0.1.1

Expected behavior
Autocomplete should work for --dir on zsh & other shells

Platform Details (please complete the following information):

  • OS: Arch Linux
  • Terminal Emulator: Kitty
  • Shell: Zsh

[Feature] Add Flag to Exclude Filetypes

Is your feature request related to a problem? Please describe.
If I don't want to exclude duplicates of a single file type, It's very difficult to do it.

Describe the solution you'd like
Add a --exclude-types / -x to exclude certain file types from being scanned by deduplicator

[Feature] Mass Processing Options --keep-latest --keep-oldest

Is your feature request related to a problem? Please describe.
The Interactive mode allows the deletion of files one duplicate group at a time. When working with millions of files, this can be tedious.

Describe the solution you'd like
In order to automate this process by using deduplicator in scripts, adding options like --keep-latest --keep-oldest can help.

Describe alternatives you've considered
adding custom config files that can parse a DSL to decide which files to keep [idea for the future]

[Feature] Adding new flag to show full file path

** Is your request for an opportunity linked to a problem? Please describe it.
I have a clear and deep folder structure and some files have long names, it is also important for me to see where the duplicates are.

image

Describe the solution you would like.
So I would suggest adding a new flag that shows the full path to the files.

Moreover, I use a laptop with FullHD and some links are too long and can't be shown properly. So I think it would be a good idea to add the ability to move the line to the next line.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.