evrignaud / fim Goto Github PK

View Code? Open in Web Editor NEW

122.0 9.0 16.0 5.07 MB

File Integrity Manager -

Home Page: https://evrignaud.github.io/fim

License: GNU General Public License v3.0

Shell 2.18% Java 96.57% Batchfile 0.60% CSS 0.65%

integrity hash sha-512 selinux deduplicate linux windows files hardware-corruption-detection docker-image

fim's People

Contributors

Stargazers

Watchers

Forkers

yoimbert flopraden jokerland jeremie-lesage darkmatter26 drautureau raiden9 bruno-a ptemplier msmtmsmt123 dr-chaos tool-recommender-bot crasm image-et tianjiayue olivierpaul

fim's Issues

Deduplicate files not only based on hash comparison

You need to compare first file size in order to decrease hash collision.

Not possible to run or install fim-1.2.3 in new system

When using the 1.2.3-distribution,

$ /usr/local/bin/fim/fim fdup
WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by com.rits.cloning.Cloner (file:/usr/local/bin/fim/bin/fim-1.2.3.jar) to field java.util.TreeSet.m
WARNING: Please consider reporting this to the maintainers of com.rits.cloning.Cloner
WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
WARNING: All illegal access operations will be denied in a future release
Exception in thread "main" java.lang.UnsupportedOperationException: Path not associated with default file system.
        at java.base/java.nio.file.Path.toFile(Path.java:771)
        at org.fim.internal.SettingsManager.load(SettingsManager.java:62)
        at org.fim.internal.SettingsManager.<init>(SettingsManager.java:49)
        at org.fim.command.AbstractCommand.checkHashMode(AbstractCommand.java:51)
        at org.fim.command.FindDuplicatesCommand.execute(FindDuplicatesCommand.java:48)
        at org.fim.Fim.run(Fim.java:258)
        at org.fim.Fim.main(Fim.java:75)

When trying to install from sources:

[INFO] Copying file images/hands-on-little.png
Jul 09, 2020 9:15:02 PM org.asciidoctor.internal.JRubyAsciidoctor renderFile
SEVERE: (RuntimeError) asciidoctor: FAILED: /home/lupe/recolhidos/fim-1.2.3/src/main/asciidoc/slides/fr.adoc: Failed to load AsciiDoc document - Could not find a converter to handle transform: inline_quoted
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 6.690 s
[INFO] Finished at: 2020-07-09T21:15:02+01:00
[INFO] Final Memory: 49M/118M
[INFO] ------------------------------------------------------------------------
[ERROR] Failed to execute goal org.asciidoctor:asciidoctor-maven-plugin:1.5.5:process-asciidoc (generate-slides) on project fim: Execution generate-slides of goal org.asciidoctor:asciidoctor-maven-plugin:1.5.5:process-asciidoc failed: org.jruby.exceptions.RaiseException: (RuntimeError) asciidoctor: FAILED: /home/lupe/recolhidos/fim-1.2.3/src/main/asciidoc/slides/fr.adoc: Failed to load AsciiDoc document - Could not find a converter to handle transform: inline_quoted -> [Help 1]
[ERROR]
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch. [ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR]
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/PluginExecutionException

As I don't know Java or maven, I'd appreciate some step-by-step help on how to solve this issue.
I'm using openjdk11 on Linux 64 bits.

build: fimDoesNotExist(org.fim.FimTest): Unexpected exception

Hello,

I wanted to build Fim myself, not than I can maintain a fork but in order to use a build which resolves Issue #9.
Anyway, it doesn't work and I don't know what can be wrong.
Thank you for your help!

Here is the end of the output but I can send more informations.

- State #1: 2017/06/03 21:57:28 (10 files - 121,4 MB - generated using hash mode hashAll)
	Comment: Using hash mode hashAll

Added:            file01
Added:            file02
Added:            file03
Added:            file04
Added:            file05
Added:            file06
Added:            file07
Added:            file08
Added:            file09
Added:            file10

10 added

- State #2: 2017/06/03 21:57:50 (20 files - 230,3 MB - generated using hash mode hashSmallBlock)
	Comment: Using hash mode hashSmallBlock

Added:            .fimignore
Added:            dir01/.fimignore
Added:            dir01/ignored_type1
Added:            file12
Added:            ignored_type2
Added:            media.mp4
Copied:           file11 	(was file05)
Duplicated:       file03.dup1 = file03 
Duplicated:       file03.dup2 = file03 
Duplicated:       file07.dup1 = file07 
Date modified:    file02 	last modified: 2017/06/03 21:57:27 -> 2017/06/03 21:57:30
Content modified: file04 	last modified: 2017/06/03 21:57:27 -> 2017/06/03 21:57:29
Content modified: file05 	last modified: 2017/06/03 21:57:28 -> 2017/06/03 21:57:30
Renamed:          file01 -> dir01/file01 
Deleted:          file06

6 added, 1 copied, 3 duplicated, 1 date modified, 2 content modified, 1 renamed, 1 deleted

- State #3: 2017/06/03 21:57:53 (20 files - 234,3 MB - generated from dir01 using hash mode hashSmallBlock)
	Comment: Using hash mode hashSmallBlock

Added:            dir01/file15

1 added

- State #4: 2017/06/03 21:57:54 (22 files - 267,3 MB - generated using hash mode hashSmallBlock)
	Comment: From from sub directory dir01

Added:            file13
Added:            file14

2 added

2017/06/03 21:57:57 - Info  - Searching for duplicate files

2017/06/03 21:57:57 - Info  - Scanning recursively local files, using 'full' mode and automatic scaling
(Hash progress legend for files grouped 10 by 10: # > 1 GB, @ > 200 MB, O > 100 MB, 8 > 50 MB, o > 20 MB, . otherwise)
OO
2017/06/03 21:57:59 - Info  - Scanned 22 files (267,3 MB), using 2 threads, hashed 267,3 MB (avg 267,3 MB/s), during 00:00:01

- Duplicate set #1: duplicated 2 times, 11,4 MB each, 22,8 MB of wasted space
      file03
  [-] file03.dup1
  [-] file03.dup2

- Duplicate set #2: duplicated 1 time, 12,6 MB each, 12,6 MB of wasted space
      file07
  [-] file07.dup1

Removed 3 files and freed 35,4 MB
No duplicate file remains
Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 61.791 sec

Results :

Tests in error: 
  fimDoesNotExist(org.fim.FimTest): Unexpected exception, expected<org.fim.command.exception.BadFimUsageException> but was<org.fim.command.exception.RepositoryException>

Tests run: 249, Failures: 0, Errors: 1, Skipped: 6

[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 01:50 min
[INFO] Finished at: 2017-06-03T21:57:59+02:00
[INFO] Final Memory: 69M/580M
[INFO] ------------------------------------------------------------------------
[ERROR] Failed to execute goal org.apache.maven.plugins:maven-surefire-plugin:2.9:test (default-test) on project fim: There are test failures.
[ERROR] 
[...]

Provide a new starting shell script for Mac OS X

As "readlink" command is not GNU for this platforme, we have to replace it with "greadlink".
The content of the script should be:

currentDir=`pwd`

baseDir=`dirname "$(greadlink -f "$0")"`

JAVA_OPTIONS="-Xmx4g -XX:MaxMetaspaceSize=4g"

JAR_FILE=`ls -1 "${baseDir}/target"/fim-*.jar | grep -v sources`

java ${JAVA_OPTIONS} -jar "${JAR_FILE}" "${1}" "${2}" "${3}" "${4}" "${5}" "${6}"

fim and metadata changes

I use to manage my image collection with Digikam. And I often modify image metadatas:

because my taxonomy evolves
because of the progresses of face recognition
and so on.

Would it be possible to let fim hashes the part of the image file which is the image and the not part representing the metadata? This request is relevant for many file formats such as jpeg, tif, mp3, etc.

I've got no idea if it's possible or not but I think some file format probably have a part reserved for content and another one for metadata.

Thanks for the good job.

Provide a CSV output for duplicates

Hello,

Currently, the text output of the fdup sub-command is not easily usable by other tools. It would be useful to provide other formats using a dedicated switch (eg --output-type csv). CSV is a good candidate. Columns could be:

set #
# of duplication inside that set
size of the duplicated file
total size of wasted space in the set
full path of the file
directory where the file is located
filename (including file extension)
filetype (= its extension)

Thank you.

fim rdup -y ; Ask me to choose the file I want to keep

I tried under linux and windows and the remove duplicates command "fim rdup -y" ask me which file I want to remove despite the --always-yes option. Is it normal ? Did I misunderstood the documentation ?

Check the second block

To ensure that the headers don't increase the collision probability when doing a rapid check, it would be more efficient to check for the second 1 MB / 4kB block in the file.

Use space is wrong in "fim log"

Hi,

I remove several files. But the use space is not update in the commit.

State 1 250 MB
State 2 252 MB
State 3 163 MB

But, I remove files in State 2, the use space should be updated in state 2 not in state 3.

λ fim log -q
- State #1: 2018/05/03 22:34:38 (1801 files - 250.3 MB - generated using hash mode hashMediumBlock)

1801 added

- State #2: 2018/05/03 23:20:01 (1950 files - 252.6 MB - generated using hash mode hashAll)

149 added, 1768 content modified, 33 deleted

- State #3: 2018/05/05 21:47:20 (1935 files - 163.5 MB - generated using hash mode hashMediumBlock)

18 added, 2 content modified, 2 deleted

Note: the two deleted files in state 3 are small.

.fimignore ignored with last option (-l)

hello
maybe a little bug?
with command: fim fdup -l .fimignore isn't take in account
with command: fim fdup .fimignore is used...
and a little difficulty with windows:
To create .fimignore , I have to enter ".fimgnore.{espace}"
It take me some times to find that..
is it possible to access directly the json database??
thank you , very very useful program..

git push/pull kind of interface

I like your idea and it's a very well cared repo with a good Readme section - thank you evrignaud for your work.

I would like to see a git push/pull kind of interface inside fim, in order to synchronise between repos.

I'm facing this problem:

I have 3 HD: A, B and C
The content of A and B is backed up on C
I want to keep C synchronised, being sure about data integrity - with 1/2 commands possibly!

It would be cool to have a git push/pull kind of interface.
Maybe, a UUID or network path could be used as an identifier:
On backup:

fim remote add workspace uuid://13152fae-d25a-4d78-b318-74397eb08184

On the workstation:

fim remote add backup smb://home-nas/backup/A

And then run:

fim push backup
fim pull origin

How this could work?

fim remote COMMAND [ALIAS] [LOCATION]

COMMAND
add [ALIAS] [LOCATION]
It will check location for a .fim folder
If so use it, otherwise initialise a new work space

delete [ALIAS]
Delete saved locations

set [ALIAS] [LOCATION]
Change location of ALIAS

ALIAS
An alias name for a repo

LOCATION
ftp://, smb://, etc for network location
uuid:// for a disk location (generally, uuid for linux fs, sn for ntfs)
why use uuid? because drives can be mounted in different locations
using the mount point will create the need for the drive to be in the correct position each time

fim remote add backup smb://home-nas/backup/A

fim push/pull [SOURCE][/SUBDIR] [ORIGIN][/SUBDIR]

push/pull will:

load the manifest/log of the two locations
check that the two locations are synchronised, with the source of the operation ahead of any given commit in respect with the destination
update the destination with data from source, passing the whole commit history plus only the files changed
if the two locations diverged in different branch, die informing the user about the error (it could be a list of the different files i.e.)

push and pull are are the same command, although they work in different directions.
push from SOURCE to ORIGIN
pull from ORIGIN to SOURCE

SOURCE and ORIGIN are ALIAS
If SOURCE is omitted, it will default to the current work space
ORIGIN cannot be omitted <-- different behaviour than Git, but because there is no rollback possibilities, ensure user actually wants to make that operation

SUBDIR
It can be used to filter the operations on a given subposition in the repo

fim pull server/subfolder1

If omitted will mean current position in respect of repo root

Other use case:

A media production factory as server on smb://server/media-root

A new employee, Mr. Smith, start working with the company.
He want to get a subfolder of media-root into his local folder such as %USERPROFILE%\My Documents\WorkingDir or ~/Documents/WorkingDir:

Smith ~/Documents/WorkingDir $ fim init
Smith ~/Documents/WorkingDir $ fim remote add server smb://server/media-root
Smith ~/Documents/WorkingDir $ fim pull server/subfolder1

Mr Smith goes to his assignment

Smith ~/Documents/WorkingDir $ cd subfolder1

When finished he will commit his work:

Smith ~/Documents/WorkingDir/subfolder1 $ fim ci -m "My job done"

Check his work will not conflict with others

Smith ~/Documents/WorkingDir/subfolder1 $ fim pull server

Because of the omitting system, these actually translates to
fim pull localrepo/subfolder1 server/subfolder1

Push the data to the server

Smith ~/Documents/WorkingDir/subfolder1 $ fim push server

And it is really cool now because fim has taken care that everything is in place and will not conflict with Mr Smiths colleagues work!
At the same Mr Smith office will know what files Mr Smith has used and why!

How to install it from `git clone`?

Hi,

I don't know the java ecosystem. Did a git clone for this repository. When tried to run fim:

$ ./fim
ls: cannot access '/wd0/home/lpvm/recol/fim/target/fim-*.jar': No such file or directory
Error: Unable to access jarfile

$ which java
/sbin/java

$ java -version
openjdk version "1.8.0_121"
OpenJDK Runtime Environment (build 1.8.0_121-b13)
OpenJDK 64-Bit Server VM (build 25.121-b13, mixed mode)

What should I do?

Ignore sub-folder in the root .fimignore

Hello,

it would be great if you could be able to ignore a sub-folder directly in the .fimignore at the root of the repository/directory you are monitoring.

Currently, to ignore, let's say foo/sub/ you must create a .fimignore file in foo/ and write sub inside. Don't know if it's possible to add the possibility to add foo/sub/ in the "root" .fimignore (and with that you can keep only one ignore file by repository) ?

Thanks for reading,
N4th4

Exception in thread "main" java.lang.IllegalStateException

Hello,

I run Fim version 1.2.2 on Ubuntu 16.04.2 LTS with java -version

openjdk version "1.8.0_121"
OpenJDK Runtime Environment (build 1.8.0_121-8u121-b13-0ubuntu1.16.04.2-b13)
OpenJDK 64-Bit Server VM (build 25.121-b13, mixed mode)

I had this error:

fim st -n
2017/03/26 17:39:18 - Info  - Scanning recursively local files, using 'do not hash' mode and automatic scaling
2017/03/26 17:39:40 - Info  - Scanned 274436 files (1 TB), during 00:00:21

Exception in thread "main" java.lang.IllegalStateException: Duplicated entries: Size=273632, MapSize=273585
	at org.fim.util.FileStateUtil.buildHashCodeMap(FileStateUtil.java:52)
	at org.fim.internal.StateComparator.searchForAddedOrModified(StateComparator.java:191)
	at org.fim.internal.StateComparator.compare(StateComparator.java:162)
	at org.fim.command.StatusCommand.execute(StatusCommand.java:56)
	at org.fim.Fim.run(Fim.java:258)
	at org.fim.Fim.main(Fim.java:75)

One strange thing is also that repertory has a size of 1,7T, so bigger than the 1 TB displayed.

Sort out duplicate results

Hello,

Currently duplicate results are

sorted by wasted size in decreasing order
in a format not easily manipulable with standard tools like the unix sort command.

Is it possible to add a --sort option to the fdup sub-command with a parameter that will sort results by

numbers of files in the duplicated set
size of individual file in a set
filepath similarity

Thank you

Detect unexpected changes (AKA "hardware corruption")

fim is good at finding which file changed in a normal way.

It would be nice to have a way to find changes most likely caused by a hardware corruption or a filesystem bug: change in content, but not in any of the timestamps (ctime, mtime). This could be done by filtering fim diff to only show changes to files having unmodified timestamp since last commit.

rdup -M inside a fim repo

Hello,

For a lot of reasons, I have some legitimate duplicates inside my fimed repository. So running rdup is not an option. I would like to do it against two specific dirs. Furthermore rdup keeps the first entry, which is sometimes not the good one to keep.

I would to be be able to use the -M option from the rdup sub-command inside a sub(-sub-sub) directory of a fim repository.

e.g. based on your documentation terminology (10.2 section), I would to do the following :

$ pwd
source #contains the .fim directory
$ cd some/sub/dir
$ pwd
source/some/sub/dir
$ fim rdup -M ../../other/sub/dir
[...]

Could be very useful
Thank you.

How do you ignore or remove a file or directory after it's already been added?

I initialized a fim repository and added everything. I'm trying to specifically ignore the file keys/pwdb.kdbx and the directory notebook/todo, and even though I have an entry for them in my .fimignore, they still show up in the status.

With git, it'd be git rm --cached. Is there an equivalent for fim, or do I need to reinitialize the repository and start over?

Details

Data structure

/data
├── archives
...
├── keys
│   └── pwdb.kdbx
...
├── notebook
│   ├── misc
│   └── todo
...

.fimignore

# Wildcard directories
**/.stversions
**/.tmsu

# Wildcard files
**/*.kdbx.lock

# Specific files and directories
# (also tried /data/...)
keys/pwdb.kdbx
notebook/todo

fim st -n

2017/08/11 22:31:16 - Info  - Scanning recursively local files, using 'do not hash' mode and automatic scaling                                                
2017/08/11 22:31:19 - Info  - Scanned 37277 files (69 GB), using 1 thread, during 00:00:02                                                                    

Comparing with the last committed state from 2017/08/10 17:07:55               
Comment: Fix up permissions and set up syncthing                               

Added:            notebook/misc/howto/make-goo-gone.txt                        
Added:            tmp/.stfolder        
Content modified: .fimignore    last modified: 2017/08/09 21:18:50 -> 2017/08/11 22:31:12                                                                     
Content modified: keys/pwdb.kdbx        PosixFilePermissions: rw-rw-r-- -> rw-r--r--                                                                          
                                        last modified: 2017/08/08 00:13:38 -> 2017/08/11 08:33:48                                                             
Content modified: notebook/todo/todo.txt        last modified: 2017/08/10 14:47:41 -> 2017/08/11 11:33:50                                                     

2 added, 3 content modified

Add par2 infos to be able to repair files that are damaged

Once the fim dcor command have detected hardware corruption.
It could be fine to repair them using par2 files.
More details here in french: http://linuxfr.org/news/parchive-les-pr%C3%A9mices-dune-norme

[Question] How to save the report generated during a commit

Hello, thanks for this good software.

Maybe I use it poorly, but I want to save the report generated during the commit.
For example

Added: test_1.txt
Added: test_2.txt
Content modified: test_3.txt

It seems that, the log command doesn't return it. Only this message appears:
2 added, 1 content modified

My actual poor wordaround: Run the commit command one time, get the report, cancel the commit, run the commit again with the comment filled with the report.

[Question] Is it possible to 'cancel' then 'restart' a fim init without restarting from 0?

Hello!

My question is related to fim init taking nearly 3 hours to complete as seen in the documentation.

Scenario

The fim init is taking up alot of CPU power and slowing down the users computer. I want to 'pause' the scanning and resume later

My Testing

When I perform a test and cancel the fim init, I am not able to 'restart' the init without removing the .fim directory.

Question

1.) Is it possible to 'pause' the initial fim init indexing and resume at a later time?

Thanks

Cannot create a fim repository on a CIFS share

When attempting to create a fim repository, I get the error message "Not allowed to create the 'X:.fim' directory that holds the Fim repository"
This is appening when accessing a CIFS file located on a synology NAS with admin read/write account

Error deleting files

fim finds duplicates, but cannot remove them, probably because of 'strange' (accented - bad UTF) characters in file or directory names.
I include one of those files for your tests.

https://www.dropbox.com/s/rg7zs9oqvjniz2d/Audiolivro%20-%20Livro%20-%20Curso%20de%20Orat%C3%B3ria%20-%2009%20A%20Import%C3%A2ncia%20da%20Palavra%20Di%C3%A1ria.mp3?dl=0

Filter out duplicate results

Hello,

Using the fdup sub-command, it would be nice to have an option to filter results to exclude some directories/filetype inside the repository. The --exclude / --include options could use ant-like patterns to define paths, separated by : for example.

Thank you

fdup - show files with same name but different content.

Issue created from #20

Add the option to record SELinux labels on files

SELinux is used to label files which allows them to be accessible from certain processes and not others.
As files can contain sensible information, are checked if they are not tampered with (integrity), metadata such as SELinux is also worth to keep a look on.

evrignaud / fim Goto Github PK

fim's People

Contributors

Stargazers

Watchers

Forkers

fim's Issues

Details

Data structure

.fimignore

fim st -n

Recommend Projects

Recommend Topics

Recommend Org