Giter VIP home page Giter VIP logo

yahb's Introduction

YAHB - Yet Another Hardlink-based Backup-tool

YAHB is a deduplicating file copy tool, intended for backup use. Deduplication works on the file-level with NTFS hardlinks.

Download & Installation

The latest release is available HERE.

YAHB is also available via winget. Simply open a command prompt and run

winget install asdfjkl.YAHB

Example

Consider the following scenario: Your have a folder

C:\MyFiles

for which you want to create backups. Assume for simplicity that the folder contains only two files:

C:\MyFiles\movie.avi (huge 600 MB movie file)
C:\MyFiles\todo.txt  (your todo-list, few kilobytes)

The large movie.avi doesn't change. Your todo.txt is changed almost daily, but it's only a very small file. Let's further assume, it's March 1st, 2019, the current time is 15:12, and you are creating backup with YAHB to F:\Backup. Then YAHB will simply copy C:\MyFiles as follows:

F:\Backup\201903011512\C__\MyFiles\movie.avi
F:\Backup\201903011512\C__\MyFiles\todo.txt

Suppose the next day (March 2nd, same time) you want to create another backup to the same location. The file todo.txt has changed inbetween, but the file movie.avi has not. YAHB will locate the last previous backup folder, and identify those files that changed, and those that didn't. Running YAHB again will result in the following backup:

F:\Backup\201903021512\C__\MyFiles\movie.avi -> hardlink to F:\Backup\201903011512\C__\MyFiles\movie.avi
F:\Backup\201903021512\C__\MyFiles\todo.txt

The folder F:\Backup\201903021512 now only takes a few kilobytes, instead of 600 MB, since movie.avi is only stored once on the drive F:, but two NTFS hardlinks are pointing to it.

Moreover:

  • If at some point, you decide to delete the folder F:\Backup\201903011512 (but keep F:\Backup\201903021512), NTFS will detect that there is a hardlink pointing to movie.avi. It will delete the folder, but keep movie.avi on the disk. Same for the other way round.
  • You always have a 1:1 copy of your current files at hand. In case of a desaster, there is no proprietary backup format to extract from, re-order your file structure etc. In the above example, just copy the latest version of MyFiles back, and all your data are there - maximum recoverability.
  • If a file is currently locked (i.e. opened for read/write), YAHB supports to still create a copy of that file using Windows Volume Shadow Copy Service. This is useful, if you want to create a backup in the background while working with the computer, i.e. creating backups of documents while you have them still open in Word/LibreOffice, or creating a backup of your Thunderbird or Firefox Profile folder, while still writing mails or browsing the web.

Installation

Just unzip to a folder, open a command-prompt and run yahb.

Requirements

YAHB is currently 64 bit only. YAHB will likely run fine on Windows 7 and 8.1, but only Windows 10 is supported.

  • When copying to a locally attached drive, the target drive MUST be NTFS-formatted. Otherwise hardlinks cannot be created.
  • When copying to a network share, things are more complicated. Basically the underlying file system must support hardlinks, and must expose hardlink creation in such a way, that Windows API commands can be used to create hardlinks. This is supported with i.e. SAMBA when Unix Extensions are enabled. Fortunately, most typical NAS solutions like Synology or QNAP suport this and work out-of-the-box.

YAHB requires Microsoft NET Framework 4.7.2 or higher. The following versions of Windows ship with suitable versions of NET Framwework by default, i.e. you don't need to install anything if you run:

  • Windows 10, version 1809 and later

If you are running an earlier version of Windows, download and install the latest Microsoft NET Framework here.

Only if you want to make use of Windows Volume Shadow Copy Service to copy files currently in use, you need to additionally install Microsoft Visual Studio C++ 2019 Redistributable. You need the version for 64bit systems, i.e. vc_redist.x64.exe. Note that it's very likely that this is already installed on your system by other programs.

Restrictions

Windows via default has a MAX_PATH restriction, i.e. can't handle path names longer than 260 characters - a relict from old MS-DOS times. Since YAHB keeps the original folder structure but in addition adds a timestamp and drive letter -- like e.g. F:\Backup\201903021512\C__\MyFiles it is possible to run into problems as the destination path is then longer than 260 characters.

There are two possible workarounds:

  • keep the maximal path length in mind and if required shorten folder names prior to creating a backup.
  • For YAHB version 1.0.5 or later: Windows 10, version 1607 and later are able to remove the MAX_PATH restriction via a registry entry. Locate HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\FileSystem and look for an entry called LongPathsEnabled. Change that value to 1 if it isn't already set to 1.

Note: When backing up to a network drive (i.e. SAMBA), it is unclear if long paths (> 260 characters) work in practice with YAHB 1.0.5. and the above mentioned registry entry. Feedback is appreciated!

Donate

You can support the development of YAHB by donating via Paypal.

paypal

Usage

Note: To use the option /vss you MUST run YAHB with elevated rights, i.e. from an elevated command prompt (Run as Administrator).

YAHB (Yet Another Hardlink-based Backup-Tool)
Version 1.0.7.0
Copyright (c) 2019 - 2021 Dominik Klein

     Syntax:: yahb.exe <source-dir> <target-dir> [<options>]

 source-dir:: source directory (i.e. C:\MyFiles)
 target-dir:: target directory (i.e. D:\Backups)

TYPICAL EXAMPLE:

 yahb c:\MyFiles d:\Backup /s /xf:*.tmp

will copy all files and the directory structure from c:\MyFiles
to d:\Backup\YYYYMMDDHHMM, including all subdirectories. Yahb will
also look for previous backups of c:\MyFiles in d:\Backup, and if
a file has not changed, it will create a hardlink to that location.
Moreover, all files with ending .tmp will be skipped.

OPTIONS

  /copyall                 :: copy ALL files. Otherwise the following directory
                              patterns and file types are excluded:

                              DIRECTORIES:
                              - 'System Volume Information'
                              - 'AppData\Local\Temp'
                              - 'AppData\Local\Microsoft\Windows\INetCache'
                              - 'C:\Windows'
                              - '$Recycle.Bin'

                              FILES AND PLACEHOLDERS:
                              - hiberfil.sys
                              - pagefile.sys
                              - swapfile.sys
                              - *.~
                              - *.temp

  /files:PAT1;PAT2;...     :: copy only files that match the supplied
                              file patterns (like *.exe)

  /help                    :: display this help screen

  /id:FILENAME             :: supply a list of Input Directories to copy which
                              are stored line by line in a textfile FILENAME.
                              If this options is used, <source-dir> can be
                              omitted. If both <source-dir> and /id:FILENAME
                              are present, all directories will be copied.
                              NOTE that if /s is provided, it will be 
                              applied to the list of input directories, and
                              will also be applied to <source-dir>.

  /list                    :: do not copy anything, just list all files

  /log:FILENAME            :: write all output (log) to a textfile FILNAME.
                              If FILENAME exists, it will be overwritten

  /+log:FILENAME           :: same as /log:FILENAME, but always append, i.e.
                              do not not overwrite FILENAME if it exists.

  /pause                   :: after finishing, wait for the user to press
                              ENTER before closing the program. This
                              prevents a command - prompt from vanishing
                              after finishing if run e.g. by Windows' RUNAS
                              command

  /s                       :: also copy all SUBDIRECTORIES of <source-dir>

  /tee                     :: even if /log:FILENAME or /+log:FILENAME is
                              chosen, still write everything additionally
                              to console output.

  /verbose                 :: by default, only the progress and errors 
                              are output to the console/log. In verbose
                              mode, all created files and directories
                              are listed - note that for large copy
                              operations, this frequent output to console
                              will slow down the overal operation

  /vss                     :: If a file is currently in use, and cannot be
                              accessed, try to still copy that file by using
                              Windows' Volume Shadow Copy Service.
                              YOU NEED TO RUN YAHB WITH ELEVATED (ADMIN)
                              RIGHTS FOR THIS TO WORK.

  /xd:DIR1;DIR2;...        :: eXclude directories dir1, dir2, and so forth.
                              I.e. if DIR is provided here, any (full)
                              directory path that contains DIR is skipped

  /xf:PAT1;PAT2;...        :: eXclude files with filename PAT1, PAT2 and so
                              forth. PAT can also be a file pattern like *.tmp

  /?                       :: display this help screen

yahb's People

Contributors

asdfjkl avatar creedflan738 avatar thomasschroeder avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

yahb's Issues

Check for double-call of yahb

If by design or errouneously one calls yahb twice in the same minute, it treats the backup folder both as source for the Hardlink-similarity check and target folder. I guess it would be easy to check against that.

Problem with R/O Files

There is a problem with (incremental?) Backups. if the source file is readonly flagged.
Even with admin-right an error is generated. If on remove the readonly-flag from the source file, everything is fine:

Unbehandelte Ausnahme: System.UnauthorizedAccessException: Der Zugriff auf den Pfad "w:\BACKUP\202002182353\d__\test.pdf" wurde verweigert.
bei System.IO.__Error.WinIOError(Int32 errorCode, String maybeFullPath)
bei System.IO.FileStream.Init(String path, FileMode mode, FileAccess access, Int32 rights, Boolean useRights, FileShare share, Int32 bufferSize, FileOptions options, SECURITY_ATTRIBUTES secAttrs, String msgPath, Boolean bFromProxy, Boolean useLongPath, Boolean checkHost)
bei System.IO.FileStream..ctor(String path, FileMode mode, FileAccess access, FileShare share, Int32 bufferSize)
bei System.IO.File.OpenFile(String path, FileAccess access, SafeFileHandle& handle)
bei System.IO.File.SetCreationTimeUtc(String path, DateTime creationTimeUtc)
bei System.IO.FileSystemInfo.set_CreationTimeUtc(DateTime value)
bei System.IO.FileSystemInfo.set_CreationTime(DateTime value)
bei yahb.CopyModule.doCopy()
bei yahb.Program.Main(String[] args)

yahb dies with "UnauthorizedAccessException"

I guess the file should be skipped and the Error logged.

If you should fix this error I would be really thankful for a Win7-Release. (Would be a pity if I couldn't use this very nice solution...)

yahb C:\ K:\YAHB\C /vss /+log:K:\YAHB\C.log /s /x
f:*.tmp;tmp;temp
copying files: [####      ]  40%  ETR: 03:18:53
Unbehandelte Ausnahme: System.UnauthorizedAccessException: Der Zugriff auf den Pfad "C:\Users\All Users\Microsoft\Diagnosis\events00.rbs" wurde verweigert.
   bei System.IO.__Error.WinIOError(Int32 errorCode, String maybeFullPath)
   bei System.IO.File.InternalCopy(String sourceFileName, String destFileName, Boolean overwrite, Boolean checkHost)
   bei yahb.CopyModule.doCopy()
   bei yahb.Program.Main(String[] args)

Check for old Backup

This is a wish for one additional feature: I'd like a flag to determine a maximum age x. If the last backup file is older than x, then copy instead of hardlink.

The reason: I feel unsafe if an important file was written to memory only once years ago and is only hardlinked since then. I'd feel better if it's written anew once in a while. Of course this relies on the original file aging better than the old Backup copy. So addtionally it would make sense to verify the versions against each other, but I guess that is a lot more work than my proposal above.

YAHB doesn't create Hardlinks

My drive is NTFS and I'm using Windows 10. Everything looks normal, but the space used shows no hard-links have been used. I used DU by sysinternals to check. Flag -u results in the same count and size as without flag.

Error: Division by zero

Hallo,

I wanted to try yahb, but it did not really work.

Scenario: In Windows 10 a share from my Synology NAS is mounted as J:, it contains a directory named Backups. (Maybe it's important: Windows 10 runs in a virtual box).

I tried
yahb c:\Users\joerg\Documents j:\Backups /s

Output:
creating list of directories ...
ERR:c:\Users\joerg\Documents\Eigene Videos:Der Zugriff auf den Pfad "c:\Users\joerg\Documents\Eigene Videos" wurde verweigert.
ERR:c:\Users\joerg\Documents\Eigene Musik:Der Zugriff auf den Pfad "c:\Users\joerg\Documents\Eigene Musik" wurde verweigert.
ERR:c:\Users\joerg\Documents\Eigene Bilder:Der Zugriff auf den Pfad "c:\Users\joerg\Documents\Eigene Bilder" wurde verweigert.
creating list of directories ... DONE
creating list of files ...
creating list of files ... DONE
unable to identify a previous backup location, copying all
copying files: [ ] 0%
Unbehandelte Ausnahme: System.DivideByZeroException: Es wurde versucht, durch 0 (null) zu teilen.
bei yahb.CopyModule.doCopy()
bei yahb.Program.Main(String[] args)

Result: The backup directory ist created as expected:
j:\Backups\202004031108\c__\Users\joerg\Documents
But this contains only the directories of the source, which are all empty. No files were copied.

Do I make a mistake here? Should I change or check something? Or is there some bug?

I hope my comment helps to improve your project,
regards and good health,
Jörg

Control over memory usage

I use yahb to backup results of an indefinite long algorithm. Right now it has 7GB of results in a folder, which are refined again and again and keep growing. As loading / saving needs additional time, I designed my program to carefully keep ~3 GB of the currently used results in memory.

YAHB takes ~800 MB to backup that folder. Most often this results in >4 GB total memory usage, so Windows activates the Auslagerungsdatei. Of course I could tell my program to use less memory, but that would slow it down even more.

Best would be an option to tell YAHB to use a maximum of X MB for operation, like 500 MB in this case.

/verbose:level

A verbose-level will help to reduce the log-file size. Something like
/verbose -> all operations (as now implemented)
/verbose:1 -> only new
/verbose:2 -> only new and non existent
something like this.

greetings

Backup failed, SystemIO error

The full error message in the output after about 68% of successful backup is below. There is enough space in the drive.

unable to create hardlink, copying instead

Unbehandelte Ausnahme: System.IO.IOException: Nicht genügend Systemressourcen, um den angeforderten Dienst auszuführen.

bei System.IO.__Error.WinIOError(Int32 errorCode, String maybeFullPath)
bei System.IO.__ConsoleStream.Write(Byte[] buffer, Int32 offset, Int32 count)
bei System.IO.StreamWriter.Flush(Boolean flushStream, Boolean flushEncoder)
bei System.IO.StreamWriter.Write(Char[] buffer, Int32 index, Int32 count)
bei System.IO.TextWriter.SyncTextWriter.WriteLine(String value)
bei System.Console.WriteLine(String value)
bei yahb.Config.addToLog(String message)
bei yahb.CopyModule.doCopy()
bei yahb.Program.Main(String[] args)

Copying too slow

I accidentally backupped to a new folder, so all files are copied. This takes 5 hours exactly. When copying all files with robocopy, it takes ~30 minutes. So copying is much slower than strictly necessary.

Idea: If it's too difficult making your algorithm more efficient you could let it do all the hardlinks and then copy the remaining files with robocopy.

System.IO.IOException on yahb.CopyModule.createDirectoryList()

There is a problem with recent Adobe Acrobat Licensing Service that makes yahb and other backup software break when trying to access the log folders of the software for backup purposes. This has been reported to Adobe and will hopefully be worked on (https://community.adobe.com/t5/illustrator-discussions/com-adobe-dunamis-folder-cannot-be-backed-up-by-backup-solutions/td-p/14559757)

However, it would probably be possible to fix this (or work around it) in yahb as well in case this happens in the future with this or other software.

Here is the output from yahb just before the crash

Unbehandelte Ausnahme: System.IO.IOException: Der Prozess kann nicht auf die Datei "C:\Users\flo\AppData\Roaming\com.adobe.dunamis\d225650b-9c1f-4738-97e3-94e805951049\v1\0" zugreifen, da sie von einem anderen Prozess verwendet wird.
   bei System.IO.__Error.WinIOError(Int32 errorCode, String maybeFullPath)
   bei System.IO.FileSystemEnumerableIterator`1.CommonInit()
   bei System.IO.FileSystemEnumerableIterator`1..ctor(String path, String originalUserPath, String searchPattern, SearchOption searchOption, SearchResultHandler`1 resultHandler, Boolean checkHost)
   bei System.IO.Directory.EnumerateDirectories(String path)
   bei yahb.CopyModule.createDirectoryList()
   bei yahb.Program.Main(String[] args)

Whatever Acrobat Licensing Service does here, there is currently no catch for IOException in createDirectoryList(). So maybe add this and keep working on the remaining directories as is already done for some other exceptions.

Maybe if I get this right we could also try a fix with different EnumerationOptions.
Update: Turns out after getting to the point in the .NET code where the exception is thrown we do not have an option to prevent this with different EnumerationOptions. I'll leave the analysis here anyway in case someone wants to follow the thought process.

yahb calls EnumerateDirectories() with only one argument, the directory to start from

subdirs = new List<string>(Directory.EnumerateDirectories(currentDir));

This means that the unused parameters are filled with default options, the EnumerationOptions being set to EnumerationOptions.Compatible
https://github.com/dotnet/runtime/blob/5535e31a712343a63f5d7d796cd874e563e5ac14/src/libraries/System.Private.CoreLib/src/System/IO/Directory.cs#L216

Notably EnumerationOptions.Compatible means that IgnoreInaccessible is set to false
https://github.com/dotnet/runtime/blob/5535e31a712343a63f5d7d796cd874e563e5ac14/src/libraries/System.Private.CoreLib/src/System/IO/EnumerationOptions.cs#L20-L21

We thus end up calling EnumerateDirectories() with all available parameters internally
https://github.com/dotnet/runtime/blob/5535e31a712343a63f5d7d796cd874e563e5ac14/src/libraries/System.Private.CoreLib/src/System/IO/Directory.cs#L223-L224

Which then calls InternalEnumeratePaths() defined just above the different EnumerateDirectories() definitions
https://github.com/dotnet/runtime/blob/5535e31a712343a63f5d7d796cd874e563e5ac14/src/libraries/System.Private.CoreLib/src/System/IO/Directory.cs#L196-L214

This leads to another internal call to FileSystemEnumerableFactory.UserDirectories() defined here
https://github.com/dotnet/runtime/blob/5535e31a712343a63f5d7d796cd874e563e5ac14/src/libraries/System.Private.CoreLib/src/System/IO/Enumeration/FileSystemEnumerableFactory.cs#L128-L140

Creating a new FileSystemEnumerable instance defined here
https://github.com/dotnet/runtime/blob/5535e31a712343a63f5d7d796cd874e563e5ac14/src/libraries/System.Private.CoreLib/src/System/IO/Enumeration/FileSystemEnumerable.cs#L14-L38

And here is the interesting part, at the end of the constructor we create a DelegateEnumerator which according to the source code comment ensures that we get possible IO exceptions for the target directory right at the beginning
https://github.com/dotnet/runtime/blob/5535e31a712343a63f5d7d796cd874e563e5ac14/src/libraries/System.Private.CoreLib/src/System/IO/Enumeration/FileSystemEnumerable.cs#L35-L37

This DelegateEnumerator creates a FileSystemEnumerator
https://github.com/dotnet/runtime/blob/5535e31a712343a63f5d7d796cd874e563e5ac14/src/libraries/System.Private.CoreLib/src/System/IO/Enumeration/FileSystemEnumerable.cs#L60-L68

Ath the end of the FileSystemEnumerator constructor it calls its method Init()
https://github.com/dotnet/runtime/blob/5535e31a712343a63f5d7d796cd874e563e5ac14/src/libraries/System.Private.CoreLib/src/System/IO/Enumeration/FileSystemEnumerator.cs#L31-L43

Which is implemented in the Windows specific file and creates a directory handle to check for any IO exceptions
https://github.com/dotnet/runtime/blob/5535e31a712343a63f5d7d796cd874e563e5ac14/src/libraries/System.Private.CoreLib/src/System/IO/Enumeration/FileSystemEnumerator.Windows.cs#L48-L50

So unfortunately there is no try/catch here and changing the EnumerationOptions won't help, we just have to catch the IOException in yahb.

Maybe custom EnumerationOptions would help at locations where IgnoreInaccessible is actually used?
https://github.com/dotnet/runtime/blob/main/src/libraries/System.Private.CoreLib/src/System/IO/Enumeration/FileSystemEnumerator.Windows.cs#L115

I think with IgnoreInaccessible and RecurseSubdirectories set to true in EnumerationOptions we may get the full directory list with much less going back and forth between yahb's createDirectoryList() and the .NET functions.

But certainly not related to this bug then.

System.ArgumentOutOfRangeException: Ungültige Win32-FileTime

Thank you for this nice piece of code!
Unfortunately, after the second run of a backup, I receive this error message after some percentages:


copying files: [## ] 21% ETR: 00:03:43
Unbehandelte Ausnahme: System.ArgumentOutOfRangeException: Ungültige Win32-FileTime.
Parametername: fileTime
bei System.DateTime.FromFileTimeUtc(Int64 fileTime)
bei yahb.CopyModule.doCopy()
bei yahb.Program.Main(String[] args)

Unfortunately no informations about which file is affected in the log file.

Can you help me with this?
Thank you very much in advance!

return an errorlevel

If something goes wrong it would be helpful to get back an errorlevel <> 0.

greetings gmlltg

report errors

currently still old setting (i.e. report error only on verbose); reverse behaviour.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.