harmy / boar Goto Github PK

View Code? Open in Web Editor NEW

0.0 0.0 0.0 7.56 MB

Automatically exported from code.google.com/p/boar

Python 70.06% Shell 22.45% Makefile 0.08% C 5.69% C++ 0.42% Inno Setup 1.30%

boar's People

Contributors

Watchers

boar's Issues

Feature Request: Filtering/Grouping of Sessions

For purposes of organization and efficiency it makes sense to great separate 
sessions for directories that are more or less self contained. Particular 
photoshoots/projects, different events, etc. So rather than having a session 
called "pics" I find that I have sessions called personal_xmas_2012, 
work_client1_images, etc.

This is fine for a smaller number of sessions, but I'm in the process of 
putting more or less all of my media under revision control and the session 
list quickly balloons out of control. 

It would be nice to be able to appending some kind of "tag" metadata to 
sessions to be able to quickly filter logical session groups. Like only showing 
sessions with +work, or filtering on multiple tags like +work +2012. At the 
moment I'm overloading the session name to accomplish this, but it would be 
much nicer to do it with a separate metadata attribute that could be easily 
added to the session.json files.

On a related note, it would be nice to be able to specify a session by using a 
partial session name so long as it was unambiguous. So import -v -m "minor 
edits" "photos" "very_long_session_name_blah_blah_blah" could be "very_long*" 
(assuming there was nothing else that started with very_long).

Original issue reported on code.google.com by [email protected] on 15 Jan 2012 at 6:32

Option to skip errors during initial import

BOAR: 3-Mar-2011 daily
OS: Windows 7 x64
PYTHON: 2.7.1

When doing an initial import (in this case I'm importing a large document 
repository network mapped as drive letter X) boar quits with: 

Traceback (most recent call last):
  File "C:\[home]\boar\boar", line 501, in <module>
    return_code = main()
  File "C:\[home]\boar\boar", line 458, in main
    return cmd_import(args[1:])
  File "C:\[home]\boar\boar", line 274, in cmd_import
    log_message = log_message)
  File "C:\[home]\boar\workdir.py", line 183, in checkin
    self.get_changes()
  File "C:\[home]\boar\workdir.py", line 353, in get_changes
    filelist[f] = self.cached_md5sum(fn)
  File "C:\[home]\boar\workdir.py", line 310, in cached_md5sum
    stat = os.stat(abspath)
WindowsError: [Error 2] The system cannot find the file specified: 
u'x:\\/JOHNDOE/
DIR1/PDFsamTMPbufferELENL1.pdf'

I'm trying to do an import on a "live" document repository. At the moment, it 
seems boar quits when it finds a locked file or a file it expects disappears. I 
realize there are concerns with regard to snapshot integrity, but is there any 
harm in giving the user the option to ignore/skip a file/folder rather than 
aborting? It seems that would be a more graceful way to handle these cases. If 
a file is locked for editing, or is a tmp file that disappears during the 
import the file should be easy enough to add when it becomes available by doing 
an update correct?

Anyway, just wondering what you thoughts are on alternatives to simply aborting 
a large import process given how long such a process can take.

Original issue reported on code.google.com by [email protected] on 8 Jun 2011 at 10:48

info command should work even if repo is unavailable

What steps will reproduce the problem?
1. Check out a workdir
2. Make the repository unavailable (move it or rename it)
3. Execute "boar info"

What is the expected output? What do you see instead?

Boar should give us any available info on the workdir. Now boar only reports an 
error because the repo is unavailable.

Original issue reported on code.google.com by [email protected] on 25 Sep 2011 at 9:10

When checking in files, Check for a crc32 hash in the filename

This is essentially just looking in the filename for the hash and seeing if the 
hash and the computed hash match. Attached is a link to a python script that 
checks crc32s after finding them in the filename. 
http://agafix.org/anime-crc32-checksum-in-linux-v20/

Original issue reported on code.google.com by [email protected] on 4 Mar 2011 at 9:08

Add additional dir level to blobs/

As it stands, the blobs/ dir is subdivided into 256 folders. For use cases 
involving very large datasets (ie 1M+ files) having directories with 3000+ 
files in them gets unwieldy and can effect performance. What are your thoughts 
on allowing an upgrade path for /blobs/12/34/1234567890abcdef ?

This would allow for virtually any size dataset (2 subdir nesting is what you 
often see in the urls of file and imagehosts that store files by hash). If you 
want to allow backward compatibility, you could specify a "repo version" 
property either in the main repo dir or in the session file?

Anyway, really liking boar. I need to brush up on python a bit, but I'd like to 
submit patches sooner rather than later, and not just endless issues/requests :P

Original issue reported on code.google.com by [email protected] on 21 Dec 2011 at 5:57

Functionality duplicate

Am I right you want to do EXACTLY the same thing that is possible by using:
1. rdiff-backup
2. rdiff-backup-fs

?

If not, I have an issue for you - very limited description at project homepage 
that prevents me judging that I'm interested or not.

If you would like to do same as rdiff-backup but optimised in incremental data 
saving space algorithms using object specific techniques then this project is 
VERY interesting. If not, this is simple backup utility clone.

So, I would like an answer which should be at least descriptive about plans for 
the future if you want help developing such a thing.

Original issue reported on code.google.com by [email protected] on 26 May 2011 at 11:19

Feature Requestion: boar diff

A diff command to be able to see the differences between revisions.

I know there has been discussion of issues with linked vs independent snapshot 
states with regard to efficiency trade offs. What are your thoughts on breaking 
up the single bloblist into separate components more along the lines of the git 
object model?

http://book.git-scm.com/1_the_git_object_model.html

I understand that simplicity is a guiding principal of boars design, but having 
separate per directory hashed "tree" objects can retain the same simple json 
format, but should allow a diff operation to run more efficiently by quickly 
surfacing only those directories that are dissimilar between revisions and 
allowing the script to quickly drill down to those differences. Just a thought.

Original issue reported on code.google.com by [email protected] on 19 Dec 2011 at 7:08

Cannot checkout a directory with spaces in its name

What steps will reproduce the problem?
1. Import a directory with spaces in its name, e.g
boar import "a directory" "session/a directory"

where session is the session's name 

2. Checkout the directory 
boar co "session/a directory"

 supposed repo option was set correctly.

What is the expected output? What do you see instead?
It should create a new directory "session/a directory"

Instead it output: 
AssertionError: Offset was: a Path was: a directory/....

seems that it failed on the first file. 

What platform are you using? (Windows XP, Windows 7, Linux, ...)
Ubuntu 9.04

What version of Python are you using?
2.6.2

What version of boar are you using? (Mercurial change id or daily build
date)
boardaily 27-feb-2011

Please provide any additional information below.
If I tried

boar import "a directory" "session/a directory" a_dir_name

it output: ERROR: too many arguments

Original issue reported on code.google.com by [email protected] on 2 Mar 2011 at 6:22

Purge Directory from the Repo

This enhancement request is similar to Issue 13, but the idea would be to be 
able to strike an entire directory from the "snapshot" of a given session 
without disrupting the rest of the snapshot.

The use case is similar to that of issue 13. In the case of considerable 
reorganization of a media collection and particularly when a large directory is 
permanently removed from the collection and there is no longer a need/desire to 
maintain copies of that data, it would be nice to be able to remove a directory 
from a snapshot.

It would require iterating thru the bloblists of all linked sessions and 
removing any entries occurring at and below a given directory and removing 
blobs that are no longer present in any of the resulting bloblists.

To guard against corruption the linked sessions can first be cloned and the 
operation can be conducted on this clone to create a fork of the session.

In fact the "purge" command can simply handle the bloblist cleanup, while a 
second "cleanup" command can created to remove blobs that no longer have any 
entries in the bloblist.

Given that a guiding use case for boar is managing large media collections, 
being able to gracefully retire content from boar sessions/repos is important. 
I may upgrade all my music to a lossless format and decide to retire my "mp3" 
folder from my boar music repo for instance.

Anyway, I can't think of any obstacles that should make this too difficult to 
implement, however I've been wrong before... once ;)

Regardless, I think this is an important/desirable functionality for a tool 
like boar.

Original issue reported on code.google.com by [email protected] on 10 Jun 2011 at 1:09

Feature request: encryption

Hi,

for my backups it is very necessary that I can store them encrypted.

Do you see any possibility in future, to implement that in future?


Jan

Original issue reported on code.google.com by [email protected] on 26 Jun 2011 at 10:18

Feature Request: Display progress more verbosely

I notice boar spends significant amount of time listing files and folders 
(common.py:get_tree), for instance, listing just 20000 files may take > 15s. 
This is an usability issue for folders with hundreds of thousands files. I have 
a folder with 400,000 files and it takes forever to complete.

My suggestion is boar should display verbose progress for some of its steps 
where it potentially takes a significant amount of time to complete. I think 
just counting the number of files being listed in a folder should have a 
positive psychological effect, though you can't know in advance the total 
number of files. 
Also printing line after line like this: 

Remaining: 2053 files, 14959 Mb (0.0% complete, 0.0 Mb/s)
Remaining: 2052 files, 14415 Mb (3.6% complete, 16.3 Mb/s)
...

is not very user friendly. Instead it should update the progress on a single 
line, similar to this: http://pypi.python.org/pypi/progressbar/2.2

Original issue reported on code.google.com by [email protected] on 9 Feb 2012 at 9:34

Merged into: #37

Exception when giving only --repo command line option

What steps will reproduce the problem?
1. execute the command "boar --repo=/whatever"

What is the expected output? What do you see instead?

There should be a helpful message telling that you need to issue a command word 
as well. Instead, there is an ugly exception.

Original issue reported on code.google.com by [email protected] on 2 Aug 2011 at 10:02

Erronous "Deletion failed" message when updating offset workdir

What steps will reproduce the problem?
1. Create a new directory with the following files with any content: 
deleted_file.txt, subdir/file.txt.
2. Import the directory as a new session named "TestSession"
3. Check out the root of "TestSession" as workdir1
4. Check out "TestSession/subdir" as workdir2
5. Delete the file workdir1/deleted_file.txt
6. Commit the changes in workdir1
7. Execute an update command in workdir2

(These steps are also implemented in the attached test script)

What is the expected output? What do you see instead?

The update should run without any error messages. Instead, a message "Deletion 
failed: deleted_file.txt" is shown.

Please use labels and text to provide additional information.

Original issue reported on code.google.com by [email protected] on 11 Apr 2011 at 7:50

Attachments:

issue17.sh

Continuous replication mode for clone command

The clone command will update an older copy of a repository with all the 
changes in a more recent version. It would be useful to be able to make boar 
automatically keep a clone updated.

The boar "clone" command shall take a new flag, "--continuous" which shall make 
boar simply repeat the clone operation every n seconds.

Original issue reported on code.google.com by [email protected] on 18 Aug 2011 at 2:21

Tab Completion in Bash shell

Definitely a low priority "convenience" feature... but tab completion for 
commands and sessions would be great. Would be especially nice for sessions 
with subpath/offset components :)

In a related vein, a --color=WHEN style switch for shells that support color 
would be great too as far as the someday maybe features go ;)

Original issue reported on code.google.com by [email protected] on 21 Dec 2011 at 5:29

Add Option to Ignore Files

The option to specify files to ignore would be very useful. Currently, I need 
to be able to ignore Thumbs.db in all directories, so a spec like "Thumbs.db" 
should ignore any file with that filename.

Similarly, other options might be "c:\temp" to ignore an entire subdirectory, 
"*.tmp" to ignore any file ending in tmp, "c:\temp\*.tmp" for a combination.

Original issue reported on code.google.com by [email protected] on 21 Feb 2011 at 2:50

Empty first revision?

I was just wondering if there is any purpose to the initial revision for any 
session being empty. Is that just an artifact of the fact that when you 
"create" a new session it is necessarily empty, or is there any use case or 
technical reason why having an empty revision is a good idea?

Original issue reported on code.google.com by [email protected] on 13 Jun 2011 at 7:26

There should be an warning message when checking out non-existing subdir

What steps will reproduce the problem?
1. Check out a non-existing subdir from a session, like "boar co 
MySession/this_dir_does_not_exist"

What is the expected output? What do you see instead?
The operation completes without an error and an empty workdir is created. This 
is confusing if the user intended to check out an existing directory.

This is an accidental feature. It is a very convenient way to create a new 
directory in a session. But the behaviour is likely to confuse a user who 
simply mistyped the name of the directory. There should be a notification 
message explaining that a new directory will be created if you commit any 
changes in this workdir.

Original issue reported on code.google.com by [email protected] on 5 Feb 2012 at 10:55

ImportError: No module named bsddb

What steps will reproduce the problem?
1. Install ActivePython 2.7.1.3
2. Install Boar
3. execute boar

What is the expected output? What do you see instead?
C:\>boar.bat

C:\>C:\Python27\python.exe C:\Python27\boar\boar
Traceback (most recent call last):
  File "C:\Python27\boar\boar", line 35, in <module>
    import workdir
  File "C:\Python27\boar\workdir.py", line 40, in <module>
    import dbhash
  File "C:\Python27\lib\dbhash.py", line 7, in <module>
    import bsddb
ImportError: No module named bsddb

What platform are you using? (Windows XP, Windows 7, Linux, ...)
windows XP

What version of Python are you using?
2.7.1.3.

What version of boar are you using? (Mercurial change id or daily build date)
boar-daily.05-Feb-2011.zip

Please provide any additional information below.
from http://docs.python.org/library/bsddb.html
Deprecated since version 2.6: The bsddb module has been deprecated for removal 
in Python 3.0.

Original issue reported on code.google.com by [email protected] on 9 Feb 2011 at 11:11

add --ignore-errors switch added to ci convenience function

the --ignore-errors switch is great for the import command, given that ci is 
simply a shortcut to perform an import for working dirs, the same 
--ignore-errors switch would be desirable for the same reason as it is on 
import (having a large commit abort after process for a long time vs skipping 
locked/modified files is not annoying).

Thanks and hope you enjoy your holidays :)

Original issue reported on code.google.com by [email protected] on 21 Dec 2011 at 10:52

Exception when accessing locked archive.

What steps will reproduce the problem?
1. Create a new repo
2. Start some large writing operation, like importing a huge directory
3. Kill the boar process in some way that will not allow normal cleanup 
procedures (kill -9, or yank the drive)
4. Execute the same operation again (or any other writing operation)

What is the expected output? What do you see instead?

The operation should resume normally. Instead, the following exception is 
thrown:

Traceback (most recent call last):
  File "C:\Python26\boar\boar", line 584, in <module>
    return_code = main()
  File "C:\Python26\boar\boar", line 563, in main
    return cmd_clone(args[1:])
  File "C:\Python26\boar\boar", line 446, in cmd_clone
    repo2 = repository.Repo(repopath2)
  File "C:\Python26\boar\blobrepo\repository.py", line 117, in __init__
    self.repo_mutex.lock_with_timeout(60)
  File "C:\Python26\boar\common.py", line 317, in lock_with_timeout
    except MutexLocked:
NameError: global name 'MutexLocked' is not defined

A workaround is to make sure that no other process is accessing the repository, 
and then remove all the mutex-* files in the tmp directory in the repo.

Original issue reported on code.google.com by [email protected] on 20 Jul 2011 at 8:46

Concurrent commits on the same session can cause conflicts to go undetected

What steps will reproduce the problem?
1. Create a new session.
2. Check out the new session to two workdirs, workdir1 and workdir2.
3. Add some large files to workdir1 so that it will take a long time to commit.
4. Start committing the changes to workdir1.
5. Add some small files to workdir2 so that it will commit quickly.
6. Commit the changes to workdir2 before the workdir1 commit has completed.

What is the expected output? What do you see instead?

Both commits will succeed. The slow commit will be the last to finish, and will 
be the latest revision of the session, hiding the changes committed in workdir2.

The commit to workdir2 should fail with an error message "This session is 
currently being updated. Try again later." (Or, the commit could simply wait 
until the other commit is finished, but the likely result then will also be an 
error, "Your workdir is our of date. You need to update your workdir before you 
can commit")

Original issue reported on code.google.com by [email protected] on 22 Mar 2011 at 11:35

Feature Request: Cygwin Path Support (Cross Platform Path Support)

I know you stated that boar is designed to work with bash and windows cmd 
shells, so I'm not sure if you ever have occasion to use cygwin on windows. It 
would be nice if boar would do something intelligent when it sees a cygwin 
style path in terms of converting it to a compatible path so regardless of what 
style of path (*nix, windows, or cygwin) boar behaves correctly cross platform. 
Cygwin paths looks like "/cygdrive/drive_letter/path/to/file".

When making a directory a working directory I know the path value is stored in 
the info file; it is these that should be correctly interpreted if one 
shell/platform is used to make a directory a working directory and a different 
shell is used to make a check in from that directory.

Original issue reported on code.google.com by [email protected] on 15 Jan 2012 at 7:21

Boar should be able to list specific folders and files

Currently it lists all files which makes it hard to manage the repository.

I think it should be like svn list, which will help tremendously.

Original issue reported on code.google.com by [email protected] on 2 Mar 2011 at 6:33

Workdir commands should give nice error message when used outside a workdir

What steps will reproduce the problem?
1. Go to any non-workdir directory
2. Execute "status" or "info" commands.

What is the expected output? What do you see instead?
There is an ugly exception. There should be a nice message explaining that 
these commands cannot be used outside a workdir.

Original issue reported on code.google.com by [email protected] on 21 Jan 2012 at 4:57

Add a --version command

Add a --version command to print out the current version of boar. When 
reporting issues or troubleshooting problems it makes it easy to confirm the 
version you are running from the command line.

Original issue reported on code.google.com by [email protected] on 8 Jun 2011 at 10:17

Boar should not be affected by FAT32 limits

What steps will reproduce the problem?
1. Create a repository on a FAT32 drive
2. Import a directory with a file with size > 4GB

What is the expected output? What do you see instead?

There is an error message. The operation should complete normally.

Due to how boar store files internally, file size limits imposed by the file 
system also affects boar. FAT32 is unfortunately still quite common as the 
default file system on portable media such as USB memory sticks and external 
HDDs. Boar should split large files so that a repository can always be stored 
on FAT32. FAT32 max file counts and filename limits are not expected to cause 
any problems.

There are possibly other file systems that should be supported by boar, but 
this issue covers only FAT32.

Original issue reported on code.google.com by [email protected] on 15 Aug 2011 at 2:03

Feature to show log for specific files or sub-tree

Currently list shows all sessions
list session show revisions within a session

extending this to list session/offset would make the use of /offset much more 
useful (as it stands, I'm not exactly sure what functionality the 
session/offset provides). Is there some way to drill down to only view things 
by session/offset already or is that an option to allow for future 
functionality?

Original issue reported on code.google.com by [email protected] on 21 Dec 2011 at 11:11

Detecting and skipping unchanged imports/commits

At the moment, when a user runs import on a directory and nothing has changed, 
the import is still committed. Is there a use case for this?

I'm not sure if this behavior is by design, or if it would perhaps be more 
intuitive to display an info/warning message to the user that nothing has 
changed and not create the new session or at the very least ask for 
confirmation from the user.

If this is by design, I'm just wondering what the use case for it is (sessions 
where nothing has changed).

Original issue reported on code.google.com by [email protected] on 15 Jan 2012 at 12:36

use /usr/bin/env to allow other locations for the python executable.

/usr/bin/env python Should be more platform independent, because now python can 
be in other places, like the users home directory.

Original issue reported on code.google.com by [email protected] on 10 Mar 2011 at 9:20

Attachments:

boar.patch

Block level data deduplication

Boar should be able to find identical blocks between files and only store them 
once. Boar does currently only perform file level deduplication. That is, 
identical files are only stored once in the repository, regardless of filename 
or session. 

This feature will reduce overhead when performing small changes to large files, 
such as editing EXIF data in an image. Also, it will make it feasible to 
version control large data files with frequent small changes, such as virtual 
machine images.

Original issue reported on code.google.com by [email protected] on 22 Mar 2011 at 11:48

Boar should support symbolic links

I primarily use boar in a windows 7 environment, however the lack of support 
for tracking symlinks/junctions makes it very difficult to use as a "backup" 
solution.

The issue I have is that by not at least storing metadata about the existence 
of junctions in the bloblist, information about the directory structure you are 
ostensibly checking in is irrevocably lost.

You want to know that your data is safe when using any version control system. 
Symlinks and junctions *are* important. They're usually there for a reason, and 
if they're missing, things can break in nonobvious ways. By not having any 
means to at least track or log their existence, boar is losing potentially 
important data. When restoring a session that included symlinks or junctions 
the user will not have any indication what files are missing or any place to 
look for hints about recreating them.

Even if boar doesn't allow you to recreate them fully (due to filesystem  or 
permissions issues) perhaps simply providing a hint/placeholder file or 
including an entry in the bloblist with symlink and reparse point (name -> 
target) information would allow the user to recreate these structures on 
whatever filesystem they are using with a fairly trivial script using mklink or 
ln (or what have you).

Also, does boar store information about empty directories? Again, this comes 
down to the issue of "losing" information. I realize boar is aimed at story 
binary data, but as a practical matter the directory structures (ie the 
"where") in which blobs are stored matters. Even if a directory is empty, the 
name or location of a directory can be provide important information that is 
otherwise irrevocably lost if not stored somehow. 

Philosophically, I feel backup and especially archival tools should err on the 
side of caution when it comes to being able to faithfully restore the data they 
are entrusted with. If a user has a file in a directory the safest assumption 
is because they want it there and as such any VCS that deals with files should 
provide a mechanism for restoring or at least capturing information about it.

Any thoughts on this (even simply having an entry in the bloblist about 
directories and symlinks/junctions)?

Original issue reported on code.google.com by [email protected] on 19 Dec 2011 at 5:46

Missing repo or workdir prevents --help message from showing

Some commands that require a workdir or a repository to work on, will not give 
a help message when --help is given. Instead, an error message about missing 
repository will be printed. (Also, a few commands do not accept the --help 
option at all)

> What is the expected output? What do you see instead?

All commands should always print a helpful message, and nothing else, when 
--help is specified.

Original issue reported on code.google.com by [email protected] on 9 Feb 2011 at 9:40

Commands ci, update and status should be able to work on part of a tree

The ci, update, and status commands currently always work on the full workdir 
tree. Scanning the full tree can be time consuming. To make it easier to use 
work directories containing many files, it should be possible to execute 
operations on only parts of the tree. For instance, if you know that you have 
only changed files in a specific directory, it serves no purpose to scan any 
files outside that directory.

Original issue reported on code.google.com by [email protected] on 9 Feb 2012 at 9:58

UnicodeEncodeError: 'ascii' codec can't

What steps will reproduce the problem?
1. Import a directory with option -v -n

What is the expected output? What do you see instead?
Boar should dry run through and print out all necessary information. Instead it 
spitted out : 

UnicodeEncodeError: 'ascii' codec can't encode character u'\xa1' 
in position 0: ordinal not in range(128)

and stop

What platform are you using? (Windows XP, Windows 7, Linux, ...)
Windows 7

What version of Python are you using?
2.6
What version of boar are you using? (Mercurial change id or daily build
date)
Boar daily 13 Feb-2011

Please provide any additional information below.
This is an issue with sys.stdout.print cannot print out non-ascii character. A 
common fix is to use Django's smart_str. 
Reference : 
http://www.saltycrane.com/blog/2008/11/python-unicodeencodeerror-ascii-codec-can
t-encode-character/

Original issue reported on code.google.com by [email protected] on 24 Feb 2011 at 4:30

Feature Request: Better Progress Output

Watching a cursor blink for 20+ minutes and *hoping* the program you're running 
is doing something and didn't just hang is a little unsettling. It's nice to be 
able to monitor (or at least get some sense) of the progress of any program 
that might take a particularly long time to complete execution (as is the case 
with boar when importing large directories).

Having a --progress switch, varying levels of verbosity (ie. -v, -vv, ..), or 
even a --debug switch (along the lines of wget) to print some sort of progress 
output to the console would be much appreciated.

Even if there's a performance hit, for those that choose to enable it, having a 
(processed/total) style progress indicator can be very nice especially on huge 
imports as otherwise there is zero way to gage how far along boar is (or that 
it is infact making progress during certain phases like the initial hash 
calculation).

Thanks :)

Original issue reported on code.google.com by [email protected] on 21 Dec 2011 at 5:15

Enhancement: Meta-Sessions (ie. supersets of existing sessions) should leverage existing .boar folders

When checking in a directory if boar finds .boar "working directory" metadata 
folders it should intelligently leverage these to greatly increase the 
efficiency of check ins of a parent directory that includes directories already 
under version control within it.

== Reasoning ==

Given the way the logging/reporting features currently work (revision count, 
etc) it seems to make the most sense to set up sessions on a "per project" 
basis. Since file changes are usually localized on a per project basis it makes 
sense to be able to "check in" changes to a project (ie work_john.doe_website, 
rev 3) rather than checking in a giant global folder every time (ie websites, 
rev 2042). When using a global session/folder that includes many subprojects 
(like a folder called pictures, or websites, etc) it quickly becomes unwieldy 
to try to track down revisions for a specific project since there is a lot of 
background noise for other projects. That being said, it *is* nice to be able 
to have a "global" view of such folders sometimes. Even if you have subfolders 
in your pics folder or your website folder, etc to correspond to specific 
subprojects, sometimes there are files that are *only* in the root of the pics 
folder or are not specific to any particular project, or furthermore, if you 
ever reorganize the structure of the pics/websites/etc folder it's nice to be 
able to have it under version control so you can return to the old layout if 
you decide you don't like the new one.

With all of this in mind, I realize that the "offset" function is designed to 
allow  the check-in of subfolders, but commits/etc are all still mashed 
together under a single session, there is no way to filter on a "per 
project/subfolder" basis atm. Also, it's nice to have a project exist as an 
atomic work dir that you can freely move and plop down wherever you want on 
your filesystem and be able to continue making check-ins (since paths are based 
at the work dir level) without worrying that it is no longer an "offset" from 
the master pics/websites/etc folder if you decide you want move it to some new 
parent directory.

So, ultimately, per project sessions are just fine, but they don't obviate the 
usefulness of sessions to track much larger folders that are a superset of many 
smaller sessions (and which also include files that *don't* exist in any other 
session).

This bring me back around to the original feature request. If you just spent 
several days checking in a bunch of sessions/directories to correspond with all 
your photoshoots, or video projects, or whatever project category you happened 
to use to draw session boundaries with, and at the end of it all you want to 
create a "master" snapshot of the entire "photos" folder to catch/track any 
stray files that aren't included in existing sessions (and to track the global 
layout of your files within that folder), then it should *not* take another 
several days to import directories for which there are already bloblist/cache 
files (defeats the point of precomputing such things if they aren't used later).

Original issue reported on code.google.com by [email protected] on 20 Jan 2012 at 2:11

Small bugs

In function check_in_file in file workdir.py (line 464) all occurances of 
"path" variable should be replaced with "abspath" or "sessionpath".

Original issue reported on code.google.com by [email protected] on 1 Jan 2012 at 7:30

Patch for /boarmount

boarmount now checks for sufficient arguments. Otherwise an ugly python 
exception is shown, because sys.argv[1:3] barfs if there aren't enough 
arguments.

Original issue reported on code.google.com by [email protected] on 10 Mar 2011 at 11:20

Attachments:

boarmount.patch

Unhandled Exception when Updating

When boar can't update a file, an unhandled exception occurs and stops the 
process.

What steps will reproduce the problem?
1. Attept to update a directory containing files with the System and Hidden 
attributes.
2. Observe the exception.

What is the expected output? What do you see instead?
I expect the action taken to be configurable, (ie. Can't update Thumbs.db, 
Skip/All/Ignore/Fail)


The Traceback is as follows:
Updating: photos/2009/100CANON/Thumbs.db
Traceback (most recent call last):
  File "C:\Python26\boar\boar", line 498, in <module>
    main()
  File "C:\Python26\boar\boar", line 470, in main
    cmd_update(args[1:])
  File "C:\Python26\boar\boar", line 280, in cmd_update
    wd.update(new_revision = options.revision)
  File "C:\Python26\boar\workdir.py", line 141, in update
    fetch_blob(front, b['md5sum'], target_abspath, overwrite = True)
  File "C:\Python26\boar\workdir.py", line 482, in fetch_blob
    f = open(target_path, "wb")
IOError: [Errno 13] Permission denied: 
u'E:\\Pictures\\photos/2009/100CANON/Thumbs.db'

What platform are you using? (Windows XP, Windows 7, Linux, ...)
Windows 7 Ultimate

What version of Python are you using?
Python 2.6.5

What version of boar are you using? (Mercurial change id or daily build date)
Daily Build 2/13

Original issue reported on code.google.com by [email protected] on 21 Feb 2011 at 3:00

Boar should detect md5 hash collisions

Boar uses the 128 bit md5 checksum algorithm. The odds against an accidental 
collision (two different files having the same checksum) are truly astronomical 
(if you have 10 000 000 000 files, the risk of at least one collision is about 
10^-19). However, md5 does have a weaknesses that makes it possible to 
construct collisions intentionally. This feature is therefore mostly a security 
issue, since accidental collisions are rare enough.

Collisions will cause problems, as boar currently assumes that files with the 
same md5 checksum are always identical. Most likely, one of the files causing 
the collision will be lost. 

Boar should prevent such problems by storing an alternative checksum (maybe 
some variant of SHA) for every stored file, and use this to make sure that 
files with the same md5 checksum are truly identical. There will be no attempt 
at making the boar repository store md5 collisions. If a collision is found 
during an import or checkin, boar will abort the operation and print an error 
message.

Original issue reported on code.google.com by [email protected] on 23 Mar 2011 at 8:34

cleanup empty directories

1. rm a dir [with subdirs] in WorkingCopyA
2. boar co
3. cd corkingCopyB; boar update

What is the expected output? 
In WorkingCopyB: same structure as WorkingCopyA

What do you see instead?
In WorkingCopyB: old empty dirs

What platform are you using? (Windows XP, Windows 7, Linux, ...)
Ubuntu natty

What version of Python are you using?
2.7
What version of boar are you using? (Mercurial change id or daily build
date)
BOAR_VERSION = "boar-daily.11-Jul-2011"

It's nasty do to do it myself, because I always have to check if evereythin 
really is empty before deleting. I'd like to see that boar worked correctly.

great tool!

Original issue reported on code.google.com by [email protected] on 7 Oct 2011 at 3:49

Boar should be able to handle old read-only repositories

What steps will reproduce the problem?
1. Create a repository with a repo v0 version of boar (boar-daily.11-Jul-2011 
or earlier)
2. Make the repository read-only by changing the permissions or burn it to a 
dvd.
3. Try to check out the contents of the repository.

What is the expected output? What do you see instead?

The operation should complete normally. Instead, an exception occurs because 
the repository can't be upgraded to the current format due to write protection.

A workaround is to copy the repository to a location where it can be modified 
to allow the repository to be upgraded.

Original issue reported on code.google.com by [email protected] on 8 Oct 2011 at 11:14

"cleanup" command to clean up the tmp directory

When an import or checkin is taking place, the incomplete data is stored under 
tmp/ in the repository directory. If the operation is aborted, the data stays 
in the tmp directory indefinitely. There should be a "cleanup" command to 
delete old files from that directory. 

A manual workaround is to make sure that no boar command is currently running, 
and then simply delete the contents of the tmp directory. 

Automatic cleanup might be implemented in the future, but this item only covers 
an explicit cleanup command.

Original issue reported on code.google.com by [email protected] on 1 Sep 2011 at 8:08

exception when giving wrong number of arguments to boarmount

boarmount now checks for sufficient arguments. Otherwise an ugly python 
exception is shown, because sys.argv[1:3] barfs if there aren't enough 
arguments.

Original issue reported on code.google.com by [email protected] on 10 Mar 2011 at 10:34

Attachments:

boarmount.patch

UnicodeEncodeError

What steps will reproduce the problem?
1. Update a directory with files that have special unicode character in their 
names

What is the expected output? What do you see instead?
Boar should update normally. Instead we see UnicodeEncodeError

What platform are you using? (Windows XP, Windows 7, Linux, ...)
Windows 7. I tested on my Ubuntu and this doesn't occur, it seems that the 
default local in my Ubuntu is some kind of unicode instead of ascii.

What version of Python are you using?
2.6

What version of boar are you using? (Mercurial change id or daily build
date)
    403cbb6bc635

Please provide any additional information below.

Original issue reported on code.google.com by [email protected] on 14 Apr 2011 at 2:01

Boar should have an option for purging old revisions

Unlike a normal revision control software that deal mostly with text files, 
boar is designed for large multimedia files or other types of binary files such 
as MSOffice documents and other proprietary formats. While we do want to keep 
some revisions (e.g. for important documents) , we normally don't want to keep 
ALL revision, especially when major reorganization of repositories occur. For 
instance, we may want to move a large directory, says 100gb out of the 
repository, and we're mostly certainly sure that we don't ever want it back, 
there's no point in keeping that directory in the history of the repository.

So with this reasoning, I propose a purge command that removes all history from 
a certain revision and before. Boar should be able to make it as if a certain 
revision in the middle is the initial import.

Original issue reported on code.google.com by [email protected] on 13 Mar 2011 at 9:22

Negation syntax or "include" syntax for new import "ignore" feature

This might just be a matter of me having trouble with the appropriate syntax to 
accomplish what I'd like, but ultimately, I'd like to be able to say "ignore 
all files *except* .xyz".

For instance, if I wanted to only import video files or image files from.

The only solution I was able to find about doing this in svn (as you mentioned 
it uses similar syntax) was here: http://www.thoughtspark.org/node/38

The syntax seemed rather clunky. Is there any chance you could perhaps 
implement a "+/-" syntax for inclusion / exclusion of a file mask. The idea 
being someone could type:

-*
+*.xyz

The masks get in sequential order, so first you "exclude" everything and then 
you "include" files ending in ".xyz". This is just an example of a way the 
inclusion/exclusion filter might work (this is how HTTRACK implements file 
masks, but again, it's just one possible way to implement it).

Thanks

Original issue reported on code.google.com by [email protected] on 23 Jun 2011 at 2:53

AssertionError: Offset was: workdir Path was: workdir_longer/file

What steps will reproduce the problem?
1. Import 2 directories with one directory's name is part of the other
2. Modify something in the longer-named directory
3. Go to the short-named directory and update

Run issue.sh to simulate this situation

What is the expected output? What do you see instead?
Boar should update normally, instead it spit out: 

AssertionError: Offset was: workdir Path was: workdir_longer/file


What platform are you using? (Windows XP, Windows 7, Linux, ...)
Ubuntu 10

What version of Python are you using?
Python 2.6

What version of boar are you using? (Mercurial change id or daily build
date)
70b62d23db  

Please provide any additional information below.
My attempt: issue9.patch

Original issue reported on code.google.com by [email protected] on 11 Mar 2011 at 1:46

Attachments:

Unreadable directories silently ignored

What steps will reproduce the problem?
1. Create a small test file tree containing an unreadable sub directory (chmod 
a-rx)
2. Import the tree with boar.

What is the expected output? What do you see instead?

The import goes without problems. Boar should indicate in some way that not the 
entire tree could be imported.

This problem is not easy to fix in python earlier than 3.0, because of the way 
the os.path.walk function behaves. There is no way to handle errors in a custom 
way, and the default handling is to ignore unreadable directories. Implementing 
a pure python walker would be a solution, but is likely to be too slow to be 
worth it.

Original issue reported on code.google.com by [email protected] on 7 Aug 2011 at 1:43

harmy / boar Goto Github PK

boar's People

Contributors

Watchers

boar's Issues

Recommend Projects

Recommend Topics

Recommend Org