Giter VIP home page Giter VIP logo

bagit's Introduction

bagit-python

Build Status

bagit is a Python library and command line utility for working with BagIt style packages.

Installation

bagit.py is a single-file python module that you can drop into your project as needed or you can install globally with:

pip install bagit

Python v2.4+ is required.

Command Line Usage

When you install bagit you should get a command line program called bagit.py which you can use to turn an existing directory into a bag:

bagit.py --contact-name 'John Kunze' /directory/to/bag

You can pass in key/value metadata for the bag using options like --contact-name above, which get persisted to the bag-info.txt. For a complete list of bag-info.txt properties you can use as commmand line arguments see --help.

Since calculating checksums can take a while when creating a bag, you may want to calculate them in parallel if you are on a multicore machine. You can do that with the --processes option:

bagit.py --processes 4 /directory/to/bag

If you would like to validate a bag you can use the --validate flag.

bagit.py --validate /path/to/bag

If you would like to take a quick look at the bag to see if it seems valid by just examining the structure of the bag, and comparing its payload-oxum (byte count and number of files) then use the --fast flag.

bagit.py --validate --fast /path/to/bag

Python Usage

You can also use bagit programatically in your own Python programs. To create a bag you would do this:

import bagit
bag = bagit.make_bag('mydir', {'Contact-Name': 'John Kunze'})

make_bag returns a Bag instance. If you have a bag already on disk and would like to create a Bag instance for it, simply call the constructor directly:

import bagit
bag = bagit.Bag('/path/to/bag')

If you would like to see if a bag is valid, use its is_valid method:

bag = bagit.Bag('/path/to/bag')
if bag.is_valid():
    print "yay :)"
else:
    print "boo :("

If you'd like to get a detailed list of validation errors, execute the validate method and catch the BagValidationError exception. If the bag's manifest was invalid (and it wasn't caught by the payload oxum) the exception's details property will contain a list of ManifestErrors that you can introspect on. Each ManifestError, will be of type ChecksumMismatch, FileMissing, UnexpectedFile.

So for example if you want to print out checksums that failed to validate you can do this:

import bagit

bag = bagit.Bag("/path/to/bag")

try:
  bag.validate()

except bagit.BagValidationError, e:
  for d in e.details:
    if isinstance(d, bag.ChecksumMismatch):
      print "expected %s to have %s checksum of %s but found %s" % \
        (e.path, e.algorithm, e.expected, e.found)

Development

% git clone git://github.com/LibraryOfCongress/bagit-python.git
% cd bagit-python
% python test.py

If you'd like to see how increasing parallelization of bag creation on your system effects the time to create a bag try using the included bench utility:

% ./bench.py

License

cc0

bagit's People

Contributors

edsu avatar jobyh avatar mjgiarlo avatar bhspitmonkey avatar acdha avatar steffenfritz avatar bmannix avatar dchud avatar dbrunton avatar ruebot avatar

Watchers

Steffen Fritz avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.