Giter VIP home page Giter VIP logo

xopen's Introduction

https://img.shields.io/pypi/v/xopen.svg?branch=master

xopen

This small Python module provides an xopen function that works like the built-in open function, but can also deal with compressed files. Supported compression formats are gzip, bzip2 and xz. They are automatically recognized by their file extensions .gz, .bz2 or .xz.

The focus is on being as efficient as possible on all supported Python versions. For example, xopen uses pigz, which is a parallel version of gzip, to open .gz files, which is faster than using the built-in gzip.open function. pigz can use multiple threads when compressing, but is also faster when reading .gz files, so it is used both for reading and writing if it is available.

This module has originally been developed as part of the Cutadapt tool that is used in bioinformatics to manipulate sequencing data. It has been in successful use within that software for a few years.

xopen is compatible with Python versions 3.5 to 3.8.

Usage

Open a file for reading:

from xopen import xopen

with xopen('file.txt.xz') as f:
    content = f.read()

Or without context manager:

from xopen import xopen

f = xopen('file.txt.xz')
content = f.read()
f.close()

Open a file in binary mode for writing:

from xopen import xopen

with xopen('file.txt.gz', mode='wb') as f:
    f.write(b'Hello')

Credits

The name xopen was taken from the C function of the same name in the utils.h file which is part of BWA.

Kyle Beauchamp <https://github.com/kyleabeauchamp/> has contributed support for appending to files.

Ruben Vorderman <https://github.com/rhpvorderman/> contributed improvements to make reading gzipped files faster.

Benjamin Vaisvil <https://github.com/bvaisvil> contributed support for format detection from content.

Some ideas were taken from the canopener project. If you also want to open S3 files, you may want to use that module instead.

Changes

v0.9.0

  • When the file name extension of a file to be opened for reading is not available, the content is inspected (if possible) and used to determine which compression format applies.
  • This release drops Python 2.7 and 3.4 support. Python 3.5 or later is now required.

v0.8.4

  • When reading gzipped files, force pigz to use only a single process. pigz cannot use multiple cores anyway when decompressing. By default, it would use extra I/O processes, which slightly reduces wall-clock time, but increases CPU time. Single-core decompression with pigz is still about twice as fast as regular gzip.
  • Allow threads=0 for specifying that no external pigz/gzip process should be used (then regular gzip.open() is used instead).

v0.8.3

  • When reading gzipped files, let pigz use at most four threads by default. This limit previously only applied when writing to a file.
  • Support Python 3.8

v0.8.0

  • Speed improvements when iterating over gzipped files.

v0.6.0

  • For reading from gzipped files, xopen will now use a pigz subprocess. This is faster than using gzip.open.
  • Python 2 support will be dropped in one of the next releases.

v0.5.0

  • By default, pigz is now only allowed to use at most four threads. This hopefully reduces problems some users had with too many threads when opening many files at the same time.
  • xopen now accepts pathlib.Path objects.

Author

Marcel Martin <[email protected]> (@marcelm_ on Twitter)

Links

xopen's People

Contributors

bernt-matthias avatar bvaisvil avatar kyleabeauchamp avatar marcelm avatar rhpvorderman avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.