Giter VIP home page Giter VIP logo

rubra's Introduction

Rubra: a bioinformatics pipeline.
---------------------------------

https://github.com/bjpop/rubra

License:
--------

Rubra is licensed under the MIT license. See LICENSE.txt.

Description:
------------

Rubra is a pipeline system for bioinformatics workflows. It is built on top
of the Ruffus (http://www.ruffus.org.uk/) Python library, and adds support
for running pipeline stages on a distributed compute cluster.

Authors:
--------

Bernie Pope, Clare Sloggett, Gayle Philip, Matthew Wakefield

Usage:
------

usage: rubra [-h] PIPELINE_FILE --config CONFIG_FILE
                [CONFIG_FILE ...] [--verbose {0,1,2}]
                [--style {print,run,touchfiles,flowchart}] [--force TASKNAME]
                [--end TASKNAME] [--rebuild {fromstart,fromend}]

A bioinformatics pipeline system.

optional arguments:
  -h, --help            show this help message and exit
  PIPELINE_FILE         Your Ruffus pipeline stages (a Python module)
  --config CONFIG_FILE [CONFIG_FILE ...]
                        One or more configuration files (Python modules)
  --verbose {0,1,2}     Output verbosity level: 0 = quiet; 1 = normal; 2 =
                        chatty (default is 1)
  --style {print,run,touchfiles,flowchart}
                        Pipeline behaviour: print; run; touchfiles; flowchart (default is
                        print)
  --force TASKNAME      tasks which are forced to be out of date regardless of
                        timestamps
  --end TASKNAME        end points (tasks) for the pipeline
  --rebuild {fromstart,fromend}
                        rebuild outputs by working back from end tasks or
                        forwards from start tasks (default is fromstart)

Example:
--------

Below is a little example pipeline which you can find in the Rubra source
tree. It counts the number of lines in two files (test/data1.txt and
test/data2.txt), and then sums the results together.

   rubra example_pipeline.py --config example_config.py --style run

There are 2 lines in the first file and 1 line in the second file. So the
result is 3, which is written to the output file test/total.txt.

The --pipeline argument is a Python script which contains the actual
code for each pipeline stage (using Ruffus notation). The --config
argument is a Python script which contains configuration options for the
whole pipeline, plus options for each stage (including the shell command
to run in the stage). The --style argument says what to do with the pipeline:
"run" means "perform the out-of-date steps in the pipeline". The default
style is "print" which just displays what the pipeline would do if it were
run. You can get a diagram of the pipeline using the "flowchart" style. You 
can touch all files in order using the "touchfiles" style, which is mostly 
useful for forcing Ruffus to acknowledge that a set of steps is up to date.

Configuration:
--------------

Configuration options are written into one or more Python scripts, which
are passed to Rubra via the --config command line argument.

Some options are required, and some are, well, optional.

Options for the whole pipeline:
-------------------------------

    pipeline = {
        "logDir": "log",
        "logFile": "pipeline.log",
        "procs": 2,
        "end": ["total"],
    }


Options for each stage of the pipeline:
---------------------------------------

    stageDefaults = {
        "distributed": False,
        "walltime": "00:10:00",
        "memInGB": 1,
        "queue": "batch",
        "modules": ["python-gcc"]
    }

    stages = {
        "countLines": {
            "command": "wc -l %file > %out",
        },
        "total": {
            "command": "./test/total.py %files > %out",
        },
    }

rubra's People

Contributors

bjpop avatar claresloggett avatar genomematt avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.