chloroExtractor

Introduction

The chloroExtractor is a perl based program which provides a pipeline for DNA extraction of chloroplast DNA from whole genome plant data. Too huge amounts of chloroplast DNA can cast problems for the assembly of whole genome data. One solution for this problem can be a core extraction before sequencing, but this can be expensive. The chloroExtractor takes your whole genome data and extracts the chloroplast DNA, so you can have your different DNA separated easily by the chloroExractor. Furthermore the chloroExtractor takes the chloroplast DNA and tries to assemble it. This is possible because of the preserved nature of the chloroplasts primary and secondary structure. Through k-mer filtering the k-mers which contain the chloroplast sequences get extracted and can then be used to assemble the chloroplast on a guided assembly with several other chloroplasts.

Requirements

Required Software

Required Perl modules

Installation

Install the requirements then clone the directory recursively

git clone --recursive https://github.com/chloroExtractorTeam/chloroExtractor

Docker

Our chloroExtractor is also available as a docker image. Running chloroExtractor using that image requires the installation of docker and the permission to execute the docker commands. The data are mapped into the container as a volumne under /data. Our chloroExtractor will be ran with /data as working directory. Therefore, the output files will be stored inside the directory which was mapped into the container. In case you are not using a user mapping, chloroExtractor will run with root priveleges and all created files will belong the root user. For further information about docker and its security implications please visit their website.

docker pull chloroextractorteam/chloroextractor
docker run -v /location-of-input-data:/data --rm chloroextractorteam/chloroextractor -1 first_read.fq -2 second_read.fa [other options]

Usage

To use the chloroExtractor, use the ptx executable in the bin/ folder

./ptx --help

or use the docker container:

docker run -v /location-of-input-data:/data --rm chloroextractorteam/chloroextractor --help

It returns a list of all mandatory parameters and optional setting.

$ ./ptx [<OPTIONS>] -1 <FQ_1> -2 <FQ_2> -d <OUTPUT-DIRECTORY>

Options:
    -1|--reads
        Input reads file, first of pair.

    -2|--mates
        Input reads file, second of pair

    -d|--dir [ptx]
        Path to a working directory. Will be created. If exists, needs to be
        empty.

    --create-config
        Create a config file with default settings for user customization.

    -c|--config
        Use user customized config file. Superseeds default config.

    --continue=[TASKID TASKID ...] [TRUE]
        By default, the pipeline will check for a incomplete previous run
        and if possible continue after the last successful task of that run.
        Additionally you may provide task ids to specify a specific task -
        instead of the last task - to continue from.

    --redo [FALSE]
        Force pipeline to restart from the beginning, ignoring and
        overwriting previous results. Supersedes --continue.

    --stop-after=<TASKID>
        Stop the pipeline after the specified task.

    --skip=<TASKID/PATTERN TASKID/PATTERN ...>
        Skip specified tasks or tasks matching specified patterns (perl
        regex). If other tasks request results from skipped tasks, the
        pipeline will try to reuse results from previous runs. You need to
        take care, that these results still make sence in the current run.

    -V|--version
        Display version.

    -h|--help
        Display this help.

All the Options can and should be handled with the configuration file ptx.cfg, which is located in the mainfolder. With this config file you can handle the options for each step and task individual. On default the chloroExtractor uses this config file, you can edit these one, or make your own one and uses the -c parameter to use it.

$ ./ptx -c ownptx.cfg -1 FQ_1 -2 FQ_2

Input data

The chloroExtractor uses unsortet Fastq files with paired end reads. Please make sure your reads are not sortet at all, otherwise there could be problems or even wrong results.

Example

An example data set can be downloaded from zenodo. As example we download the dataset into a folder and run chloroExtractor with the input files.

For preparation, a folder will be created and an example dataset will be downloaded:

# create a folder for the testrun
mkdir -p /tmp/chloroExtractor-testrun
cd /tmp/chloroExtractor-testrun

# download the example set and extract the sequencing reads
wget 'https://zenodo.org/record/884449/files/SRR5216995_1M.tar.bz2' -O - | tar xjf -

Afterwards, chloroExtractor can be run in command line mode:

# run chloroExtractor via command line (assuming all dependencies are installed and ptx folder is in PATH)
ptx -1 SRR5216995_1M_1.fastq -2 SRR5216995_1M_2.fastq
[17-09-21 13:42:42] [PipeWrap] Running ptx from the beginning, no previous runs detected.
[17-09-21 13:42:42] [PipeWrap] Running 'jf0': jellyfish count -t 8 -m 31 -s 500M -C -o jf0.jf /data/SRR5216995_1M_1.fastq /data/SRR5216995_1M_2.fastq
[...]

or using the docker container:

# other possibility is docker container based chloroExtractor (assuming that the user is allowed to run docker)
docker pull chloroextractorteam/chloroextractor # ensure the latest version from docker hub
docker run -v /tmp/chloroExtractor-testrun:/data --rm chloroextractorteam/chloroextractor -1 SRR5216995_1M_1.fastq -2 SRR5216995_1M_2.fastq
[17-09-21 13:52:30] [PipeWrap] Running ptx from the beginning, no previous runs detected.
[17-09-21 13:52:30] [PipeWrap] Running 'jf0': jellyfish count -t 8 -m 31 -s 500M -C -o jf0.jf /data/SRR5216995_1M_1.fastq /data/SRR5216995_1M_2.fastq
[...]

Both runs result in a final chloroplast assembly in the file fcg.fa.

Another more detailed example is available at our demo.

Changelog

Version 1.0.0 is archived as and used for submission to The Journal of Open Source Software

License

For License please refer to the LICENSE file

jdeligt / chloroextractor Goto Github PK