Co-SELECT

The Co-SELECT pipeline uses pyhton based doit software to automate repetitive tasks of analyzing sequencing data on multiple rounds (alternatively cycles used in many of the scripts) of multiple TF experiments. It would be good to have a basic knowledge of doit, and obviously, it should be installed.

Directory structure

The top level directory has the following subdirectories:

DNAShapeR - The DNAShapeR program is slightly modified for our need. We particularly need the executable DNAShapeR/src/dnashpe to generate the shape values from the oligo sequences.
src - This subdirectory contains all the source codes of Co-SELECT.
downloads - This subdirectory should have all the gzipped fastq files for the HT-SELECT experiments downloaded from the ENA website.
data - This subdirectory contains all intermediate files generated by Co-SELECT. It must have enough space! On our system it takes about 3TB for the 131 TF experiments that we analyzed.
results - The results of Co-SELECT are kept here.

The subdirectory src also contain the following files which are essential:

PRJEB14744.txt - It gives the details of the sequencing data of the project PRJEB14744 in ENA. We have mostly used the following columns: run_accession, fastq_ftp, and submitted_ftp.
PRJEB14744_nonzero_cycle.csv - It maps the TF experiment (and barcode/primer if there are multiple experiments for the same TF) to the accession number of the project PRJEB14744 in ENA.
PRJEB14744_zero_cycle.csv - It maps the initial pool (round 0) of the experiments (multiple experiment may share the same initial pool) to the accession number of the project PRJEB14744 in ENA.
tf_inventory_jolma_ronshamir.csv - It contains all information that we could glean from the previous two papers on the dataset.
tf_coremotif.csv - It gives the coremotifs that we used for the experiments. One may change the coremotifs and try rerunning the complete Co-SELECT analysis.
tf_run_coselect.csv - This gives the list of experiments on which Co-SELECT has to be run.

Downloading the dataset

The example script dodo_downloads.py can be used for downloading all the experiments. Note that downloading all the datasets will require 136G disk space.

Note that all the doit task files are kept in src directory. Hence we would need to change current working directory to src.

$ cd src

$ doit -f dodo_downloads.py

The download can be made faster using multiple processes, say n=10, as follows:

$ doit -n 10 -f dodo_downloads.py

Preprocessing the round 0 datasets

Co-SELECT needs to compute the round 0 probabilities using a simple Markov model. This is done by invoking the following command:

$ doit -n 50 -f dodo_round0.py

Analyzing TF experiments using Co-SELECT

Co-SELECT analysis on the selected TF experiments configured through the file tf_run_coselect.csv can be done by invoking the following command:

$ doit -n 50 -f dodo_analyze.py

Generating the summary reports and plots of the analysis results

The comparison of experiment vs control groups in Co-SELECT analysis and the generation of results is done by invoking the following command:

$ doit -n 50 -f dodo_results.py

The summary results and plots are saved at ../results directory.

Generating the promiscuous shapemers

The promiscuity of shapemers in the motif-free oligos can be computed and the corresponding plot as in our paper can be generated by invoking the following command:

$ doit -f dodo_promiscuous.py

The lists of highly promiscuous shapemers and and plots are saved at ../results directory.

global19 / co-select Goto Github PK

co-select's Introduction

Co-SELECT

Directory structure

Downloading the dataset

Preprocessing the round 0 datasets

Analyzing TF experiments using Co-SELECT

Generating the summary reports and plots of the analysis results

Generating the promiscuous shapemers

co-select's People

Contributors

Watchers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent