Giter VIP home page Giter VIP logo

global19 / co-select Goto Github PK

View Code? Open in Web Editor NEW

This project forked from ncbi/co-select

0.0 2.0 0.0 3.48 MB

This repository contains the source code of Co-SELECT, a computational tool to analyze the results of in vitro HT-SELEX experiments for TF-DNA binding to show the role of DNA shape in TF-DNA binding by using a novel method of deconvoluting the contributions of DNA sequence and DNA shape on the binding.

R 14.16% Assembly 48.64% Makefile 0.05% C++ 18.23% Python 18.58% Shell 0.32% Rebol 0.01%

co-select's Introduction

Co-SELECT

The Co-SELECT pipeline uses pyhton based doit software to automate repetitive tasks of analyzing sequencing data on multiple rounds (alternatively cycles used in many of the scripts) of multiple TF experiments. It would be good to have a basic knowledge of doit, and obviously, it should be installed.

Directory structure

The top level directory has the following subdirectories:

  • DNAShapeR - The DNAShapeR program is slightly modified for our need. We particularly need the executable DNAShapeR/src/dnashpe to generate the shape values from the oligo sequences.
  • src - This subdirectory contains all the source codes of Co-SELECT.
  • downloads - This subdirectory should have all the gzipped fastq files for the HT-SELECT experiments downloaded from the ENA website.
  • data - This subdirectory contains all intermediate files generated by Co-SELECT. It must have enough space! On our system it takes about 3TB for the 131 TF experiments that we analyzed.
  • results - The results of Co-SELECT are kept here.

The subdirectory src also contain the following files which are essential:

  • PRJEB14744.txt - It gives the details of the sequencing data of the project PRJEB14744 in ENA. We have mostly used the following columns: run_accession, fastq_ftp, and submitted_ftp.
  • PRJEB14744_nonzero_cycle.csv - It maps the TF experiment (and barcode/primer if there are multiple experiments for the same TF) to the accession number of the project PRJEB14744 in ENA.
  • PRJEB14744_zero_cycle.csv - It maps the initial pool (round 0) of the experiments (multiple experiment may share the same initial pool) to the accession number of the project PRJEB14744 in ENA.
  • tf_inventory_jolma_ronshamir.csv - It contains all information that we could glean from the previous two papers on the dataset.
  • tf_coremotif.csv - It gives the coremotifs that we used for the experiments. One may change the coremotifs and try rerunning the complete Co-SELECT analysis.
  • tf_run_coselect.csv - This gives the list of experiments on which Co-SELECT has to be run.

Downloading the dataset

The example script dodo_downloads.py can be used for downloading all the experiments. Note that downloading all the datasets will require 136G disk space.

Note that all the doit task files are kept in src directory. Hence we would need to change current working directory to src.

$ cd src
$ doit -f dodo_downloads.py

The download can be made faster using multiple processes, say n=10, as follows:

$ doit -n 10 -f dodo_downloads.py

Preprocessing the round 0 datasets

Co-SELECT needs to compute the round 0 probabilities using a simple Markov model. This is done by invoking the following command:

$ doit -n 50 -f dodo_round0.py

Analyzing TF experiments using Co-SELECT

Co-SELECT analysis on the selected TF experiments configured through the file tf_run_coselect.csv can be done by invoking the following command:

$ doit -n 50 -f dodo_analyze.py

Generating the summary reports and plots of the analysis results

The comparison of experiment vs control groups in Co-SELECT analysis and the generation of results is done by invoking the following command:

$ doit -n 50 -f dodo_results.py

The summary results and plots are saved at ../results directory.

Generating the promiscuous shapemers

The promiscuity of shapemers in the motif-free oligos can be computed and the corresponding plot as in our paper can be generated by invoking the following command:

$ doit -f dodo_promiscuous.py

The lists of highly promiscuous shapemers and and plots are saved at ../results directory.

co-select's People

Contributors

soumitrakp avatar

Watchers

James Cloos avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.