Giter VIP home page Giter VIP logo

ideeps's Introduction

iDeepS

iDeepS aims to discover Sequence-Structure Motifs, it trains deep learning models to infer binding sequence and structure motifs from sequences simultaneously. We first encode the sequence and secondary structure into one-hot encoding, which are further fed into CNNs to learn abstract motif features. then we use bidirectional LSTM to capture the long range dependencies between binding sequence and structure motifs identified by CNNs. Finally the learned abstract features are fed into classification layer to predict RBP binding sites on RNAs. We comprehensively evaluate iDeepS on verified RBP binding sites derived from large-scale representative CLIP-seq datasets.

Dependency

python 2.7
NumPy v1.6.1
Scipy v0.9
keras v1.1.2 library and its backend is theano 0.9.0
sklearn v0.17
EDeN NOTE: use our uploaded EDeN.zip, decompress it and install it locally. The code structure of latest EDeN is completely changed and it does not work with our code.
RNAshapes

Content

./datasets: the training and testing dataset with sequence and label indicating it is binding sites or not
./motif/seq_cnn: detected binding sequence motifs from iDeepS, and we also report the matched known motifs uusing TOMTOM and motif enrichment analysis using AME in MEME Suite.
./motif/structure_cnn: detected binding structure motifs from iDeepS. It also reports the motif enrichment analysis using AME in MEME Suite
./ideeps.py: the python code, it can be ran to reproduce our results.

Usage

python ideeps.py [-h] [--data_file <data_file>] [--train TRAIN]
[--model_dir MODEL_DIR] [--predict PREDICT]
[--out_file OUT_FILE] [--batch_size BATCH_SIZE] [--n_epochs N_EPOCHS] [--motif MOTIF] [--motif_dir MOTIF_DIR]

where the input training file should be sequences.fa.gz with label info in each head per sequence.

Use example

1. Train the model using your data (currently only support fix-length sequences):
python ideeps.py --train=True --data_file=datasets/clip/10_PARCLIP_ELAVL1A_hg19/30000/training_sample_0/sequences.fa.gz --model_dir=models

--model_dir: the dir used to save the trained model, which is used for prediction step.
--data_file: the training sequence file sequences.fa.gz with label informaiton in the head.

NOTICE: When you run iDeepS, please make sure there are no empty structure.gz in corresponding dir. otherwise it will give the error:
File "ideeps.py", line 163, in read_structure
structure = struct_dict[old_name[:-1]]
KeyError: '> chr1,+,44951749,44951849; class:0'

2. Predict the binding probability for your sequences (you need use the same dir for saved models in training step):
python ideeps.py --predict=True --data_file=datasets/clip/10_PARCLIP_ELAVL1A_hg19/30000/test_sample_0/sequences.fa.gz --model_dir=models --out_file=YOUR_OUTFILE

--model_dir: The saved dir for models in training step.
--data_file: configure your testing sequence file sequences.fa.gz.

Need install WebLogo (http://weblogo.berkeley.edu/) and TOMTOM in MEME Suite(http://meme-suite.org/doc/download.html?man_type=web) to search identifyed motifs
3. Identify the binding sequence-structure motifs (you need use the same dir for saved models in training step):
python ideeps.py --motif=True --data_file=datasets/clip/10_PARCLIP_ELAVL1A_hg19/30000/test_sample_0/sequences.fa.gz --model_dir=models --motif_dir=YOUR_MOTIF_DIR

--model_dir: The saved dir for models in training step, you must specify the trained model.
--data_file: configure your sequence file sequences.fa.gz to identify binding sequence-structure motifs.



Update

We also update iDeepS to iDeepS2, which can handle vairbale lengths and the sequences and structures are encoded into one-hot encoding vector. You can dowbload iDeepS2 from https://github.com/xypan1232/iDeepS2.

Reference

Xiaoyong Pan^, Peter Rijnbeek, Junchi Yan, Hong-Bin Shen^. Prediction of RNA-protein sequence and structure binding preferences using deep convolutional and recurrent neural networks. BMC Genomics, 2018, 19:511.

Contact: Xiaoyong Pan (xypan172436atgmail.com)

ideeps's People

Contributors

xypan1232 avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.