Giter VIP home page Giter VIP logo

nvwa's Introduction

NvWA

Code used for Deep learning of cross-species single-cell landscapes identifies conserved regulatory programs underlying cell types

Nvwa, a deep learning–based strategy, to predict expression landscapes and decipher regulatory elements (Filters) at the single-cell level.

Requirements

  • Python packages
h5py >= 2.7.0
numpy >= 1.14.2
pandas == 0.22.0
scipy >= 0.19.1
pyfasta >= 0.5.2
torch >= 1.0.0
captum

Descriptions

  • 0_preproc_dataset for process dataset
  • 1_train for init, train and test models
  • 1_train/utils.py contains model architecture
  • 2_explain for explain models
  • 2_explain/explainer.py contains model explainer
  • 3_application for predicting genomic tracks
  • main examples for run model in each species
  • Analysis_plotting analysis and plotting function
  • Results results of Nvwa analysis

Detailed description on Results Folder for Reproducing Figures

  • Test_Metrics AUROC and AUPR Metric values on held-out test set for eight species
  • scATAC_overlap_test Permutation test results of Nvwa whole-genome prediction and experimental functional genomics data
  • Filters Property information of filters/motifs for eight species
  • Filter_Annotation filters/motifs annotation results of TomTom agains known motif database
  • Influe Influence scores (the fold-change of in-silico filter nullification on predictions)
  • Influe_celltype detailed analysis of Influence scores
  • Species_motif_hit.csv homologous Filters/motifs identified by TomTom among eight species
  • tomtom_DBtfmodiscoTrimmed_NvwaConv1.html comparison of tfmodisco motifs and Nvwa featuremap-based motifs

For reproducing the Nvwa analysis from scratch, we recomand reading the dmel.sh in main folder, and downloading the drosophila dataset from the url below.

Datasets for eight species

We provided single cell labels for eight species in http://bis.zju.edu.cn/nvwa/dataset.html.

For the single cell labels, we provided the expression label, and corresponding cell, gene informations. The ready-to-use machine learning dataset were also publically accessed, which were paired with one-hot sequence, cell annotation information and split into train, validation, test set. The detailed preprocessing procedures were also described step by step.

Running Nvwa

Example

python 1_train/1_hyperopt_BCE_best.py ./Dataset.Dmel_train_test.h5
python 1_train/1_hyperopt_BCE_best.py ./Dataset.Dmel_train_test.h5 --mode test
python 2_explain/1_run_explain.py ./Dataset.Dmel_train_test.h5

Details

./Dataset.Dmel_train_test.h5: example of Dataset.h5 file

./1_train/1_hyperopt_BCE_best.py: for init, train and test models

--mode: mode choice for train, test, test_all_gene

2_explain/1_run_explain.py: for explain models

--help: print help info.

Note

Nvwa is now more like in-house scripts for reproducing our work, if you find any problem running Nvwa code, please contant me. If you run into errors loading trained model weights files, it is likely the result of differences in PyTorch or CUDA toolkit versions.

NvTK (NvwaToolKit, https://github.com/JiaqiLiZju/NvTK), a more systemmatic software is under acitivate development. It will support modern deep learning achitectures in genomics, such as ResNet, Attention Module, and Transformer. I recommend to use NvTK for generating your own model.

Citation

Please cite the corresponding protocol published concurrently to this repository:

Jiaqi Li, Jingjing Wang, Peijing Zhang, Renying Wang, Yuqing Mei, Zhongyi Sun, Lijiang Fei, Mengmeng Jiang,Lifeng Ma, Weigao E, Haide Chen,Xinru Wang, Yuting Fu, Hanyu Wu, Daiyuan Liu, Xueyi Wang, Jingyu Li, Qile Guo, Yuan Liao, Chengxuan Yu, Danmei Jia, Jian Wu, Shibo He, Huanju Liu, Jun Ma, Kai Lei, Jiming Chen, Xiaoping Han, Guoji Guo. Deep learning of cross-species single-cell landscapes identifies conserved regulatory programs underlying cell types. Nature Genetics, 2022. DOI: 10.1038/s41588-022-01197-7

nvwa's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

nvwa's Issues

Modeling mutations

Hello,

Thank you for sharing all of your code here. The paper was great.

Since you model is able to predict gene expression from DNA sequence, I would be interested to test what the effect of certain mutations are on the predictions. Especially at the level of single cell transcriptomes. Could we please briefly describe to me how I can go about using your model to accomplish this?

Thanks

Training data

Hi Jiaqi,

Since you obtained the sequence of each gene from the reference genome based on coordinates, the sequences of the same gene in different cells are completely identical. However, their gene expression may vary. How does NVWA handle this situation?

Thanks!

Fig4.b predicted signal

Hi Jiaqi,
It is cool that the predicted signal is highly consistent with other epigenetic signals the model never see.
But, as Nvwa is a classifier for gene expression, how to make the model predict such signal ?
Are these signal saliency score?

Thanks!

No 1_MAGIC.py file

Hi Jiaqi,

Thank you for such great work!

I tried to run 0_preproc_dataset/main.sh, but there's no file with name 1_MAGIC.py. Should it be 1_MAGIC_MCA.pyinstead?

And get these errors after changing 1_MAGIC.py to 1_MAGIC_MCA.py
Screen Shot 2023-01-08 at 2 50 38 AM
Can you help me with these problems? Thank you!

software availability

Hi Jiaqi,
Your project is pretty cool
Are you considering developing a usable python package?
I find that some packages are difficult to install...

data shape

Hi Li,

Congrats on the paper really enjoyed the read.

I was having a play with the training data you kindly shared and noticed the sequence length for those is 20 kb:

e.g.
train_data
(34464, 4, 20000)

opposed to the 10 kb mentioned in the documentation.
Am I right in assuming that those 20 kb windows are also centered on the TSS?

Cheers

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.