Giter VIP home page Giter VIP logo

deepcpg's Introduction

DeepCpG: Deep neural networks for predicting single-cell DNA methylation

Version License PyPI Docs DOI Tweet

DeepCpG1 is a deep neural network for predicting the methylation state of CpG dinucleotides in multiple cells. It allows to accurately impute incomplete DNA methylation profiles, to discover predictive sequence motifs, and to quantify the effect of sequence mutations. (Angermueller et al, 2017).

Please help to improve DeepCpG, by reporting bugs, typos in notebooks and documentation, or any ideas on how to make things better. You can submit an issue or send me an email.

DeepCpG model architecture and applications.

DeepCpG model architecture and applications.

(a) Sparse single-cell CpG profiles as obtained from scBS-seq or scRRBS-seq. Methylated CpG sites are denoted by ones, unmethylated CpG sites by zeros, and question marks denote CpG sites with unknown methylation state (missing data). (b) DeepCpG model architecture. The DNA model consists of two convolutional and pooling layers to identify predictive motifs from the local sequence context, and one fully connected layer to model motif interactions. The CpG model scans the CpG neighborhood of multiple cells (rows in b), using a bidirectional gated recurrent network (GRU), yielding compressed features in a vector of constant size. The Joint model learns interactions between higher-level features derived from the DNA- and CpG model to predict methylation states in all cells. (c, d) The trained DeepCpG model can be used for different downstream analyses, including genome-wide imputation of missing CpG sites (c) and the discovery of DNA sequence motifs that are associated with DNA methylation levels or cell-to-cell variability (d).

Table of contents

News

  • 181201: DeepCpG 1.0.7 released!
  • 180224: DeepCpG 1.0.6 released!
  • 171112: Keras 2 is now the main Keras version (release 1.0.5).
  • 170412: New notebook on predicting inter-cell statistics!
  • 170414: Added dcpg_eval_perf.py and dcpg_eval_perf.Rmd for evaluating and visualizing prediction performances! Find an example in this notebook!
  • 170412: New notebook on predicting inter-cell statistics!
  • 170410: New notebook on estimating mutation effects!
  • 170406: A short description of all DeepCpG scripts!
  • 170404: New guide on creating and analyzing DeepCpG data released!
  • 170404: Training on continuous data, e.g. from bulk experiments, now supported!

Installation

The easiest way to install DeepCpG is to use PyPI:

pip install deepcpg

Alternatively, you can checkout the repository,

git clone https://github.com/cangermueller/deepcpg.git

and then install DeepCpG using setup.py:

python setup.py install

Getting started

  1. Store known CpG methylation states of each cell into a tab-delimted file with the following columns:
  • Chromosome (without chr)
  • Position of the CpG site on the chromosome starting with one
  • Binary methylation state of the CpG sites (0=unmethylation, 1=methylated)

Example:

1   3000827   1.0
1   3001007   0.0
1   3001018   1.0
...
Y   90829839  1.0
Y   90829899  1.0
Y   90829918  0.0
  1. Run dcpg_data.py to create the input data for DeepCpG:
dcpg_data.py
--cpg_profiles ./cpg/cell1.tsv ./cpg/cell2.tsv ./cpg/cell3.tsv
--dna_files ./dna/mm10
--cpg_wlen 50
--dna_wlen 1001
--out_dir ./data

./cpg/cell[123].tsv store the methylation data from step 1., ./dna contains the DNA database, e.g. mm10 for mouse or hg38 for human, and output data files will be stored in ./data.

  1. Fine-tune a pre-trained model or train your own model from scratch with dcpg_train.py:
dcpg_train.py
  ./data/c{1,3,6,7,9}_*.h5
  --val_data ./data/c{13,14,15,16,17,18,19}_*.h5
  --dna_model CnnL2h128
  --cpg_model RnnL1
  --joint_model JointL2h512
  --nb_epoch 30
  --out_dir ./model

This command uses chromosomes 1-3 for training and 10-13 for validation. ---dna_model, --cpg_model, and --joint_model specify the architecture of the CpG, DNA, and Joint model, respectively (see manuscript for details). Training will stop after at most 30 epochs and model files will be stored in ./model.

  1. Use dcpg_eval.py to impute methylation profiles and evaluate model performances.
dcpg_eval.py
  ./data/*.h5
  --model_files ./model/model.json ./model/model_weights_val.h5
  --out_data ./eval/data.h5
  --out_report ./eval/report.tsv

This command predicts missing methylation states on all chromosomes and evaluates prediction performances using known methylation states. Predicted states will be stored in ./eval/data.h5 and performance metrics in ./eval/report.tsv.

  1. Export imputed methylation profiles to HDF5 or bedGraph files:
dcpg_eval_export.py
  ./eval/data.h5
  -o ./eval/hdf
  -f hdf

Examples

You can find example notebooks and scripts on how to use DeepCpG in /examples. R scripts and Rmarkdown files for downstream analyses are stored in /R.

Documentation

The DeepCpG documentation provides information on training, hyper-parameter selection, and model architectures.

Model Zoo

You can download pre-trained models from the DeepCpG model zoo.

FAQ

Why am I getting warnings 'No CpG site at position X!' when using `dcpg_data.py`? This means that some sites in --cpg_profile files are not CpG sites, i.e. there is no CG dinucleotide at the given position in the DNA sequence. Make sure that --dna_files point to the correct genome and CpG sites are correctly aligned. Since DeepCpG currently does not support allele-specific methylation, data from different alleles must be merged (recommended) or only one allele be used.

How can I train models on one or more GPUs? DeepCpG use the Keras deep learning library, which supports Theano or Tensorflow as backend. If you are using Tensorflow, DeepCpG will automatically run on all available GPUs. If you are using Theano, you have to set the flag device=GPU in the THEANO_FLAGS environment variable.

THEANO_FLAGS='device=gpu,floatX=float32'

You can find more information about Keras backends here, and about parallelization here.

Content

  • /deepcpg/: Source code
  • /docs: Documentation
  • /examples/: Examples on how to use DeepCpG
  • /R: R scripts and Rmarkdown files for downstream analyses
  • /script/: Executable DeepCpG scripts
  • /tests: Test files

Changelog

1.0.7

  • Add support for Keras >=2.2.0.

1.0.6

  • Add support for Keras 2.1.4 and Tensorflow 1.5.0
  • Minor bug-fixes.

1.0.5

Uses Keras 2 as main Keras version.

1.0.4

Adds evaluation scripts and notebooks, improves documentation, and fixes minor bugs.
  • Adds dcpg_eval_perf.py and R markdown files for computing and visualizing performance metrics genome-wide and in annotated contexts.
  • Adds dcpg_snp.py for computing mutation effects.
  • Adds notebooks on computing mutation effects and predicting inter-cell statistics.
  • Adds documentation of DeepCpG scripts.
  • Adds integration tests.
  • Includes minor bug-fixes.

1.0.3

Extends dcpg_data.py, updates documentation, and fixes minor bugs.
  • Extends dcpg_data.py to support bedGraph and TSV input files.
  • Enables training on continuous methylation states.
  • Adds documentation about creating and analyzing Data.
  • Updates documentation of scripts and library.

Contact


  1. Angermueller, Christof, Heather J. Lee, Wolf Reik, and Oliver Stegle. DeepCpG: Accurate Prediction of Single-Cell DNA Methylation States Using Deep Learning. Genome Biology 18 (April 11, 2017): 67. doi:10.1186/s13059-017-1189-z.

deepcpg's People

Contributors

cangermueller avatar mina7928 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

deepcpg's Issues

Keras/Tensorflow combatible versions

Hi,
Could you please recommend a keras and tensorflow compatible versions? I am getting multiple issues with the latest versions.. eg.

AttributeError: module 'keras.regularizers' has no attribute 'WeightRegularizer'

Thanks!

PCA of learnt motifs by DeepCpG

Hi,
Could anybody explain more about Fig.3? How did you do PCA of learnt motifs by DeepCpG? What kind of values did you calculated as the input of PCA? I also want to try this method in my own research. But I did not find details in the paper. Thank you sososososo much if i can get any reply :D!

ValueError: If printing histograms, validation_data must be provided, and cannot be a generator.

Hi,
Thanks for your software.After creating data as test,there is an issue when i execute
dcpg_train.py ./data/c1_000000-001000.h5 --val_files ./data/c13_000000-001000.h5 --cpg_model RnnL1 --out_dir ./models/cpg --nb_epoch 1 --nb_train_sample 1000 --nb_val_sample 1000
Here is the traceback:
Traceback (most recent call last):
File "/home/ljc/down/deepcpg-keras2/scripts/dcpg_train.py", line 832, in
app.run(sys.argv)
File "/home/ljc/down/deepcpg-keras2/scripts/dcpg_train.py", line 194, in run
return self.main(name, opts)
File "/home/ljc/down/deepcpg-keras2/scripts/dcpg_train.py", line 802, in main
verbose=0)
File "/home/ljc/anaconda3/lib/python3.6/site-packages/keras/legacy/interfaces.py", line 87, in wrapper
return func(*args, **kwargs)
File "/home/ljc/anaconda3/lib/python3.6/site-packages/keras/engine/training.py", line 2082, in fit_generator
callbacks.on_epoch_end(epoch, epoch_logs)
File "/home/ljc/anaconda3/lib/python3.6/site-packages/keras/callbacks.py", line 77, in on_epoch_end
callback.on_epoch_end(epoch, logs)
File "/home/ljc/anaconda3/lib/python3.6/site-packages/keras/callbacks.py", line 751, in on_epoch_end
raise ValueError('If printing histograms, validation_data must be '
ValueError: If printing histograms, validation_data must be provided, and cannot be a generator.

Could you plz help me?Thanks a lot.

Error in dcpg_eval.py: TypeError: len() of unsized object

Traceback (most recent call last):
  File "/home/thui/anaconda3/envs/deepcpg/bin/dcpg_eval.py", line 4, in <module>
    __import__('pkg_resources').run_script('deepcpg==1.0.3', 'dcpg_eval.py')
  File "/home/thui/anaconda3/envs/deepcpg/lib/python3.4/site-packages/pkg_resources/__init__.py", line 738, in run_script
    self.require(requires)[0].run_script(script_name, ns)
  File "/home/thui/anaconda3/envs/deepcpg/lib/python3.4/site-packages/pkg_resources/__init__.py", line 1506, in run_script
    exec(script_code, namespace, namespace)
  File "/home/thui/anaconda3/envs/deepcpg/lib/python3.4/site-packages/deepcpg-1.0.3-py3.4.egg/EGG-INFO/scripts/dcpg_eval.py", line 254, in <module>
  File "/home/thui/anaconda3/envs/deepcpg/lib/python3.4/site-packages/deepcpg-1.0.3-py3.4.egg/EGG-INFO/scripts/dcpg_eval.py", line 96, in run
  File "/home/thui/anaconda3/envs/deepcpg/lib/python3.4/site-packages/deepcpg-1.0.3-py3.4.egg/EGG-INFO/scripts/dcpg_eval.py", line 221, in main
  File "/home/thui/anaconda3/envs/deepcpg/lib/python3.4/site-packages/deepcpg-1.0.3-py3.4.egg/EGG-INFO/scripts/dcpg_eval.py", line 76, in write_dict
  File "/home/thui/anaconda3/envs/deepcpg/lib/python3.4/site-packages/deepcpg-1.0.3-py3.4.egg/EGG-INFO/scripts/dcpg_eval.py", line 81, in write_dict
TypeError: len() of unsized object
Exception ignored in: <bound method Session.__del__ of <tensorflow.python.client.session.Session object at 0x7f0eaff6d128>>
Traceback (most recent call last):
  File "/home/thui/anaconda3/envs/deepcpg/lib/python3.4/site-packages/tensorflow/python/client/session.py", line 587, in __del__
AttributeError: 'NoneType' object has no attribute 'TF_NewStatus'

My command: dcpg_eval.py ./data/*.h5 --model_files ./model/cpg/model.json ./model/cpg/model_weights_val.h5 --out_data ./eval/data.h5 --out_report ./eval/report.tsv

My environment:

(deepcpg) thui@devbox01:$ pip freeze
appdirs==1.4.3
cycler==0.10.0
deepcpg==1.0.4
h5py==2.7.0
Keras==1.2.2
matplotlib==2.0.1
numpy==1.12.1
numpydoc==0.6.0
packaging==16.8
pandas==0.20.1
protobuf==3.3.0
py==1.4.33
pyparsing==2.2.0
pytest==3.0.7
python-dateutil==2.6.0
pytz==2017.2
PyYAML==3.12
scikit-learn==0.18.1
scipy==0.19.0
seaborn==0.7.1
six==1.10.0
tensorflow==1.1.0
tensorflow-gpu==1.1.0
Theano==0.9.0
Werkzeug==0.12.1

how to test only one cell?

Hello, if I train model use several cells like BS27_1_SER, BS27_3_SER, BS27_5_SER, BS27_6_SER and BS27_8_SER. Then when I test this pretained model, it seems that I must provide this five cell tsv files. So if I want test only one cell tsv file, how should I do?
can you give some advises?
thank you very much~
look forward to your reply!

keras/tensorflow version incompatibility

Hi,

I have used python==3.6.3, keras==2.2.0, tensorflow==1.10.0, numpy==1.14, pandas==0.23.4, and pip installed deepcpg (v1.0.7) as previously described in a different issue (#37); I've also tried using an earlier version combination of tensorflow==1.6 and keras==2.1.6, but the following error occurs:
202105101300

The issue states that "'list' object has no attribute 'get_shape'". My labmate and I believe this comes from a version control situation with tensorflow, but we don't understand how this error came into place.

Using any further older version would result in the 'keras.engine' has no attribute 'input_layer' error as described in the previous issue (#37).

I have exported my current environment as a yml file and would be happy to share that as well.

Error running CpG model

Hi,

When running dcpg_train.py, I encounter the following error:

Traceback (most recent call last):
  File "/home/thui/anaconda3/envs/deepcpg/bin/dcpg_train.py", line 4, in <module>
    __import__('pkg_resources').run_script('deepcpg==1.0.3', 'dcpg_train.py')
  File "/home/thui/anaconda3/envs/deepcpg/lib/python3.4/site-packages/setuptools-27.2.0-py3.4.egg/pkg_resources/__init__.py", line 744, in run_script
  File "/home/thui/anaconda3/envs/deepcpg/lib/python3.4/site-packages/setuptools-27.2.0-py3.4.egg/pkg_resources/__init__.py", line 1506, in run_script
  File "/home/thui/anaconda3/envs/deepcpg/lib/python3.4/site-packages/deepcpg-1.0.3-py3.4.egg/EGG-INFO/scripts/dcpg_train.py", line 828, in <module>
  File "/home/thui/anaconda3/envs/deepcpg/lib/python3.4/site-packages/deepcpg-1.0.3-py3.4.egg/EGG-INFO/scripts/dcpg_train.py", line 192, in run
  File "/home/thui/anaconda3/envs/deepcpg/lib/python3.4/site-packages/deepcpg-1.0.3-py3.4.egg/EGG-INFO/scripts/dcpg_train.py", line 699, in main
  File "/home/thui/anaconda3/envs/deepcpg/lib/python3.4/site-packages/deepcpg-1.0.3-py3.4.egg/EGG-INFO/scripts/dcpg_train.py", line 569, in build_model
  File "/home/thui/anaconda3/envs/deepcpg/lib/python3.4/site-packages/deepcpg-1.0.3-py3.4.egg/EGG-INFO/scripts/dcpg_train.py", line 549, in build_cpg_model
  File "/home/thui/anaconda3/envs/deepcpg/lib/python3.4/site-packages/deepcpg-1.0.3-py3.4.egg/deepcpg/models/cpg.py", line 86, in __call__
  File "/home/thui/anaconda3/envs/deepcpg/lib/python3.4/site-packages/deepcpg-1.0.3-py3.4.egg/deepcpg/models/cpg.py", line 76, in _replicate_model
AttributeError: 'module' object has no attribute 'WeightRegularizer'

I also tried calling on the attribute directly (inspired from https://github.com/cangermueller/deepcpg/blob/master/deepcpg/models/cpg.py):

>>> from keras import regularizers as kr
Using Theano backend.
>>> kr.WeightRegularizer
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'module' object has no attribute 'WeightRegularizer'

I'm using python 3.4.5 (Anaconda distribution), with Theano (version 0.9.0) as the backend to Keras (version 2.0.3). I installed deepcpg using python setup.py install rather than pip, because the .py scripts weren't being correctly added to path by pip

Fine tuning joint model doesn't seem to work.

Is it possible to use --fine_tune on a joint model?

I tried to follow the Ipython example for retraining and replaced toe --cpg_model with --joint_model but when it tries to load the previous model I get an null pointer exception. Also when trying to give it the model separately by using --model_files this is not solved. In this case it has issues with not matching samples between the model and the new training data.

Error filter

Hello,

I am using deepcpg to with continuous methylation values from library obtained with a single-cell protocol, just with low-input DNA, not single cell. Model training, evaluation and test run smoothly, as well as the calculation of the activating filters. However, at the motif analysis and visualization when I run the following:

dcpg_filter_motifs.py activations.h5 --out_dir outDir --plot_heat --plot_dens --plot_pca --out_format pdf --verbose

I get this error:

INFO (2018-02-13 21:33:40,554): Reading data
Traceback (most recent call last):
File "/home/f/framar/pfs/METH_CSF/deepcpg-1.0.5/scripts/dcpg_filter_motifs.py", line 619, in
App().run(sys.argv)
File "/home/f/framar/pfs/METH_CSF/deepcpg-1.0.5/scripts/dcpg_filter_motifs.py", line 312, in run
return self.main(name, opts)
File "/home/f/framar/pfs/METH_CSF/deepcpg-1.0.5/scripts/dcpg_filter_motifs.py", line 458, in main
assert filters_weights.shape[1] == 1
AssertionError

Any possible solution?

Regards

Francesco

Unable to train joint model from pretrained cpg and dna model with tensorflow backend

Hi,

I was able to train the joint model successfully. Then I wanted to train the cpg and dna model separately and the joint model on top. Training cpg and dna model worked fine, but I started having problems when trying to train the joint model from the two pretrained models.

The error I got was:

tensorflow.python.framework.errors_impl.FailedPreconditionError: Attempting to use uninitialized value cpg/bidirectional_1/forward_gru_1/bias
	 [[{{node cpg/bidirectional_1/forward_gru_1/bias/read}}]]

I found that the cause was in dcpg_train.py in get_callbacks:

if K._BACKEND == 'tensorflow' and not opts.no_tensorboard:
    K.set_session(K.tf.Session(config=K.tf.ConfigProto(
        intra_op_parallelism_threads=1,
        inter_op_parallelism_threads=1)))

I don't have too much experience with tensorflow or Keras, but it looks to me that setting a new session causes the model to lose the loaded weights. Since this is only done when tensorboard is used, I was able to workaround the problem by running with --no_tensorboard.

Best,
Rene

Negative strand index input

Hi:

I'm trying to use the package to deal with my sparse DNA methylation data, then I find the same issue as: #33 #33

I believe the problem is only with the negative strand, because only about half of the data trigger the warning and they are all in the negative strand. I try to suppress the warning by add the conditions in dcpg_data.py and it works.

if seq[p:p + 2] != 'CG' and seq[p-1:p+1] != 'CG': # make the change to suppress the warnings

However, when considering sites from negative strand, it is strange to use the opposite positive strand sequence as input for training, and it makes me confused. I wonder if there is any reason behind it.

keras / tensorflow version compatibility

Hello,

I am using deepcpg-1.0.7. I am using it through a conda environment with python 3.5, keras 2.0.2, and tensorflow 1.0.1. I have tried later versions of all three as well and I just keep getting error after error running the dcpg_train.py.

I'm attaching the current error with the listed versions of dependencies above.
Screen Shot 2020-06-30 at 1 16 24 PM

Installation

Sir can i know the proper way to run dataset coz im new to using such kind of datasets .. i'm facing a lot of problems running it ..

CpG sites in the negative strand

When I trun dcpg_data.py it gives thousands of warnings "dcpg_data.py:152: UserWarning: No CpG site at position 47655381! "

I think the reason is that those CpG sites are on the negative strand. Is there a solution for this?

Can't calculate the pair potential

Excuse me,I'm an university student and I have interest about your paper:MODEL-BASED REINFORCEMENT LEARNING FOR BIOLOGICAL SEQUENCE DESIGN.However,I have a little question about protein contact Ising Model.The experiment have used a pair potential based on co-occurence probabilities derived from reference.But I don't how to calculate the pair potential from reference.so could you tell me the values of this potential or how to calculate it.I would appreciate it very much if you could reply to me.Thank you very much.

Problem with setup.sh when downloading cpg data

I ran bash setup.sh in the examples folder. The DNA data was successfully downloaded. However downloading the cpg data encountered the following error:

#################################
wget http://www.ebi.ac.uk/~angermue/deepcpg/alias/b3afd7f831dec739d20843a3ef2dbeff -O ./data/cpg.zip
#################################
--2020-04-21 14:29:02--  http://www.ebi.ac.uk/~angermue/deepcpg/alias/b3afd7f831dec739d20843a3ef2dbeff
Resolving www.ebi.ac.uk (www.ebi.ac.uk)... 193.62.192.80
Connecting to www.ebi.ac.uk (www.ebi.ac.uk)|193.62.192.80|:80... connected.
HTTP request sent, awaiting response... 301 Moved Permanently
Location: https://www.ebi.ac.uk/~angermue/deepcpg/alias/b3afd7f831dec739d20843a3ef2dbeff [following]
--2020-04-21 14:29:02--  https://www.ebi.ac.uk/~angermue/deepcpg/alias/b3afd7f831dec739d20843a3ef2dbeff
Connecting to www.ebi.ac.uk (www.ebi.ac.uk)|193.62.192.80|:443... connected.
HTTP request sent, awaiting response... 404 Not Found
2020-04-21 14:29:02 ERROR 404: Not Found.

Any ideas?

GPU utility in training only about 25%

Hi!

When trying to train DeepCpG, on a GPU system, I get very low GPU utility (approximately 25%). Do you perhaps have an idea what could be the cause? I tried increasing the number of workers for the data loader, but it didn't help.

Here are the parameters I have been using:

dcpg_train.py ${mydatadir}/train/* --val_file ${mydatadir}/val/* --out_dir ${mydatadir}/model/ --dna_model CnnL2h128 --cpg_model RnnL1 --joint_model JointL2h512 --nb_epoch 10 --data_nb_worker 8 --data_q_size 20 --batch_size=512

I initially forgot to set data_nb_worker, so it defaulted to 1. I increased it to 8 and found no improvement. In both cases, the GPU utility is constantly at only about 25%.

Thanks,
Rene

erros and warnings

Hi,

Please accept my apologies if these questions are already answered some where in the docs or in previous issues.

  1. I have scBS data from mice and I wanted to impute the matrix. I followed the steps to generate input data with dcpg_data.py and I received a ton of no cpg site warnings. Is it correct to assume that most of the warnings are due to non-directional library protocol (aligned to all 4 strands with bismark) ? Is there a way to mitigate the issue ?

  2. Next, I wanted a train a model, however I am getting the error

dcpg_train.py  train_files \
    --val_files val_files \
    --dna_model CnnL2h128 --out_dir deepCpG/model/dna/ --nb_epoch 30

AttributeError: 'Model' object has no attribute 'input_layers'

train files and val files are generated as in the docs (splitting by chromosomes).

  1. I then tried with the pre-compiled models, and I have the following error
$ dcpg_download.py Smallwood2014_2i_cpg -o deepCpG/model/Smallwood2014_2i_cpg

$ dcpg_train.py deepCpG/data/*.h5 --cpg_model deepCpG/model/Smallwood2014_2i_cpg/ --out_dir deepCpG/model/cpg/ --fine_tune

ValueError: Unknown layer: Merge

Am I missing something here ?

-Anand

AttributeError: 'module' object has no attribute 'WeightRegularizer'

hello, when I run bash train,sh in examples/scripts, it shows error:

#################################
dcpg_train.py ./data/c{1,3,5,7,9}*.h5 --val_files ./data/c{1,3,5,7,9}*.h5 --out_dir ./models/dna --dna_model CnnL2h128 --val_files ./data/c{1,3,5,7,9}*.h5 --nb_train_sample 500 --nb_val_sample 500 --nb_epoch 1
#################################
Using TensorFlow backend.
INFO (2017-06-01 11:00:40,810): Building model ...
INFO (2017-06-01 11:00:40,811): Building DNA model ...
Traceback (most recent call last):
  File "/home/ztgong/local/anaconda2/bin/dcpg_train.py", line 4, in <module>
    __import__('pkg_resources').run_script('deepcpg==1.0.4', 'dcpg_train.py')
  File "/home/ztgong/local/anaconda2/lib/python2.7/site-packages/pkg_resources/__init__.py", line 719, in run_script
    self.require(requires)[0].run_script(script_name, ns)
  File "/home/ztgong/local/anaconda2/lib/python2.7/site-packages/pkg_resources/__init__.py", line 1511, in run_script
    exec(script_code, namespace, namespace)
  File "/home/ztgong/local/anaconda2/lib/python2.7/site-packages/deepcpg-1.0.4-py2.7.egg/EGG-INFO/scripts/dcpg_train.py", line 828, in <module>
    
  File "/home/ztgong/local/anaconda2/lib/python2.7/site-packages/deepcpg-1.0.4-py2.7.egg/EGG-INFO/scripts/dcpg_train.py", line 192, in run
    
  File "/home/ztgong/local/anaconda2/lib/python2.7/site-packages/deepcpg-1.0.4-py2.7.egg/EGG-INFO/scripts/dcpg_train.py", line 699, in main
    
  File "/home/ztgong/local/anaconda2/lib/python2.7/site-packages/deepcpg-1.0.4-py2.7.egg/EGG-INFO/scripts/dcpg_train.py", line 565, in build_model
    
  File "/home/ztgong/local/anaconda2/lib/python2.7/site-packages/deepcpg-1.0.4-py2.7.egg/EGG-INFO/scripts/dcpg_train.py", line 502, in build_dna_model
    
  File "build/bdist.linux-x86_64/egg/deepcpg/models/dna.py", line 90, in __call__
AttributeError: 'module' object has no attribute 'WeightRegularizer'

I refer to keras documentation. There is keras.regularizers.WeightRegularizer, so why it reports an error here?
how to fix it? thank you very much~
looking forward to your reply~

my python version is 2.7 and keras version is 2.0.3/2.0.4.

Error in train.sh

_I am trying to run example data for training (train.sh) but getting the following error:
WARNING:tensorflow:From /.local/lib/python2.7/site-packages/tensorflow/python/ops/nn_impl.py:180: where (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where
WARNING (2019-09-26 15:41:00,377): From /.local/lib/python2.7/site-packages/tensorflow/python/ops/nn_impl.py:180: where (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where
INFO (2019-09-26 15:41:00,617): Loading data ...
INFO (2019-09-26 15:41:00,908): Initializing callbacks ...
Traceback (most recent call last):
File "
/.local/bin/dcpg_train.py", line 836, in
app.run(sys.argv)
File "
/.local/bin/dcpg_train.py", line 195, in run
return self.main(name, opts)
File "/.local/bin/dcpg_train.py", line 790, in main
callbacks = self.get_callbacks()
File "
/.local/bin/dcpg_train.py", line 459, in get_callbacks
if K._BACKEND == 'tensorflow' and not opts.no_tensorboard:
AttributeError: 'module' object has no attribute '_BACKEND'

Can you help what is this error about?_

can you provide datasets?

hello, new to CpG, I want to run dcpg_data.py to create the input data for DeepCpG, but I have no dataset like ./cpg/cell[123].tsv and ./dna/mm10. so can you provide them or link?
thank you very much!!

No access to ./data/anno.zip on the server

Hi, I am trying to download the data using "bash setup.sh", but I cannot download the anno.zip because no permission to access. Could you please figure it out?

Thanks
jie

Fails CpG module training

Hi

I tried to run, interactively, the cells in example/notebooks/basics/index.ipynb. I have a problem in training the CpG model. Here is the output:

#################################
dcpg_train.py ./data/c1_000000-001000.h5 --val_files ./data/c13_000000-001000.h5 --cpg_model RnnL1 --out_dir ./models/cpg --nb_epoch 1 --nb_train_sample 1000 --nb_val_sample 1000
#################################
Using TensorFlow backend.
INFO (2020-04-22 15:04:14,822): Building model ...
Replicate names:
BS27_1_SER, BS27_3_SER, BS27_5_SER, BS27_6_SER, BS27_8_SER

INFO (2020-04-22 15:04:14,834): Building CpG model ...
WARNING:tensorflow:From /tamir1/yoramzar/Projects/Models/NN/deepcpg/venv37/lib/python3.7/site-packages/tensorflow_core/python/ops/resource_variable_ops.py:1630: calling BaseResourceVariable.__init__ (from tensorflow.python.ops.resource_variable_ops) with constraint is deprecated and will be removed in a future version.
Instructions for updating:
If using Keras pass *_constraint arguments to layers.
WARNING (2020-04-22 15:04:14,846): From /tamir1/yoramzar/Projects/Models/NN/deepcpg/venv37/lib/python3.7/site-packages/tensorflow_core/python/ops/resource_variable_ops.py:1630: calling BaseResourceVariable.__init__ (from tensorflow.python.ops.resource_variable_ops) with constraint is deprecated and will be removed in a future version.
Instructions for updating:
If using Keras pass *_constraint arguments to layers.
Model: "RnnL1"
______________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
==================================================================================================
cpg/state (InputLayer)          (None, 5, 50)        0                                            
__________________________________________________________________________________________________
cpg/dist (InputLayer)           (None, 5, 50)        0                                            
__________________________________________________________________________________________________
cpg/concatenate_1 (Concatenate) (None, 5, 100)       0           cpg/state[0][0]                  
                                                                 cpg/dist[0][0]                   
__________________________________________________________________________________________________
cpg/time_distributed_1 (TimeDis (None, 5, 256)       25856       cpg/concatenate_1[0][0]          
__________________________________________________________________________________________________
cpg/bidirectional_1 (Bidirectio (None, 512)          787968      cpg/time_distributed_1[0][0]     
__________________________________________________________________________________________________
cpg/dropout_1 (Dropout)         (None, 512)          0           cpg/bidirectional_1[0][0]        
__________________________________________________________________________________________________
cpg/BS27_1_SER (Dense)          (None, 1)            513         cpg/dropout_1[0][0]              
__________________________________________________________________________________________________
cpg/BS27_3_SER (Dense)          (None, 1)            513         cpg/dropout_1[0][0]              
__________________________________________________________________________________________________
cpg/BS27_5_SER (Dense)          (None, 1)            513         cpg/dropout_1[0][0]              
__________________________________________________________________________________________________
cpg/BS27_6_SER (Dense)          (None, 1)            513         cpg/dropout_1[0][0]              
__________________________________________________________________________________________________
cpg/BS27_8_SER (Dense)          (None, 1)            513         cpg/dropout_1[0][0]              
==================================================================================================
Total params: 816,389
Trainable params: 816,389
Non-trainable params: 0
__________________________________________________________________________________________________
INFO (2020-04-22 15:04:15,272): Computing output statistics ...
Output statistics:
          name | nb_tot | nb_obs | frac_obs | mean |  var
---------------------------------------------------------
cpg/BS27_1_SER |   1000 |    193 |     0.19 | 0.84 | 0.13
cpg/BS27_3_SER |   1000 |    209 |     0.21 | 0.77 | 0.18
cpg/BS27_5_SER |   1000 |    196 |     0.20 | 0.75 | 0.19
cpg/BS27_6_SER |   1000 |    203 |     0.20 | 0.62 | 0.24
cpg/BS27_8_SER |   1000 |    200 |     0.20 | 0.81 | 0.15

Class weights:
cpg/BS27_1_SER | cpg/BS27_3_SER | cpg/BS27_5_SER | cpg/BS27_6_SER | cpg/BS27_8_SER
----------------------------------------------------------------------------------
        0=0.84 |         0=0.77 |         0=0.75 |         0=0.62 |         0=0.81
        1=0.16 |         1=0.23 |         1=0.25 |         1=0.38 |         1=0.19
WARNING:tensorflow:From /tamir1/yoramzar/Projects/Models/NN/deepcpg/venv37/lib/python3.7/site-packages/tensorflow_core/python/ops/nn_impl.py:183: where (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where
WARNING (2020-04-22 15:04:15,502): From /tamir1/yoramzar/Projects/Models/NN/deepcpg/venv37/lib/python3.7/site-packages/tensorflow_core/python/ops/nn_impl.py:183: where (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where
INFO (2020-04-22 15:04:15,568): Loading data ...
INFO (2020-04-22 15:04:15,586): Initializing callbacks ...
INFO (2020-04-22 15:04:23,654): Training model ...

WARNING:tensorflow:From /tamir1/yoramzar/Projects/Models/NN/deepcpg/venv37/lib/python3.7/site-packages/tensorflow_core/python/ops/nn_impl.py:183: where (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where
WARNING (2020-04-22 15:04:15,502): From /tamir1/yoramzar/Projects/Models/NN/deepcpg/venv37/lib/python3.7/site-packages/tensorflow_core/python/ops/nn_impl.py:183: where (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where
INFO (2020-04-22 15:04:15,568): Loading data ...
INFO (2020-04-22 15:04:15,586): Initializing callbacks ...
INFO (2020-04-22 15:04:23,654): Training model ...

Training samples: 1000
Validation samples: 1000
WARNING:tensorflow:OMP_NUM_THREADS is no longer used by the default Keras config. To configure the number of threads, use tf.config.threading APIs.
WARNING (2020-04-22 15:04:24,831): OMP_NUM_THREADS is no longer used by the default Keras config. To configure the number of threads, use tf.config.threading APIs.
2020-04-22 15:04:24.831482: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory
2020-04-22 15:04:24.831508: E tensorflow/stream_executor/cuda/cuda_driver.cc:318] failed call to cuInit: UNKNOWN ERROR (303)
2020-04-22 15:04:24.831531: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (compute-0-254.power5): /proc/driver/nvidia/version does not exist
2020-04-22 15:04:24.831851: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 AVX512F FMA
2020-04-22 15:04:24.841381: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2600000000 Hz
cannot allocate memory for thread-local data: ABORT
cannot allocate memory for thread-local data: ABORT
: 127

Why is it pointing to libcuda? I am assuming this notebook act be run on a cpu, right?

Question about DeepCpG

Hi, Daniel, author of DanQ here. Thanks for citing my paper in your work. I have a quick question about DeepCpG's inputs. Are you using CpG state labels from neighboring sites as inputs to your recurrent neural network? If so, your method sort of blurs the line between supervised and unsupervised learning in the sense that you are using supervised learning methods to train a model that uses CpG states as both labels and inputs.

Also, in your paper, you said in the CpG module each xt is a vector of size 100 containing the "methylation state and distance of K = 25 CpG sites to the left and to the right of a target CpG". From what I understand, xt should contain information on 50 total CpG sites, and is assigning 2 binary features for each of the 50 sites, bringing the total to 100. If so, is there a reason why you could not have brought this down to just 50 features and assigned a single value to each of the 50 sites? I always assumed that methylation was completely binary (either yes or no), but I might be confused here.

Thanks,
Daniel

pip install deepcpg does not install tensorflow.

Running pip install deepcpg does not include any version tensorflow, let alone the correct one.

pip install tensorflow results in ModuleNotFoundError: No module named 'keras.layers.merge' on cpg.py's ... import concatenate

Error in dcpg_train.py

Hi Christof,
Fantastic looking piece of software, but unfortunately I've encountered an error when starting to train the data.

~/scratch/deepCpG$ dcpg_train.py c1_*.h5 c3_*.h5 c5_*.h5 --val_files c2_*.h5 --dna_model CnnL2h128 --cpg_model RnnL1 --out_dir models --nb_epoch 30
Using TensorFlow backend.
INFO (2018-07-23 12:21:21,556): Building model ...
INFO (2018-07-23 12:21:21,558): Building DNA model ...
Traceback (most recent call last):
  File "/usr/local/bin/dcpg_train.py", line 4, in <module>
    __import__('pkg_resources').run_script('deepcpg==1.0.6', 'dcpg_train.py')
  File "/usr/lib/python2.7/dist-packages/pkg_resources/__init__.py", line 719, in run_script
    self.require(requires)[0].run_script(script_name, ns)
  File "/usr/lib/python2.7/dist-packages/pkg_resources/__init__.py", line 1511, in run_script
    exec(script_code, namespace, namespace)
  File "/usr/local/lib/python2.7/dist-packages/deepcpg-1.0.6-py2.7.egg/EGG-INFO/scripts/dcpg_train.py", line 836, in <module>
    
  File "/usr/local/lib/python2.7/dist-packages/deepcpg-1.0.6-py2.7.egg/EGG-INFO/scripts/dcpg_train.py", line 194, in run
    
  File "/usr/local/lib/python2.7/dist-packages/deepcpg-1.0.6-py2.7.egg/EGG-INFO/scripts/dcpg_train.py", line 705, in main
    
  File "/usr/local/lib/python2.7/dist-packages/deepcpg-1.0.6-py2.7.egg/EGG-INFO/scripts/dcpg_train.py", line 571, in build_model
    
  File "/usr/local/lib/python2.7/dist-packages/deepcpg-1.0.6-py2.7.egg/EGG-INFO/scripts/dcpg_train.py", line 508, in build_dna_model
    
  File "build/bdist.linux-x86_64/egg/deepcpg/models/dna.py", line 117, in __call__
  File "build/bdist.linux-x86_64/egg/deepcpg/models/utils.py", line 448, in _build
AttributeError: 'Model' object has no attribute 'input_layers'

Any ideas on how I can fix this? Any help would be much appreciated.

Best,
Tim

Invalid file path or buffer object type: <class 'deepcpg.data.utils.GzipFile'>

Hi,
Thanks for making this software. After I download the example data files (by script in deepcpg/examples/setup.sh), when I am running the example script:
python deepcpg_path/dcpg_data.py --cpg_profiles ../data/cpg/BS27_1_SER.tsv.gz ../data/cpg/BS27_3_SER.tsv.gz ../data/cpg/BS27_5_SER.tsv.gz ../data/cpg/BS27_6_SER.tsv.gz ../data/cpg/BS27_8_SER.tsv.gz --dna_files ../data/dna/mm10 --out_dir ./data --dna_wlen 1001 --cpg_wlen 50 --nb_sample 1000

I got the problem of this error. It seems something is wrong about reading the file, but i have checked that all of the files path are correct in my system...

Traceback (most recent call last):
File "/home/yaping/software/deepcpg/deepcpg/scripts/dcpg_data.py", line 629, in
app.run(sys.argv)
File "/home/yaping/software/deepcpg/deepcpg/scripts/dcpg_data.py", line 260, in run
return self.main(name, opts)
File "/home/yaping/software/deepcpg/deepcpg/scripts/dcpg_data.py", line 413, in main
log=log.info)
File "/home/yaping/software/deepcpg/deepcpg/scripts/dcpg_data.py", line 114, in read_cpg_profiles
cpg_profile = dat.read_cpg_profile(cpg_file, sort=True, *args, **kwargs)
File "build/bdist.linux-x86_64/egg/deepcpg/data/utils.py", line 254, in read_cpg_profile
File "/home/yaping/software/anaconda/anaconda2/envs/ccimpute/lib/python2.7/site-packages/pandas/io/parsers.py", line 655, in parser_f
return _read(filepath_or_buffer, kwds)
File "/home/yaping/software/anaconda/anaconda2/envs/ccimpute/lib/python2.7/site-packages/pandas/io/parsers.py", line 392, in _read
filepath_or_buffer, encoding, compression)
File "/home/yaping/software/anaconda/anaconda2/envs/ccimpute/lib/python2.7/site-packages/pandas/io/common.py", line 210, in get_filepath_or_buffer
raise ValueError(msg.format(_type=type(filepath_or_buffer)))
ValueError: Invalid file path or buffer object type: <class 'deepcpg.data.utils.GzipFile'>

Recommendations for using in Tumor Normal scenario

Hi!

We are trying to use DeepCpG on a few canine samples. We have a few normals and a few tumor samples. We wanted to know if we should be training the models on the Normals or the tumors or a mixture of both to predict the methylation status. The reason I'm asking this is we have very few normals compared to the tumor samples.

Also, once the model is trained, do we need to need to fine tune for each sample we want to call the methylation upon?

Any suggestions would be greatly helpful!

Regards,
Harish

impute CpG values for gene by cell matrix

Hi there,

I have a CpG methylation level matrix (gene by cell), and I'm curious is there any way to use DeepCpG tool to impute the methylation level on the genes?

Cannot open the deep_cpg/notebooks/basics/index.ipynb

Hi, I cannot open the deep_cpg/notebooks/basics/index.ipynb file, Jupyter notebook return the error as : NotJSONError('Notebook does not appear to be JSON: '{\n "cells": [\n {\n "cell_type": "m...',)

I also tried to run the sh data.sh from the example folder but I got the following errors:

Traceback (most recent call last):
File "/Users/jyang32/anaconda3/bin/dcpg_data.py", line 4, in
import('pkg_resources').run_script('deepcpg==1.0.6', 'dcpg_data.py')
File "/Users/jyang32/anaconda3/lib/python3.6/site-packages/pkg_resources/init.py", line 750, in run_script
self.require(requires)[0].run_script(script_name, ns)
File "/Users/jyang32/anaconda3/lib/python3.6/site-packages/pkg_resources/init.py", line 1518, in run_script
.format(**locals()),
pkg_resources.ResolutionError: Script 'scripts/dcpg_data.py' not found in metadata at '/Users/jyang32/anaconda3/lib/python3.6/site-packages/deepcpg-1.0.6.dist-info'

Could you provide some hints? Thank you!

All other scripts can work properly.

Jie

a problem

there is single Parameters in line 141 in deepcpg/data/annotations.py. So if we use deepcpg using pip, there seems to be error when run index.sh by selecting File -> Download as -> Bash (.sh).
please check this problem.

Error in execution of File fine_tune/index.ipynb

Error is ValueError: Improper config format: {u'l2': 9.999999747378752e-05, u'name': u'WeightRegularizer', u'l1': 0.0}.
It seems there is some problem related to Keras2 and Keras1 compatibility.

AttributeError: 'list' object has no attribute 'get_shape' when building model

I am trying to use dcpg_train.py, but it fails at the model-building stage like this:

INFO (2017-11-17 11:41:39,196): Building DNA model ...
Traceback (most recent call last):
  File "anaconda/envs/tensorflow/bin/dcpg_train.py", line 828, in <module>
    app.run(sys.argv)
  File "anaconda/envs/tensorflow/bin/dcpg_train.py", line 192, in run
    return self.main(name, opts)
  File "anaconda/envs/tensorflow/bin/dcpg_train.py", line 699, in main
    model = self.build_model()
  File "anaconda/envs/tensorflow/bin/dcpg_train.py", line 591, in build_model
    outputs = mod.add_output_layers(stem.outputs, output_names)
  File "/home/MYUSERNAME/anaconda/envs/deepcpg/lib/python3.6/site-packages/deepcpg-1.0.4-py3.6.egg/deepcpg/models/utils.py", line 260, in add_output_layers
  File "/home/MYUSERNAME/anaconda/envs/deepcpg/lib/python3.6/site-packages/keras/engine/topology.py", line 554, in __call__
    output = self.call(inputs, **kwargs)
  File "/home/MYUSERNAME/anaconda/envs/deepcpg/lib/python3.6/site-packages/keras/layers/core.py", line 840, in call
    output = K.dot(inputs, self.kernel)
  File "/home/MYUSERNAME/anaconda/envs/deepcpg/lib/python3.6/site-packages/keras/backend/tensorflow_backend.py", line 822, in dot
    if ndim(x) is not None and (ndim(x) > 2 or ndim(y) > 2):
  File "/home/MYUSERNAME/anaconda/envs/deepcpg/lib/python3.6/site-packages/keras/backend/tensorflow_backend.py", line 437, in ndim
    dims = x.get_shape()._dims
AttributeError: 'list' object has no attribute 'get_shape'

I am using the latest commit to the master branch here, keras 2.0.2 (and have tried more recent versions) and tensorflow-gpu 1.4.0.

Thanks in advance for any suggestions!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.