tanjimin / c.origami Goto Github PK

View Code? Open in Web Editor NEW

59.0 4.0 8.0 248 KB

C.Origami, a prediction and screening framework for cell type-specific 3D chromatin structure.

Python 95.59% Shell 4.41%

chromatin computational-biology convolutional-neural-network deep-learning genomics hi-c transformer

c.origami's People

Contributors

Stargazers

Watchers

Forkers

zining01 iamzhuxinlu dylan-plummer skelviper limad02 javrodriguez ieshghi wangleibio

c.origami's Issues

Questions when I process the Hi-C data

Hi! Dear author.
I encountered problems when I tried to train the model of hg19. Neither using nf-core/hic(https://nf-co.re/hic) nor converting the .mcool file on 4DN into npz could I train the exact model, and the prediction result was empty. In addition, I found that the train loss were very low.
Could you please tell me more about how to process the Hi-C data？

Using CTCF chip-exo without input

Hello,

I have CTCF chip-exo data and unlike chip-seq it doesnt have an input sample. I have generated the bigwig file but cannot run ctcf_norm.sh to normalize it as I dont have an input sample. Can I use the bigwig file directly without normalizing it?

I do have a chip-seq input sample from the same cell line but I am not sure if I should use that for normalizing the chip-exo data.

Thanks

Questions about performance comparison with deepC

I am interested in the performance comparison with deepC. Could you please provide more details about training deepC with IMR-90 data? For example:

Did you use hg19 or hg38 as the reference genome?
DeepC has 2 training phases. Did you re-train the model for both phases?
If you perform the training for phase I, what chromatin features were selected?

I'm looking forward to your response.

What method was used in the paper to convert the predictions into valid pairs?

I saw in the paper it states that "we converted the predicted matrix back to valid pairs by merging predictions to chromosomes and counting the discretized intensity value." I also saw that someone raised the issue and it was closed stating it takes a lot of effort. I am wondering if you can specify what method was used in the paper so that it can be replicated by others and FitHiC can be used for loop calling?

Wrong file download link

The files corigami_data.tar.gz and corigami_data_gm12878_add_on.tar.gz have the same download link.

About comparison with Original HiC matrix

Dear all,

I followed the example you give to predict GM12878 HiC contact map in a certain region of 2MB

To measure the correlation between this prediction and the original HiC matrix of the same region, I understand that I have to perform (log(ICE normalized counts + 1)) on the original 2MB size HiC matrix, and then measure the correlation directly with the predictions.

Is that true?

Thanks

reproduce

hello!
I am currently engaged in replicating the C.Origami model and have encountered some difficulties.
I have diligently replicated the training process detailed on GitHub, employing the same datasets and code across two distinct servers, and have done multiple times. I've noticed considerable randomness in the training process, and the weights saved automatically by PyTorchlightning do not seem to effectively predict the Hi-C matrices in the training, validation, and test datasets. Using the checkpoints provided on GitHub, I found that there were no issues with the predictions.
I am curious to know if there have been previous reports of such variability, and if not, I plan to continue repeating my experiments to understand this phenomenon better.
thanks!

insulation score

Hello,

I noticed there was code for evaluation in issue 29, but I want to ask the specific code used for calculating the insulation score, ''import insulation as insu''.

Additionally, I am intrigued by the visualization presented in Figure 2.d of the ATAC_seq and CTCF. Are the values within each window_size region averaged to generate these plots?

Thanks!

The pre-trained models of other cell types

Dear Jimin, Your research has been very helpful in my work!
I am trying to reproduce the best results of your method as our baseline to compare.
Does Origami provide pre trained models for other cell types? (excluding IMR90)

thanks！

How to makr trans interaction contacts prediction?

Dear developers,

Thanks for your work to develop the state-of-art tool.I have tried the example and my own data for intra chromosomal contacts, looking OK. However, how to predict intra chromosomal like you paper Fig. 4. If there are some example or tutorial to make the inter chromosomal contacts?

Thanks !
https://media.springernature.com/lw685/springer-static/image/art%3A10.1038%2Fs41587-022-01612-8/MediaObjects/41587_2022_1612_Fig4_HTML.png?as=webp

Adding different ChIP-seq during the training phase ？

Hi，
Thank you for developing Corigami !
I have encountered some problems when I use corigami to train my data.

Could I Adding different ChIP-seq (like H3K27ac, H3k4me3...) in training phase ?

Could simply add bigwig files of different ChIP-seq in "genomic_features"?

Best wishes,
Kirtio

Adding chromosome-wide prediction

A lot of request was received on chromosome-wide predictions. I will add this feature soon.

Request for evaluation code

Hello,

Thank you for sharing your work. It is very helpful and inspiring.

Would it be possible for you to share the evaluation code (e.g. observed versus expected Hi-C matrices correlation, distance-stratified correlation, insulation score correlation) or provide more detailed information on how to calculate these metrics?

Thank you for your time and help.

Can C.Origami predict the 3D chromatin with out CTCF ChIP-seq input data?

I have the cut&tag data for CTCF, but no corresponding input data. Can I just use ip data to predict the architecture?

`examples/prediction.sh` generates blank Hi-C image for chromosome 15

Hi @tanjimin,

First of all, thanks for sharing your work in this repo.

I was trying to run the example provided in examples/prediction.sh, but it did not give the expected result. To be more specific, I modified the chromosome to --chr "chr15", and the rest is consistent with the documentation. The output Hi-C image is blank, while the numpy array looks fine. The output image remains blank for other start positions as well.

Can you please look into this problem? Thank you.

Question about downscaling 10-kb matrix to 8192 bp resolution

Hi @tanjimin ,

I really enjoyed using C.Origami, but I got a naive question that has been bothering me for a while. I was wondering how to downscale the 10-kb experimental Hi-C matrix to 8192 bp resolution in order to match the model outputs.

Best,

Jiankun

Single-cell ATACseq adaptability

I was curious to know if the C.Origami model can be used on single-cell ATACseq data?

regards
Arch

How to balance mcool files.

Hi，jimin
How do you perform balance processing on mcool files? When I use hic-bench's track to generate an mcool file using juicertools, I encounter an error because the -k parameter in addnorm is not available in the new version. Then I use Cooler to balance the mcool file, but the generated values are extremely small, mostly ranging from 0 to 0.01. However, when I downloaded your data and viewed it, I found that the values were distributed from 0 to 40.
Thanks!

How to merge the predicted matrix and convert it back to valid pairs ？

Could HiC-data be replaced by HiC-data generated by other comformation capture experiments like HiChIP or ChIA-PET?

Hi,

Dear author, the C.Origami is a beautiful job!

Could I replace the HiC experiment data with HiChIP or ChIA-PET data during training my model? As we all know, they are techniques that incorporate chromatin immunoprecipitation(ChIP)-based enrichment and chromatin proximity ligation, thus reflecting three-dimensional conformation data that capture specific chromatin modification. I wonder if your model can compatible with similar data.

Best wishes
Gemma

tanjimin / c.origami Goto Github PK

c.origami's People

Contributors

Stargazers

Watchers

Forkers

c.origami's Issues

Recommend Projects

Recommend Topics

Recommend Org