Giter VIP home page Giter VIP logo

Comments (5)

christacaggiano avatar christacaggiano commented on September 28, 2024 2

Hi Dhanya and Fayaz, Sorry for the late reply. CelFiE is really designed to be used to fit multiple samples simultaneously. When you fit one sample at once, which seems to be the case with the data that Fayaz provided (I don't know about Dhanya's data), the EM will tend to learn a "custom" unknown for that sample. This makes sense intuitively, because if I was blindly trying to describe one sample without knowing much about the reference (which is an assumption of the model- that our reference tissues aren't perfect since both ENCODE and BLUEPRINT are noisy), then the best I could do to describe a sample is to just describe what I have in front of me.

I have found that CelFiE performs best with more than 10 samples fit at once (see figure 3 of our preprint).

Let me know if that helps to clear things up.

from celfie.

seifudd avatar seifudd commented on September 28, 2024 1

Hi Dhanya,

I apologize for the late reply. I ended up using meth_atlas for deconvolution: https://github.com/nloyfer/meth_atlas

Hope this helps.

Thanks, Fayaz

from celfie.

dhanya-sudhakaran avatar dhanya-sudhakaran commented on September 28, 2024

Dear Fayaz

I was facing the same issue when I tried using celfie for my dataset. With no unknowns, I get >90% as placenta contribution in non-pregnant samples.

I was wondering if you were able to solve your issue and if you tried any other tool for your samples. Appreciate your time!

Thank you
Dhanya

from celfie.

seifudd avatar seifudd commented on September 28, 2024

Hi Dhanya and Fayaz, Sorry for the late reply. CelFiE is really designed to be used to fit multiple samples simultaneously. When you fit one sample at once, which seems to be the case with the data that Fayaz provided (I don't know about Dhanya's data), the EM will tend to learn a "custom" unknown for that sample. This makes sense intuitively, because if I was blindly trying to describe one sample without knowing much about the reference (which is an assumption of the model- that our reference tissues aren't perfect since both ENCODE and BLUEPRINT are noisy), then the best I could do to describe a sample is to just describe what I have in front of me.

I have found that CelFiE performs best with more than 10 samples fit at once (see figure 3 of our preprint).

Let me know if that helps to clear things up.

Hi Christina,

Thank you for your reply. We have 16 samples (COVID19, cfDNA, WGBS) and I tried CelFie but, I'm getting the same results as I did with the individual sample i.e. it predicts that most of the cfDNA originates from the "placenta."

I am attaching the results here.

I am also attaching the input data.

One question I had was: are the reference TIMs you provide on hg38? or hg19? Maybe that's causing the issue?

Any help will be appreciated.

Thanks, fs

covid19_samples_reference_file_tims.txt

covid19_cell_proportions.xlsx

from celfie.

seifudd avatar seifudd commented on September 28, 2024

Hi Dhanya and Fayaz, Sorry for the late reply. CelFiE is really designed to be used to fit multiple samples simultaneously. When you fit one sample at once, which seems to be the case with the data that Fayaz provided (I don't know about Dhanya's data), the EM will tend to learn a "custom" unknown for that sample. This makes sense intuitively, because if I was blindly trying to describe one sample without knowing much about the reference (which is an assumption of the model- that our reference tissues aren't perfect since both ENCODE and BLUEPRINT are noisy), then the best I could do to describe a sample is to just describe what I have in front of me.
I have found that CelFiE performs best with more than 10 samples fit at once (see figure 3 of our preprint).
Let me know if that helps to clear things up.

Hi Christina,

Thank you for your reply. We have 16 samples (COVID19, cfDNA, WGBS) and I tried CelFie but, I'm getting the same results as I did with the individual sample i.e. it predicts that most of the cfDNA originates from the "placenta."

I am attaching the results here.

I am also attaching the input data.

One question I had was: are the reference TIMs you provide on hg38? or hg19? Maybe that's causing the issue?

Any help will be appreciated.

Thanks, fs

covid19_samples_reference_file_tims.txt

covid19_cell_proportions.xlsx

Hi Christina,

For your reference, this is the command that I used. There is a warning about division by zero at some point in the script but this could be due to no coverage. I changed the number of unknowns from 0 to 20 compared to the previous run.

Again, any help will be appreciated.

Thanks, fs

python /data/NHLBI_BCB/Sean_MethylSeq/10-tissue_of_origin_methylation_project/celfie/EM/em.py \

/data/NHLBI_BCB/Sean_MethylSeq/14_MKJ5249/02_methylseq_analysis_pipeline/02_tissue_of_origin_prediction/04_deconvolution_with_celfie/covid19_samples_reference_file_tims.txt
/data/NHLBI_BCB/Sean_MethylSeq/14_MKJ5249/02_methylseq_analysis_pipeline/02_tissue_of_origin_prediction/04_deconvolution_with_celfie
16
1000
20
1
0.001
100
writing to /data/NHLBI_BCB/Sean_MethylSeq/14_MKJ5249/02_methylseq_analysis_pipeline/02_tissue_of_origin_prediction/04_deconvolution_with_celfie/
finshed reading /data/NHLBI_BCB/Sean_MethylSeq/14_MKJ5249/02_methylseq_analysis_pipeline/02_tissue_of_origin_prediction/04_deconvolution_with_celfie/covid19_samples_reference_file_tims.txt

beginning generation of /data/NHLBI_BCB/Sean_MethylSeq/14_MKJ5249/02_methylseq_analysis_pipeline/02_tissue_of_origin_prediction/04_deconvolution_with_celfie/1_alpha.pkl

/data/NHLBI_BCB/Sean_MethylSeq/10-tissue_of_origin_methylation_project/celfie/EM/em.py:159: RuntimeWarning: invalid value encountered in true_divide
add_pseduocounts(1, np.nan_to_num(y/y_depths), y, y_depths)
/data/NHLBI_BCB/Sean_MethylSeq/10-tissue_of_origin_methylation_project/celfie/EM/em.py:160: RuntimeWarning: invalid value encountered in true_divide
add_pseduocounts(0, np.nan_to_num(y/y_depths), y, y_depths)

from celfie.

Related Issues (7)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.