dinhdiep / monod2 Goto Github PK
View Code? Open in Web Editor NEWMONOD2 is a toolkit for methylation haplotype analysis of bisulfite sequencing data
MONOD2 is a toolkit for methylation haplotype analysis of bisulfite sequencing data
Hi Dinh,
I'm running your tool to extract methylation haplotypes from bam files, and I'm currently comparing the reads displayed in IGV with the methylation haplotypes generated from your tool. I'm seeing discrepancies, and I'm wondering if you can help me interpret the results.
My bam was aligned by bsmap, and the following command was used to generate the haplotype file:
sh scripts/bam2cghap_v1.sh WGBS ng.3805/MHBS.txt /allcpg/hg19.fa.allcpgs.txt.gz test.bam test
I compared the output with IGV:
The screenshot is from IGV's bisulfite CG mode, in which everything in blue are bases that are not methylated at CG motifs, referring to T's in your haplotype block output. If a base is methylated, it would be colored red, referring to C's in your haplotype block output. I have added the genomic coordinates right below the reads in IGV, and below the screenshot is the output from bam2cghap_v1.sh
Something isn't matching up: we don't see any reads with C's in IGV, and the T's doesn't account for all the reads in the region in IGV. Not sure if you have use IGV as a gold standard to compare your methylation haplotypes. Could the aligner (bsmap) be contributing to this?
A second example, from your data:
./scripts/make-mappable-bins.sh BAMfiles/CCT.bam 10
sh scripts/bam2cghap_v1.sh WGBS CCT.RD10_80up.genomecov.bed allcpg/hg19.fa.allcpgs.txt.gz BAMfiles/CCT.bam CCT
And here's the haplotype file in the region:
chr11:377279-377364 CCCCCCCCCCCCCC 1 377282,377288,377304,377312,377314,377316,377321,377324,377336,377340,377344,377346,377348,377356
chr11:377279-377364 CCCCCCCCCCTCCC 1 377282,377288,377304,377312,377314,377316,377321,377324,377336,377340,377344,377346,377348,377356
chr11:377279-377364 CCCCCCCCTTCCCC 1 377282,377288,377304,377312,377314,377316,377321,377324,377336,377340,377344,377346,377348,377356
Again, not all the reads in IGV account for the haplotype file, and I have checked that none of the reads here are PCR duplicates.
Would be interested to hear what you think about this.
Thanks,
Chris
Hi dinhdiep,
I'm very interested in your work and want to conduct methylation haplotype analysis on my bisulfite sequencing data. I am having trouble generating CpG haplotype files using your code. When I try the test example:
./scripts/bam2cghap.sh allcpg/cpg.small.txt.gz BAMfiles/Colon_primary_tumor_sept9_promoter.bam test
I get this error for many lines:
Use of uninitialized value $ref_seq in concatenation (.) or string at bin/getHaplo_PE_cgOnly.pl line 161, line 28.
Could you help me to get this to work? I would really appreciate it.
Hello,
Is there a quick way to fit the getHaplo_PE_cgOnly.pl into single end read data?
Thanks,
could I do alignment with bismark ?
and use it`s BAM file do methylation haplotype analysis
Hi dinhdiep,
I want to use your method to generate methylation haplotype blocks.
When running your script for Identifying the methylation haplotype blocks:
/bin/cgHap2MHBs_parallel.pl [hapInfo file] [min R2] [snpBed file] [outPrefix] ,what's type file "[snpBed file]" should be given ? It is the files in "data" folder named snp138_part_.gz ? I have tried used those snp138_part_.gz files, but couldn't got the expected results , what's the potential problem ?Can you help me address this problem? Thank you very much!
Best wishes!
Hi dinhdiep,
I am very interested in your great work and want to use your method to generate MHL for liver cancer. When running your script I found there are some missing in the examples. For example, in this command:
./scripts/bam2cghap.sh allcpg/cpg.small.txt.gz BAMfiles/Colon_primary_tumor_sept9_promoter.bam test
I can not find a allcpg directory in your repository. With out the file, I can't figure out the format and generate my own file that is compatible with the command.
And in this command,
./scripts/cghap2mhbs.sh results/HaploInfo/chr22.sub.hapInfo.txt example/N37_10_tissue_pooled.autosomes.RD10_80up.genomecov.bed 0.3 chr22
The results/HaploInfo/chr22.sub.hapInfo.txt is missing. I assume that this file is generated from one step of your script.
Can you help me address this problem? Thanks!
Hi Dinh,
I wonder whether methylation haplotype load mesure still applies to this kind of data set.Many thanks.
Jinsheng
In you data folder, the data/ng.3805/MHBS.txt is a bed file, which is your target region. How to get that region? Thanks.
Hello:
What is the type of SNP file, 1-based ? come from WGS data or BS-seq data ?
And how to do quality control for that SNP data, such as 4x depth and so on.
Hi @dinhdiep
Thanks for sharing this nice software to the community.
I have reviewed the README file and found that there is a wrong instruction at Identifying the methylation haplotype blocks
cghap2mhbs.sh [haplotype file] [target bed] [minimum LD R2 cutoff] [output name prefix]
This should be changed into cghap2mhbs.sh [haplotype file] [target bed] [minimum LD R2 cutoff] [snp files] [output name prefix]
The source code scripts/cghap2mhbs.sh
says
if [ -z "$4" ]
then
echo "usage: $0 [haplotype file] [target bed] [minimum LD R2 cutoff] [snp file] [output name prefix]"
exit 0
fi
In addition to above readme suggestion, there are problems regarding to global or local PATH
settings in some scripts or code.
For example, the perl script bin/cgHap2MHBs_parallel.pl
wants to run bin/hapInfo_maskSNPs2mld_block.pl
as shown below
sub run_process{
# my $hap_info_file =
# my $bin_file =
# my $min_r2 =
# my $cg_snp_bed_file =
my $file = shift;
my $cmd = "bin/hapInfo_maskSNPs2mld_block.pl hapinfo_$file.tmp $minR2 $snpBedFile > $outNamePrefix.$file";
system($cmd);
unlink("hapinfo_$file.tmp");
}
But when I execute bin/cgHap2MHBs_parallel.pl
perl cannot find bin/hapInfo_maskSNPs2mld_block.pl
.
Below is how my system complains about above problem.
sh: 1: bin/hapInfo_maskSNPs2mld_block.pl: not found
sh: 1: bin/hapInfo_maskSNPs2mld_block.pl: not found
sh: 1: bin/hapInfo_maskSNPs2mld_block.pl: not found
sh: 1: bin/hapInfo_maskSNPs2mld_block.pl: not found
sh: 1: bin/hapInfo_maskSNPs2mld_block.pl: not found
sh: 1: bin/hapInfo_maskSNPs2mld_block.pl: not found
sh: 1: bin/hapInfo_maskSNPs2mld_block.pl: not found
sh: 1: bin/hapInfo_maskSNPs2mld_block.pl: not found
sh: 1: bin/hapInfo_maskSNPs2mld_block.pl: not found
sh: 1: bin/hapInfo_maskSNPs2mld_block.pl: not found
sh: 1: bin/hapInfo_maskSNPs2mld_block.pl: not found
sh: 1: bin/hapInfo_maskSNPs2mld_block.pl: not found
This happens not only this perl script but some other shell scripts.
It would be great if you make a comment about path settings or fix those issue.
Thanks for reading this long comment.
"require(gplot2)" contain a typo and it should be "require(ggplot2)", which has already been listed. So, "require(gplot2)" should be removed.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.