dinhdiep / monod2 Goto Github PK

View Code? Open in Web Editor NEW

35.0 35.0 31.0 1.49 GB

MONOD2 is a toolkit for methylation haplotype analysis of bisulfite sequencing data

Shell 12.52% Perl 51.08% R 36.40%

methyl-seq

monod2's People

Stargazers

Watchers

monod2's Issues

Interpreting methylation haplotypes

Hi Dinh,

I'm running your tool to extract methylation haplotypes from bam files, and I'm currently comparing the reads displayed in IGV with the methylation haplotypes generated from your tool. I'm seeing discrepancies, and I'm wondering if you can help me interpret the results.

My bam was aligned by bsmap, and the following command was used to generate the haplotype file:
sh scripts/bam2cghap_v1.sh WGBS ng.3805/MHBS.txt /allcpg/hg19.fa.allcpgs.txt.gz test.bam test

I compared the output with IGV:

The screenshot is from IGV's bisulfite CG mode, in which everything in blue are bases that are not methylated at CG motifs, referring to T's in your haplotype block output. If a base is methylated, it would be colored red, referring to C's in your haplotype block output. I have added the genomic coordinates right below the reads in IGV, and below the screenshot is the output from bam2cghap_v1.sh

Something isn't matching up: we don't see any reads with C's in IGV, and the T's doesn't account for all the reads in the region in IGV. Not sure if you have use IGV as a gold standard to compare your methylation haplotypes. Could the aligner (bsmap) be contributing to this?

A second example, from your data:
./scripts/make-mappable-bins.sh BAMfiles/CCT.bam 10
sh scripts/bam2cghap_v1.sh WGBS CCT.RD10_80up.genomecov.bed allcpg/hg19.fa.allcpgs.txt.gz BAMfiles/CCT.bam CCT

And here's the haplotype file in the region:

chr11:377279-377364 CCCCCCCCCCCCCC 1 377282,377288,377304,377312,377314,377316,377321,377324,377336,377340,377344,377346,377348,377356
chr11:377279-377364 CCCCCCCCCCTCCC 1 377282,377288,377304,377312,377314,377316,377321,377324,377336,377340,377344,377346,377348,377356
chr11:377279-377364 CCCCCCCCTTCCCC 1 377282,377288,377304,377312,377314,377316,377321,377324,377336,377340,377344,377346,377348,377356

Again, not all the reads in IGV account for the haplotype file, and I have checked that none of the reads here are PCR duplicates.

Would be interested to hear what you think about this.

Thanks,
Chris

hapInfo2mld_block

Extracting methylation haplotypes from sequence alignment files

Hi dinhdiep,

I'm very interested in your work and want to conduct methylation haplotype analysis on my bisulfite sequencing data. I am having trouble generating CpG haplotype files using your code. When I try the test example:
./scripts/bam2cghap.sh allcpg/cpg.small.txt.gz BAMfiles/Colon_primary_tumor_sept9_promoter.bam test

I get this error for many lines:
Use of uninitialized value $ref_seq in concatenation (.) or string at bin/getHaplo_PE_cgOnly.pl line 161, line 28.

Could you help me to get this to work? I would really appreciate it.

getHaplo_PE_cgOnly.pl

Hello,
Is there a quick way to fit the getHaplo_PE_cgOnly.pl into single end read data?
Thanks,

mapping

could I do alignment with bismark ?
and use it`s BAM file do methylation haplotype analysis

/bin/cgHap2MHBs_parallel.pl

Hi dinhdiep,
I want to use your method to generate methylation haplotype blocks.
When running your script for Identifying the methylation haplotype blocks:
/bin/cgHap2MHBs_parallel.pl [hapInfo file] [min R2] [snpBed file] [outPrefix] ，what's type file "[snpBed file]" should be given ? It is the files in "data" folder named snp138_part_.gz ? I have tried used those snp138_part_.gz files, but couldn't got the expected results , what's the potential problem ?Can you help me address this problem? Thank you very much!
Best wishes!

Missing allcpg directory in the repository

Hi dinhdiep,
I am very interested in your great work and want to use your method to generate MHL for liver cancer. When running your script I found there are some missing in the examples. For example, in this command:

./scripts/bam2cghap.sh allcpg/cpg.small.txt.gz BAMfiles/Colon_primary_tumor_sept9_promoter.bam test

I can not find a allcpg directory in your repository. With out the file, I can't figure out the format and generate my own file that is compatible with the command.

And in this command,

./scripts/cghap2mhbs.sh results/HaploInfo/chr22.sub.hapInfo.txt example/N37_10_tissue_pooled.autosomes.RD10_80up.genomecov.bed 0.3 chr22

The results/HaploInfo/chr22.sub.hapInfo.txt is missing. I assume that this file is generated from one step of your script.

Can you help me address this problem? Thanks!

whether methylation haplotype load mesure applies to sequencing data where each haplotype within a specific region have different start and end

Hi Dinh,

I would like to ask you a question.I have bisulfite sequencing data from cfDNA captrued with target probe.Therefore,for a specific region,the haplotypes may have different start and end as follows:

I wonder whether methylation haplotype load mesure still applies to this kind of data set.Many thanks.

Jinsheng

How to get data/ng.3805/MHBS.txt ?

In you data folder, the data/ng.3805/MHBS.txt is a bed file, which is your target region. How to get that region? Thanks.

requirement for hapInfo_maskSNPs2mld_block ?

Hello:
What is the type of SNP file, 1-based ? come from WGS data or BS-seq data ?
And how to do quality control for that SNP data, such as 4x depth and so on.

Issue with readme and problem of setting working directory

Hi @dinhdiep

Thanks for sharing this nice software to the community.

I have reviewed the README file and found that there is a wrong instruction at Identifying the methylation haplotype blocks

cghap2mhbs.sh [haplotype file] [target bed] [minimum LD R2 cutoff] [output name prefix]

This should be changed into cghap2mhbs.sh [haplotype file] [target bed] [minimum LD R2 cutoff] [snp files] [output name prefix]

The source code scripts/cghap2mhbs.sh says

if [ -z "$4" ]
then
        echo "usage: $0 [haplotype file] [target bed] [minimum LD R2 cutoff] [snp file] [output name prefix]"
        exit 0
fi

In addition to above readme suggestion, there are problems regarding to global or local PATH settings in some scripts or code.

For example, the perl script bin/cgHap2MHBs_parallel.pl wants to run bin/hapInfo_maskSNPs2mld_block.pl as shown below

sub run_process{
#  my $hap_info_file = 
#  my $bin_file = 
#  my $min_r2 = 
#  my $cg_snp_bed_file =
  my $file = shift;
  my $cmd = "bin/hapInfo_maskSNPs2mld_block.pl hapinfo_$file.tmp $minR2 $snpBedFile > $outNamePrefix.$file";
  system($cmd);
  unlink("hapinfo_$file.tmp");
}

But when I execute bin/cgHap2MHBs_parallel.pl perl cannot find bin/hapInfo_maskSNPs2mld_block.pl.

Below is how my system complains about above problem.

sh: 1: bin/hapInfo_maskSNPs2mld_block.pl: not found
sh: 1: bin/hapInfo_maskSNPs2mld_block.pl: not found
sh: 1: bin/hapInfo_maskSNPs2mld_block.pl: not found
sh: 1: bin/hapInfo_maskSNPs2mld_block.pl: not found
sh: 1: bin/hapInfo_maskSNPs2mld_block.pl: not found
sh: 1: bin/hapInfo_maskSNPs2mld_block.pl: not found
sh: 1: bin/hapInfo_maskSNPs2mld_block.pl: not found
sh: 1: bin/hapInfo_maskSNPs2mld_block.pl: not found
sh: 1: bin/hapInfo_maskSNPs2mld_block.pl: not found
sh: 1: bin/hapInfo_maskSNPs2mld_block.pl: not found
sh: 1: bin/hapInfo_maskSNPs2mld_block.pl: not found
sh: 1: bin/hapInfo_maskSNPs2mld_block.pl: not found

This happens not only this perl script but some other shell scripts.

It would be great if you make a comment about path settings or fix those issue.

Thanks for reading this long comment.