Giter VIP home page Giter VIP logo

asset's Issues

ast_10x fails

Hi,

I got stuck with the 10x steps of the Asset pipeline. All other parts of the pipeline (i.e. identifying gaps and ends, PacBio support, HiC support) work well.

However, when I get to the
$asset/bin/ast_10x -x gaps.bed $prefix.bam > 10x.bed
step, I get the following error message:

Program starts
[M::aa_10x] processing bam file
[M::proc_bam] finish processing 2121039345 bams
[M::proc_bam] read pair counter: 1000860477
[W::aa_10x] none useful information, quit

As I have a (admittingly unnecessary) large number of 10x reads, my $prefix.bam file is 260GB large. Also, what I saw is that in the gaps.bed file, all scaffolds have a gap at the first (0 0) and last position. Is that normal?

Best,
Oliver

10X Chromium pipeline

Hi Dengfeng,

I am trying to evaluate a 10X Chromium assembly done with SUPERNOVA.

I can get the gap.bed file, but the following never finished:
#####################################################

bwa index Andean12nov19_SUPERNOVA_output_pseudohap2.1.fasta
while read -r r1 r2
do
prefix=basename $r1 .fastq.gz
/home/sgod/GenomicsPrograms/asset/bin/10x -c -p $prefix $r1 $r2 # generate trimmed read files $prefix_{1,2}.fq.gz
bwa mem -t12 Andean12nov19_SUPERNOVA_output_pseudohap2.1.fasta $prefix_1.fq.gz $prefix_2.fq.gz | samtools view -b - > $prefix.bam
done < $10xlist

/home/sgod/GenomicsPrograms/asset/bin/ast_10x gaps.bed $bam1 $bam2 $bam3 ... >10x.bed 2>ast_10x.log

#####################################################
My 10Xlist is a file with the directory and 4 fastq files from the original 10X sequencing (R1 and R2 X 2 lanes)

Is there something simple I am missing in the syntax?

Thank you for your input!
Sher

Output of ast_hic, Unexpected Truncation

Hi Dengfeng, @dfguan

I want to evaluate an assembly (not scaffolded yet) with a HiC data. It seems that using ast_hic I should be able to find the reliable blocks. I used the default parameters for ast_hic

Usage: ast_hic [options] <GAP_BED> <BAM_FILEs>
Options:
         -c    INT      minimum coverage [7]
         -C    INT      maximum coverage [inf]
         -q    INT      minimum alignment quality [0]
         -L    INT      maximum insertion length, gap excluded [15000]
         -h             help

So I expected a block to be present in the output BED file if at least 7 paired reads are spanning over it and those reads are having an insertion length less than 15Kb. I tried to investigate the output BED file for some regions manually. I found some regions that I couldn't explain why they are not present in the BED file.

Here is what I did:

detgaps asm.fa > gaps.bed
ast_hic gaps.bed hic.bam > hic.bed

The assembly and the bam file are attached HiC_asset.zip and you can reproduce the BED file. ( I reduced the whole assembly and bam file to just one contig for simplicity )

The first lines of the BED file is as below:

track name="HC" description="hic data"
h1tg000188l	302	205890
h1tg000188l	206041	206104
h1tg000188l	206177	206215
h1tg000188l	206234	215904

As you can see the locus 205900 is not included so there should be a problem with this locus. But It seems that the reads spanning 205880 (included) and 205900 (not included) are exactly the same.

To find the spanning reads I ran the line below once for pos=205900 and once for pos=205880 :
samtools view hic.bam | awk '{if (($4 < pos) && ($8 > pos)) print $0 }'

For either of them I got the same 7 paired reads (all of them are having insertion lengths less than 15Kb):

A00675:75:H5CNNDSXY:1:1166:22761:14544	81	h1tg000188l	192537	19	10S128M13S	=	206017	13354	AATATCAAAGTGTAATCCCAGCACTTTTGGAGGCCAAGGCAAGAGGATCACAAGGTGAGGAGATCAAGACCATCCTGGCCAATACAGTGAAACGCTGTCACTATTAAAAATCCAAAAAATTAGCTAGGCATGGTGGCAAGGTTCGTCCATC	FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFF::,FFFFFFFFFFFFFFFFF:FFFFFFFFFFFF	NM:i:0	MD:Z:128	MC:Z:39S112M	AS:i:128	XS:i:117	XA:Z:h1tg000132l,-2800693,10S128M13S,1;h1tg000188l,-185735,10S128M13S,1;h1tg000132l,-2807711,10S128M13S,1;h1tg000144l,-2327679,10S128M13S,2;
A00675:75:H5CNNDSXY:2:1308:13982:24298	145	h1tg000188l	197070	0	50M4I1M4D92M3S	=	206108	8893	TACCCAAAATATGTATAATATACTGTACATAAATTATGGAAGTACCCAAAGATTCATAATCAACTGTACATAAAATAACAAAGTACTCAAACTATATATTCTATACAGTACATAAAATATCAAAGTACCCATACTGTATATTATATACTG	FFFFFFFFFF,FFFFFFFFF:FFFFFFFFFFFFF:F,FFFFFFFFFF::FF:FFFFFFFFFFFFFFFFFFFFFF:FFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF	NM:i:12	MD:Z:38A12^TTTT5A17T52A15	MC:Z:48S66M34S	AS:i:79	XS:i:78
A00675:75:H5CNNDSXY:2:1645:30481:33207	2177	h1tg000188l	199232	0	94H57M	=	206104	6873	CAAAATATGTATAATATACTGTACATAAATTATGAAAGTAACCAAACATTTATAATA	,FFF:,FF,,F,FFF:FFFF,FFFFFFFFF:F:FFFF,FF,FFFFF,:F,:FFFF:F	NM:i:1	MD:Z:40C16	MC:Z:5H88M58H	AS:i:46	XS:i:46	SA:Z:h1tg000006l,78306568,+,71M80S,49,1;
A00675:75:H5CNNDSXY:2:2671:9109:8876	2129	h1tg000188l	200829	0	106H32M13H	=	211484	10625	ACTGTACATAATATATTAAAGTCGCCTAAATA	FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF	NM:i:0	MD:Z:32	MC:Z:151M	AS:i:32	XS:i:26	SA:Z:h1tg000157l,793787,+,95S56M,0,0;h1tg000188l,235190,-,55S62M34S,0,1;
A00675:75:H5CNNDSXY:3:2152:14000:15969	113	h1tg000188l	202107	60	82M69S	=	211218	9167	TATGTATTATATATTGTACATAAAATATCAAACTATGCCAAATATATATTATATACTGTACATAAAATATCAAATTACCCAATATGAAGCTTTTATTAAAAATTCTAGGAAGTGAATACTAATCTATAGTGAAAAAGCAGACTAGGAAGAC	FFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFF,FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF,F:FFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFF:FFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFF	NM:i:0	MD:Z:82	MC:Z:14S137M	AS:i:82	XS:i:54	SA:Z:h1tg000144l,5760537,+,70M81S,0,0;
A00675:75:H5CNNDSXY:4:1214:15637:20964	2161	h1tg000188l	203773	0	30M118H	=	207353	3674	TACATAAAATATCAAAGTACCCAAACTTCA	FFFFFF,FFFFFFFFFFFFFFFFFFFF:FF	NM:i:0MD:Z:30	MC:Z:28H100M1D1M1I21M	AS:i:30	XS:i:28	SA:Z:h1tg000188l,207450,+,89M59S,0,0;
A00675:75:H5CNNDSXY:4:1618:3486:17018	161	h1tg000188l	204379	43	151M	=	208464	4213	TCAAATATATATATTATTCTGTACATAAAATATCACATTACACCATATATATATTATATACTGTACATAAAATATCAAAGTACCCAAAATATGTATAATATACTGTATATAAATTATGAAAGTACCCAAACCTTTATAATAAAGTGTACAT	FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFF:FFF:FFF:FF:FFFF:F	NM:i:0	MD:Z:151	MC:Z:23S128M	AS:i:151	XS:i:131	XA:Z:h1tg000188l,+199979,131M20S,0;h1tg000132l,+2822475,4S147M,2;

I also found these lines in HC.cov.base indicating a reduction in the coverage (from 7 to 6) at 205890. But I couldn't figure out why? Could you please explain what's happening at 205890 that makes ast_hic truncate the BED file there?

205863	205890	7
205891	205962	6

Thanks in advance,
Mobin

Add descriptions of expected output in .bed files

Hello,

I'm using your tool to evaluate assemblies produced with PacBio HiFi reads, 10x reads, and Omni-C reads.

Can you provide descriptions of the expected output columns in the various bed files?

My final punchlist bed file has only "C"s present in the final column, which I assume means "conflict". I don't have any "S", which I'm assuming means "supported".

If I knew what the upstream file fields are, I can possibly troubleshoot there.

Thanks!
Kim

Having only two sources of data

Hi there,
I generated an assembly using hifi reads along with hi-c data but there were no additional data e.g. 10x or bn. The assembly is of high quality with N50 of 220 Mb and is free of gaps. I tried the following codes to check mis-joins and mis-assemblies but the pchlst_scaf.bed is empty. Accordingly, the output is not reasonable and there are many C in the final results.

./bin/acc gaps.bed pb.bed > pb_acc.bed
./bin/pchlst -c gaps.bed pb_acc.bed > pchlst_ctg.bed

./bin/acc gaps.bed pb.bed hic2.bed > hic_acc.bed
./bin/pchlst -c gaps.bed hic_acc.bed > pchlst_scaf.bed

./bin/union_brks gaps.bed pchlst_{ctg,scaf}.bed > pchlst_final.bed

Can you please see the codes and let me know if there is any mistake.
I have 10x reads from another individual. This individual is related to main sample that was used for assembly. I wonder if it is possible to use sequences from another sample to asses the genome assembly.
Thank you

zlib.h: No such file or directory

I got a error when I have runned make,below this error.

gcc -g -Wall -D VERBOSE -O2 -D PRINT_COVERAGE -c -o paf.o paf.c
paf.c:1:10: fatal error: zlib.h: No such file or directory
#include <zlib.h>
^~~~~~~~
compilation terminated.
make: *** [paf.o] Error 1

my zlib.h path is /usr/include/zlib.h,but i dont know how to fix this problem,i hope you can help me ,thank you!!

ONT and HiC only

Hi Dengfeng,

My output with pchlst_final.bed file looks strange. Could you take a look at my workflow to ensure I have run it correctly? I've also attached the outputs

MY WORKFLOW WITH ONT AND HiC data only:

/nesi/nobackup/uoa02642/Rewa_Assemblies/asset/bin/detgaps gapclosed.fasta > gaps.bed
/nesi/nobackup/uoa02642/Rewa_Assemblies/asset/bin/split_fa gapclosed.fasta > split.fa
samtools faidx split.fa
bwa index split.fa
bwa mem -SP -B10 -t 18 split.fa /nesi/project/landcare02730/HiC_Data/Rewa_trimgalore_output/Rewa_trimgalore_adaptortrim/Rewarewa_HiC_R1_trimmed_val_1.fq /nesi/project/landcare02730/HiC_Data/Rewa_trimgalore_output/Rewa_trimgalore_adaptortrim/Rewarewa_HiC_R2_trimmed_val_2.fq | samtools view -b - > asset_aln.bam
/nesi/nobackup/uoa02642/Rewa_Assemblies/asset/bin/col_conts asset_aln.bam > links.mat
/nesi/nobackup/uoa02642/Rewa_Assemblies/asset/bin/ast_hic2 split.fa.fai links.mat > hic2.bed 2>ast_hic.log

minimap2 -x map-ont -t 18 *.fasta /nesi/nobackup/uoa02642/Rewa_Assemblies/rewarewa_rebasecalled.fasta > ont.paf
/nesi/nobackup/uoa02642/Rewa_Assemblies/purge_dups/src/pbcstat ont.paf
/nesi/nobackup/uoa02642/Rewa_Assemblies/purge_dups/src/calcuts PB.stat > ont.cutoffs 2>calcults.log
/nesi/nobackup/uoa02642/Rewa_Assemblies/asset/bin/ast_pb -m 5 -M 252 ont.paf > ont.bed
/nesi/nobackup/uoa02642/Rewa_Assemblies/asset/bin/acc gaps.bed hic2.bed ont.bed > hic2_ont.bed
/nesi/nobackup/uoa02642/Rewa_Assemblies/asset/bin/pchlst gaps.bed hic2_ont.bed > pchlst_scaf.bed
/nesi/nobackup/uoa02642/Rewa_Assemblies/asset/bin/pchlst -c gaps.bed hic2_ont.bed > pchlst_ctg.bed
/nesi/nobackup/uoa02642/Rewa_Assemblies/asset/bin/union_brks gaps.bed pchlst_scaf.bed pchlst_ctg.bed > pchlst_final.bed

Thanks for all the help and patience with this. Very much appreciated.

A
Re__Additional_query.zip

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.