vgp / vgp-assembly Goto Github PK

VGP repository for the genome assembly working group

License: Other

Python 29.58% Shell 44.17% Java 3.12% Perl 1.33% R 11.67% C 0.19% Makefile 1.38% Dockerfile 2.52% WDL 4.88% LiveScript 1.16%

vgp-assembly's People

Contributors

Stargazers

Watchers

vgp-assembly's Issues

Update bionano scaffold outputs to be gzipped

The Bionano scaffold apps don't produce .fasta.gz files, this is good to include in the next update.

asset bionano problem

in my project,and human CHM13 v1.0 assembly,I used asset to evaluate the structure of assembly,but it both present a problem about binano low support region ,the no good result show in blow:
in CHM13 v1.0 Assembly:

in the red square,the binano low support region covered the whole chromosome.I used your recommend parameter that requires regions to have at least 10 molecule coverage or 0.5 x (average molecule coverage). i dont know why ,and i try to change the parameter into 0.3,0.2,0.1, it didnt work. can you help me?
and this phenomenon also present my project ,i think this is common problem . this problem show in blow:

mitoVGP crashes on start

Hi,
when running mitoVGP for nanopore I am getting this error:
scripts/mtDNApipe: 3: set: Illegal option -o pipefail
scripts/blastMT: 3: set: Illegal option -o pipefail

Even when running the program with the test data:
./mitoVGP -a ONT -s Mastacembelus_armatus -i fMasArm1 -r mtDNA_Mastacembelus_armatus.fasta -t 24 -b variantCaller

Do you know what might be wrong?
Thanks,
Manuel

telomere prediction

Hi,

I wanted to run your telomere search pipeline, but I am not able to. I downloaded the github, and then tried running the telomere_analysis.sh, but i keep getting error that fasta file is not found, but I am giving the correct address. I adjusted the script and added path for the VGP_PIPELINE assuming that is causing the issue, but probably not. I am not sure if what I am doing is correct, and will really appreciate your help to guide me as how to run this program and get the results.

thank you so much,

regards
Amit

Problem install conda environment

Heys,

So I am trying to set up a new environment:
conda env create -f mitoVGP_conda_env_pacbio.yml

And I am getting this error:

ResolvePackageNotFound:
- r-base==3.2.2=0

Did anybody has a similar issue?
Thanks in advance!

Sex chromosome assembly

in your article ,only briefly to introduce the method about Sex chromosome assembly，if you can provide more detail information or flow about the Sex chromosome assembly. thank you for your help!

mitovgp trimmer failing

Hello!

Looking for help resolving the trimmer module of the mitovgp pipeline. The first trimmer script is not completing. From what I can tell there's no error generated. It appears that mummer successfully runs twice, but the expected ${FNAME}_polish2_10x1_trim1.fasta isn't generated at the end for map10x2 to proceed. Any suggestions to resolve this?

Here's the relevant standard output:

BEGIN1: 5
BEGIN2: 18010
END1: 15213

1: PREPARING DATA
2,3: RUNNING mummer AND CREATING CLUSTERS
# reading input file "Pectinophora_gossypiella_2/Pgos/assembly_MT_rockefeller/intermediate
s/trimmed/Pgos.tig00000001_polish2_10x1.ntref" of length 18007
# construct suffix tree for sequence of length 18007
# (maximum reference length is 536870908)
# (maximum query length is 4294967295)
# CONSTRUCTIONTIME /home/amanda.stahlke/.conda/envs/mitoVGP_pacbio/opt/mummer-3.23/mummer
Pectinophora_gossypiella_2/Pgos/assembly_MT_rockefeller/intermediates/trimmed/Pgos.tig0000
0001_polish2_10x1.ntref 0.00
# reading input file "/90daydata/project/ag100pest/Pgos/RawData/MT_Contig/Pectinophora_gos
sypiella_2/Pgos/assembly_MT_rockefeller/intermediates/trimmed/Pgos.tig00000001_polish2_10x
1_new.fasta" of length 18006
# matching query-file "/90daydata/project/ag100pest/Pgos/RawData/MT_Contig/Pectinophora_go
ssypiella_2/Pgos/assembly_MT_rockefeller/intermediates/trimmed/Pgos.tig00000001_polish2_10
x1_new.fasta"
# against subject-file "Pectinophora_gossypiella_2/Pgos/assembly_MT_rockefeller/intermedia
tes/trimmed/Pgos.tig00000001_polish2_10x1.ntref"
# COMPLETETIME /home/amanda.stahlke/.conda/envs/mitoVGP_pacbio/opt/mummer-3.23/mummer Pect
inophora_gossypiella_2/Pgos/assembly_MT_rockefeller/intermediates/trimmed/Pgos.tig00000001
_polish2_10x1.ntref 0.01
# SPACE /home/amanda.stahlke/.conda/envs/mitoVGP_pacbio/opt/mummer-3.23/mummer Pectinophor
a_gossypiella_2/Pgos/assembly_MT_rockefeller/intermediates/trimmed/Pgos.tig00000001_polish
2_10x1.ntref 0.03
4: FINISHING DATA


++++ running: map10x2 ++++

Species: -s Pectinophora_gossypiella_2

Species ID: -i Pgos

Contig number: -n tig00000001

Number of threads: -t 30

Working directory: Pectinophora_gossypiella_2/Pgos/assembly_MT_rockefeller/intermediates


--Generate sorted alignment:

Align...
Error: could not open Pectinophora_gossypiella_2/Pgos/assembly_MT_rockefeller/intermediate
s/trimmed/Pgos.tig00000001_polish2_10x1_trim1.fasta

I've attached a couple of other log files here - not sure what's most helpful. Thanks in advance for any ideas. I've posted this same issue at gf777/mitoVGP#1

Pgos_mtDNApipe_20210624-174115.log
Pgos_trimmer_20210624-222455.log
R-mtpipe.6056486.err.log
R-mtpipe.6056486.out.log

Amanda

Chromosome status estimation

hello ,in your article, you use the Hi-c data interaction signs to estimate the proportion of Chromosome status,but i dont know how to calculate this value.thank you for your help

plant mitochondrial genomes?

Hi,

I am wondering how well would mitoVGP work at assembling plant mitochondrial genomes? Plant mitogenomes are very long, non-conserved and often with multiple chromosomal structures. Thanks.

Possible missed documentation -- haplotype merging step

Hello,

Thank you for your fantastic pipeline and resources. I have been following many of the VGP (and mitoVGP)'s workflow to generate my own trio-binned assembly and our assembly wouldn't be nearly as high quality without you.

I am hoping to submit my genome in a similar manner to the Zebra Finch (https://www.ncbi.nlm.nih.gov/assembly/GCA_008822105.2), where there is a consensus genome from the haplotypes and separate submission of each parental genome. This being said, I can't seem to find how you completed the haplotype merging step within any of your papers/documentation. I've seen GATK and VCF-consensus as possible options but I would love to know what you did specifically. Similarly, in your final merged reference, did you ever use the two genomes to fill in left over gaps of one another?

Thanks again,
Dustin

mitoVGP crashes on start

Crashes on start:

$ ./mitoVGP -a pacbio -s Rana_sylvatica -i aRanSyl -r NC_027236.1.fasta -t 124 


++++                        mitoVGP v2.0                         ++++
++++ The Vertebrate Genomes Project Mitogenome Assembly Pipeline ++++
++++     Credit: Giulio Formenti [email protected]       ++++


Started at at: 2021-02-16 08-27-23

With command:

./mitoVGP

Using 'NC_027236.1.fasta' as reference

Genome size not provided, using reference genome size: 17592 bp

Command: sh -e scripts/mtDNApipe -s Rana_sylvatica -i aRanSyl -r NC_027236.1.fasta -g 17592 -t 124 -a pacbio 2>&1 | tee Rana_sylvatica/aRanSyl/assembly_MT_rockefeller/intermediates/log/aRanSyl_mtDNApipe_20210216-082724.out
scripts/mtDNApipe: 3: set: Illegal option -o pipefail
scripts/blastMT: 3: set: Illegal option -o pipefail

time of running MITOVGP

Hi,

I want to use the mitoVGP assembly to extract the full mitogenome for a group of birds. I have reference genome from the close species and also as you mention in your paper, I have the Pacbio (number of reads: 386038492, and length of reads: 53852369634) and 10x data (for R1, number of reads: 193019246, and length of reads: 29145906146 and, also the same for R2) for the one sample too.
Then I used this command to run the mitoVGP for my dataset:

./mitoVGP
-a pacbio
-s Taeniopygia_guttata
-i bTaeGut2
-r data_REF.fasta
-t ${NSLOTS:-1}
-1 $my_PACBIO_data
-2 $my_10xdata_R1 $ my_10xdata_R2
-b variantCaller

The first question, I am not pretty sure that this way that I used to introduce my dataset (10x data and PACBIO data) is correct or not?

And then it was running for more than 3 weeks, and the times that I set for this analyses was finished and it was stopped without any result.

I am wondering to know, normally how much time does this analyses need? And do you have any suggestion that help me to run this analyses?

I am looking forward to hearing from you

The best
Niloo

10X align

in your pipeline,the longrange software using fixed fastq format in blow：
_S*_L00*_R1_001.fastq.gz
_S*_L00*_R2_001.fastq.gz
if the 10x fastq data trim barcode or not ？

I got the average molecule coverage is 234 when I aligned 10X data by using longrange,but the mean depth only is 89 .I dont know what means the average molecule coverage, and is the value reasonable？

error in Asset

when I run asset/bin/ast_bion_bnx, I got the error show blow:
Segmentation fault (core dumped)

I show you my run script:
/home/Duhuipeng/zm/Biosoft/asset/bin/ast_bion_bnx exp_refineFinal1_r.cmap exp_refineFinal1_q.cmap exp_refineFinal1.xmap BMX_v122_BSPQI_0kb_0labels_key.txt > bionano_"saphyr""BspQI".bed 2>ast_bion_bnx"saphyr"_"BspQI".log

I dont know where is problem,thank you for your help!

Pipeline for Polishing Canu assembly with Illumina Reads

Hello, I'm sorry for the inconvenience, but is there a tutorial somewhere on how to polish a Canu assembly of Nanopore reads using Illumina reads? The Canu github linked me to here: https://github.com/VGP/vgp-assembly/tree/master/pipeline/freebayes-polish . However, I don't see a tutorial on how to exactly begin polishing.

Do I perhaps just use the following command: ./_submit_longranger_freebayes.sh assembly.fasta reference_genome.fasta along with putting the illumina reads in /data/rhiea/<genome>/genomic_data/?

Runtime?

Discussed in #58

^{Originally posted by ilante August 24, 2021}

A few practical questions:

What is the runtime of the vgp-assembly/pipeline/? I was unable to find it in the documentation.
How long would it approximately take to run the entire workflow of the vgp-assembly for a species with a ~ 300 Mb genome?
What would the computing cost roughly be?

Any rough estimates or pointers to the documentation would be highly appreciated!

QUESTION: Does mitoVGP need pacbio RAW data (BAM) files as input?

Or can it accept FASTQ files?

conda install ONT environment fails

Hi there

I wanted to try the pipeline on some insect data (Illumina + ONT reads).

I seem to have problems to use your conda yml file for ONT. I am getting this error:

Solving environment: failed
ResolvePackageNotFound:

gcc_impl_linux-64==7.3.0=habb00fd_2

I could not yet solve it. The pacbio install works.

If I change the listed problematic package to gcc_impl_linux-64=7.3.0 or gcc_impl_linux-64 then it comes up with loads! of conflicts.

Any ideas?

I also wonder whether I could use the pacbio pipeline using error-corrected ONT reads (fasta, corrected with ratatosk from ca 80X coverage)?

Thanks,
Eckart

genomescope evaluation

in my project,the Genome size, heterozygosity, and repeat contents were estimated with GenomeScope,but The results vary greatly.in order to make sure i got it right ,I download human HG002 to estimate those indicator.i know the human genome size approximate to 3100000000,but i dont know the size of genome repeat content. in your VGP webisites ,i see your team assemblied the human genome, could you tell me the size/rate of human repeat content ? thank you so much!

BLAST options error: File .fasta does not exist

Hello,
I want to use the mitoVGP assembly to extract the full mitogenome for a few bird species. While I was running mitoVGP , I encounter the same error while executing the following command:
(as given in https://github.com/VGP/vgp-assembly/blob/master/mitoVGP/README.md):
$./mitoVGP -a pacbio -s Mastacembelus_armatus -i fMasArm1 -r mtDNA_Mastacembelus_armatus.fasta -t 24 -b variantCaller

(command with bam data at the local disk):
$./mitoVGP -a pacbio -s Gallus_gallus -i bGalGal1 -r ./bGalGal1.MT.20191002.fasta -t 18 -1 /DATA2/rawdata/pacbio/PAPH_A_m64292e_230113_084435.subreads.bam -b variantCaller
ERROR:
New DB title: Mastacembelus_armatus/fMasArm1/assembly_MT_rockefeller/intermediates/canu/fMasArm1.contigs.fasta
Sequence type: Nucleotide
Keep MBits: T
Maximum file size: 1000000000B
BLAST options error: File Mastacembelus_armatus/fMasArm1/assembly_MT_rockefeller/intermediates/canu/fMasArm1.contigs.fasta does not exist

ERROR:
New DB name: /DATA2/black_francolin_analysis/software/mitoVGP/Gallus_gallus/bGalGal1/assembly_MT_rockefeller/intermediates/blast/bGalGal1.db
New DB title: Gallus_gallus/bGalGal1/assembly_MT_rockefeller/intermediates/canu/bGalGal1.contigs.fasta
Sequence type: Nucleotide
Keep MBits: T
Maximum file size: 1000000000B
BLAST options error: File Gallus_gallus/bGalGal1/assembly_MT_rockefeller/intermediates/canu/bGalGal1.contigs.fasta does not exist

Please help me resolve this and assemble the mitogenome, thank you so much!

Clarification in documentation: Polishing phased assemblies

Hello!

Thank you for this unparalleled pipeline and resource. I've been following the VGP for my own de novo trio-binned assembly with linked reads and HiC data and I was hoping for some clarification in the genome polishing step.

I'm able to get the code working, however reading your documentation and papers from the last three years I'm still unable to determine which 10X data you use in the longranger-freebayes step of polishing.

For trio-binning, I have long reads of the F1, and 10X linked reads for the F1, maternal, and paternal animals.

Do you:

keep the maternal/paternal trio-binnned genomes separate and polish the maternal genome with the maternal 10X reads?
haplotype the F1's 10X reads and polish the maternal and paternal genomes separately
Combine the maternal/paternal genomes in the final step and polish with freebayes H1 and H2 before getting a consensus?
Another combination I haven't thought of.

It would be hugely helpful if you could clarify this for me and perhaps add it to the documentation for others thinking of the same issue.

Thank you so much and I'm sorry if I missed it.

Best,
Dustin

how to use DNAnexus Workflow to assembly genome

I want to use your fArcCen1 Assembly Tutorial for training,but I dont know find your subject in DNAnexus websites.thank you !

vgp / vgp-assembly Goto Github PK

vgp-assembly's People

Contributors

Stargazers

Watchers

Forkers

vgp-assembly's Issues

Discussed in #58

A few practical questions:

Recommend Projects

Recommend Topics

Recommend Org