bcgsc / arcs Goto Github PK
View Code? Open in Web Editor NEW🌈Scaffold genome sequence assemblies using linked or long read sequencing data
License: GNU General Public License v3.0
🌈Scaffold genome sequence assemblies using linked or long read sequencing data
License: GNU General Public License v3.0
Hi !
we followed these steps in a pipeline in order to improve a draft genome related to an species of Planaria:
arcs -f ./DjScaff_fn120141213.fa -a ./bamfilename.txt
skipped .. unpaired reads
Warning: Skipped 531501621 unpaired reads. Read pairs should be consecutive in the SAM/BAM file.
{ "All_barcodes_unfiltered":2297724, "All_barcodes_filtered":1502436, "Scaffold_end_barcodes":33849, "Min_barcode_reads_threshold":50, "Max_barcode_reads_threshold":10000 }
=> Pairing scaffolds... Thu Jul 19 07:51:44 2018
=> Creating the graph... Thu Jul 19 07:51:44 2018
=> Writing graph file... Thu Jul 19 07:51:44 2018
Max Degree (-d) set to: 0. Will not delete any vertices from graph.
Writing graph file to ./pre-ref/DjScaff_fnl20141213.fa.scaff_s98_c5_l0_d0_e30000_r0.05_original.gv...
=> Creating the ABySS graph... Thu Jul 19 07:51:44 2018
=> Writing the ABySS graph file... Thu Jul 19 07:51:44 2018
=> Done. Thu Jul 19 07:51:45 2018
and output files:
DjScaff_fnl20141213.fa.scaff_s98_c5_l0_d0_e30000_r0.05_original.gv
which is empty
and
DjScaff_fnl20141213.fa.scaff_s98_c5_l0_d0_e30000_r0.05.dist.gv
which has a content like this:
digraph arcs {
2 "DjScaffold1+" [l=33983]
3 "DjScaffold1-" [l=33983]
4 "DjScaffold2+" [l=33879]
5 "DjScaffold2-" [l=33879]
6 "DjScaffold3+" [l=45345]
7 "DjScaffold3-" [l=45345]
8 "DjScaffold4+" [l=25271]
9 "DjScaffold4-" [l=25271]
10 "DjScaffold5+" [l=39436]
11 "DjScaffold5-" [l=39436]
12 "DjScaffold6+" [l=143049]
13 "DjScaffold6-" [l=143049]
14 "DjScaffold7+" [l=7664]
15 "DjScaffold7-" [l=7664]
16 "DjScaffold8+" [l=45170]
...
and obviously running the python script faced the below error:
File "/s/chopin/a/grad/asharifi/e/Planaria_10X/Fastq/For_10x_Denovo_Data/PG2103_03BE5/ref/refdata-DjScaff_fnl20141213/PG2103/outs/pre-ref/DjScaff_fnl20141213.fa.scaff_s98_c5_l0_d0_e30000_r0.05_original.gv", line 1
graph G {
^
SyntaxError: invalid syntax
Do you have any idea?
thanks
After I installed ARCS, I tryed to test example data:
root@user-lubuntu:/mnt/hgfs/SharedFolders/programas/arcs/Examples/arcs_test-demo# ./runARCSdemo.sh
Downloading sample Chromium read alignment .bam file and human genome assembly draft...
--2018-04-28 23:37:55-- ftp://ftp.bcgsc.ca/supplementary/ARCS/testdata/NA24143_genome_phased_namesorted.bam1.sorted.bam
=> “NA24143_genome_phased_namesorted.bam1.sorted.bam”
Resolvendo ftp.bcgsc.ca (ftp.bcgsc.ca)... 134.87.4.91
Conectando-se a ftp.bcgsc.ca (ftp.bcgsc.ca)|134.87.4.91|:21... conectado.
Acessando como anonymous ... Acesso autorizado!
==> SYST ... feito. ==> PWD ... feito.
==> TYPE I ... feito. ==> CWD (1) /supplementary/ARCS/testdata ... feito.
==> SIZE NA24143_genome_phased_namesorted.bam1.sorted.bam ... 23218461706
==> PASV ... feito. ==> RETR NA24143_genome_phased_namesorted.bam1.sorted.bam ... feito.
Tamanho: 23218461706 (22G) (não autoritário)
NA24143_genome_phased_name 40%[===============> ] 8,69G --.-KB/s in 4h 33m
2018-04-29 04:11:19 (556 KB/s) - Conexão de dados: Tempo esgotado para conexão; A conexão de controle está fechada.
Tentando novamente.
--2018-04-29 04:26:20-- ftp://ftp.bcgsc.ca/supplementary/ARCS/testdata/NA24143_genome_phased_namesorted.bam1.sorted.bam
(tentativa: 2) => “NA24143_genome_phased_namesorted.bam1.sorted.bam”
Conectando-se a ftp.bcgsc.ca (ftp.bcgsc.ca)|134.87.4.91|:21... conectado.
Acessando como anonymous ... Acesso autorizado!
==> SYST ... feito. ==> PWD ... feito.
==> TYPE I ... feito. ==> CWD (1) /supplementary/ARCS/testdata ... feito.
==> SIZE NA24143_genome_phased_namesorted.bam1.sorted.bam ... 23218461706
==> PASV ... feito. ==> REST 9335154088 ... feito.
==> RETR NA24143_genome_phased_namesorted.bam1.sorted.bam ... feito.
Tamanho: 23218461706 (22G), 13883307618 (13G) restantes (não autoritário)
NA24143_genome_phased_name 100%[++++++++++++++++=======================>] 21,62G 752KB/s in 5h 47m
2018-04-29 10:13:43 (651 KB/s) - A conexão de controle está fechada.
Tentando novamente.
--2018-04-29 10:28:45-- ftp://ftp.bcgsc.ca/supplementary/ARCS/testdata/NA24143_genome_phased_namesorted.bam1.sorted.bam
(tentativa: 3) => “NA24143_genome_phased_namesorted.bam1.sorted.bam”
Conectando-se a ftp.bcgsc.ca (ftp.bcgsc.ca)|134.87.4.91|:21... conectado.
Acessando como anonymous ... Acesso autorizado!
==> SYST ... feito. ==> PWD ... feito.
==> TYPE I ... feito. ==> CWD (1) /supplementary/ARCS/testdata ... feito.
==> SIZE NA24143_genome_phased_namesorted.bam1.sorted.bam ... 23218461706
O arquivo já foi obtido.
2018-04-29 10:28:47 (0,00 B/s) - “NA24143_genome_phased_namesorted.bam1.sorted.bam” salvo [23218461706]
--2018-04-29 10:28:47-- ftp://ftp.bcgsc.ca/supplementary/ARCS/testdata/hsapiens-8reformat.fa
=> “hsapiens-8reformat.fa”
Resolvendo ftp.bcgsc.ca (ftp.bcgsc.ca)... 134.87.4.91
Conectando-se a ftp.bcgsc.ca (ftp.bcgsc.ca)|134.87.4.91|:21... conectado.
Acessando como anonymous ... Acesso autorizado!
==> SYST ... feito. ==> PWD ... feito.
==> TYPE I ... feito. ==> CWD (1) /supplementary/ARCS/testdata ... feito.
==> SIZE hsapiens-8reformat.fa ... 3074865858
==> PASV ... feito. ==> RETR hsapiens-8reformat.fa ... feito.
Tamanho: 3074865858 (2,9G) (não autoritário)
hsapiens-8reformat.fa 100%[=======================================>] 2,86G 459KB/s in 86m 58s
2018-04-29 11:55:48 (575 KB/s) - “hsapiens-8reformat.fa” salvo [3074865858]
Running ARCS...
Converting graph for LINKS...
Traceback (most recent call last):
File "./makeTSVfile.py", line 96, in <module>
readGraphFile(infile)
File "./makeTSVfile.py", line 15, in readGraphFile
with open(infile, 'r') as f:
IOError: [Errno 2] No such file or directory: 'hsapiens-8reformat.fa.scaff_s98_c5_l0_d0_e30000_r0.05_original.gv'
Running LINKS...
run complete
I am trying to improve rat reference genome rn6 with 10x data. I ran longranger basic
on my R1 and R2 files to get a barcoded.fastq.gz file. I checked the file to make sure it has the BX:Z:barcode-1 after the @e in the fastq file (with a space in between).
$ cat summary.csv
barcode_diversity,bc_on_whitelist,num_read_pairs
1064967.05402,0.933687055596,394846319
I already ran tigmint
and have a rn6_after_tigmint.fa file. I also have a phased_possorted_bam.bam file from longranger. So far, I have done the analysis a few ways and got the same results: the scaffold.fa is basically the same as the rn6_after_tigmint.fa file, with one contig for one scaffold. I ran assembly-stats
on both files and the stats are the same. The
.assembly_correspondence.tsv file contains NA for all the rows in the last 3 columns.
$ tail *correspondence.tsv
scaffold9553 7080 7080 f NA NA NA
scaffold9554 5570 5570 f NA NA NA
scaffold9555 3375 3375 f NA NA NA
scaffold9556 2162 2162 f NA NA NA
scaffold9557 8135 8135 f NA NA NA
scaffold9558 3363 3363 f NA NA NA
scaffold9559 7157 7157 f NA NA NA
scaffold9560 6728 6728 f NA NA NA
scaffold9561 1136 1136 f NA NA NA
scaffold9562 7468 7468 f NA NA NA
The analysis I've done:
I am suspecting the barcodes are not being read into the pipeline but not sure how to check it.
Any insights are highly appreciated!
arcs 1.0.5
tigmint 1.1.2
LINKS 1.8.6
longranger 2.2.2
Hi,
It's more a question about how to use arcs than to report an issue.
I see arcs need bam files and fasta file as input. I understand the fasta file is the scaffolds sequences but i don't really understand what are the bam files ?
Are they chromium "linked-reads" output ? From which softwares ? supernova produces fasta files, otherwise, i see that cellranger produces bam files.
Can you explain me how i can generate bam files from chromium data ?
cheers
Hi,
I found that chromium sequencing have more bias (GC bias, PCR duplicates) than standard WGS. In the manuscript, it mentioned that:
These form a link between the two sequences, provided that there is sufficient number of read pairs aligned (-c, set to 5 by default)
So I wonder if ARCs will consider PCR duplicates during scaffolding. If not, is it better to remove duplicates before scaffolding using Arcs?
Best,
Danshu
I working on a plant genome (~220 Mb)
My starting point is an phased assembly from Pacbio reads with Falcon and Falcon Unzip (774 contigs)
Important : I have renamed the contigs names by simple numeric identifier :
>1
>2
...
>774
What I have done so far :
1/ longranger basic on the reads.fastq from 10X genomics
2/ Tigmint correction with the following commands :
samtools faidx draft.fa
bwa index draft.fa
bwa mem -t8 -p -C draft.fa reads.fq.gz | samtools sort -@8 -tBX -o draft.reads.sortbx.bam
tigmint-molecule draft.reads.sortbx.bam | sort -k1,1 -k2,2n -k3,3n > draft.reads.molecule.bed
tigmint-cut -p8 -o draft.tigmint.fa draft.fa draft.reads.molecule.bed
3/ When I try to run ARCS on the output from tigmint like with :
arcs -f draft.tigmint.fa -a file_of_bamfile -c 5 -e 30000 -r 0.05
I have the following error :
=> Reading alignment files... Mon Jan 21 11:07:40 2019
error: unexpected sequence: 1 of size 2459890+ draft.fa.c5_e30000_r0.05.tigpair_checkpoint.tsv draft.fa
/var/spool/slurmd/job108691/slurm_script: line 50: draft.fa.c5_e30000_r0.05.tigpair_checkpoint.tsv: command not found
4/ When I try to run ARCS without Tigmint (directly on my draft assembly) like this :
arcs -f draft.fa -a file_of_bamfile -c 5 -e 30000 -r 0.05
I have this log :
=> Reading alignment files... Mon Jan 21 10:15:51 2019
Warning: Skipping an unpaired read. Read pairs should be consecutive in the SAM/BAM file.
Prev read: H9:1:HV2KYBBXX:6:1209:9090:3952
Curr read: H9:1:HV2KYBBXX:6:2216:4482:43867
Warning: Skipped 1000000 unpaired reads.
Warning: Skipped 2000000 unpaired reads.
Warning: Skipped 3000000 unpaired reads.
...
Warning: Skipped 210000000 unpaired reads.
Warning: Skipped 210477149 unpaired reads. Read pairs should be consecutive in the SAM/BAM file.
{ "All_barcodes_unfiltered":1958917, "All_barcodes_filtered":1085640, "Scaffold_end_barcodes":1008556, "Min_barcode_reads_threshold":50, "Max_barcode_reads_threshold":10000 }
=> Pairing scaffolds... Mon Jan 21 10:39:45 2019
=> Creating the graph... Mon Jan 21 10:39:50 2019
=> Writing graph file... Mon Jan 21 10:39:50 2019
Max Degree (-d) set to: 0. Will not delete any vertices from graph.
Writing graph file to draft.fa.scaff_s98_c5_l0_d0_e30000_r0.05_original.gv...
=> Creating the ABySS graph... Mon Jan 21 10:39:50 2019
=> Writing the ABySS graph file... Mon Jan 21 10:39:50 2019
Thanks for your help
Hi Thanks a lot for this package! I found a small change for running BWA in parallel that might help:
arks-make command on line 138 is:
/usr/bin/time -v sh -c 'bwa mem -t$t -C -p $< $(reads).fq.gz | samtools view -Sb - | samtools sort -@$t -n - -o $@' |& tee $(patsubst %.sorted.bam,%,$@)_bwa_mem.log
should be:
/usr/bin/time -v sh -c 'bwa mem -t$(threads) -C -p $< $(reads).fq.gz | samtools view -Sb - | samtools sort -@$t -n - -o $@' |& tee $(patsubst %.sorted.bam,%,$@)_bwa_mem.log
Hi guys,
I recently mapped the demultiplexed reads from 10X chromium to a reference genome using bowtie2. As recommended the barcodes were attached to the read name as READNAME_BARCODE.
After the mapping finished I converted the sam file into a bam file using samtools and then used the bam for finding the links between contigs in a raw assembly. However I keep getting the following error:
On line 940000000
On line 950000000
On line 960000000
On line 970000000
On line 980000000
On line 990000000
On line 1000000000
On line 1010000000
On line 1020000000
On line 1030000000
On line 1040000000
error: `/data/Bioinfo/bioinfo-proj-jmontenegro/DENOVO/Dunnart/Results/Mappings/Bowtie/Abyss/10X_abyss.sam': Expected end-of-file and saw `2 127M = 2707 -246 AGAAAATCTATGTGTGGGAATCTTCAGATAATCGTATGATTGAGCTTTACTAGCAACCATGGTATGGAGGTCATGGATTTACCCATTTCCCAAAGGAGAAATTCAGTGTTTATTGCCTTCTTAGAAT IHFFCIHHHDHHCHIIIIHGIIIIIIGIHIHIIIIIIHIIIIIIIIHHFHHIIIIHIHHFIIIIIIIIIIIHGHGIIIIIHGIHIIIIHHHIHHIIIIIIIHHHIGIIIIIIIIIHIIIIIHIIIII AS:i:-5 XN:i:0 XM:i:1 XO:i:0XG:i:0 NM:i:1 MD:Z:58T68 YS:i:-21 YT:Z:CP'
However I cannot see any issues in the bam file at all. At least while converting to bam, samtools did not detect any issues. Have you seen this error before?
Cheers,
Hi,
I am trying to run arcs on an anopheles genome (300Mbp size), but the graph I get is empty:
graph G {
}
I was hoping you could help me understand what am I doing wrong, or how I can tweak the parameters to get some results?
I am trying to scaffold a SOAPdenovo assembly using 10X genomics reads, and I am using the pipeline_example.sh script you provided with the pipeline.
I don't see any error or warning messages in the pipeline printout (in attachment).
The bam file has been generated by bwa with standard parameters and then ordered by read name.
By the way, just to make sure: my initial read-names include "/1" or "/2" for read 1 or read 2, but I noticed that after bwa all reads are called "/1", and the information on read1 or read2 is included in the bam flag: can the "/1" cause issues with arcs?
Example of a read name: ST-E00143:242:HW7TJCCXX:2:1101:1407:29349/1_AGTTGGTAGTGCGCCT
An example of an alignment is:
ST-E00143:242:HW7TJCCXX:2:1101:1407:29349/1_AGTTGGTAGTGCGCCT 65 2R 61455622 60 128M = 61455622 0 GCAGTTTGCAATGGAGCGGATTTAACCTACAGTGTGGTGTGCAGCCGTTGCACTCGAACATTTCATCAACAGTGCACTAACGTGGATGACTCAGTGTATGACCTACATTGGTTCTGTGACATGTGCAC JJJJF<JJJFJJFJJFF<AFA7FJJJFJJJJF--FJFJ7FA-F<AJJJJFJFFJFFJJJJJJJJJJJJJ<JJJFJJJJJFJJ<F<JJJJJJJFFF777AFAJ7<-FF-77F--JFJJ-AFJ<AAAJF) NM:i:1 MD:Z:22A105 AS:i:123 XS:i:93
Another question I have: about the bam file your instructions say that:
" index must be included in read name e.g read1_indexA"
can you please specify what is this index? Is it the 10X genomics barcode?
Thank you very much!
Hello ARCS team,
I was wondering if I could get a recommendation on benefits of multiple 10X libraries vs. increased depth per library for ARCS / LINKS scaffolding.
Basically, we would like to increase our scaffolding potential by generating two lanes of 10X data for a genome ~2.5 Gbp (Contig N50 ~100kb). I'm wondering, in your experience, if that is better served with 1) two libraries, each on their own sequencing lane, or 2) 1 library sequenced across two lanes? It would seem to me that option 1 would allow for a greater number of links to span the gap (-l in LINKS), but at the expense of the minimum aligned read pairs per barcode mapping (-c in ARCS). In your opinion, which parameter would be more important in terms of scaffolding, and do you have a recommendation on coverage per library or reads per barcode required to achieve a suitable number of aligned reads (-c in ARCS)? It looks like from the Barcode counts output from ARCS on the first lane we've done, we have an average ~280 reads per barcode.
Thanks for you input and help,
Eric
I ran the entire arcs-make (arcs-1.0.5) pipeline on data
from a small marine organism. The makefile was a great help!
Every stage completed successfully according to the output.
However, there is very little change in the draft genome
after running the pipeline. (See Ml.fa, below, the original
draft genome and Ml_c5_m50-10000_s98...scaffolds.fa the final
output from the pipeline)
Can you suggest some stages of the pipeline I can focus on to
to improve the process.
The reads are 10x linked reads prepared by longranger basic
with this summary:
barcode_diversity, bc_on_whitelist, num_read_pairs
188166.842687, 0.896626906719, 110838678
Also included below is a summary of the supernova assembly
of these reads (for iformation only, assembly data not a
part of the the arcs process.)
------------- Ml.fa -------------
number sequences: 5,101
max sequence length: 1,222,598
min sequence length: 987
median sequence length: 1,772
total number residues: 155,875,873
N50: 187314
--Ml_c5_m50-10000_s98_r0.05_e30000_z500_l5_a0.3.scaffolds.fa --
number sequences: 5,084
max sequence length: 1,222,598
min sequence length: 987
median sequence length: 1,764
total number residues: 155,877,573
N50: 189805
(Suoernova assembly of reads .. for information only!
------------- Mlpseudohap.fasta -------------
number sequences: 25,046
max sequence length: 102,547
min sequence length: 1,000
median sequence length: 3,008
total number residues: 145,149,952
N50: 10665
Hi,
I got some problems when using arcs:
error: `@SQ SN:000000F_pilon LN:62335206': No such file or directory
My command is:
./arcs -f Pilon.fasta -a test.bam -s 98 -c 5 -l 0 -d 0 -r 0.05 -e 30000 -m 20-10000 -v
It gets same errors whenever ALIGNMENTS is sam or bam file. What's going on?
Thank you very much!
Jing
In your demo and my own genome, all the mean gaps are estimated to 10, Here is an example:
scaffold77,942471,f15191z72983k40a0.07m10_f13167z221844k4a0.5m10_f15280Z481647k5a0.6m10_f530z165997
scaffold78,1050735,r8877z5568k5a0.6m10_r984z132808k2a0.5m10_f127z174327k6a0.5m10_f128Z473863k23a0.17m10_f129z264169
scaffold79,571225,r15084z29962k16a0.12m10_f165z72736k2a0.5m10_f785Z468527
Does this mean LINKS don't do the estimation? and under what circumstances will this function work?
g++ -DHAVE_CONFIG_H -I. -I.. -I/home/zhut/src/arcs-1.0.1/Arcs -I/home/zhut/src/arcs-1.0.1/Common -I/home/zhut/src/arcs-1.0.1/DataLayer -I/home/zhut/src/arcs-1.0.1 -I/home/zhut/src/arcs-1.0.1 -isystem/home/zhut/src/arcs-1.0.1/1_58_0 -Wall -Wextra -Werror -std=c++0x -fopenmp -g -O2 -MT arcs-Arcs.o -MD -MP -MF .deps/arcs-Arcs.Tpo -c -o arcs-Arcs.o `test -f 'Arcs.cpp' || echo '/home/zhut/src/arcs-1.0.1/Arcs/'`Arcs.cpp
In file included from /usr/lib/gcc/x86_64-redhat-linux/4.4.7/../../../../include/c++/4.4.7/backward/hash_set:60,
from /usr/include/boost/graph/adjacency_list.hpp:25,
from /usr/include/boost/graph/undirected_graph.hpp:11,
from Arcs.h:20,
from Arcs.cpp:2:
/usr/lib/gcc/x86_64-redhat-linux/4.4.7/../../../../include/c++/4.4.7/backward/backward_warning.h:28:2: error: #warning This file includes at least one deprecated or antiquated header which may be removed without further notice at a future date. Please use a non-deprecated interface with equivalent functionality instead. For a listing of replacement headers and interfaces, consult the file backward_warning.h. To disable this warning use -Wno-deprecated.
In file included from Arcs.cpp:2:
Arcs.h:86: error: using ‘typename’ outside of template
Arcs.h:100: error: using ‘typename’ outside of template
In file included from Arcs.cpp:3:
../Arcs/DistanceEst.h:49: error: using ‘typename’ outside of template
../Arcs/DistanceEst.h:53: error: using ‘typename’ outside of template
../Arcs/DistanceEst.h:75: error: using ‘typename’ outside of template
../Arcs/DistanceEst.h: In function ‘void buildPairToBarcodeStats(const ARCS::IndexMap&, const std::unordered_map<std::basic_string<char, std::char_traits<char>, std::allocator<char> >, int, std::hash<std::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<const std::basic_string<char, std::char_traits<char>, std::allocator<char> >, int> > >&, const ARCS::ContigToLength&, const ARCS::ArcsParams&, PairToBarcodeStats&)’:
../Arcs/DistanceEst.h:216: error: using ‘typename’ outside of template
../Arcs/DistanceEst.h: In function ‘void addEdgeDistances(const PairToBarcodeStats&, const JaccardToDist&, const ARCS::ArcsParams&, ARCS::Graph&)’:
../Arcs/DistanceEst.h:401: error: expected initializer before ‘:’ token
../Arcs/DistanceEst.h:430: error: expected primary-expression before ‘}’ token
../Arcs/DistanceEst.h:430: error: expected ‘;’ before ‘}’ token
../Arcs/DistanceEst.h:430: error: expected primary-expression before ‘}’ token
../Arcs/DistanceEst.h:430: error: expected ‘)’ before ‘}’ token
../Arcs/DistanceEst.h:430: error: expected primary-expression before ‘}’ token
../Arcs/DistanceEst.h:430: error: expected ‘;’ before ‘}’ token
cc1plus: warnings being treated as errors
../Arcs/DistanceEst.h: At global scope:
../Arcs/DistanceEst.h:393: error: unused parameter ‘pairToStats’
../Arcs/DistanceEst.h:393: error: unused parameter ‘params’
../Arcs/DistanceEst.h:393: error: unused parameter ‘g’
../Arcs/DistanceEst.h: In function ‘void writeDistTSV(const std::string&, const PairToBarcodeStats&, const ARCS::Graph&)’:
../Arcs/DistanceEst.h:457: error: expected initializer before ‘:’ token
Arcs.cpp:1054: error: expected primary-expression at end of input
Arcs.cpp:1054: error: expected ‘;’ at end of input
Arcs.cpp:1054: error: expected primary-expression at end of input
Arcs.cpp:1054: error: expected ‘)’ at end of input
Arcs.cpp:1054: error: expected statement at end of input
Arcs.cpp:1054: error: expected ‘}’ at end of input
../Arcs/DistanceEst.h: At global scope:
../Arcs/DistanceEst.h:433: error: unused parameter ‘pairToStats’
../Arcs/DistanceEst.h:433: error: unused parameter ‘g’
make[2]: *** [arcs-Arcs.o] Error 1
make[2]: Leaving directory `/home/zhut/src/arcs-1.0.1/Arcs'
make[1]: *** [all-recursive] Error 1
make[1]: Leaving directory `/home/zhut/src/arcs-1.0.1'
make: *** [all] Error 2
Lauren wrote earlier:
As a side note we do have a Makefile (Examples/arcs-make) which runs the whole ARCS pipeline, >> taking the reads and the draft assembly as input. Even if you don't want to use the Makefile, I'd
recommend taking a look to get an idea of the required steps.
Where do I find Examples/arc-make?
I have looked in:
http://www.bcgsc.ca/downloads/supplementary/ARCS/
https://github.com/bcgsc/arcs/releases/download/v1.0.5/arcs-1.0.5.tar.gz
Hello,
I am getting an error when trying to run makeTSVfile.py and I was wondering if you had any ideas on what might be the problem.
Many thanks in advance,
Steve
$ python ~/10X_GENOMICS/arcs/Examples/makeTSVfile.py arcs_test_original.gv arcs_test_original.tigpair_checkpoint.tsv canu_sprai_corrected.consensus.fasta
Output to stdout:
Traceback (most recent call last):
File "/10X_GENOMICS/arcs/Examples/makeTSVfile.py", line 95, in
makeLinksNumbering(args.fasta_file)
File "/10X_GENOMICS/arcs/Examples/makeTSVfile.py", line 26, in makeLinksNumbering
links_numbering[test.group(1)] = str(counter)
AttributeError: 'NoneType' object has no attribute 'group'
Hi,
I'm currently evaluating the scaffolding capacity of Arcs on a genome that I'm working on currently. The genome has been assembled by Canu using RSII reads and is pretty repeat rich. I'm attaching the stats for the assembly (>1kb contigs) here:
num_seqs 26490
sum_len 898447105 (898Mb)
min_len 2002 (2kb)
avg_len 33916.5 (33.91kb)
max_len 1141553 (1.14Mb)
Q1 14867 (IQR1 14.86Kb)
Q2 21309 (IQR2 21.3 Kb)
Q3 32834 (IQR3 32.83Kb)
N50 44537 (44.5Kb)
I'm wondering what should be adequate end-length parameter (-e) for this assembly and how can I tune it. I went through the paper especially through this part and I'm confused:
Thus, depending on the level of contiguity of the input assembly, adjusting –e to a lower or higher value would account for shorter contigs or focus on longer contigs. When ARCS encounters shorter sequences (less than twice the specified –e length), the length of the head and tail regions are assigned as half the total sequence length. This is important, as the selection of –e will impact how ambiguity is mitigated when creating an edge between any two sequences.
So, in order to efficiently scaffold my assembly should I drop the -e parameter to the IQR2 length rather than the default 30Kb? How do I tune the parameters for an extremely fragmented assembly?
Harish
Hi,
I have the following problem in running Arcs:
ERROR! BAM file should be sorted in order of read name. Exiting...
Prev Read: ST-E00129:523:HC2HTALXX:3:1101:1083:65283_ATGGCCGTCACAGCCG; Curr Read: ST-E00129:523:HC2HTALXX:3:1101:1083:65318_AACCCTCCATGACGGA
I mapped and sorted the bam file following the instructions. I removed duplicate reads using MarkDuplicates and then I have this error in runing Arcs. I sorted my Bam file again using "samtools sort -n" or "picard.jar SortSam SORT_ORDER=queryname" but didn't solve the problem.
Best,
Danshu
Hi ARCS developers,
I have a draft assembly based on PacBio data and I used ARCS+LINKS pipeline to scaffold it with 10x data. I tuned ARCS parameter -r and -e and aligned scaffolds of different parameter sets to the optical map. The alignments to the optical map seem to show some mis-scaffolding. I read the pre-print paper and there is not much discussion on the mis-scaffolding. How may it happen?
Besides of tuning -r and -e, what other parameters may also be helpful to improve scaffolding based on my data?
Thanks a lot
Qihua
Hi,
We're trying to improve our extremely fragmented fish genome assembly (number of scaffolds = 175785, scaffold N50 = 8335) using ARCS and 10x data.
We have tested different parameters of ARCS and found that the following adjustments seem to work better with our data: s 98 -c 3 -l 0 -m 25-20000 -d 0 -r 0.05 -e 4000 -z 400. With LINKS we have used quite relaxed parameters: -l 2 -t 2 -a 0.9 -x 1 -z 400.
We have observed a substantial improvement after first run of ARCS+LINKS, number of scaffolds have been reduced to 158971, and scaffold N50 increased to 12248. We repeated the pipeline 4 times, every time using newly scaffolded assembly as a reference.
Now after four rounds we've got 134882 scaffolds with scaffold N50 = 95581. BUSCO analysis is also showing an improvement, completeness has been increased from 53.1% to ca 70%.
The question is: is it sensible to run several rounds of ARCS+LINKS pipeline?
P.S. We're also running Supernova, however, without any result for more than 3 weeks...
hello,
I am wondering if I could use ARCS to detect misassemblies in my scaffold-level diploid assembly (produced from PE, MP and 10x data by NRGene), in particular with the caveat that I would be using the same 10x data that built the scaffolds.
If it won't work, do you have any idea of what I could use?
Thanks,
Dario
I am attempting to scaffold some output from HGAP which marks sequence as upcase or lowercase by default as part of output. I mapped my input sequences using BWA MEM but ran into the error error: mismatched sequence lengths: sequence 004574F|arrow: 0 != 4531
and realised your definition of sequence length is different to BWAs because you exclude softmasked repeats from your length and this causes issues. Mainly because you exclude masked lowercase sequence, and BWA does not but you then check @SQ
lengths against the FASTA length by your definition. What is the correct solution to this? It might be helpful to document for other lost travellers such as myself.
I'm running Arcs with Tigmint. It runs about 16 hours and crashes. I'm using Samtools-1.9. Longranger was used to create the interleaved barcoded file from a Chromium run. Taro.contigs.fa is assembled from Nanopore runs using Canu. I'm running on a single node with 20 cores and 128GB RAM.
arcs-make arcs-tigmint draft=Taro.contigs reads=barcoded
End part of the error file is below:
[M::mem_process_seqs] Processed 573478 reads in 662.421 CPU sec, 81.837 real sec
[M::process] 0 single-end sequences; 573478 paired-end sequences
[E::bgzf_flush] [E::bgzf_flush] File write failed (wrong size)[E::bgzf_flush] File write failed (wrong size)
[E::bgzf_flush] File write failed (wrong size)
[E::bgzf_flush] File write failed (wrong size)
[E::bgzf_flush] File write failed (wrong size)
File write failed (wrong size)
[E::bgzf_close] File write failed
[E::bgzf_close] File write failed
[E::bgzf_close] File write failed
[E::bgzf_flush] File write failed (wrong size)
[E::bgzf_flush] File write failed (wrong size)
[E::bgzf_close] File write failed
[E::bgzf_close] File write failed
[E::bgzf_close] File write failed
[E::bgzf_close] File write failed
[E::bgzf_close] File write failed
samtools sort: failed to create temporary file "/tmp/Taro.contigs.barcoded.sortbx.bam.FfJGIn.0224.bam": No space left on device
samtools sort: failed to create temporary file "/tmp/Taro.contigs.barcoded.sortbx.bam.FfJGIn.0225.bam": No space left on device
samtools sort: failed to create temporary file "/tmp/Taro.contigs.barcoded.sortbx.bam.FfJGIn.0226.bam": No space left on device
samtools sort: failed to create temporary file "/tmp/Taro.contigs.barcoded.sortbx.bam.FfJGIn.0227.bam": No space left on device
samtools sort: failed to create temporary file "/tmp/Taro.contigs.barcoded.sortbx.bam.FfJGIn.0228.bam": No space left on device
samtools sort: failed to create temporary file "/tmp/Taro.contigs.barcoded.sortbx.bam.FfJGIn.0229.bam": No space left on device
samtools sort: failed to create temporary file "/tmp/Taro.contigs.barcoded.sortbx.bam.FfJGIn.0230.bam": No space left on device
samtools sort: failed to create temporary file "/tmp/Taro.contigs.barcoded.sortbx.bam.FfJGIn.0231.bam": No space left on device
time user=478289.20s system=444.58s elapsed=59684.76s cpu=802% memory=4870 job=bwa mem -t8 -pC Taro.contigs.fa barcoded.fq.gz
time user=952.14s system=103.13s elapsed=59684.72s cpu=1% memory=10 job=samtools view -u -F4
time user=3901.95s system=245.10s elapsed=59684.72s cpu=6% memory=6619 job=
make[1]: *** [Taro.contigs.barcoded.sortbx.bam] Error 1
make: *** [Taro.contigs.tigmint.fa] Error 2
HI @warrenlr,
I'm trying to generate the right formatting for the fastq files. I've not found any example, so I tried to deduce it from the bam file NA24143_genome_phased_namesorted.bam1.sorted.bam
.
Is this the right way?
@NB501445:179:H2YG7BGX7:2:11312:9490:14604_AAAAAAAAAGCAATTT
AAATTAAGTGATATATCCTCTTACGAAGTTGTTTCGGTTTAAAATTATTGTTATTATTTTTTAATGATGTGCAGAAAGTAAAGCAACATGAAGCTTTATTTACTACCAACTTCATTTCCCTTCAATA
+
EEEEEEEEEEEEEEEEEEEEEEEEEE6EEEEEEEEAE<AEA/A/EE/EEAA</EE/EEEEEEE/EEEEEEE//<EE/EEEEEEEEEEEEEE/EAEAEEEAEE<EAE<</<EAEEAEEE</A/E</EE
@NB501445:179:H2YG7BGX7:3:21602:1860:16340_AAAAAAAAATTATAAA
AAAAAAATGTTACCCACAATCCCATGATTTGCCATGCATGTATATAGGGACATATCCAATGATAGACATCGCGTTGTTACCGCGTCAACGGGGCTCTAGTAAGAATAAATTCGCAGTCACAGCTTTG
+
EEEEEEEEE/EE//E/E/E//6A/EE/A</EEAEEEE/E/<E/EEE<///EE//A/EE/EE/AEEEEEAAE6/EE/EA//EEEEEE<EEA</EE/</EE//AEE//<E/EA<AEA</E/AA/E/E/A
and this is the command to align them with bwa
:
bwa mem -t32 $ASSEMBLY.fasta $ARCSR1.gz $ARCSR2.gz | samtools view -Sb - | samtools sort -@ $THREADS -n - > $SPECIES.LinkedReads.bam
Correct?
Best
F
Hello arcs team,
We were able to make a very good genome assembly by using Tigmint+ARCS.Now we are looking to create a AGP file for our ARCS assembly.Is it possible to create a AGP file for ARCS assembly out put?
Hello,
I used ARCS-LINKS for genome scaffolding. When I choose different c, a ,l value,
I got different assembly results. what is the mean of c, a, l. it is more is better or accuracy?
Thanks,
Fuyou
Hi,
Thanks for you help and I have successfully finished this scaffolding pipeline. Now I want to understand better how Arcs works and what parameters fits my genome. Are there any instructions for this? And may I know when will you paper been published or available online as preprint?
Best,
Danshu
In the code and manual it states that the end length that is being considered is 30k by default :
Line 45 in e75619f
But in the place where it is being set, it is set to zero :
Lines 65 to 80 in ecd98d1
And where it is used, that zero is translated to "use half of the contig length" :
Lines 306 to 314 in e75619f
I don't know which is best, the default (0) might work fine for some projects, although I see that setting it to another number works better in the ones that I have tested so far.
But I think it is better to change either the default to 30K, or to change the manual and code to show the effect of using 0 as default.
Hi,
I'm trying to use arcs pipeline and everything seems to run without error. But I got empty original.gv file. I followed a pipeline aim to Acropora millepora Genome Assembly (Zachary L. Fuller, Yi Liao, Line Bay, Mikhail Matz & Molly Przeworski. August 2, 2018)
(1)
Firstly, I used longranger-basic to deal with 10X data reads:
./longranger basic --id=Ass --fastqs=./10XData/ --sample=10X
It got me output: Ass/outs/barcoded.fastq.gz.
Then convert the fastq file:
gunzip -c Ass/outs/barcoded.fastq.gz | perl -ne 'chomp;$ct++;$ct=1 if($ct>4);if($ct==1) if(/(@\S+)\sBX:Z:(\S{16})/){$flag=1;$head=$1."".$2;print "$head\n";}else{$flag=0;}}else{print "$_\n" if($flag);}' > CHROMIUM_interleaved.fastq.
The output fastq is just like this:
###############################################
(The first read_1_1)
@CL100089928L2C009R030_475184AAACACCAGACAATAC
AAAGTGATTGCGGTGCTGATTGTTGTAGTCTGTTGTTTACGGTATTCCC........
+
<CFD8D>DFA3@9=DCFC>@,?9C=3BB;FFE8FF0@5F9DEEBDFB..........
(The paired read_1_2)
@CL100089928L2C009R030_475184AAACACCAGACAATAC
ACTGTAGAGAGCAGTTAAGACACTGGTAGGAGAGGAGAGGATTAAT........
+
FEBDFEEED@DD/FFFACEACEEFE9EFDCEEAB>7C.E6EEFFEFEFFEEFF;.....
################################################
The fastq format is correct, but what concerns me is that the barcode is added to the behind of reads name. The arcs results show that it cannot identify the barcode. I don't know what kind of fastq the arcs needs. Are there some problems?
(2) Then I mapped CHROMIUM_interleaved.fastq using bwa the the draft assembly.
bwa index --> bwa mem --> samtools sort -n .. > 10x.bam
(3) Then using ARCS:
./arcs -f Polished.Scaff10x_R2-renamed.fa -a AlignmentFile -s 98 -c 5 -l 0 -d 0 -r 0.05 -e 30000 -m 20-10000 -v > arcs.log
In the AlignmentFile is the file of 10x.bam. Then the output of original.gv file has nothing. The log file shows that:
###############################################
.....
.....
{ "All_barcodes_unfiltered":0, "All_barcodes_filtered":0, "Scaffold_end_barcodes":0, "Min_barcode_reads_threshold":20, "Max_barcode_reads_threshold":10000 }
.....
.....
Max Degree (-d) set to: 0. Will not delete any vertices from graph.
.....
.....
###############################################
Sorry for such long text. Could you please help with the problem?
Best~
Jing
Hi,
I'm trying to run ARCS in different depth of 10x reads, and find the scaffold N50 become smaller while increasing the reads depth, here is the details:
and here is my command:
java -jar trimmomatic-0.33.jar PE -phred33 -threads 4 ${depth}_R1_interleaved.fastq ${depth}_R2_interleaved.fastq ${depth}_trim_1.fq ${depth}_unpair_1.fastq ${depth}_trim_2.fq ${depth}_unpair_2.fq ILLUMINACLIP:TruSeq3-PE.fa:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:75
bwa mem -t 4 p_ctg.rename.fa ${depth}_trim_1.fq ${depth}_trim_2.fq >${depth}.sam
samtools view -bS ${depth}.sam >${depth}_combine.bam
samtools sort --threads 4 -n ${depth}_combine.bam -o ${depth}_combine.sort.bam
arcs -f p_ctg.rename.fa -a alignments.fof -s 80 -c 2 -l 0 -d 0 -r 0.05 -e 30000 -v 1 -m 20-10000 -b ${depth}_combine
python makeTSVfile.py ${depth}_combine_original.gv ${depth}_combine.tigpair_checkpoint.tsv p_ctg.rename.fa
touch empty.fof
LINKS -f p_ctg.rename.fa -s empty.fof -k 20 -b ${depth}_combine -l 5 -t 2 -a 0.9 -x 1
Why this happened? Is there anything wrong with my command? Why the scaffold length become smaller? Hope can get some advice, and thanks in advance.
Hello,
I always get follows error when run the arcs:
=>Getting scaffold sizes... Tue Jan 2 10:52:14 2018
=>Starting to read BAM files... Tue Jan 2 10:52:27 2018
samtools: Not a directory
PID 6012 exited with status 1
Thanks.
Hi,
I'm interested in using arcs to scaffold my genome assembly using 10X genomics data. For the first mapping step, bwa mem with default parameters was used in the ABySS 2.0 preprint (http://biorxiv.org/content/early/2016/08/07/068338).
My question is that, is "bwa mem with default parameters" enough for mapping Chromium data? Or other specialized linked-read alignment tool such as Lariat will improve final scaffolding results?
Best,
Quan
Hello,
I am trying to install ARCS and I have this issue with Automake:
[copettid@kp141-242 arcs]$ ./autogen.sh
+ aclocal
threads object version 2.21 does not match bootstrap parameter 2.16 at /usr/lib64/perl5/DynaLoader.pm line 210.
Compilation failed in require at /usr/share/automake-1.15/Automake/ChannelDefs.pm line 23.
BEGIN failed--compilation aborted at /usr/share/automake-1.15/Automake/ChannelDefs.pm line 26.
Compilation failed in require at /usr/share/automake-1.15/Automake/Configure_ac.pm line 27.
BEGIN failed--compilation aborted at /usr/share/automake-1.15/Automake/Configure_ac.pm line 27.
Compilation failed in require at /usr/bin/aclocal line 39.
BEGIN failed--compilation aborted at /usr/bin/aclocal line 39.
[copettid@kp141-242 arcs]$ cd ..
[copettid@kp141-242 bin]$ sudo dnf install autoconf automake
[sudo] password for copettid:
Last metadata expiration check: 2:45:43 ago on Fri 29 Mar 2019 09:19:18 AM CET.
Package autoconf-2.69-27.fc28.noarch is already installed, skipping.
Package automake-1.15.1-5.fc28.noarch is already installed, skipping.
Dependencies resolved.
Nothing to do.
Complete!
It should be there, right? I don't understand what is wrong at this point.
Hi @lcoombel,
I run my first test, how is it possible that I got zero at all scaffolding. Is something to expect, Is there anything I can do to check to confirm this?
Best
F
If I run configure, I get this error:
./configure: line 4899: test: !=: unary operator expected
That line is this one :
configure:if test $ac_cv_header_boost_unordered_unordered_map_hpp != yes; then
I tried to fix it with :
configure:if test "$ac_cv_header_boost_unordered_unordered_map_hpp" != yes; then
But then the configure script kept complaining about missing boost libraries, even if I downloaded the latest version, and placed in the source directory.
Without the fix it compiles fine, by the way.
Unexpected results from using tigmint with arcs pipeline
Ml_c5_m50-10000_s98_r0.05_e30000_z500_l5_a0.3.scaffolds.fa -------------
number sequences/scaffolds: 5,084
max sequence length: 1,222,598
min sequence length: 987
median sequence length: 1,764
total number residues: 155,877,573
Scaffold N50: 189805
Ml.tigmint_c5_m50-10000_s98_r0.05_e30000_z500_l5_a0.3.scaffolds.fa -------------
number sequences/scaffolds: 14,977
max sequence length: 256,959
min sequence length: 1
median sequence length: 1,469
total number residues: 155,789,698
Scaffold N50: 45459
Why are results with tigmint so very diffrent from arcs alone?
Why does the max scaffold length decrease from over 1,000,000 to 256,000?
Why are there ~3,000 seqs with length < 500 when the min seq length was 987? (Includes five seqs of length 1, three of length 2 !!)
Also the N50 decreases significantly.
I see this :
Running: arcs 1.0.1
pid 2515
-f XXXXX.fasta
-a fof.in
-s 98
-c 5
-l 0
-z 500
-b XXXXX.fasta.scaff_s98_c5_l0_d0_e0_r0.05
Min index multiplicity: 50
Max index multiplicity: 10000
-d 0
-e 0
-r 0.05
-v 1
=>Getting scaffold sizes... Tue May 16 16:20:13 2017
Saw 31431 sequences.
=>Starting to read BAM files... Tue May 16 16:20:26 2017
Reading bam XXXXX.faa.sorted.bam
On line 10000000
On line 20000000
On line 30000000
<SNIP>
On line 2130000000
On line 2140000000
On line -2140000000
On line -2130000000
On line -2120000000
On line -2110000000
On line -2100000000
On line -2090000000
On line -2080000000
On line -2070000000
On line -2060000000
On line -2050000000
arcs: Arcs.cpp:506: void writeGraph(const string&, ARCS::Graph&): Assertion `out' failed.
=>Starting pairing of scaffolds... Tue May 16 21:14:03 2017
=>Starting to create graph... Tue May 16 21:22:44 2017
=>Starting to write graph file... Tue May 16 21:23:25 2017
Max Degree (-d) set to: 0. Will not delete any verticies from graph.
Writting graph file to XXXXX.fasta.scaff_s98_c5_l0_d0_e0_r0.05_original.gv...
( I have removed the info on the genome)
I guess I see a overflow problem, is there an easy fix ?
Hi,
I have two chromium libraries and these two libraries have some shared barcodes. To scaffolding using the two libraries, I have to change the barcodes to distinguish between the two libraries. I plan to append A to the end of barcodes in library 1 and append C to the end of barcodes in library 2. So does Arcs require the barcode length to be 16? Or the barcode length does not matter?
Best,
Danshu
Hello,
I am running arcs with arch-make arcs-tigmint
I am running the command:
./arcs-master/Examples/arcs-make arcs-tigmint draft=draft reads=reads m=40
which I believe run tigmint-make and arch-make, and this with -n are running as follows:
For ./arcs-master/Examples/arcs-make arcs-tigmint -n draft=draft reads=reads m=40
ln -s draft.fasta draft.fa
./tigmint/bin/tigmint tigmint draft=draft reads=reads minsize=2000 as=0.65 nm=5 dist=50000 mapq=0 trim=0 span=20 window=1000 t=40
touch empty.fof
perl -ne 'chomp; if(/>/){$ct+=1; print ">$ct\n";}else{print "$_\n";} ' < draft.tigmint.fa > draft.tigmint.renamed.fa
bwa index draft.tigmint.renamed.fa
sh -c 'bwa mem -t40 -C -p draft.tigmint.renamed.fa reads.fq.gz | /mid_large_2t/important/samtools-1.6/samtools view -Sb - | /mid_large_2t/important/samtools-1.6/samtools sort -@40 -n - -o draft.tigmint.sorted.bam'
echo draft.tigmint.sorted.bam > draft.tigmint_bamfiles.fof
./arcs-master/Arcs/arcs --bx -v -f draft.tigmint.renamed.fa -a draft.tigmint_bamfiles.fof -c 5 -m 40 -s 98 -r 0.05 -e 30000 -z 500 -d 0 --gap 100 -b draft.tigmint_c5_m40_s98_r0.05_e30000_z500
python /mid_large_2t/running_arcs/arcs-master/Examples/makeTSVfile.py draft.tigmint_c5_m40_s98_r0.05_e30000_z500_original.gv draft.tigmint_c5_m40_s98_r0.05_e30000_z500.tigpair_checkpoint.tsv draft.tigmint.renamed.fa
ln -s draft.tigmint_c5_m40_s98_r0.05_e30000_z500.tigpair_checkpoint.tsv draft.tigmint_c5_m40_s98_r0.05_e30000_z500_l5_a0.3.tigpair_checkpoint.tsv
/opt/links_v1.8.6/LINKS -f draft.tigmint.renamed.fa -s empty.fof -b draft.tigmint_c5_m40_s98_r0.05_e30000_z500_l5_a0.3 -l 5 -a 0.3 -z 500
rm draft.tigmint_c5_m40_s98_r0.05_e30000_z500_l5_a0.3.tigpair_checkpoint.tsv
./tigmint/bin/tigmint tigmint -n draft=draft reads=reads minsize=2000 as=0.65 nm=5 dist=50000 mapq=0 trim=0 span=20 window=1000 t=40
bwa index draft.fa
bwa mem -t40 -pC draft.fa reads.fq.gz | /mid_large_2t/important/samtools-1.6/samtools view -u -F4 | /mid_large_2t/important/samtools-1.6/samtools sort -@40 -tBX -m 2G -T tmp/ -o draft.reads.sortbx.bam
tigmint/bin/tigmint-molecule -a0.65 -n5 -q0 -d50000 -s2000 draft.reads.sortbx.bam | sort -k1,1 -k2,2n -k3,3n >draft.reads.as0.65.nm5.molecule.size2000.bed
/mid_large_2t/important/samtools-1.6/samtools faidx draft.fa
tigmint/bin/tigmint-cut -p40 -w1000 -n20 -t0 -o draft.reads.as0.65.nm5.molecule.size2000.trim0.window1000.span20.breaktigs.fa draft.fa draft.reads.as0.65.nm5.molecule.size2000.bed
ln -sf draft.reads.as0.65.nm5.molecule.size2000.trim0.window1000.span20.breaktigs.fa draft.tigmint.fa
The arch with tigmint only run until tigmint-molecule and then in the tigmint-cut step it only produces
.fa and .bed but .fa is blank and fa.bed is as follows:
-rw-rw-r-- 1 ubuntu ubuntu 697296771 Jan 7 03:20 draft.reads.as0.65.nm5.molecule.size2000.bed
-rw-rw-r-- 1 ubuntu ubuntu 0 Jan 7 04:00 draft.reads.as0.65.nm5.molecule.size2000.trim0.window1000.span20.breaktigs.fa
-rw-rw-r-- 1 ubuntu ubuntu 342869 Jan 7 04:00 draft.reads.as0.65.nm5.molecule.size2000.trim0.window1000.span20.breaktigs.fa.bed
DETAIL OF fa.bed file out of tigmint-cut
head draft.reads.as0.65.nm5.molecule.size2000.trim0.window1000.span20.breaktigs.fa.bed
0 0 15517657 0-1
0 15517657 15517670 0-2
0 15517670 17147778 0-3
0 17147778 17147785 0-4
0 17147785 23376074 0-5
0 23376074 23376100 0-6
0 23376100 29696924 0-7
0 29696924 29697326 0-8
0 29697326 29697665 0-9
0 29697665 29697943 0-10
tail draft.reads.as0.65.nm5.molecule.size2000.trim0.window1000.span20.breaktigs.fa.bed
204375 0 33106 204375
204415 0 57665 204415
204459 0 25333 204459
204499 0 64113 204499
204541 0 58322 204541
204581 0 3400 204581
204621 0 53185 204621
204661 0 20990 204661
204701 0 60883 204701
204741 0 113995 204741
If you don't mind can you please let me know what I may be doing wrong that I am unable to get .fa file after tigmint-cut but only getting .fa.bed file. Is anything I am missing in this command and as this draft.reads.as0.65.nm5.molecule.size2000.trim0.window1000.span20.breaktigs.fa is input for arcs-tigmint all downstream are empty.
-rw-rw-r-- 1 ubuntu ubuntu 0 Jan 5 19:17 draft.tigmint.renamed.fa
-rw-rw-r-- 1 ubuntu ubuntu 0 Jan 5 19:17 draft.tigmint.renamed.fa.pac
-rw-rw-r-- 1 ubuntu ubuntu 0 Jan 5 19:17 empty.fof
Thanks,
Dharm
Hi,
I'm trying to use arcs pipeline and everything seems to run without error. I'm not getting any improvement on overall genome contiguity, but I'm not necessarily attributing this to the parameters themselves. What I don't understand quite yet is during the:
runLINKS.sh
step I'm not entirely understanding is how, given the script itself, the .gv and .tsv files are being used by LINKS. Is there an example log file available to determine if the program is functioning as expected?
Hi there :).
I work on a highly heterozygous diploid organism. I have performed a Supernova assembly, and would like to use arcs+links to improve it. However, there are several fasta output options from supernova to choose from. I imagine I could either run arcs+links separately on both haplotype fasta files or once on the raw fasta file. Is there a "correct" way to do this?
Hello,
I am trying to run arcs with bam file (generated by 10x longranger align).But getting the following error. below is the command which I used for running arcs
command :
arcs -f genome.fa -a possorted_bam.bam -b arcs_out
=>Getting scaffold sizes... Sun Feb 26 17:16:31 2017
=>Starting to read BAM files... Sun Feb 26 17:16:36 2017
Could not open @hd VN:1.3 SO:coordinate. --fatal.
Hi,
There are a few lines in the code of the script mentioned above that have leading tabs in them. It would be a good idea to replace the tabs for spaces to let the python interpreter work correctly
Hello,
I'm pretreatment my 10x pair reads with https://github.com/tiramisutes/proc10xG scripts and then run ARCS as follows:
arcs -f genome.fa -a BAM.list -s 98 -c 5 -m 50-1000 -d 0 -r 0.05 -e 30000 -b XXXX
The output is normal and not any error. But finally the XXXX_original.gv file is empty. What's happend?
Running: arcs 1.0.1
pid 83323
-c 5
-d 0
-e 30000
-l 0
-m 50-1000
-r 0.05
-s 98
-v 0
-z 500
--gap=100
-b AS_2BESST
-g NA
--barcode-counts=NA
--tsv=NA
-a BAM.list
-f genome.fa
=>Getting scaffold sizes... Tue Jan 2 10:59:29 2018
=>Starting to read BAM files... Tue Jan 2 10:59:36 2018
{ "All_barcodes_unfiltered":0, "All_barcodes_filtered":0, "Scaffold_end_barcodes":0, "Min_barcode_reads_threshold":50, "Max_barcode_reads_threshold":1000 }
=>Starting to write reads per barcode TSV file... Tue Jan 2 13:17:14 2018
=>Starting pairing of scaffolds... Tue Jan 2 13:17:14 2018
=>Starting to create graph... Tue Jan 2 13:17:14 2018
=>Starting to write graph file... Tue Jan 2 13:17:14 2018
Max Degree (-d) set to: 0. Will not delete any vertices from graph.
Writing graph file to XXXX_original.gv...
=>Starting to create ABySS graph... Tue Jan 2 13:17:14 2018
=>Starting to write ABySS graph file... Tue Jan 2 13:17:14 2018
=>Starting to write TSV file... Tue Jan 2 13:17:14 2018
Hello!
I am trying to use arcs for scaffolding Pacbio draft genome sequences (draft.fa) by using 10X Genomics reads (reads.fq.gz).
By far I have used tigmint protocol with default options and it was OK.
samtools faidx draft.fa
bwa index draft.fa
bwa mem -t20 -p -C draft.fa reads.fq.gz | samtools sort -@20 -tBX -T /tmp_brew -o draft.reads.sortbx.bam
tigmint-molecule draft.reads.sortbx.bam | sort -k1,1 -k2,2n -k3,3n >draft.reads.molecule.bed
tigmint-cut -p20 -o draft.tigmint.fa draft.fa draft.reads.molecule.bed
However, at arcs steps I got these error messages "error: `@HD VN:1.5 SO:unknown': No such file or directory"
arcs -f draft.tigmint.fa -a draft.reads.sortbx.bam
Running: arcs 1.0.3
pid 13640
-c 5
-d 0
-e 30000
-l 0
-m 50-10000
-r 0.05
-s 98
-v 0
-z 500
--gap=100
-b draft.tigmint.fa.scaff_s98_c5_l0_d0_e30000_r0.05
-g draft.tigmint.fa.scaff_s98_c5_l0_d0_e30000_r0.05.dist.gv
--barcode-counts=NA
--tsv=NA
-a draft.reads.sortbx.bam
-f draft.tigmint.fa
=> Getting scaffold sizes... Mon Jul 2 14:37:54 2018
=> Reading alignment files... Mon Jul 2 14:38:02 2018
error: `@HD VN:1.5 SO:unknown': No such file or directory
Could you help or recommend me how to handle these errors?
I am currently dealing with the scaffolding of a plant genome with ARCS, but got blocked because there is no content in the output graph.gv file from ARCS (graph{}).
I've attached the Barcode index to the reads name and sorted the alignment file( bwa0.7.15) in order if reads name. I’ve tries both 15 and 85 fold of 10X reads, but neither of them works. There is no error reported and the log file ended with “Done!"
The reference data set was resulted from Supernova assembly, it is noteworthy that only those scaffolds from one specific chromosome were used as input(we have done genetic mapping). I wonder if this gonna trigger the error?
The test dataset of human genome worked quite well so I bet there is no problem with the installation.
Here is my command line for running ARCS:
arcs -f ./10X_asm_megabubbles_2_2.fasta -a bam.list -s 98 -c 5 -l 0 -z 1000 -m 5-10000 -d 0 -e 30000 -r 0.1 -v 1
I've tried -s 90, it didn't work either.
I have a problem in installing Arcs caused by boost library. When I run ./configure I get this error:
[...]
Making install in Arcs
make[1]: Entering directory `/pica/h1/vpeona/arcs/Arcs'
g++ -DHAVE_CONFIG_H -I. -I.. -I/home/vpeona/arcs/Arcs -I/home/vpeona/arcs/Common -I/home/vpeona/arcs/DataLayer -I/home/vpeona/arcs -I/home/vpeona/arcs -isystem../boost_1_61_0/ -isystem/home/vpeona/arcs/1_58_0 -Wall -Wextra -Werror -std=c++0x -fopenmp -g -O2 -MT arcs-Arcs.o -MD -MP -MF .deps/arcs-Arcs.Tpo -c -o arcs-Arcs.o `test -f 'Arcs.cpp' || echo '/home/vpeona/arcs/Arcs/'`Arcs.cpp
In file included from /usr/lib/gcc/x86_64-redhat-linux/4.4.7/../../../../include/c++/4.4.7/backward/hash_set:60,
from /usr/include/boost/graph/adjacency_list.hpp:25,
from /usr/include/boost/graph/undirected_graph.hpp:11,
from Arcs.h:20,
from Arcs.cpp:1:
/usr/lib/gcc/x86_64-redhat-linux/4.4.7/../../../../include/c++/4.4.7/backward/backward_warning.h:28:2: error: #warning This file includes at least one deprecated or antiquated header which may be removed without further notice at a future date. Please use a non-deprecated interface with equivalent functionality instead. For a listing of replacement headers and interfaces, consult the file backward_warning.h. To disable this warning use -Wno-deprecated.
Arcs.cpp: In function ‘void writeGraph(const std::string&, ARCS::Graph&)’:
Arcs.cpp:505: error: ‘write_graphviz_dp’ is not a member of ‘boost’
make[1]: *** [arcs-Arcs.o] Error 1
make[1]: Leaving directory `/pica/h1/vpeona/arcs/Arcs'
make: *** [install-recursive] Error 1
I also downloaded the boost library and specified the path to configure:
./configure --with-boost=../boost_1_61_0/ && make install
still not working. I'm using gcc 4.4.7. Do you have some suggestions?
Hello
I run the supernova using
supernova run --id=Yuc27M --fastqs=analyzing_fastq --maxreads=1080624478 --localcores=64
got the phased fasta file output using following command:
supernova mkoutput --style=pseudohap2 --asmdir=EG9-30/outs/assembly --outprefix=/mid_large_2t/second_10xdenova_assembly_EG9_30/EG9-30/newoutput_FASTA/EG9-30_10xgenomics_consensus --index
The out is
Yuc27M_10xgenomics_consensus.1.fasta
Yuc27M_10xgenomics_consensus.2.fasta
The contigs length wasn't satisfactory so we like to extend the contigs length using Tigmint with ARCS and LINKS
For Tigmint with ARCS and LINKS, first following command were used to get the linked fastq files:
longranger basic --id=sample_Yuc27M_1_arcs --fastqs=samples_for_10x --localcores=64
Reads from longranger basic were renamed as reads.fq.gz and Yuc27M_10xgenomics_consensus.1.fasta from supernova output is renamed as draft.fa. Then arch-make with arcs-tigmint is run using following command:
./arcs-master/Examples/arcs-make arcs-tigmint draft=draft reads=reads
First run was using LINKS default a=0.3 and also gave two additional trial using a=0.6 and 0.9 on LINKS a- flag the results are as follows but only increase N50 length from a=0.3 but not from supernova phased fasta output, but the results were quite opposite, it decreased N50 length from original supernova contigs. The abyss-fas for original supernova assembled contigs Yuc27M_10xgenomics_consensus.1.fasta and three output for LINKS a- 0.3, 0.6, 0.9 are
draft.tigmint_c5_m50-10000_s98_r0.05_e30000_z500_l5_a0.3.scaffolds.fa
draft.tigmint_c5_m50-10000_s98_r0.05_e30000_z500_l5_a0.6.scaffolds.fa
draft.tigmint_c5_m50-10000_s98_r0.05_e30000_z500_l5_a0.9.scaffolds.fa as follows:
abyss-fac Yuc27M_10xgenomics_consensus.1.fasta
n n:500 L50 min N80 N50 N20 E-size max sum name
15157 15121 40 815 5973469 18.56e6 34.71e6 22.1e6 59.6e6 2.476e9 Yuc27M_10xgenomics_consensus.1.fasta
abyss-fac draft.tigmint_c5_m50-10000_s98_r0.05_e30000_z500_l5_a0.3.scaffolds.fa
n n:500 L50 min N80 N50 N20 E-size max sum name
15600 15057 92 518 2187088 7011644 16.15e6 9827444 41.75e6 2.476e9 draft.tigmint_c5_m50-10000_s98_r0.05_e30000_z500_l5_a0.3.scaffolds.fa
abyss-fac draft.tigmint_c5_m50-10000_s98_r0.05_e30000_z500_l5_a0.6.scaffolds.fa
n n:500 L50 min N80 N50 N20 E-size max sum name
15018 14475 84 518 2785433 8790483 17.73e6 10.68e6 41.76e6 2.476e9 draft.tigmint_c5_m50-10000_s98_r0.05_e30000_z500_l5_a0.6.scaffolds.fa
abyss-fac draft.tigmint_c5_m50-10000_s98_r0.05_e30000_z500_l5_a0.9.scaffolds.fa
n n:500 L50 min N80 N50 N20 E-size max sum name
14619 14076 72 518 3029632 9622543 21.32e6 12.74e6 52.08e6 2.476e9 draft.tigmint_c5_m50-10000_s98_r0.05_e30000_z500_l5_a0.9.scaffolds.fa
My question is, if I am missing anything while running the arcs-make with arcs-tigmint that may be shortening the contigs?
I run arch-make with all the default paratment except the above mentioned LINKS a- (0.3/0.6/0.9),
# tigmint Parameters
minsize=2000
as=0.65
nm=5
dist=50000
mapq=0
trim=0
span=20
window=1000
# bwa Parameters
t=40
# ARCS Parameters
c=5
m=50-10000
z=500
s=98
r=0.05
e=30000
D=false
dist_upper=false
d=0
gap=100
B=20
# LINKS Parameters
l=5
a=0.3
are there any other parameter I can change that can increase the contigs length and also decrease the missassemblies?
your input is greatly appreciated,
Thanks
Hi,
Right now, there is no link to the paper or the LINKS software, which makes it a bit harder to use your software.
Dag,
Jan
For other people :
Paper :
https://doi.org/10.1101/100750
LINKS :
http://www.bcgsc.ca/platform/bioinfo/software/links
https://github.com/warrenlr/LINKS
Hi!
Thanks for releasing arcs, I've started trying it out and it looks promising. I'm wondering if you would consider picking up the chromium barcode from the BX
tag instead of from the read name? According to the 10x genomics documentation this is where the verified barcode information should be placed.
With kind regards,
Johan Dahlberg
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.