Giter VIP home page Giter VIP logo

Comments (9)

francicco avatar francicco commented on August 19, 2024

After the alignment ARCS returns me this error

###### Fri 30 Nov 15:35:27 GMT 2018: Running ARCS
Running: arcs 1.0.5
 pid 161016
 -c 5
 -d 0
 -e 30000
 -l 0
 -m 50-10000
 -r 0.05
 -s 98
 -v 0
 -z 500
 --gap=100
 -b ID1090_1_renamed_bcfrac066_pseudohap_1kb.fasta.scaff_s98_c5_l0_d0_e30000_r0.05
 -g ID1090_1_renamed_bcfrac066_pseudohap_1kb.fasta.scaff_s98_c5_l0_d0_e30000_r0.05.dist.gv
 --barcode-counts=NA
 --tsv=NA
 -a Pdid.LinkedReads.Aligned.sortedByCoord.out.bam
 -f ID1090_1_renamed_bcfrac066_pseudohap_1kb.fasta

=> Getting scaffold sizes... Fri Nov 30 15:35:27 2018

=> Reading alignment files... Fri Nov 30 15:35:28 2018
error: `@SQ	SN:bcfrac066.1	LN:1327': No such file or directory

from arcs.

lcoombe avatar lcoombe commented on August 19, 2024

Hi @francicco,

The input fastq files can be either in the format you show above, or the barcode can also be in a comment of the read header (@Name BX:Z:barcode). So, you can just use the reads generated by longranger basic. If you keep the barcode in the BX comment, you just need to add the -C option to the bwa mem command so that the barcode will be placed in the BX tag of the SAM entries.

@NB501445:179:H2YG7BGX7:2:11312:9490:14604 BX:Z:AAAAAAAAAGCAATTT-1
AAATTAAGTGATATATCCTCTTACGAAGTTGTTTCGGTTTAAAATTATTGTTATTATTTTTTAATGATGTGCAGAAAGTAAAGCAACATGAAGCTTTATTTACTACCAACTTCATTTCCCTTCAATA
+
EEEEEEEEEEEEEEEEEEEEEEEEEE6EEEEEEEEAE<AEA/A/EE/EEAA</EE/EEEEEEE/EEEEEEE//<EE/EEEEEEEEEEEEEE/EAEAEEEAEE<EAE<</<EAEEAEEE</A/E</EE
@NB501445:179:H2YG7BGX7:3:21602:1860:16340 BX:Z:AAAAAAAAATTATAAA-1
AAAAAAATGTTACCCACAATCCCATGATTTGCCATGCATGTATATAGGGACATATCCAATGATAGACATCGCGTTGTTACCGCGTCAACGGGGCTCTAGTAAGAATAAATTCGCAGTCACAGCTTTG
+
EEEEEEEEE/EE//E/E/E//6A/EE/A</EEAEEEE/E/<E/EEE<///EE//A/EE/EE/AEEEEEAAE6/EE/EA//EEEEEE<EEA</EE/</EE//AEE//<E/EA<AEA</E/AA/E/E/A

You can also leave the input read files interleaved if you wish (default of longranger basic), and would just need to add -p to the bwa mem command to make sure the reads are paired properly.

As for the error, the input file to -a is expected to be a text file listing the path to all the alignment files you wish to use. It looks like you supplied the BAM file itself.

-a, --fofName=FILE    text file listing input SAM/BAM filenames

As a side note we do have a Makefile (Examples/arcs-make) which runs the whole ARCS pipeline, taking the reads and the draft assembly as input. Even if you don't want to use the Makefile, I'd recommend taking a look to get an idea of the required steps.

Hope that helps!
Lauren

from arcs.

francicco avatar francicco commented on August 19, 2024

Hi @lcoombe,

Sorry, I was a bit lat with the test. I'm still playing around with some tests to check the pipeline.
Now I'm mapping the interlaced reads with BWA.

[M::process] 2819384 single-end sequences; 0 paired-end sequences

It looks like BWA doesn't recognise the paired ends with are formatted like this

@D00352:465:CD1CHANXX:2:2209:5496:86527/1_AAAAAAAAAAATGAAC
TAAGTCAACTAATCAACTAGATTCTCAGGTTCCCTCTGCCTACCCTGCTATGTGCGGTATACAGCGTGAAGCTAAAAAAAGCCAACTTATCAACTATGTCGT
+
BB<BF//</<B/<FFF/BB<FF/<F/B</B/FFFB///<<F/B<B/</<7/7F<FF7/B7BBFBF7FFFF<<<///7<F///////////BB77/////7/7
@D00352:465:CD1CHANXX:2:2209:5496:86527/2_AAAAAAAAAAATGAAC
CTAAATATGTTTTTTGTTTCCTAAGTGTGTGATTAGCACTTTAAAACTTCAAAAAAGATTTTTTTATGCGTAAAAAAGCACGATGTTAAATTCTGGGATTCAACACGTACCTAATATCTTAAATT
+
/BBB//FFBBBFFF/F<<FFBB/<FBFB/F/<FFFF</<FF<F<<<FFBF<FFF//7F<<FFFFF////<//</<BFB<//F//F/<BFFFFFF/7/<FB/77/7/7<B/BBF<FFFF<<FBBFF

It that normal?
F

from arcs.

lcoombe avatar lcoombe commented on August 19, 2024

Hi @francicco - Could you post your most recent bwa mem command? Are you using the -p parameter?

from arcs.

francicco avatar francicco commented on August 19, 2024

That's the command

bwa mem -t$THREADS $SPECIES.fa -p $ARCSR.fastq.gz | samtools view -Sb - | samtools sort -@ $THREADS -n - > $SPECIES.LinkedReads.Aligned.sortedByCoord.out.bam

from arcs.

francicco avatar francicco commented on August 19, 2024

It is actually sorted by read. I just named it wrong.
F

from arcs.

lcoombe avatar lcoombe commented on August 19, 2024

@francicco - I believe it is due to the \1 and \2 suffixes that you added to the read header before appending the barcode sequence. I did a small test case, and this interfered with bwa mem's smart pairing (-p).
I suggest either going back to your original post-longranger fastq file (formatted as I showed in my initial comment) or removing those characters from the read headers so that the 2 read headers of a read pair are identical.

$ head -n8 lr.arcsformat.fq 
@gi|453231901|ref|NC_003280.10|_12670490_12670606_0_1_0_0_0:0:0_0:0:0_5db95_AAACACCTCAGGCTGC
GAAAAGTTTCGTGTACTCCACACGGACACATACATTTAGTTTTAAAACTAAAATCAAGCCGCGACGCGACACGCAACGCGCCGTAAATCTACCCCAGATATGGCCGAGCAAAAATGGCCTAGTTCGGC
+
GFIEHGEFEDEGCEDBCCDB@CE=HBABBCABBADABBBB@@B>BACAA@<@AACCB?D@A@@@A@@??><BC??>?<A<B??A?><>==A;=??;B@==AB>>?=?>>=?><>;;@=<;=?>=>=?=
@gi|453231901|ref|NC_003280.10|_12670490_12670606_0_1_0_0_0:0:0_0:0:0_5db95_AAACACCTCAGGCTGC
CACTGTGCTTCACCACTTCATCACCCGCCGATTCTTTTGAAAAAAAAAATTTCTCACATACACATATATTGAGAGTAAAAGTCACAAGCCACGGATTTCTGGCTTCTCTCATAAATTGAAATGGAAGAGTTTGCCGAACTAGGCCATTTTT
+
GIHIIHGICFIHFEIIDCEDDDCEEEDDDCCEEACFBBBBECDBECCBAAAA>@ABAA@ABAAA@A?BB<B@@@D@@E>A??=?@?@B>@?>B>@>??>CB=A>A@?>=<A<>>?>>>B>>BA>>>>>@<?=>====?<==>?<====;?=
$ bwa mem -p /projects/btl_scratch/lcoombe/barcode-assemblies/unitig-graph/celegans/LRSIM-noVariantData/data/celegansN2_referencegenome.fna lr.arcsformat.fq  > test.sam
[M::bwa_idx_load_from_disk] read 0 ALT contigs
[M::process] read 25 sequences (3476 bp)...
[M::process] 1 single-end sequences; 24 paired-end sequences
...
$ head -8 lr.arcsformat.subscr.fq 
@gi|453231901|ref|NC_003280.10|_12670490_12670606_0_1_0_0_0:0:0_0:0:0_5db95/1_AAACACCTCAGGCTGC
GAAAAGTTTCGTGTACTCCACACGGACACATACATTTAGTTTTAAAACTAAAATCAAGCCGCGACGCGACACGCAACGCGCCGTAAATCTACCCCAGATATGGCCGAGCAAAAATGGCCTAGTTCGGC
+
GFIEHGEFEDEGCEDBCCDB@CE=HBABBCABBADABBBB@@B>BACAA@<@AACCB?D@A@@@A@@??><BC??>?<A<B??A?><>==A;=??;B@==AB>>?=?>>=?><>;;@=<;=?>=>=?=
@gi|453231901|ref|NC_003280.10|_12670490_12670606_0_1_0_0_0:0:0_0:0:0_5db95/2_AAACACCTCAGGCTGC
CACTGTGCTTCACCACTTCATCACCCGCCGATTCTTTTGAAAAAAAAAATTTCTCACATACACATATATTGAGAGTAAAAGTCACAAGCCACGGATTTCTGGCTTCTCTCATAAATTGAAATGGAAGAGTTTGCCGAACTAGGCCATTTTT
+
GIHIIHGICFIHFEIIDCEDDDCEEEDDDCCEEACFBBBBECDBECCBAAAA>@ABAA@ABAAA@A?BB<B@@@D@@E>A??=?@?@B>@?>B>@>??>CB=A>A@?>=<A<>>?>>>B>>BA>>>>>@<?=>====?<==>?<====;?=
$ bwa mem -p /projects/btl_scratch/lcoombe/barcode-assemblies/unitig-graph/celegans/LRSIM-noVariantData/data/celegansN2_referencegenome.fna lr.arcsformat.subscr.fq > test.sam
[M::bwa_idx_load_from_disk] read 0 ALT contigs
[M::process] read 25 sequences (3476 bp)...
[M::process] 25 single-end sequences; 0 paired-end sequences
...

Also, you are right that if bwa mem doesn't do proper paired-end alignments that this will interfere with ARCS -- one of the filters that ARCS uses when reading the input BAM(s) is checking that read pairs are mapping in a proper pair.

from arcs.

francicco avatar francicco commented on August 19, 2024

Ok, thanks! I'll try to remove them from the header!
Thanks.
F

from arcs.

lcoombe avatar lcoombe commented on August 19, 2024

Sounds good - I'll close this, but feel free to re-open or open another issue if you have any other questions.

Thank you for your interest in ARCS!
Lauren

from arcs.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.