Giter VIP home page Giter VIP logo

rails's Introduction

Release Downloads Issues link Thank you for your Stars

Logo

RAILS v1.5.1 and Cobbler v0.6.1

Rene L. Warren, 2014-present

Contents


  1. Name
  2. Description
  3. What's new
  4. Implementation and requirements
  5. Community guidelines
  6. Installation
  7. Dependencies
  8. Test data
  9. Citing RAILS/Cobbler
  10. Usage
  11. Algorithm
  12. Runs on human
  13. License preamble

Name


RAILS: Radial Assembly Improvement by Long Sequence Scaffolding

Cobbler: Gap-filling with long sequences

Description


RAILS and Cobbler are genomics application for scaffolding and automated finishing of genome assemblies with long DNA sequences. They can be used to scaffold & finish high-quality draft genome assemblies with any long, preferably high-quality, sequences such as scaftigs/contigs from another genome draft.

They both rely on accurate, long DNA sequences to patch gaps in existing genome assembly drafts.

Cobbler is a utility to automatically patch gaps (ambiguous regions in a draft assembly, represented by N's) It does so by first aligning the long sequences to the assembly, tallying the alignments and replacing N's with the sequences from these long DNA sequences.

RAILS is an all-in-one scaffolder and gap-filler. Its process is similar to that of Cobbler. It scaffolds your genome draft with the help of long DNA sequences (contig sequences are ordered/oriented using alignment information). The newly created gaps are automatically filled with the DNA sequence of the provided long DNA sequence.

You can test the software by executing "runme.sh" in the test folder. A simulated SARS genome assembly is provided to test the software.

What's new in v1.5.1

Remove requirement on samtools when running in "stream" mode

What's new in v1.5.0

Ability to stream the .sam output of your favorite aligner directly into cobbler/RAILS (tested with minimap2/human data -- see runRAILSminimapSTREAM.sh)

What's new in v1.4.2

Improved documentation, minor fixes, support for minimap2 (see runRAILSminimap.sh in the test folder)

What's new in v1.4.1

  1. Save in memory gap sequence from highest-matching read for both cobbler and RAILS
  2. Track the number of reads support in cobbler (-l) and RAILS, and allow cutoff when scaffolding (-l and -a), with latter (RAILS)
  3. Remove the hardcoded two-hit requirement for a read in RAILS. Instead, process two best hits for each read aligning different sequences
  4. Implement grace (-g) option, which effectively simulate read trimming (valuable for Nanopore read mapping (suggested -g 250 to -g 500))
  5. bug fixes (-list.tsv (cobbler) reported some instances of gap-fill regions not fixed in the assembly). cobbler gap-fill table now lists #supporting reads for each gap filled

Implementation and requirements


RAILS and Cobbler are implemented in PERL and run on any OS where PERL is installed. Both tools require samtools (tested with v1.8) to read sequence alignment bamfiles. The runRAILS.sh pipeline requires bwa (see Dependencies below for tested version). The runRAILSminimap.sh and runRAILSminimapSTREAM.sh pipelines require minimap2. Please make sure these tools are in your PATH before running the above pipelines.

Community guidelines


I encourage the community to contribute to the development of this software, by providing suggestions for improving the code and/or directly contributing to the open source code for these tools. Users and developers may report software issues, bug fix requests, comments, etc, at https://github.com/warrenlr/RAILS

Installation


Download the tar ball, gunzip and extract the files on your system using:

gunzip rails_v1-5-1.tar.gz
tar -xvf rails_v1-5-1.tar

Pleasure ensure that both cobbler.pl and RAILS are in your PATH.

Alternatively, individual tools are available for download/cloning within the github repository

Dependencies


Make sure you have installed bwa (Version: 0.7.15-r1140) or minimap2 (2.15-r905) and that they are in your PATH. Make sure you have installed samtools (Version: 1.8) and that it is in your PATH.

Other versions of bwa, minimap2 & samtools may or may not be compatible and they have not been tested. Users may choose to use other versions than the ones specified here, at they see fit, but are expected to thoroughly test the behavior on their own.

Compatible tools may be used, but have not been tested fully (eg. sambamba)

Test data


Go to ./test
(cd test)

You may need to change both runme.sh and runmeHuman.sh to specify the path of samtools on your system

1. SARS:
execute runme.sh
(./runme.sh)

2. Human:
execute runmeHuman.sh (will take a while to run with bwa mem (~12h). With minimap2, this test will take ~1h.)
(./runmeHuman.sh)

Citing RAILS/Cobbler


Thank you for your Stars and for using, developing and promoting this free software!

If you use RAILS or Cobbler for you research, please cite:

Warren RL. 2016. RAILS and Cobbler: Scaffolding and automated finishing
of draft genomes using long DNA sequences. The Journal of Open Source
Software. doi: 10.21105/joss.00116

link

Usage


./runRAILS.sh
Usage: runRAILS.sh     

this pipeline will:
1. reformat the assembly file $1
2. rename the long sequence file $2
3. Build a database index with bwa
4. Align the reformatted long sequences to your re-formatted baseline assembly
5. Run Cobbler to gap-fill regions of ambiguity
6. Reformat Cobbler's .fa file
7. Build a database index of it with bwa
8. Align the reformatted long sequences to your re-formatted cobbler assembly
9. Run RAILS to generate a newly scaffolded assembly draft

Usage: ./cobbler.pl [v0.6.1]
-f  Assembled Sequences to further scaffold (Multi-FASTA format NO LINE BREAKS, required)
-q  File of filenames containing long Sequences queried (Multi-FASTA format NO LINE BREAKS, required)
-s  File of filenames containing full path to BAM file(s) (use v0.2 for reading SAM files) or simply type: stream for streaming the .sam output of minimap2 or favorite aligner
-p  Full path to samtools (known to work/tested with v1.8, required if reading BAM files)
-d  Anchoring bases on contig edges (ie. minimum required alignment size on contigs, default -d 1000, optional)
-i  Minimum sequence identity fraction (0 to 1), default -i 0.9, optional
-l  Minimum number of long sequence support per gap, default -l 1, optional
-g  Grace length (bp), default -g 1, optional
-t  LIST of names/header, long sequences to avoid using for merging/gap-filling scaffolds (optional)
-b  Base name for your output files (optional)
-v  Runs in verbose mode (-v 1 = yes, default = no, optional)
IMPORTANT: the order of files in -q and -s MUST match!


Usage: ./RAILS [v1.5.1]
-f  Assembled Sequences to further scaffold (Multi-Fasta format, required)
-q  File of filenames containing long Sequences queried (Multi-Fasta format, required)
-s  File of filenames containing full path to BAM file(s) or simply type: stream for streaming the .sam output of minimap2 or favorite aligner
-p  Full path to samtools (known to work/tested with v1.8, required if reading BAM files)
-d  Anchoring bases on contig edges (ie. minimum required alignment size on contigs, default -d 1000, optional)
-i  Minimum sequence identity fraction (0 to 1), default -i 0.9, optional
-t  LIST of names/header, long sequences to avoid using for merging/gap-filling scaffolds (optional)
-l  Minimum number of links to compute scaffold (default -l 1, optional)
-a  Maximum link ratio between two best contig pairs *higher values lead to least accurate scaffolding* (default -a 0.99, optional)
-g  Grace length (bp), default -g 1, optional
-b  Base name for your output files (optional)
-v  Runs in verbose mode (-v 1 = yes, default = no, optional)
IMPORTANT: the order of files in -q and -s MUST match!


Algorithm


The pipeline is detailed in the provided script runRAILS.sh. PLEASE ensure the draft assembly is FASTA-formatted with one sequence per line (NO LINE BREAKS)

Cobbler's process:

The assembly draft sequence supplied to Cobbler is first broken up at the ambiguous regions of the assembly (Ns) to create scaftigs. In the runRAILS.sh, these scaftigs are renamed, tracking their scaffold of origin (renumbered incrementally) and their position within it (also numbered incrementally). A bwa index is created and the long sequence file, also re-numbered, is aligned to the scaftigs. Cobbler is supplied with the alignment file (-s sam file) and the long reads files (-q option), specifying the minimum length of anchoring bases (-d) aligning at the edge of scaftigs and the minimum sequence identity of the alignment (-i). When 1 or more long sequences align unambiguously to the 3'end of a scaftig and the 5'end of its neighbour, the gap is patched with the sequence of that long sequence. If no long sequences are suitable, or the -d and -i conditions are not met, the original Ns are placed back between those scaftigs.

RAILS process:

In RAILS, the process is similar as for Cobbler, except that the draft assembly is not broken up at Ns, since the goal is to merge distinct sequences into larger ones. Long sequences are aligned to the draft assembly sequences, orienting and ordering sequences and simulateneously filling the gaps between them, using DNA bases from the long sequences.

Scaffolding in RAILS is done using the LINKS scaffolder code (Warren et al. 2015), the unpublished scaffolding engine in the widely-used SSAKE assembler (Warren et al. 2007), and foundation of the SSPACE-LongRead scaffolder (Boetzer and Pirovano, 2014).

The grace (-g) parameter may be used to set the MAXIMUM length of unaligned bases allowed at the end of each (long) sequencing read alignment to the draft genome assembly. For example, setting -g 250 tells cobbler/RAILS to consider a sequencing read with a soft-clip of up to 250 bp in 5' or 3'

Output: For both Cobbler and RAILS, a summary of the gaps closed and their lengths is provided (.tsv) as a text file. A fasta file (.fa) of the finished and/or scaffolded draft is generated for both along with a log file reporting basic success statistics.

Boetzer M, Pirovano W. 2014. SSPACE-LongRead: scaffolding bacterial draft genomes using long read sequence information. BMC Bioinformatics.15:211. DOI: 10.1186/1471-2105-15-211

Warren RL, Yang C, Vandervalk BP, Behsaz B, Lagman A, Jones SJ, Birol I. 2015. LINKS: Scalable, alignment-free scaffolding of draft genomes with long reads. GigaScience 4:35. DOI: 10.1186/s13742-015-0076-3

Warren RL, Sutton GG, Jones SJM, Holt RA.  2007.  Assembling millions of short DNA sequences using SSAKE.  Bioinformatics. 23(4):500-501. DOI: 10.1093/bioinformatics/btl629

Runs on human


On a human HG004 ABySS draft assembly, cobbler filled over 65% of the gaps using 1, 2.5, 5, 15 kb long DNA sequences simulated from the human genome reference. The Pearson correlation between the predicted gap sizes and the size of patched gaps is R=0.8150

Table 1. Patching gaps with Cobbler (v0.2) using simulated 1, 2.5, 5, 15kbp simulated long sequences from human genome reference GRCh38.

Metric Value
Total gaps 148,091
Number of gaps patched 95,523
Proportion of gaps patched 65.1%
Average length (bp) 343.39
Length st.dev +/- 931.12
Total bases added 32,801,755
Largest gap resolved (bp) 13,662
Shortest gap resolved (bp) 1

RAILS (v1.1) was used to further contiguate the human baseline assembly draft and automatically close gaps within in:

Table 2. RAILS scaffolding and gap-filling summary on a human assembly baseline, using simulated 1, 2.5, 5, 15kbp simulated long sequences from human genome reference GRCh38.

Metric Value
Number of merges induced 6,029
Average closed gap length (bp) 1,136.71
Closed gap length st.dev +/- 2,511.69
Total bases added 6,853,222
Largest gap resolved (bp) 14,471
Shortest gap resolved (bp) 1

6,029 merges resulted from RAILS scaffolding of the baseline human assembly draft (1,695 >= 500bp) The scaffold N50 length increased from 5.6 to 7.3 Mbp, a 30% increase in N50 length.

Table 3. Assembly statistics on human genome scaffolding and finishing post Cobbler and RAILS (reporting sequences 500 bp and larger).

Stage n:500 n:N50 n:NG50 NG50 (bp) N50 (bp) max (bp) sum (bp)
Baseline 65,905 145 164 5,144,025 5,597,244 26.41e6 2.794e9
Cobbler 65,905 145 161 5,312,196 5,658,133 26.66e6 2.827e9
RAILS 64,210 113 125 6,935,685 7,266,542 32.14e6 2.836e9

License preamble


RAILS and Cobbler Copyright (c) 2014-present British Columbia Cancer Agency Branch. All rights reserved.

RAILS and Cobbler are released under the GNU General Public License v3

This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, version 3.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program. If not, see http://www.gnu.org/licenses/.

rails's People

Contributors

andrewjpage avatar lcoombe avatar warrenlr avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

rails's Issues

How to run RAILS

Dear Developer

Could you please provide a better example how to run RAILS? I installed the tool but using the following command I get error:

RAILS> runRAILSminimap.sh MJ_hifiasm_assembly.fa mj.combined.fa 90 0.95 pacbio /sw/Modules/QFAB 24

Usage: runRAILSminimap.sh <FASTA assembly .fa> <FASTA long sequences .fa> <anchoring sequence length eg. 250> <min sequence identity 0.95> <max. softclip eg. 250bp> <min. number of read support eg. 2> <long read type eg.: ont, pacbio, nil>

What the problem should be? Installation? I also installed minimap2 in RAILS main folder and added path both in runRAILSminimap.sh script and my $PATH. I also added path to samtools.

Could you please clarify how to run, e.g. an example of command?

Regards
Ardy

test run not finsihing correctly?

Hi,

I'm planning of giving RAILS and/or cobbler a try (through a recommendation of @benvvalk ) . I was able to download, unzip, untar the tarball. I have bwa & samtools in my path (module load system) and then I tried the test run. that one ran but I'm not sure if it was OK, as I got this message at the end of it:

Scaffolds fasta in: SARSreads.fa_vs_SARSassembly.fa_90_0.95_rails.scaffolds.fa
Empty arrayn -- maybe the scaffold merging step did not necessitate gap filling. at ../RAILS line 215.
RAILS process terminated.

thanks in advance

EDIT: ok, I just noticed there is another issue on this #8 , I was using the v1.4-1 though

Conda

Hi

Can RAILS be run in Conda?

Regards
Ardy

Improving documentation and RAILS itself

Hi Rene,
would you please include in the Readme.md in more detail what scenarios of RF, FR RAILS supports? By reading the code it seems both are supported but provided I do not see any discussion how conflicts between pairs are resolved ... I am better asking.

Also, I wonder if ALT locations from within SAM/BAM could be considered if the PRIMARY match is rejected. Do other tools do something to this extent?

I lack explicit note on Nanopore 1D reads in your documentation and how to use them. By reading commit diffs I learned about "-g Grace length (bp), default -g $grace, optional\n"; but would you mind explaining what is this?

Can I throw into RAILS multiple evidence, say Nextera Long Mate-Pair mappings and Nanopore 1D mappings in one go? How does RAILS decide which of the say conflicting pairing informations to use? Sounds I should better use Nextera Long Mate-Pair mapping in one attempt and later use Nanopore 1D-based SAM/BAM input. Note: the Nextera LMP datasets (containing RF reads with 4-8kbp inserts) are typically contaminated by FR reads with short insert sizes (practicaly "normal" paired-end reads).

What if there are two similarly scoring matches. The one with the highest score has wider alignment by a few nucleotides but with an extra mismatch. The ALT match however could be the "true" match, having slightly lower score. Is there any logic available to use this?

Provided RAILS supported ALT matches what I would be after at first attempt would be to use only read pairs which had NO ALT matches recorded in SAM/BAM. That means, I would go only after haploid loci (assemblies as you know are always to some extent redundant, say diploid or even multiploid instead of haploid).

Thank you for your comments.

no alignments around gaps

Hello,
I am trying to scaffold and close gaps in a plant mitochondrial assembly.
I used Canu to assemble mmitochondiral reads, and got 13 contigs, then I used LINKS to scaffold them:
perl /path/to/LINKS.pl -f mtCMS281_canupol_1.4M.ctg_sl.fa -s ONT_trim_input.fofn -b trim_19 -k 19 -d 5000,7500,10000,15000,20000,25000,30000
(I got some scaffolding only when using the Canu error corrected reads)
I got two scaffolds, with 2 and 3 contigs, respectively

LINKS_scaffold_ID       LINKS_contig_ID original_name   orientation(f=forward/r=reverse)        number_of_links links_ratio     gap_or_overlap(-)
scaffold3       3       tig00000003 len=106557 reads=263 class=contig suggestRepeat=no suggestBubble=yes suggestCircular=no     f       5       0       11175
scaffold3       12      tig00000013 len=56132 reads=103 class=contig suggestRepeat=no suggestBubble=yes suggestCircular=no      r       NA      NA      NA

scaffold10      11      tig00000012 len=53141 reads=51 class=contig suggestRepeat=no suggestBubble=yes suggestCircular=no       r       7       0       457
scaffold10      10      tig00000011 len=46269 reads=161 class=contig suggestRepeat=no suggestBubble=yes suggestCircular=no      f       8       0       13283
scaffold10      5       tig00000005 len=57849 reads=179 class=contig suggestRepeat=no suggestBubble=yes suggestCircular=no      f       NA      NA      NA

(I removed the ctgs that were not scaffolded)
Then I used Cobbler (with raw and EC reads):
perl /path/to/cobbler.pl -f trim_19.scaffolds.fa -q ONT_trim_input.fofn -s trim_bwas.bam.fofn -p /home/copettid/bin/samtools-1.8/samtools -v 1
but none of the gaps were filled, not even partially:

scaffold        scaftig gapLength       gapFilledLength readSupportCount
1       1
2       1
3       1       11175
3       2
4       1
5       1
6       1
7       1
8       1
9       1
10      1       457
10      2       13283
10      3

I tried using bam files from alignments made with minimap2 or bwa mem, but as you see in the image there are very few reads whose alignments end next to the gap.
I wonder if this is an issue at the alignment or at the gapclosing step.
Capture
Also, is it possible to know which read(s) was(were) used for scaffolding with LINKS at first? That would help to close the gap by selecting them by hand, at worst.
thanks,

Dario

Are the versions listed for bwa, minimap2, and samtools *exact* or minimum requirements?

Your documentation lists the following programs as being prerequisites for RAILS to work (with version numbers given in parentheses):

bwa (version 0.7.15-r1140) or minimap2 (version 2.15-r905)
samtools (version 1.8)

Are these requirements exact, or merely minimum version numbers? In other words, is there any reason to assume that RAILS could not work equally well with (say) minimap2 version 2.17-r941 and samtools version 1.9?

Scaffold ordering, and meta on merges?

Hi Rene,

Two quick clarifications needed. LINKS reorders/renames the scaffold IDs based on length. I'm using RAILS after LINKS. If RAILS merges two scaffolds, does it keep one of these IDs, or does it reorder/rename them too for the final scaffolds? Also, is there a file that describes which scaffolds are merged with RAILS? Is it *scaffolds_GAPseqList.txt? What is the format of this output?
Thanks!
Chris

no scaffolding - samtools issue?

Hello,
I am trying to scaffold some Illumina scaffolds with ONT reads (>40 kb) with RAILS, but I get an error message that seems to point to the samtools version:

$ ./RAILS -f Rabiosa_b4_Guppy3.0.3_190718_40kb_q9.fa_vs_Rabiosa_genome_v2.2.fa_2000_0.90_gapsFill-formatted.fa -s Rabiosa_b4_Guppy3.0.3_190718_40kb_q9.fa_vs_Rabiosa_genome_v2.2.fa_scaffolding.fof -l 5 -g 250 -d 2000 -i 0.90 -b out_rails2 -q Rabiosa_b4_Guppy3.0.3_190718_40kb_q9.fa-formatted.fof -p /home/copettid/bin/samtools-1.8/samtools

Running: ./RAILS [v1.4.1]
-f Rabiosa_b4_Guppy3.0.3_190718_40kb_q9.fa_vs_Rabiosa_genome_v2.2.fa_2000_0.90_gapsFill-formatted.fa
-q Rabiosa_b4_Guppy3.0.3_190718_40kb_q9.fa-formatted.fof
-s Rabiosa_b4_Guppy3.0.3_190718_40kb_q9.fa_vs_Rabiosa_genome_v2.2.fa_scaffolding.fof
-d 2000
-i 0.90
-e 1
-l 5
-a 0.99
-g 250
-t

=>Reading bam: Wed Aug 14 18:38:19 CEST 2019
Parsing alignment file Rabiosa_b4_Guppy3.0.3_190718_40kb_q9.fa_vs_Rabiosa_genome_v2.2.fa_scaffolding.bam...
Redundant same contig combo linking:0
Same gap sequence fill:0
done.
Contigs processed:
=>Scaffolding initiated: Wed Aug 14 18:49:54 CEST 2019
=>Scaffolding ended: Wed Aug 14 18:49:55 CEST 2019
Scaffolds layout in: out_rails2.scaffolds
=>Making fasta file: Wed Aug 14 18:49:56 CEST 2019
Scaffolds fasta in: out_rails2.scaffolds.fa
Empty arrayn -- maybe the scaffold merging step did not necessitate gap filling. It is also possible that your version of samtools is not supported. This script was tested with samtools v1.8. at ./RAILS line 220.

I got this when using the default samtools installation I had (1.9), and also when pointing RAILS to 1.8, the one you tested it with.
I am now assuming the problem is somewhere else: could you help me spot the issue?
The output files are empty:

-rw-r--r-- 1 copettid mpb    0 Aug 14 18:42 Rabiosa_b4_Guppy3.0.3_190718_40kb_q9.fa_vs_Rabiosa_genome_v2.2.fa_scaffolding.bam.bampreprocessor.err.log68011565800937
-rw-r--r-- 1 copettid mpb    0 Aug 14 18:49 out_rails2.scaffolds
-rw-r--r-- 1 copettid mpb   78 Aug 14 18:49 out_rails2.pairing_issues
-rw-r--r-- 1 copettid mpb    0 Aug 14 18:49 out_rails2.pairing_distribution.csv
-rw-r--r-- 1 copettid mpb    0 Aug 14 18:49 out_rails2.scaffolds_GAPseqList.txt
-rw-r--r-- 1 copettid mpb    0 Aug 14 18:49 out_rails2.scaffolds.fa
-rw-r--r-- 1 copettid mpb 1.3K Aug 14 18:49 out_rails2.log

the bam file has a header that looks like this (when open with zcat):

BAM�8�@SQ       SN:wga1.1,18314 LN:18314
@SQ     SN:wga1.2,43129 LN:43129
@SQ     SN:wga1.3,59019 LN:59019
@SQ     SN:wga1.4,387   LN:387
@SQ     SN:wga1.5,22927 LN:22927
@SQ     SN:wga1.6,45475 LN:45475
@SQ     SN:wga1.7,85776 LN:85776
@SQ     SN:wga1.8,960   LN:960
@SQ     SN:wga1.9,17994 LN:17994
@SQ     SN:wga1.10,3060 LN:3060

can this be an issue?
Thanks,

Dario

Issue with samtools view

Hi!

I'm trying to convert my sam files to bam files using "samtools view", but keep getting this error: "samtools view: failed to open "" for reading: No such file or directory"! sam files are the output of aligning an assembly with RNA-seq paired end data using hisat2.

I'd appreciate it if you could help me solve this issue.

Best,
Maral

bam file creation interupted

Hi,

I am trying to gap fill my assembly using nanopore long reads with runRAILS but at each run the bam generation step bugs and is frozen when the file reaches around 17Gb. my assembly is around 28Gb and the long reads file is 80Gb. the job script is as follows:

module load bwa/0.7.17 samtools/1.9 bioperl/1.7.5 perl/5.22.4
./runRAILS.sh contigs.fa reads.fa 250 0.95 /cvmfs/soft.computecanada.ca/easybuild/software
/2017/avx512/Compiler/intel2018.3/samtools/1.9/bin/samtools

I repeated the process several times but I am having the same issue. Also before this error happened, in my first run I got the error : samtools view: failed to open "" for reading: No such file or directory

Hoping to hear from you soon.

Best,

Untar the code into the repository

Instead of having a tar.gz file in the repository, unzip it, so that the code changes can be seen. Then you can tag a release on github and it will automatically provide a tar.gz file for each tag.

Add Community guidelines

One of the reviewer checks is for this:

Community guidelines: Are there clear guidelines for third parties wishing to 1) Contribute to the software 2) Report issues or problems with the software 3) Seek support

Could you add something to the README to cover this?

samtools view: failed to open "" for reading: No such file or directory

Hi,
I have two assemblies produced with PacBio by canu and 10X by supernova. Base on Rene's recent suggestion, I used the supernova assembly to scaffold and gapfill the canu assembly. I made overlapping sequence tiles of 2.5, 5, and 10 kbp from the supernova assembly. Then, the tiles were aligned to canu assembly by minimap2. Then, the sam file were converted to bam file. However, when I used the bam file to run cobbler/RAILS, it showed that :

` =>Reading bam: Wed Jul 31 04:11:07 EDT 2019
Parsing alignment file ▒BCo}▒{▒▒Eu▒▒KBHB ...
[E::hts_open_format] Failed to open file ▒
samtools view: failed to open "" for reading: No such file or directory

Redundant same contig combo linking:0
Same gap sequence fill:0 `

How can I solve the problem?

Thanks,
Chi-Fa

Some suggestions

Hi @andrewjpage,

After running the code, I think it might be good if :

  • explain the Samtools version in README(I can't run with samtools lastest version.)
  • explain -g parameters in detail
  • if I set RAILS PATH in RAILS/versions/RAILS_v1.5.1/runRAILSminimapSTREAM.sh line 10, line 28, 32, 36, 52, 56, 60 shoud be cobbler.pl or RAILS not ./*

samtools hard-coded to a gsc-specific path

Both cobbler.pl & RAILS define SAMPATH to be "/gsc/btl/linuxbrew/bin/samtools"

The readme doesn't mention that it needs to be changed or that RAILS depends on samtools (any particular version?)

Without a valid samtools, the tests fail with:
Empty arrayn -- maybe the scaffold merging step did not necessitate
gap filling. at /group/bioinfo/apps/apps/RAILS_v1.4/RAILS line 215.
RAILS process terminated.

Consider checking that a valid samtools is present and print clear error message if its missing.

This bug was opened and then closed as fixed, but nothing seems to have changed.

I was using version 1.4

No script for assembly

Hi,
I am having plant mitochondrial contigs from SPAdes assembly using Illumina reads. I would like to make a scaffold and gap filling with ONT reads. How to convert ONT reads (.fastq) into fof format and what are the necessary commands to execute both cobbler and Rails?
Please help me in this regard.

Polyploid analysis

Hi René,

I'm currently using some ONT data to scaffold an allopolyploid plant genome assembly. The documentation for RAILS, LINKS, and Cobbler doesn't explicitly mention ploidy limitations, but I've found that it's often wise to check :)

(Sorry to spam you about polyploids both here and on the ARCS repo, but I thought there might be separate considerations)

Best,
Ted

Scaffolding and patching with Supernova contigs

Hi,

I'm reading the description of RAILS (RAILS and Cobbler: Scaffolding and automated finishing of draft genomes using long DNA sequences). From what I understood "Cobbler and RAILS [...] can be used to scaffold & finish high-quality draft genome assemblies with any long, preferably high-quality, sequences such as scaftigs/contigs from another genome draft."

For a bunch of genomes, I have data produced with PacBio and 10X, for both, I have assembled a draft genome, and so far I'm using 10X data to error correct the PacBio draft and subsequently to further scaffold it using ARCS. Giving that both drafts have different completeness (BUSCO profiles), can I use RAILS to patch the PacBio genomes with the Supernova assembly?

Thank you,
F

bam file not produced

Hello,
I was running RAILS to scaffold an assembly with ONT reads and I don't see a .bam file being produced.
I run it with this command:
sh runRAILSminimap_190731.sh ../../Rabiosa_genome_v2.2.fa /home/copettid/public/Dario/Lolium/Oxford_Nanopore_data/batch_4_sequencig_190717/Rabiosa_b4_Guppy3.0.3_190718_40kb_q9.fa 2000 0.90 250 5 ont /usr/local/bin/samtools &>stdout
inside the shell script I edited the number of cores in minimap, but no bam file was generated. The beginning of the stdout says this:

Resolving ambiguous bases -Ns- in ../../Rabiosa_genome_v2.2.fa assembly using long sequences /home/copettid/public/Dario/Lolium/Oxford_Nanopore_data/batch_4_sequencig_190717/Rabiosa_b4_Guppy3.0.3_190718_40kb_q9.fa
reformatting file ../../Rabiosa_genome_v2.2.fa
WARNING: MAKE SURE YOUR INPUT FASTA IS ONE SEQUENCE PER LINE WITH NO LINE BREAKS!
reformatting file /home/copettid/public/Dario/Lolium/Oxford_Nanopore_data/batch_4_sequencig_190717/Rabiosa_b4_Guppy3.0.3_190718_40kb_q9.fa
Aligning long sequences /home/copettid/public/Dario/Lolium/Oxford_Nanopore_data/batch_4_sequencig_190717/Rabiosa_b4_Guppy3.0.3_190718_40kb_q9.fa-formatted.fa to your contigs..
Running minimap2 with preset map-ont
runRAILSminimap_190731.sh: line 24: /home/copettid/public/Dario/Lolium/Oxford_Nanopore_data/batch_4_sequencig_190717/Rabiosa_b4_Guppy3.0.3_190718_40kb_q9.fa_vs_../../Rabiosa_genome_v2.2.fa_gapfilling.bam: No such file or directory
[M::mm_idx_gen::81.339*1.59] collected minimizers
[M::mm_idx_gen::88.199*2.38] sorted minimizers

can it be because I have files with absolute paths in them?
Thanks,

Dario

samtools hard-coded to a gsc-specific path

Hi Rene,

I notice that the location of samtools in your shell and perl scripts are hard-coded to a gsc-specific path.

I hope everything is going well with you. I've been enjoying looking into all these recent genomes you guys have been publishing using 10x data etc. I hope to make use of these tools and pipelines in the near future.
Cheers,

Chris

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.