Giter VIP home page Giter VIP logo

sam2consensus's Introduction

sam2consensus

Get the consensus sequences for short sequencing reads mapped to a reference. Version 2.0

Brief description

The program takes as input a SAM file (.sam or .sam.gz) resulting from mapping short reads to a reference (the reference sequences can correspond to separate genes for example), then it calculates the consensus sequence from the aligned reads alone. If you have BAM files you will need to convert them first with samtools. A single or multiple consensus thresholds can be specified, the program also adds insertions, if many long long insertions are expected, we recommend to perform indel ralignment before for optimal results. The consensus method is the one used by Geneious and described in detail in http://assets.geneious.com/manual/8.1/GeneiousManualse41.html

Regions with no coverage are filled with -s (or a different character if specified). Input SAM files don't need to be sorted. Original reference FASTAs are not necessary since the consensus is reference-free.

It will produce a FASTA file per reference containing as many sequences as thresholds were specified.

Usage

Just type python sam2consensus.py -h to show the help of the program:

usage: sam2consensus.py [-h] -i FILENAME [-c THRESHOLDS] [-n N] [-o OUTFOLDER]
                        [-p PREFIX] [-m MIN_DEPTH] [-f FILL] [-d MAXDEL]

+------------------------------------------------------------------+
| sam2consensus.py: extract the consensus sequence from a SAM file |
+------------------------------------------------------------------+

The program takes as input a SAM file (.sam or .sam.gz) resulting from mapping
short reads to a reference (the reference sequences can correspond to separate
genes for example), then it calculates the consensus sequence from the aligned
reads alone. A single or multiple consensus thresholds can be specified, the
program also adds insertions, if many long insertions are expected, we recommend
to perform indel ralignment before for optimal results. The consensus method is
the one used by Geneious and described in detail in
http://assets.geneious.com/manual/8.1/GeneiousManualse41.html

Regions with no coverage are filled with -s (or a different character if
specified). Input SAM files don't need to be sorted. Original reference FASTAs
are not necessary since the consensus is reference-free.

It will produce a FASTA file per reference containing as many sequences as
thresholds were specified.

optional arguments:
  -h, --help            show this help message and exit
  -i FILENAME, --input FILENAME
                        Name of the SAM file, SAM does not need to be sorted
                        and can be compressed with gzip
  -c THRESHOLDS, --consensus-thresholds THRESHOLDS
                        List of consensus thresold(s) separated by commas, no
                        spaces, example: -c 0.25,0.75,0.50, default=0.25
  -n N                  Split FASTA output sequences every n nucleotides,
                        default=do not split sequence
  -o OUTFOLDER, --outfolder OUTFOLDER
                        Name of output folder, default=same folder as input
  -p PREFIX, --prefix PREFIX
                        Prefix for output file name, default=input filename
                        without .sam extension
  -m MIN_DEPTH, --min-depth MIN_DEPTH
                        Minimum read depth at each site to report the
                        nucleotide in the consensus, default=1
  -f FILL, --fill FILL  Character for padding regions not covered in the
                        reference, default= - (gap)
  -d MAXDEL, --maxdel MAXDEL
                        Ignore deletions longer than this value, default=150

Examples

In the following examples, if you added the script to the PATH you can omit python from the commands.

Example 1: Using the default consensus threshold of 0.25:

python sam2consensus.py -i myfile.sam

Example 2: Using multiple consensus thresholds of 0.25, 0.50, and 0.75 and specifying an output folder:

python sam2consensus.py -i myfile.sam -c 0.25,0.50,0.75 -o results

Example 3: Using a custom consensus threshold of 0.75 and changing padding character for uncovered regions to "N":

python sam2consensus.py -i myfile.sam -c 0.75 -f N

Example 4: Specifying a prefix "sample1" and increasing minimum mapping depth to 10:

python sam2consensus.py -i myfile.sam -pre sample1 -m 10

Credits

sam2consensus's People

Contributors

edgardomortiz avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

sam2consensus's Issues

KeyError

Hi,

I tried to pull an consensus from an Minimap2 Alignment. sam2consensus stops with the following error message:
lars@helmut:~/Desktop/VZV/Carina_09_2021/VZV_ORF14_cmv/assembly-test/Scaffold$ python /home/lars/Desktop/NGS/sam2consensus-master/sam2consensus.py -f N -p final -i scaffold_mapping.sam

Processing file scaffold_mapping.sam:

SAM header processed, 1 references found.

0 reads processed.
Traceback (most recent call last):
File "/home/lars/Desktop/NGS/sam2consensus-master/sam2consensus.py", line 430, in
main()
File "/home/lars/Desktop/NGS/sam2consensus-master/sam2consensus.py", line 212, in main
sequences[refname][pos_ref][nuc] += 1
KeyError: '*'

The .sam file seemed to be correct. There are no issues while importing and visualize it in CLC or Geneious.

Best,

Lars

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.