Giter VIP home page Giter VIP logo

diamond-1's Introduction

Introduction

DIAMOND is a sequence aligner for protein and translated DNA searches, designed for high performance analysis of big sequence data. The key features are:

  • Pairwise alignment of proteins and translated DNA at 500x-20,000x speed of BLAST.
  • Frameshift alignments for long read analysis.
  • Low resource requirements and suitable for running on standard desktops or laptops.
  • Various output formats, including BLAST pairwise, tabular and XML, as well as taxonomic classification.

Follow me on Twitter to get notified of updates.

image

Quick start guide

Please read the manual for detailed installation and usage instructions. This demonstrates a quick example for setting up and using the program on Linux.

Installing the software on your system may be done by downloading it in binary format for immediate use:

wget http://github.com/bbuchfink/diamond/releases/download/v0.9.14/diamond-linux64.tar.gz
tar xzf diamond-linux64.tar.gz

The extracted diamond binary file should be moved to a directory contained in your executable search path (PATH environment variable).

To now run an alignment task, we assume to have a protein database file in FASTA format named nr.faa and a file of DNA reads that we want to align named reads.fna.

In order to set up a reference database for DIAMOND, the makedb command needs to be executed with the following command line:

$ diamond makedb --in nr.faa -d nr

This will create a binary DIAMOND database file with the specified name (nr.dmnd). The alignment task may then be initiated using the blastx command like this:

$ diamond blastx -d nr -q reads.fna -o matches.m8

The output file here is specified with the โ€“o option and named matches.m8. By default, it is generated in BLAST tabular format.

Note:
  • The program may use quite a lot of memory and also temporary disk space. Should the program fail due to running out of either one, you need to set a lower value for the block size parameter -b (see the manual).
  • The default (fast) mode was mainly designed for short reads. For longer sequences, the sensitive modes (options --sensitive or --more-sensitive) are recommended.
  • The runtime of the program is not linear in the size of the query file and it is much more efficient for large query files (> 1 million sequences) than for smaller ones.
  • Low complexity masking is applied to the query and reference sequences by default. Masked residues appear in the output as X.
  • The default e-value cutoff of DIAMOND is 0.001 while that of BLAST is 10, so by default the program will search a lot more stringently than BLAST and not report weak hits.

About

DIAMOND is developed by Benjamin Buchfink. Feel free to contact me for support (Email Twitter).

If you use DIAMOND in published research, please cite B. Buchfink, Xie C., D. Huson, "Fast and sensitive protein alignment using DIAMOND", Nature Methods 12, 59-60 (2015).

image

diamond-1's People

Contributors

bbuchfink avatar benfulton avatar colinbrislawn avatar gaberoo avatar kblin avatar leebergstrand avatar wasade avatar yesimon avatar yunesj avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.