Giter VIP home page Giter VIP logo

umi-tools's Introduction

https://user-images.githubusercontent.com/6096414/93030687-c7cf7300-f61c-11ea-92b8-102ec17ef6aa.png

UMI-tools was published in Genome Research on 18 Jan '17 (open access)

For full documentation see https://umi-tools.readthedocs.io/en/latest/

Tools for dealing with Unique Molecular Identifiers

This repository contains tools for dealing with Unique Molecular Identifiers (UMIs)/Random Molecular Tags (RMTs) and single cell RNA-Seq cell barcodes. Currently there are 6 commands.

The extract and whitelist commands are used to prepare a fastq containg UMIs +/- cell barcodes for alignment.

  • whitelist:
    Builds a whitelist of the 'real' cell barcodes
    This is useful for droplet-based single cell RNA-Seq where the identity of the true cell barcodes is unknown. Whitelist can then be used to filter with extract (see below)
  • extract:
    Flexible removal of UMI sequences from fastq reads.
    UMIs are removed and appended to the read name. Any other barcode, for example a library barcode, is left on the read. Can also filter reads by quality or against a whitelist (see above)

The remaining commands, group, dedup and count/count_tab, are used to identify PCR duplicates using the UMIs and perform different levels of analysis depending on the needs of the user. A number of different UMI deduplication schemes are enabled - The recommended method is directional.

  • dedup:
    Groups PCR duplicates and deduplicates reads to yield one read per group
    Use this when you want to remove the PCR duplicates prior to any downstream analysis
  • group:
    Groups PCR duplicates using the same methods available through `dedup`.
    This is useful when you want to manually interrogate the PCR duplicates
  • count:
    Groups and deduplicates PCR duplicates and counts the unique molecules per gene
    Use this when you want to obtain a matrix with unique molecules per gene, per cell, for scRNA-Seq.
  • count_tab:
    As per count except input is a flatfile

See QUICK_START.md for a quick tutorial on the most common usage pattern.

If you want to use UMI-tools in single-cell RNA-Seq data processing, see Single_cell_tutorial.md

Important update: We now recommend the use of alevin for droplet-based scRNA-Seq (e.g 10X, inDrop etc). alevin is an accurate, fast and convenient end-to-end tool to go from fastq -> count matrix and extends the UMI error correction in UMI-tools within a framework that also enables quantification of droplet scRNA-Seq without discarding multi-mapped reads. See alevin documentation and alevin pre-print for more information

The dedup, group, and count / count_tab commands make use of network-based methods to resolve similar UMIs with the same alignment coordinates. For a background regarding these methods see:

Genome Research Publication

Blog post discussing network-based methods.

Installation

If you're using Conda, you can use:

$ conda install -c bioconda -c conda-forge umi_tools

Or pip:

$ pip install umi_tools

Or if you'd like to work directly from the git repository:

$ git clone https://github.com/CGATOxford/UMI-tools.git

Enter repository and run:

$ python setup.py install

For more detail see INSTALL.rst

Help

For full documentation see https://umi-tools.readthedocs.io/en/latest/

See QUICK_START.md and Single_cell_tutorial.md for tutorials on the most common usage patterns.

To get help on umi_tools run

$ umi_tools --help

To get help on the options for a specific [COMMAND], run

$ umi_tools [COMMAND] --help

Dependencies

umi_tools is dependent on python>=3.5, numpy, pandas, scipy, cython, pysam, future, regex and matplotlib

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.