Giter VIP home page Giter VIP logo

rna2hla's Introduction

RNA2HLA

HLA-based quality control of RNA-seq datasets

Synopsis

Tool extracts the HLA types of I and II classes from all the files in the folder containing raw RNA-seq data (paired- on single-end). The alleles are then cross-compared between the RNA-seq samples to identify the common source of the samples based on HLA types (4 digital resolution).

Releases: v1.0 - original tool; v1.1 - script creating heatmap output is added

Author

Dr. Irina Chelysheva, 2019-2023 (c)
Oxford Vaccine Group, Department of Paediatrics, University of Oxford
Contact

Usage

$ python RNA2HLA.py -f /raw_RNAseq_data_folder [-r /global_name_of_run] [-p <int>] [-3 <int>] [-c <float>] [-g <int>]

-f is required for running RNA2HLA. Folder should contain raw RNA-seq samples, single- or paired-end or both types, in a compressed or not compressed formats.

Optional parameters:

  • -r to be used as a prefix for all output files
  • -p number of parallel search threads for bowtie (default: 6)
  • -3 trim bases from the low-quality end of each read
  • -c confidence level for HLA-typing (default: 0.05)
  • -g number of HLA genes to be included for typing (default: 5, may be increased to 6 - adding DQB1)

Dependencies

  1. RNA2HLA is a Python script (available in two versions: for Python 2 and Python 3 (coming soon)).
  2. All the dependencies provided within RNA2HLA depository (Python scripts single_end.py and paired_end.py, function scripts in R and Python, HLA class I and II databases) must be downloaded and located in the same folder.
  3. Index files must be downloaded and located in subfolder /references.
  4. Ther easiest way to run RNA2HLA is to create a conda environment using RNA2HLA_env.yml file provided:
    $ conda env create -f RNA2HLA_env.yml
    And activate it:
    $ source activate RNA2HLA_env or $ conda activate RNA2HLA_env (depends on the conda version)

Update from 2.04.2021: One user reported an error while trying to create an environment from the original yml file (this error does not appear in most cases). If you experience an error, please, use an alternative environment file RNA2HLA_env_alt.yml instead.

Otherwise:
4a) bowtie must be reachable by the command bowtie (developed with version 1.1.2)
4b) R must be installed.
4c) Packages: biopython (developed with 1.76), numpy (developed with 1.16.6, !this version caused an error for one user, therefore - 1.15 is preferable), pandas (developed with 0.24.2)

Output

The final output - overall comparison matrix in csv format, which cross-compares all RNA-seq samples in the given folder.

Individual outputs in txt format produced for each RNA-seq sample in the folder (classes I and II are written in one file):

  1. .bowtielog.txt - file with statistics of HLA mapping;
  2. .ambiguity.txt - reports typing ambuigities (if more than one solution for an allele possible based on the expression and HLA databases);
  3. .expression.txt - RPKM expression of HLA;
  4. .HLAgenotype4digits.txt - 4 digital HLA type.

Update from 9.03.2023: v1.1 Heatmap can be created from the overall comparison matrix csv file using an R script heatmap_HLA_identity_comparison.R

Limitation

In the case of studying a particular population with prior knowledge of the low HLA allele diversity, RNA2HLA should not be used as a QC, but only as a convenient study-wide HLA-typing method. One can refer to the Allele Frequency Net Database and discover HLA diversity of particular population through the interactive map. The populations with less than 50 of total known alleles should be considered as of low diversity.

Version history

1.0: initial tool

Citations - RNA2HLA

Please, cite the following publication, if you are using RNA2HLA in your research: Irina Chelysheva, Andrew J Pollard, Daniel O’Connor, RNA2HLA: HLA-based quality control of RNA-seq datasets, Briefings in Bioinformatics, 2021

License

MIT

rna2hla's People

Contributors

chelysheva avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.