Giter VIP home page Giter VIP logo

ukb_extract_variants's Introduction

UKB_extract_variants

DNAnexus users can extract UKB variants using the swiss-army-knife app via DNAnexus portal or use the command-line as bellow:

dx run app-swiss-army-knife \
-iin=project-GFk7Z88JbjP3P0Gz6q09vXFX:file-G97bV6jJykJqFjKv34JYVZ9Y \
-iin=project-GFk7Z88JbjP3P0Gz6q09vXFX:file-G97bj8jJykJbVkp97jxjQ3zJ \
-icmd="bcftools view --regions chr14:95086244-95158010 ukb23157_c14_b24_v1.vcf.gz -O z > UKBB_DICER1_500K.vcf.gz" --destination "/ws" -y

Both of the two options are good for small use but it will take some effort if you want to extract variants from many different pVCF blocks.

extract_genes_ukb_pvcf.pl is a PERL script to extract variants of interest (e.g., in a list of genes) from the UK Biobank data hosted at DNAnexus.

Briefly, the script takes a list of genes and get their locations in the reference genome hg38. Then, find the overlapped pVCF blocks. Query DNAnexus project space to get the object name of the vcf files (and their index files as well). Finally, output the essential information in JSON format for the use of swiss-army-knife app.

Dependencies

  • JSON perl module Perl modules can be installed in many ways. For example, via conda (or mamba if available):
conda install -c bioconda perl-json
  • dx-toolkit (DNAnexus SDK) dx-toolkit is available to download here. For NIH/NCI users, its installation is not needed as it is available as module DNAnexus at Biowulf.

Get started

The perl script takes five inputs:

Get the JSON files

### For Biowulf users
sinteractive # to start an interactive session 

module load DNAnexus

### login to DNAnexus using dx and select UKB project
dx login

### run the perl script
perl  extract_genes_ukb_pvcf.pl "UKB - Aug 09, 2022:Bulk/Exome sequences/Population level exome OQFE variants, pVCF format - final release/" hg38_genes.tsv pvcf_blocks.txt g28.lst JSON

### JSON will be created by the PERL script, and some JSON files will be generated under the JSON folder.
  • The expected output from the PERL script
ls JSON/
UKBB_500K.ACAD11.0.json   UKBB_500K.LGR5.0.json
UKBB_500K.ARID1B.0.json   UKBB_500K.MAP2.0.json
UKBB_500K.ATR.0.json      UKBB_500K.MITF.0.json
UKBB_500K.BLM.0.json      UKBB_500K.MKX.0.json
UKBB_500K.BRIP1.0.json    UKBB_500K.NFX1.0.json
UKBB_500K.DGKD.0.json     UKBB_500K.NID2.0.json
UKBB_500K.DNAJC21.0.json  UKBB_500K.NNT.0.json
UKBB_500K.ERBB4.0.json    UKBB_500K.POLH.0.json
UKBB_500K.ERCC2.0.json    UKBB_500K.POLQ.0.json
UKBB_500K.ERCC5.0.json    UKBB_500K.POLR1A.0.json
UKBB_500K.ERCC8.0.json    UKBB_500K.UBA7.0.json
UKBB_500K.FLCN.0.json     UKBB_500K.UBA7.1.json
UKBB_500K.GABPB1.0.json   UKBB_500K.WNT4.0.json
UKBB_500K.IDE.0.json      UKBB_500K.WNT4.1.json
UKBB_500K.KIF20A.0.json   UKBB_500K.WRN.0.json

Run Swiss-army-knife app.

The swiss-army-knife app will take JSON as input and extract variants to the target location.

### make a directory "ws" at your DNAnexus project space (you may choose any preferred name other than "ws")
dx mkdir /ws


### Test to run (optional)
dx run app-swiss-army-knife -f JSON/UKBB_500K.WNT4.0.json --destination "/ws" -y

### Process batch of all JSON files using the unix command parallel
ls JSON/*.json | parallel -j 3 'dx run app-swiss-army-knife -f {} --destination "/ws" -y '

### dx run will trigger the processes at DNAnexus servers and you may monitor them via DNAnexus portal (see the snapshot below).

### Wait until all the jobs are completed remotely.

Download pVCF files from remote server to your "local" folder.

mkdir pvcf

dx download /ws/UKBB_500K.*.vcf.gz -f -o pvcf/

ukb_extract_variants's People

Contributors

weizhu365 avatar

Watchers

Wendy Wong avatar  avatar Nathan Cole avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.