Giter VIP home page Giter VIP logo

indextools's Introduction

DNAnexus

Dnanexus Apps and Scripts

applets

  • binning_step0: BioBin Pipeline
  • biobin_pipeline
  • binning_step1: BioBin Pipeline
  • biobin_pipeline
  • binning_step2: BioBin Pipeline
  • biobin_pipeline
  • binning_step3: BioBin Pipeline
  • biobin_pipeline
  • impute2_group_join: Impute2_group_join
  • This app can be used to merge multiple imputed impute2 files
  • plato_biobin: PLATO BioBin Regression Analysis
  • PLATO_BioBin
  • vcf_batch: VCF Batch effect tester
  • vcf_batch

apps

  • association_result_annotation: Annotate GWAS, PheWAS Assocaitions
  • association_result_annotation
  • biobin:
  • This app runs the latest development build of the rare variant binning tool BioBin.
  • generate_phenotype_matrix: Generate Phenotype Matrix
  • generate_phenotype_matrix
  • genotype_case_control: Generate Case/Control by Genotype
  • App provides case and control number by each genotype
  • impute2: imputation
  • This will perfrom imputation using Impute2
  • impute2_to_plink: Impute2 To PLINK
  • Convert Impute2 file to PLINK files
  • plato_single_variant: PLATO - Single Variant Analysis
  • Apps allows you to run single variant association testing against single phenotype (GWAS) or multiple phenotype (PheWAS) test
  • rl_sleeper_app: sleeper
  • This App provides some useful tools when working with data in DNANexus. This App is designed to be run on the command line with "dx run --ssh RL_Sleeper_App" in the project that you have data that you want to explore (use "dx select" to switch projects as needed).
  • shapeit2: SHAPEIT2
  • This app do phasing using SHAPEIT2
  • strand_align: Strand Align
  • Strand Align prior to phasing
  • vcf_annotation_formatter:
  • Extracts and reformats VCF annotations (CLINVAR, dbNSFP, SIFT, SNPEff)
  • QC_apps subfolder:
    • drop_marker_sample: Drop Markers and/or Samples (PLINK)
      • drop_marker_sample
  • drop_relateds: Relatedness Filter (IBD)
    • drop_relateds
  • extract_marker_sample: Drop Markers and/or Samples (PLINK)"
    • extract_marker_sample
  • maf_filter: Marker MAF Rate Filter (PLINK)
    • maf_filter
  • marker_call_filter: Marker Call Rate Filter (PLINK)
    • marker_call_filter
  • missing_summary: Missingness Summary (PLINK)
    • Returns missingness rate by sample
  • pca: Principal Component Analysis using SMARTPCA
    • pca
  • sample_call_filter: Sample Call Rate Filter (PLINK)
    • sample_call_filter

scripts

  • cat_vcf.py *
  • download_intervals.py *
  • download_part.py *
  • estimate_size.py *
  • interval_pad.py
    • This reads a bed file from standard input, pads the intervals, sorts and then outputs the intervals guranteed to be non-overlapping
  • update_applet.sh *

sequencing

  • bcftools_view:
    • Calls "bcftools view". Still in experimental stages.
  • calc_ibd:
    • Calculates a pairwise IBD estimate from either VCF or PLINK files using PLINK 1.9.
  • call_bqsr: Base Quality Score Recalibration
  • call_genotypes:
    • Obsolete, do not use; use geno_p instead. Calls GATK GenotypeGVCFs.
  • call_hc:
  • call_vqsr:
  • cat_variants: combine_variants
    • Combines non-overlapping VCF files with the same subjects. A reimplementation of GATK CatVariants (GATK CatVariants available upon request)
  • combine_variants: combine_variants
  • gen_ancestry:
    • Determine Ancestry from PCA. Uses an eigenvector file and training dataset listing known ancestries. Runs QDA to determine posterior ancestries for all samples, even those in the training set.
  • gen_related_todrop:
    • Uses a PLINK IBD file to determine the minimal set of samples to drop in order to generate an unrelated sample set. Uses a minimum vertex cut algorithm of the related samples to get
  • geno_p:
  • merge_gvcfs:
  • plink_merge:
    • Merge PLINK bed/bim/fam files using PLINK 1.9
  • select_variants: VCF QC
  • variant_annotator: VCF QC
  • vcf_annotate: Annotate VCF File
    • Use a variety of tools to annotate a sites-only VCF.
  • vcf_concordance: VCF Concordance
  • vcf_gen_lof:
    • Subset a VCF from vcf_annotate based on the given annotations to get a sites-only VCF of loss-of-function variants.
  • vcf_pca:
    • Uses PLINK 1.9 and eigenstrat 6.0 to calculate principal components from VCF or PLINK bed/bim/fam files.
  • vcf_qc:
  • vcf_query:
    • Calls "bcftools query" to extract annotations from the VCF file. Used in the stripping of files for MEGAbase
  • vcf_sitesonly: VCF QC
    • Generates a sites-only file from full VCF files.
  • vcf_slice: Slice VCF File(s)
    • Return a small section of a VCF file (similar to tabix). For large output, many small regions, or subsetting samples, use subset_vcf instead.
  • vcf_summary: VCF Summary Statistics
    • Generate summary statistics for a VCF file (by sample and by variant)
  • vcf_to_plink:
    • Uses PLINK 1.9 to convert VCF files to PLINK bed/bim/fam files

indextools's People

Contributors

commandlinegirl avatar damien-black avatar jdidion avatar knafissi avatar nainathangaraj avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

indextools's Issues

Use RefGet to fetch contig information

When the primary file is not available or does not contain contig information, use RefGet to fetch the information on contig names and sizes, rather than requiring a .genome file. If the primary file is supplied and it specified the ID or hash of the reference genome, use that to lookup the metadata; otherwise, require the genome ID or hash to be passed as a command line option.

RefGet spec: http://samtools.github.io/hts-specs/refget.html

Server list:

Python client: https://github.com/ga4gh/refget-client

IndexTools broken on python 3.7

There appears to be a difference between 3.6 and 3.7 with the attributes available on types:

root@39d3db5b72a7:/# which indextools 
/usr/local/bin/indextools
root@39d3db5b72a7:/# indextools
Traceback (most recent call last):
  File "/usr/local/bin/indextools", line 6, in <module>
    from indextools.console import indextools
  File "/usr/local/lib/python3.7/site-packages/indextools/console/__init__.py", line 37, in <module>
    ac.conversion(decorated=parse_region)
  File "/usr/local/lib/python3.7/site-packages/autoclick/types/__init__.py", line 68, in conversion
    return decorator(decorated)
  File "/usr/local/lib/python3.7/site-packages/autoclick/types/__init__.py", line 63, in decorator
    click_type = ParamTypeAdapter(_dest_type.__name__, target)
  File "/usr/local/lib/python3.7/typing.py", line 702, in __getattr__
    raise AttributeError(attr)
AttributeError: __name__

Aggregate data across multiple samples

It may be useful to generate a single BAM file for parallelization across many samples, rather than one per sample. To do that, we can simply sum the volumes across samples for each interval, and then split on the aggregate data.

Add integration tests for partition command

To start, this should just be a single test that runs the partition command on an index file and ensures it produces a BED file with the expected number of partitions.

In the long-term, I'd like this to grow into a parameterized test run across a bunch of index files from different sources.

Add unit tests for index.py

Motivation

Adding tests allow developers to refactor or add functionality and make sure the module still works correctly. They're lightweight but extremely helpful.

Issue

index.py needs unit test coverage

Merge multiple test_data.json files in directory hierarchy

Support the following use case:

tests
|_test_data.json
|_module1
  |_test_module1.py
  |_test_data.json

Merge data from the two test_data.json files, with the one at lower level of nesting overriding the higher one if any keys collide.

Support RefGet

When the primary file is not available or does not contain contig information, use RefGet to fetch the information on contig names and sizes, rather than requiring a .genome file. If the primary file is supplied and it specified the ID or hash of the reference genome, use that to lookup the metadata; otherwise, require the genome ID or hash to be passed as a command line option.

RefGet spec: http://samtools.github.io/hts-specs/refget.html

Server list:

Python client: https://github.com/ga4gh/refget-client

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.