Giter VIP home page Giter VIP logo

nanopore's People

Contributors

marivascruz avatar yk-tanigawa avatar

Watchers

 avatar  avatar  avatar

nanopore's Issues

Time stamp from fast5 files

objective

  • would like to have a plot
    • x-axis time
    • y-axis total length of nucleotides (on chromosome 22)

method

  • try to use poretools, porekit, or some other software
  • if all of them fails, need to write a script to read fast5 file directly

Collaboration with James's team

Decided to keep track on the progress on the collaboration project with James's team on GitHub as well

meetings

2017/1/11

2017/1/23

2017/1/30

  • discussion
    • Haplotype inference (& phasing) with deep net ??

2017/2/6

  • discussion
  • we will initially focus on chromosome 20
  • Yosuke have pointed out GIAB data set and provided download script for
    James's team so that they can prepare 'label' for their variant caller
  • They also have bam file of selected long reads from Nanopore consortium
    consortium data, which will be used to construct pile-up image (input)
    for their deep network
  • Yosuke will continue to work on my project to produce posterior
    probability of haplotype given reads

WTCHG data

Oxford Nanopore MinION data from Wellcome Trust Centre for Human Genetics

logistics

data description from README

  • DNA was prepared according to the Damaged library improvement protocol
    with size selection using BluePippin 0.75% DF Marker S1 high-pass
    6-10kb v3 protocol with BPstart of 6kb.
  • Libraries were generated with the SQK-LSK108 Ligation Sequencing Kit
    1D (R9.4) and sequenced using SpotON R9.4 flowcells on a MinION Mk 1B.
  • In our analyses we declare PASS reads to be at least 1 k base in length and have and average quality of 14 or more.
  • Reads were mapped to hs37d5lam.fasta including lambda phage control sequences using "bwa mem" with the following options: -x ont2d -t 12 -R "@RG\tID:1\tSM:NA12878" -M.
  • Reads were mapped per flow-cell and then merged using samtools merge to produce the final output, comprising bam and bam index (.bai) files. PASS and FAIL reads are provided in separate files.

Google Cloud

  • Understand how to use Google Cloud on Sherlock Cluster

[TODO] 1KG population reference

  • 1000 Genome reference panel has an advantage that they have phased information
  • [TODO]
    • Prepare 1KG population reference (pgen)
    • Compute prior probability of haplotype (population frequency)
    • Compute posterior probability given data

Follow Helio's pipeline

objective

  • Familiarize to work with nanopore data

procedure

  • Follow Helio's pipeline

software installation

  • poretools
  • samtools
  • LAST
  • Picard

data

  • reference genome (transcriptome)
  • nanopore data (of course)

count match/mismatch

  1. How many FAST5 files do we have?
  2. Take several sample files, and convert them into FASTQ file
  3. Map to the reference genome (hg19)
  4. Bring in prior data (UKBB 500k individuals)
  5. Take platinum whole genome VCF file to compute $\theta$

dbSNP

$ wget -O - ftp://ftp.ncbi.nih.gov/snp/organisms/human_9606/VCF/00-All.vcf.gz | zcat - > /home/ytanigaw/PI_HOME/data/dbsnp/dbsnp_all_20160527.vcf
$ awk '(! /^#/){ print "chr"$0 }(/^#/){ print $0 }' dbsnp_all_20160527.vcf > bgzip -c - > dbsnp_all_20160527.vcf.gz

Broken file: /share/PI/mrivas/ukbb/download/chr20impv1.bgen

plink1.9 failed to convert /share/PI/mrivas/ukbb/download/chr20impv1.bgen with the following error massage:

Error: File read failure.

The same script works for chromosome11, so it is likely this file is broken.

nanopolish

To run nanopolish, you need to have c++ dynamic libraries, which you can activate by loading gcc module on sherlock cluster.

[ytanigaw@sherlock-ln01 login_node ~/nanopolish]$ ml purge
[ytanigaw@sherlock-ln01 login_node ~/nanopolish]$ ./nanopolish_test 
./nanopolish_test: /usr/lib64/libstdc++.so.6: version `CXXABI_1.3.7' not found (required by ./nanopolish_test)
./nanopolish_test: /usr/lib64/libstdc++.so.6: version `GLIBCXX_3.4.20' not found (required by ./nanopolish_test)
./nanopolish_test: /usr/lib64/libstdc++.so.6: version `GLIBCXX_3.4.14' not found (required by ./nanopolish_test)
./nanopolish_test: /usr/lib64/libstdc++.so.6: version `GLIBCXX_3.4.21' not found (required by ./nanopolish_test)
[ytanigaw@sherlock-ln01 login_node ~/nanopolish]$ ml load gcc
[ytanigaw@sherlock-ln01 login_node ~/nanopolish]$ ./nanopolish_test 
===============================================================================
All tests passed (427 assertions in 5 test cases)

GIAB data

  • Pacbio data
    • Download
    • select chr 11 and 20 with samtools
  • chr11 >20kb
  • chr20 >10kb
  • start with chr11

BGEN file

  • Understand the BGEN file format
  • Be able to manipulate haplotype reference data stored in BGEN file.
    • extract the region of interest, etc..

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.