Giter VIP home page Giter VIP logo

nanopore_utils's Introduction

Nanopore data utilities

Source

read.py: read functions for .fast5, .fasta, .fastq and .fna files. normalization.py: several raw data normalization functions.

Dependencies

  • numpy
  • scipy
  • ont_fast5_api

Demo

from read import read_fast5
from normalization import med_mad, normalize_signal
import numpy as np

read_data = read_fast5('data/0a0bdc5c-8f8f-41ea-a4d1-4ff6344fac3e.fast5')
read_data = read_data[list(read_data.keys())[0]]

# check if it has a segmentation table
if not read_data.segmentation:
    print('This file does not have a reference')

# normalize the raw signal
med, mad = med_mad(read_data.raw, factor = 1.0)
norm_signal = normalize_signal(read_data.raw, med, mad)

# read_data.segmentation contains the mapping between the DNA sequence and 
# the raw signal, but this is relative to the start of the segmentation
dna_bases = read_data.segmentation['base']
dna_bases_start_in_raw = read_data.segmentation['start'] + read_data.start_rel_to_raw
dna_bases_end_in_raw = dna_bases_start_in_raw + read_data.segmentation['length']

# here we create an array, the length of the raw signal, and annotate where
# each DNA base corresponds to the DNA signal
dna_in_raw = np.full(norm_signal.shape, '', dtype='U1')
dna_in_raw[dna_bases_start_in_raw] = dna_bases

Resquiggle

from resquiggle import resquiggle_read_normalized

mapping = resquiggle_read_normalized(
    read_id = '0a0bdc5c-8f8f-41ea-a4d1-4ff6344fac3e', # this does not matter 
    raw_signal = modelled_signal, #use the same as in norm_signal 
    genome_seq = conditioning_dna_seq, # DNA sequence, as string that should match the raw signal e.g. 'ACGTACAGATC'
    norm_signal = modelled_signal, #numpy array
)

nanopore_utils's People

Contributors

marcpaga avatar

Stargazers

Brent Pedersen avatar Konstantinos Kyriakidis avatar Albert Jiménez avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.