Giter VIP home page Giter VIP logo

dpclustering.jl's Introduction

DPClustering

Build Status Build Status Coverage Status codecov.io

Perform Dirichlet clustering on Varaint Allele Frequncies (VAFs) from sequencing data of cancers a la Nik-Zainal et al.

Getting started

Package is written in the Julia programming language.

To download this package use the Pkg.add function as below, which will download the package and install all the dependencies.

Pkg.add("DPClustering")

Clustering

Clustering is invoked using the dpclustering function which takes 2 vectors of equal size: y - the number of reads reporting the mutation and N - the depth of coverage at that locus. With these, clustering can be performed and the function will return a DPresults type. There are a number of optional arguments which are all set to reasonable defaults, you may want to change the number of iterations or set verbose to false. the default is to show a log of the time taken in the gibbs sampler. To see the optional arguments and their defaults use ?dpclustering in a julia session.

dp = dpclustering(y, N, iterations = 10000, verbose = false);

You can then plot the output using plotresults(dp).

At the moment, clustering will only work with single samples and mutations in copy neutral regions. So the input mutations should either be filtered for copy number alterations or corrected for copy number before inputting.

Example

There is some example data provided originally in Nik-Zainal et al in the examples folder. So an analysis would proceed as follows. There is a built in function to plot the data and associated clustering.

using DPClustering
data = readcsv("example/data.csv", header = true)
y = data[1][:, 1]
N = data[1][:, 2]

out = dpclustering(y, N)
myplot = plotresults(out, save = true)

plot

Speed

Due to Julia's just in time compilation the sampling is relatively fast. For example, the analysis above (600 mutations) the time taken to generate 10,000 iterations/samples should be on the order of 2-3 minutes on a reasonably specced laptop.

Acknowledgments

The model used in the Gibbs sampler is as described in Nik-Zainal et al. Bugs code provided in the supplementary information of this publication and code available from David Wedge's group (https://github.com/Wedge-Oxford/dpclust_docker) were used in the development of the package.

dpclustering.jl's People

Watchers

hkim avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.