Giter VIP home page Giter VIP logo

goontools's Introduction

NOTE: this is work in progess. Here be dragons bugs. Potentially.

goontools

goontools processes GOON files.

GOON files?!

GOON (GenOme jsON) is a simple, flexible format that is easy to process, stream and extend. Its specification is the following:

  • a GOON file is a line-delimited JSON (LDJSON) file
  • every line encodes a datum of type Object
  • every such object exposes a sequence name key with an associated String value and one or two position keys with associated integer values (strictly speaking of type Number with no fractional part)

...and that's it. Here's one valid GOON record, where CHR is the sequence name key and POS the (only) position key:

{"CHR": "chrX", "POS": 500017, "ALLELES": ["C", "T"]}

And in a different file you could have records with following structure: sequence name key chrom and position keys chromStart and chromEnd, whose values define an interval on the sequence.

{"chrom": "chr20", "chromStart": 0, "chromEnd": 500, "strand": "+"}

Back to goontools.

goontools provides a set of commands to manipulate GOON files. In particular, following commands are currently implemented:

  • index
  • sort
  • view
  • idxstat
  • extract (experimental)
  • pprint

Let's take a look at bed.ldj from the example directory:

# pretty-print first record
$ goontools pprint -n 1 bed.ldj
{
  "chromEnd": 5,
  "name": "name_0",
  "score": 426,
  "chromStart": 0,
  "chrom": "10",
  "strand": "-"
}
# sort and bgzip file:
$ goontools sort -s chrom -b chromStart bed.ldj > bed.srt.ldj
$ bgzip bed.srt.ldj
# positions are zero-based and "right-open" (non-inclusive end)
$ goontools index -0 -r -s chrom -b chromStart -e chromEnd bed.srt.ldj.gz
# check what we've indexed
$ goontools idxstat bed.srt.ldj.gz
SEQUENCE_NAME  1
SEQUENCE_NAME  10
SEQUENCE_NAME  11
SEQUENCE_NAME  3
SEQUENCE_NAME  X
SEQUENCE_KEY   chrom
START_KEY      chromStart
END_KEY        chromEnd
ZERO_BASED     true
RIGHT_OPEN     true
# get chromosome 1 records
$ goontools view bed.srt.ldj.gz 1
{"chromEnd": 11, "name": "name_6", "score": 711, "chromStart": 1, "chrom": "1", "strand": "+"}
{"chromEnd": 7, "name": "name_33", "score": 290, "chromStart": 2, "chrom": "1", "strand": "+"}
{"chromEnd": 4, "name": "name_49", "score": 923, "chromStart": 2, "chrom": "1", "strand": "-"}
{"chromEnd": 8, "name": "name_11", "score": 375, "chromStart": 4, "chrom": "1", "strand": "-"}
{"chromEnd": 9, "name": "name_37", "score": 669, "chromStart": 4, "chrom": "1", "strand": "-"}
{"chromEnd": 12, "name": "name_39", "score": 620, "chromStart": 4, "chrom": "1", "strand": "-"}
{"chromEnd": 14, "name": "name_21", "score": 402, "chromStart": 5, "chrom": "1", "strand": "-"}
{"chromEnd": 7, "name": "name_44", "score": 725, "chromStart": 6, "chrom": "1", "strand": "-"}
{"chromEnd": 14, "name": "name_19", "score": 593, "chromStart": 7, "chrom": "1", "strand": "-"}
{"chromEnd": 17, "name": "name_5", "score": 392, "chromStart": 10, "chrom": "1", "strand": "-"}
{"chromEnd": 14, "name": "name_16", "score": 364, "chromStart": 10, "chrom": "1", "strand": "+"}
{"chromEnd": 19, "name": "name_32", "score": 371, "chromStart": 10, "chrom": "1", "strand": "-"}
{"chromEnd": 18, "name": "name_34", "score": 628, "chromStart": 10, "chrom": "1", "strand": "-"}
# get intersecting intervals
# by default `view` assumes position encoding from original file, i.e. 0-based, half-open here
$ goontools view bed.srt.ldj.gz 1:7-10
{"chromEnd": 11, "name": "name_6", "score": 711, "chromStart": 1, "chrom": "1", "strand": "+"}
{"chromEnd": 8, "name": "name_11", "score": 375, "chromStart": 4, "chrom": "1", "strand": "-"}
{"chromEnd": 9, "name": "name_37", "score": 669, "chromStart": 4, "chrom": "1", "strand": "-"}
{"chromEnd": 12, "name": "name_39", "score": 620, "chromStart": 4, "chrom": "1", "strand": "-"}
{"chromEnd": 14, "name": "name_21", "score": 402, "chromStart": 5, "chrom": "1", "strand": "-"}
{"chromEnd": 14, "name": "name_19", "score": 593, "chromStart": 7, "chrom": "1", "strand": "-"}
# but we can also use a different encoding, e.g. pass a closed interval
$ goontools view -c bed.srt.ldj.gz 1:7-10
{"chromEnd": 11, "name": "name_6", "score": 711, "chromStart": 1, "chrom": "1", "strand": "+"}
{"chromEnd": 8, "name": "name_11", "score": 375, "chromStart": 4, "chrom": "1", "strand": "-"}
{"chromEnd": 9, "name": "name_37", "score": 669, "chromStart": 4, "chrom": "1", "strand": "-"}
{"chromEnd": 12, "name": "name_39", "score": 620, "chromStart": 4, "chrom": "1", "strand": "-"}
{"chromEnd": 14, "name": "name_21", "score": 402, "chromStart": 5, "chrom": "1", "strand": "-"}
{"chromEnd": 14, "name": "name_19", "score": 593, "chromStart": 7, "chrom": "1", "strand": "-"}
{"chromEnd": 17, "name": "name_5", "score": 392, "chromStart": 10, "chrom": "1", "strand": "-"}
{"chromEnd": 14, "name": "name_16", "score": 364, "chromStart": 10, "chrom": "1", "strand": "+"}
{"chromEnd": 19, "name": "name_32", "score": 371, "chromStart": 10, "chrom": "1", "strand": "-"}
{"chromEnd": 18, "name": "name_34", "score": 628, "chromStart": 10, "chrom": "1", "strand": "-"}

goontools takes inspiration and quite a bit of actual code from tabix[1] and samtools[2].

[1] http://samtools.sourceforge.net/

[2] http://bioinformatics.oxfordjournals.org/content/27/5/718.full

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.