Giter VIP home page Giter VIP logo

bbi-js's Introduction

bbi-js

NPM version Coverage Status Build Status

A parser for bigwig and bigbed file formats

Usage

If using locally

const { BigWig } = require('@gmod/bbi')
const file = new BigWig({
  path: 'volvox.bw',
})
;(async () => {
  await file.getHeader()
  const feats = await file.getFeatures('chr1', 0, 100, { scale: 1 })
})()

If using remotely, you can use it in combination with generic-filehandle or your own implementation of something like generic-filehandle https://github.com/GMOD/generic-filehandle/

const { BigWig } = require('@gmod/bbi')
const { RemoteFile } = require('generic-filehandle')

// if running in the browser or newer versions of node.js, RemoteFile will use
// the the global fetch
const file = new BigWig({
  filehandle: new RemoteFile('volvox.bw'),
})

// old versions of node.js without a global fetch, supply custom fetch function
const fetch = require('node-fetch')
const file = new BigWig({
  filehandle: new RemoteFile('volvox.bw', { fetch }),
})

;(async () => {
  await file.getHeader()
  const feats = await file.getFeatures('chr1', 0, 100, { scale: 1 })
})()

Documentation

BigWig/BigBed constructors

Accepts an object containing either

  • path - path to a local file
  • url - path to a url
  • filehandle - a filehandle instance that you can implement as a custom class yourself. path and url are based on https://www.npmjs.com/package/generic-filehandle but by implementing a class containing the Filehandle interface specified therein, you can pass it to this module

BigWig

getFeatures(refName, start, end, opts)

  • refName - a name of a chromosome in the file
  • start - a 0-based half open start coordinate
  • end - a 0-based half open end coordinate
  • opts.scale - indicates zoom level to use, specified as pxPerBp, e.g. being zoomed out, you might have 100bp per pixel so opts.scale would be 1/100. the zoom level that is returned is the one which has reductionLevel<=2/opts.scale (reductionLevel is a property of the zoom level structure in the bigwig file data)
  • opts.basesPerScale - optional, inverse of opts.scale e.g. bpPerPx
  • opts.signal - optional, an AbortSignal to halt processing

Returns a promise to an array of features. If an incorrect refName or no features are found the result is an empty array.

Example:

const feats = await bigwig.getFeatures('chr1', 0, 100)
// returns array of features with start, end, score
// coordinates on returned data are are 0-based half open
// no conversion to 1-based as in wig is done)
// note refseq is not returned on the object, it is clearly chr1 from the query though

Understanding scale and reductionLevel

Here is what the reductionLevel structure looks like in a file. The zoomLevel that is chosen is the first reductionLevel<2*opts.basesPerScale (or reductionLevel<2/opts.scale) when scanning backwards through this list

  [ { reductionLevel: 40, ... },
    { reductionLevel: 160, ... },
    { reductionLevel: 640, ... },
    { reductionLevel: 2560, ... },
    { reductionLevel: 10240, ... },
    { reductionLevel: 40960, ... },
    { reductionLevel: 163840, ... } ]

getFeatureStream(refName, start, end, opts)

Same as getFeatures but returns an RxJS observable stream, useful for very large queries

const observer = await bigwig.getFeatureStream('chr1', 0, 100)
observer.subscribe(
  chunk => {
    /* chunk contains array of features with start, end, score */
  },
  error => {
    /* process error */
  },
  () => {
    /* completed */
  },
)

BigBed

getFeatures(refName, start, end, opts)

  • refName - a name of a chromosome in the file
  • start - a 0-based half open start coordinate
  • end - a 0-based half open end coordinate
  • opts.signal - optional, an AbortSignal to halt processing

returns a promise to an array of features. no concept of zoom levels is used with bigwig data

getFeatureStream(refName, start, end, opts)

Similar to BigWig, returns an RxJS observable for a observable stream

searchExtraIndex(name, opts)

Specific, to bigbed files, this method searches the bigBed "extra indexes", there can be multiple indexes e.g. for the gene ID and gene name columns. See the usage of -extraIndex in bedToBigBed here https://genome.ucsc.edu/goldenpath/help/bigBed.html

This function accepts two arguments

  • name: a string to search for in the BigBed extra indices
  • opts: an object that can optionally contain opts.signal, an abort signal

Returns a Promise to an array of Features, with an extra field indicating the field that was matched

How to parse BigBed results

The BigBed line contents are returned as a raw text line e.g. {start: 0, end:100, rest: "ENST00000456328.2\t1000\t..."} where "rest" contains tab delimited text for the fields from 4 and on in the BED format. Since BED files from BigBed format often come with autoSql (a description of all the columns) it can be useful to parse it with BED parser that can handle autoSql. The rest line can be parsed by the @gmod/bed module, which is not by default integrated with this module, but can be combined with it as follows

import { BigBed } from '@gmod/bbi'
import BED from '@gmod/bed'

const ti = new BigBed({
  filehandle: new LocalFile(require.resolve('./data/hg18.bb')),
})
const { autoSql } = await ti.getHeader()
const feats = await ti.getFeatures('chr7', 0, 100000)
const parser = new BED({ autoSql })
const lines = feats.map(f => {
  const { start, end, rest, uniqueId } = f
  return parser.parseLine(`chr7\t${start}\t${end}\t${rest}`, { uniqueId })
})
// @gmod/bbi returns features with {uniqueId, start, end, rest}
// we reconstitute this as a line for @gmod/bed with a template string
// note: the uniqueId is based on file offsets and helps to deduplicate exact feature copies if they exist

Features before parsing with @gmod/bed:

{
  "chromId": 0,
  "start": 64068,
  "end": 64107,
  "rest": "uc003sil.1\t0\t-\t64068\t64068\t255,0,0\t.\tDQ584609",
  "uniqueId": "bb-171"
}

Features after parsing with @gmod/bed:

{
  "uniqueId": "bb-0",
  "chrom": "chr7",
  "chromStart": 54028,
  "chromEnd": 73584,
  "name": "uc003sii.2",
  "score": 0,
  "strand": -1,
  "thickStart": 54028,
  "thickEnd": 54028,
  "reserved": "255,0,0",
  "spID": "AL137655"
}

Academic Use

This package was written with funding from the NHGRI as part of the JBrowse project. If you use it in an academic project that you publish, please cite the most recent JBrowse paper, which will be linked from jbrowse.org.

License

MIT © Colin Diesh

bbi-js's People

Contributors

cmdcolin avatar dependabot[bot] avatar garrettjstevens avatar rbuels avatar sehilyi avatar skinner avatar tuner avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

bbi-js's Issues

pre-compile parser objects for performance

in RequestWorker, at least, I noticed that there are a lot of de-novo constructions of Parser objects. these are expensive, because they compile javascript every time they are created.

could probably get a significant performance boost if these were compiled globally and reused.

Error while parsing a custom-format bigbed file

Hi,

I am getting an error when trying to parse a custom bigbed file that contains 21 columns (bed4+17). I am running bbi-js thus:

import  { BigBed } from '@gmod/bbi';
import { LocalFile } from 'generic-filehandle';

const test = async () => {
  try {
    const file = new BigBed({
      filehandle: new LocalFile(require.resolve('<path-to-the-bigbed-file>')),
    });
    const { autoSql } = await file.getHeader(); // <-- this errors
  } catch (error) {
    console.log('error', error);
  }
};

The error message is: RangeError: Offset is outside the bounds of the DataView

I could trace the code execution down to this line in bbi.ts

const sum = ret.totalSummaryParser.parse(tail)

image

but then it calls the binary parser, at which point I get lost.

I am providing the problematic bigbed, as well as the source files from which the bigbed was generated. I am not sure whether the problem is with the bigbed file itself or with the bbi-js library; but would like to point out that the bigbed file can be parsed back into bed by the bigbedToBed tool, and can also be read by the pyBigWig python library.

Attachments:

  • bigbed.zip — the bigbed that causes the error
  • input.zip — an archived directory with source files from which the bigbed was generated by running the command bedToBigBed -as=transcripts.as -type=bed4+17 transcripts.bed chrom.sizes transcripts.bb

TypeError: this.ranges.map is not a function

Hi,

I'm loading a BigWig file and when calling this method as such:

bbi.getFeatures('chr7', 43110697, 43295697, {scale: 0.25, signal: {aborted: false, reason: undefined, onabort: null}})

It throws the following error:

Unhandled Runtime Error
TypeError: this.ranges.map is not a function

Call Stack
Range.getRanges
node_modules/@gmod/bbi/esm/range.js (40:0)
Range.union
node_modules/@gmod/bbi/esm/range.js (46:0)
cirFobRecur
node_modules/@gmod/bbi/esm/block-view.js (215:0)
cirFobRecur2
node_modules/@gmod/bbi/esm/block-view.js (175:0)
cirFobStartFetch
node_modules/@gmod/bbi/esm/block-view.js (195:0)

I believe it's because the union method creates a new Range with an array as only parameter:

[
    {
        "ranges": [
            {
                "min": 44784262,
                "max": 44792458
            }
        ]
    },
    {
        "ranges": [
            {
                "min": 44808850,
                "max": 44817046
            }
        ]
    }
]

And then in the constructor 0 in arg1 is true and an object is created instead of an array:

{
    "0": {
        "ranges": [
            {
                "min": 44784262,
                "max": 44792458
            }
        ]
    },
    "1": {
        "ranges": [
            {
                "min": 44808850,
                "max": 44817046
            }
        ]
    }
}

Finally when getRanges is called, it tries to call map on an object unsuccessfully.

I'll keep investigating.

Please let me know if you need more information to reproduce this issue.

need more doc in readme

What is returned from getHeaders?

What options are available in the object where scale is being passed?

How would a person that does not use webpack use this module?

Guaranteed order of returned results

I haven't seen this yet but it seems like it could hypothetically break tests in some case if the test is not careful. If this is a desired feature please provide feedback. Otherwise it should just be understood that results are returned as they come in

Unhandled Rejection (TypeError): fsOpen is not a function

Hi Dear bbi-js maintainer,

I am trying this package with a new react project, I am getting Unhandled Rejection (TypeError): fsOpen is not a function error. my code just copied from the example code:

import {BigWig} from '@gmod/bbi';

class App extends Component {

  getData = async () => {
    const ti = new BigWig({
      path: 'TW463_20-5-bonemarrow_MeDIP.bigWig'
    })
    await ti.getHeader()
    const feats = await ti.getFeatures('chr6', 50000, 50100, { scale: 1 });
    return feats;
  }
  render() {
    const data  = this.getData();
    console.log(data);
    return (
      <div className="App">
      </div>
    );
  }
}

export default App;

image

could you please help me fix this? thanks.

RangeError: Trying to access beyond buffer length

Hi Colin,

Thank for the update with generic-filehandler.
I see the url in constructor works nicely.
I am getting a new error while access a bigwig file in url, the error:

image

my bw file and was trying a region 'chr7',27053398,27373766

code is just the example code.
Thanks.

Fix in browser access of files

Currently produces errors

Could be due to zlib, or async race conditions, or maybe something different

See test_in_browser branch for karma test

Example doc?

Hi guys, maybe I'm just dense but I can't figure out how to get an example running. The snippet in the readme uses ES6 imports, but I don't see any ES6 modules after doing NPM install. So it fails right away.

After running "npm install @gmod/bbi I put the code below in a file "test.js" and run

node test.js

I get this error its complaining about the import

(function (exports, require, module, __filename, __dirname) { import {BigWig} from '@gmod/bbi'
                                                                     ^

SyntaxError: Unexpected token {
    at new Script (vm.js:79:7)
    at createScript (vm.js:251:10)

What am I missing here?

import {BigWig} from '@gmod/bbi'
const ti = new BigWig({
  path: 'volvox.bw'
})
await ti.getHeader()
const feats = await ti.getFeatures('chr1', 0, 100, { scale: 1 })

Improve caching behavior

Hi,

I'm loading BigWig data for the whole genome, i.e., calling getFeatures for each chromosome. This results in many fetch requests, as expected.

GC

However, the number of requests seems to be excessive (76), and many of them hit exactly the same range. For example, in the above case (https://genomespy.app/docs/grammar/data/lazy/#example_1), there are 25 requests hitting the same 49 byte range and other 25 requests hitting a same 8197 range. Because web browsers seem to be very bad at caching partial content, this results in quite a bit of latency.

There appears to be a caching mechanism in BlockView, but a new BlockView (and cache) is created for each getFeatures call.

private featureCache = new AbortablePromiseCache<ReadData, Buffer>({

Instead of having a new cache for each BlockView, could there be a single shared cache in the BBI class, which could be used by all BlockViews? In my example case, the number of requests would drop from 76 to 28.

I could make a PR at some point if this change is feasible and would not cause any undesired adverse effects.

Proposal: Use async iterator instead of observable

would potentially simplify the API, or at least, some people find the RxJs terminology sort of foreign and this would be native js concepts. note that async iteration is sort of a bleeding edge thing and it also needs to be done carefully.

I tried doing this a little naively but just made a mess of trying to thread generator functions all the way through the code. I think probably that might not be necessary for every function to be a generator, it just needs to generate the promises necessary and stream them in at one location

Request: iteration function without allocation

Hey,

So for performance reasons would it be possible to implement an API that doesn't allocate an object for each entry? It could look something like this:

async function fillBuffer() {
  const ti = new BigWig({ path: 'volvox.bw' })
  const header = await ti.getHeader()
  const length = header.refsByNumber[0].length
  const buffer = Buffer.allocate(length)
  await ti.iterate('chr1', 0, length, { scale: 1 }, (start, end, score) => {
    for (let position = start; position < end; position++)
      buffer.writeFloatLE(score, position * 4)
  })
  return buffer
}

Version parsers

The majority of bigwig/bigbed file format spec include the supplementary info which is https://www.ncbi.nlm.nih.gov/pubmed/20639541 and existing parser implementations

The supplementary info doc describes version 3 of the file format

The stats field only exists in 3 and not 2, but I have not seen any version 2 files that have caused this issue

The current bigwig version that seems widespread is now version 4, and I am not sure what changes were made in this version

ESM export is commonjs

The only difference between esm and cjs exports seem to be the TS target (es2018 & es5 respectively), which differentiate which JS features can be used (or need to be polyfilled). As a library, I'd recommend shipping es2018 for both ESM and CJS since most features are compatible out of the box with browsers and Node. Then, differing which module system is used via the module TS option.

❯ cat node_modules/@gmod/bbi/package.json | jq '.version'
"1.0.34"
❯ cat node_modules/@gmod/bbi/esm/index.js
"use strict";
Object.defineProperty(exports, "__esModule", { value: true });
exports.BigBed = exports.BigWig = void 0;
var bigwig_1 = require("./bigwig");
Object.defineProperty(exports, "BigWig", { enumerable: true, get: function () { return bigwig_1.BigWig; } });
var bigbed_1 = require("./bigbed");
Object.defineProperty(exports, "BigBed", { enumerable: true, get: function () { return bigbed_1.BigBed; } });

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.