Giter VIP home page Giter VIP logo

biojs-io-biom's People

Contributors

dependabot[bot] avatar iimog avatar nterhoeven avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

Forkers

nterhoeven

biojs-io-biom's Issues

Enhance the API with more functionality

As referees of our f1000 article note:

The API provided by BioJS is minimal. Notably, methods for partitioning, collapsing, transforming, filtering and subsampling are not present. While developers will be able to access sample or observation profiles as a whole, the current release of BioJS pushes much of the common manipulation logic onto the consumer of the library.

McDonald D and Bolyen E. Referee Report For: biojs-io-biom, a BioJS component for handling data in Biological Observation Matrix (BIOM) format [version 1; referees: 1 approved with reservations]. F1000Research 2016, 5:2348 (doi: 10.5256/f1000research.10362.r16546)

This can be changed by implementing the missing functions in this module:

  • partitioning
  • collapsing
  • transforming
  • filtering
  • subsampling

Improve introduction (better cover historical context)

As a referee to our f1000 article notes:

There is a historical context that Ankenbrand et al. miss in discussing biom-format and subsequently imply that the biom-format is more widely adopted than being field specific format. If the authors leave the introduction more general, then I would suggest they include more background on the history of high-throughput data storage and reproducibility in programmatic languages, perhaps starting with the Minimum Information About a Microarray Experiment - MIAME format [1] and exprSet classes developed in R about 15 years ago before the genomics standards consortium (formed in 2005), for which biom-format is a member.

Paulson J. Referee Report For: biojs-io-biom, a BioJS component for handling data in Biological Observation Matrix (BIOM) format [version 1; referees: 3 approved with reservations]. F1000Research 2016, 5:2348 (doi: 10.5256/f1000research.10362.r16545)

[1] Brazma A, Hingamp P, Quackenbush J, Sherlock G, Spellman P, Stoeckert C, Aach J, Ansorge W, Ball CA, Causton HC, Gaasterland T, Glenisson P, Holstege FC, Kim IF, Markowitz V, Matese JC, Parkinson H, Robinson A, Sarkans U, Schulze-Kremer S, Stewart J, Taylor R, Vilo J, Vingron M: Minimum information about a microarray experiment (MIAME)-toward standards for microarray data.Nat Genet. 2001; 29 (4): 365-71

test different nodejs version

During reproducing issue #30, I ran into problems with an old nodejs version. We should adapt the automated tests to different nodejs versions and probably require a recent one.

As seen below, the default nodejs version in Ubuntu 14.04 is 0.10.25. The current version 6.9.1 can be obtained via a third-party repository.

$ apt-cache policy nodejs
nodejs:
  Installed: 6.9.1-1nodesource1~trusty1
  Candidate: 6.9.1-1nodesource1~trusty1
  Version table:
 *** 6.9.1-1nodesource1~trusty1 0
        500 https://deb.nodesource.com/node_6.x/ trusty/main amd64 Packages
        100 /var/lib/dpkg/status
     0.10.25~dfsg2-2ubuntu1 0
        500 http://de.archive.ubuntu.com/ubuntu/ trusty/universe amd64 Packages

Update data when rows or columns are changed

When the user sets rows or columns data has to be updated. Use the id to determine which row/column has been present before, preserve data from those. Previously missing ids are added as empty row/column (all 0). Previously existing, now missing ids are deleted. So this can also be used to reorder the data matrix. All actions have to be performed independent of sparse or dense data representation.

Discuss other biom conversion servers (galaxy)

As a referee to our f1000 article notes:

There are other BIOM conversion servers that exist, e.g. implementations within the Galaxy framework - see https://toolshed.g2.bx.psu.edu/repository/display_tool?repository_id=b3ae8ca9317b000e&render_repository_actions_for=tool_shed&tool_config=%2Fsrv%2Ftoolshed%2Fmain%2Fvar%2Fdata%2Frepos%2F002%2Frepo_2436%2Fbiom_convert.xml&changeset_revision=501c21cce614 - these alternate tools should be mentioned in the text. How does the biom-conversion-server compare with (and potentially improve on) such Galaxy based tools?

Bik H. Referee Report For: biojs-io-biom, a BioJS component for handling data in Biological Observation Matrix (BIOM) format [version 1; referees: 2 approved with reservations]. F1000Research 2016, 5:2348
(doi: 10.5256/f1000research.10362.r16436)

Add representation agnostic getter/setter for data

There should be ways to access data points, columns and rows in a way that is agnostic to the data representation. That means no matter if the matrix_type is sparse or dense they should create the same results. Suggestion:

  • getDataAt(rowID, colID) returns a single value e.g. 17
  • getDataRow(rowID) returns a row e.g. [0,0,17,0]
  • getDataCol(colID) returns a col e.g. [0,17,3,2]
  • getDataMatrix() returns a dense matrix e.g. [[0,1,0],[1,1,0],[0,0,1]]

Add links to public landing pages

As a referee to our f1000 article notes:

Please list the public landing page for the applications mentioned in the text (in case users want to access these tools directly) - e.g. https://biomcs.iimog.org

Bik H. Referee Report For: biojs-io-biom, a BioJS component for handling data in Biological Observation Matrix (BIOM) format [version 1; referees: 2 approved with reservations]. F1000Research 2016, 5:2348
(doi: 10.5256/f1000research.10362.r16436)

datatables pseudo-serverside getters

A function that simulates a server side datatables request. It takes an object of parameters as generated by datatables. It returns the requested slice of data while adhering to sorting, pagination, etc.
This is necessary as data can not directly be used in this way:

var biom = new Biom({/*biom object containing data*/});
var dt_data = biom.data;

One problem is that data can be in sparse or dense format. Datatables needs dense format. Converting might not be convenient for the user. The second problem is that there can be huge amounts of data and datatables will be very slow if confronted with the full dataset. The way around this is using server side data where only the required data (and information about the total amount) is available and displayed. The required slice of data can be calculated quickly and returned by this function.

Check columns and rows

When the user sets columns or rows they have to be arrays of objects with the keys id and metadata. The id has to be unique in it's array. Check this and throw an Error if it is not the case.

Related to #1

Discuss HDF5 capabilities of JavaScript

As a referee to our f1000 article notes:

The authors posit that the BIOM format version 2 / 2.1 that moved to HDF5 made it impossible for javascript libraries to manipulate it natively. We found a javascript library that “takes advantage of the compatibility of V8 and HDF5”. Were the authors unable to build from this library to take advantage of the version 2 BIOM format? The BIOM version 2 / 2.1 formats were designed specifically to handle many of the shortcomings of the version 1 in terms of memory and design. It would be advantageous of the users to build from this if possible to at least read in the BIOM v2.1 HDF5 files.

Paulson J. Referee Report For: biojs-io-biom, a BioJS component for handling data in Biological Observation Matrix (BIOM) format [version 1; referees: 3 approved with reservations]. F1000Research 2016, 5:2348 (doi: 10.5256/f1000research.10362.r16545)

Provide details of Blackbird biom-conversion-server hosting

As a referee to our f1000 article notes:

Can you please provide details on how and where the "Blackbird" instance and biom-conversion-server are currently hosted (e.g. Amazon AWS)?

Bik H. Referee Report For: biojs-io-biom, a BioJS component for handling data in Biological Observation Matrix (BIOM) format [version 1; referees: 2 approved with reservations]. F1000Research 2016, 5:2348
(doi: 10.5256/f1000research.10362.r16436)

Clarify second sentence

As a referee to our f1000 article notes:

The second sentence needs clarification. “Despite this increase, for many of these studies the general basic layout of the data is similar to traditional assessment after bioinformatical processing, yet complications arise due to the increased size of the data tables.”

Paulson J. Referee Report For: biojs-io-biom, a BioJS component for handling data in Biological Observation Matrix (BIOM) format [version 1; referees: 3 approved with reservations]. F1000Research 2016, 5:2348 (doi: 10.5256/f1000research.10362.r16545)

Validate data on set

Data has to be concordant with shape (given via rows and columns) as well as matrix_typ.
If the user trys to set data that has wrong dimensions or matrix_type (i.e. sparse instead of dense or vice-versa) an Error should be thrown.
This check should also be performed on construction.

This is related to #1

Implement more validation checks

The types of biom attributes are checked in the setter functions. TypeErrors are thrown if there is a mismatch. As the constructor uses the setters types are also checked in the constructor. But there are other possible inconsistencies that are not yet checked:

  • keep shape correct by calculating dynamically (read only)
  • keep matrix_type correct by updating data on set
  • all elements of rows and columns are objects containing (unique) id and metadata
  • keep data correct by updating when rows or columns are set
  • keep data correct by throwing an Error if it is set incorrectly
  • format_url is really an url and not only a string
  • date is really date time in ISO 8601 format

Revert Blackbird branding

As a referee to our f1000 article notes:

Since this project is based on the Phinch framework, I find the "Blackbird" rebranding of the fork to be very problematic. The "Blackbird" instance is really just an updated release of the Phinch framework, with some bug fixes, added features, and implementation of the new BIOM conversion server. The rebranding/renaming is confusing for the end user (see comment by other peer reviewer below), and mistakenly implies a number of scenarios that are not accurate: 1) that the authors were involved in the original development of data visualization tools, 2) that the Blackbird rebranding and design changes were approved from by the original developers, and 3) the "Blackbird" project represents a significant expansion or retooling of the current Phinch framework. I’m fully aware that this is open source software and the authors are free to reuse and share the Phinch codebase, but I don't really see the utility of the "Blackbird" rebranding, and creating an additional web instance that mostly replicates the functionality of http://phinch.org will confuse end users.

Since the authors here are really community contributors to the original Phinch project, I would recommend eliminating the "Blackbird" rebranding of the project, and reverting back to Phinch branding (citing the framework release as Phinch v2.0). We will then initiate a pull request to update the bug fixes and integrate the new biojs-io-biom source code to be live on http://phinch.org The visual layout for Phinch (name, logo and visualization layout) was thoughtfully constructed, and the new Blackbird logo and visual modifications will likely interfere with “brand recognition” that should be attributed to the original Phinch framework.

Once this pull request is initiated and completed, the “Application” manuscript text should be updated to reflect the live implementation of the conversion library on a v2.0 Phinch framework at phinch.org.

Bik H. Referee Report For: biojs-io-biom, a BioJS component for handling data in Biological Observation Matrix (BIOM) format [version 1; referees: 2 approved with reservations]. F1000Research 2016, 5:2348
(doi: 10.5256/f1000research.10362.r16436)

As discussed here: PitchInteractiveInc/Phinch#63
the re-naming to Blackbird will be reverted and changes will be integrated into the official Phinch instance.

Clarify position with Blackbird

As referees of our f1000 article note:

The highlight with Blackbird is great to see but we were confused by the intention of the Github fork. The codebase suggests that it is more than just a proof of concept to highlight BioJS as there is project-specific branding. Would the authors consider clarifying their position with Blackbird?

McDonald D and Bolyen E. Referee Report For: biojs-io-biom, a BioJS component for handling data in Biological Observation Matrix (BIOM) format [version 1; referees: 1 approved with reservations]. F1000Research 2016, 5:2348 (doi: 10.5256/f1000research.10362.r16546)

Fix citation for BIOM R package

As a referee to our f1000 article notes:

The citation for the BIOM interface R package has been deprecated. The appropriate citation is: Paul J. McMurdie and Joseph N Paulson (2015). biomformat: An interface package for the BIOM file format. R/Bioconductor package version 1.0.0.

Paulson J. Referee Report For: biojs-io-biom, a BioJS component for handling data in Biological Observation Matrix (BIOM) format [version 1; referees: 3 approved with reservations]. F1000Research 2016, 5:2348 (doi: 10.5256/f1000research.10362.r16545)

Store sparse data more efficiently

As outlined in #17 there are more efficient ways to store sparse data than the dict of keys used by this module at the moment. For example specialized data structures such as compressed sparse row or column.

JSON is not suited for huge biom files

As referees of our f1000 article note:

The primary motivator for the development of BIOM-format 2.1.0 were scaling limitations inherent with the JSON-based representation of 1.0.0. Specifically, the “data” key of the JSON string must be parsed in full in order to random access to individual sample or observation data. This removes the possibility of algorithms which depend on efficient random access patterns for data too large for main memory. Additionally, the overhead associated with representing a large JSON object in memory is high. While we acknowledge HDF5 possesses challenges for web-based interaction with these data, it is important to note that the 1.0.0 JSON-based format is not recommended for modern sized studies using hundreds to thousands to tens of thousands of samples.

McDonald D and Bolyen E. Referee Report For: biojs-io-biom, a BioJS component for handling data in Biological Observation Matrix (BIOM) format [version 1; referees: 1 approved with reservations]. F1000Research 2016, 5:2348 (doi: 10.5256/f1000research.10362.r16546)

Add empty metadata object to rows/columns if missing

Some tools do not create an empty metadata object for a column/row if there is no metadata. However this is not concordant with the specification, which requires a metadata object for every row/column.
This can be easily fixed in the constructor/setter of rows/columns.

Fix references to BIOM (add minor version number)

As referees of our f1000 article note:

When the authors refer to BIOM v2, we believe they are actually referring to BIOM v2.1.0. There are important distinctions between the format versions. Would the authors consider clarifying the minor version number in discussion?

McDonald D and Bolyen E. Referee Report For: biojs-io-biom, a BioJS component for handling data in Biological Observation Matrix (BIOM) format [version 1; referees: 1 approved with reservations]. F1000Research 2016, 5:2348 (doi: 10.5256/f1000research.10362.r16546)

Extend biom-conversion-server

As referees of our f1000 article note:

The use of the conversion server is very cool and could be taken a step further by layering a light communication API on top to allow a client to request arbitrary samples. This separation would remove the burden of the client needing to read HDF5 formatted files, greatly lower the memory footprint of the client, and likely be more performant than a pure client-side model as the client would only need to know about what it had requested. This expansion of biojs-io-biom, in our opinion, would have the greatest impact for expanding the use of BIOM formatted data within a web application.

McDonald D and Bolyen E. Referee Report For: biojs-io-biom, a BioJS component for handling data in Biological Observation Matrix (BIOM) format [version 1; referees: 1 approved with reservations]. F1000Research 2016, 5:2348 (doi: 10.5256/f1000research.10362.r16546)

Improve installation guide

As a referee to our f1000 article notes:

In my own installation of the software, I keep getting error messages when I attempt to create a biom object, see here: http://tinyurl.com/f1000-review. If the reviewers could please clarify the installation guide on the github repo.

Paulson J. Referee Report For: biojs-io-biom, a BioJS component for handling data in Biological Observation Matrix (BIOM) format [version 1; referees: 3 approved with reservations]. F1000Research 2016, 5:2348 (doi: 10.5256/f1000research.10362.r16545)

Add getter for nnz

nnz is a property introduced in biom version 2 that denotes the number of non-zero elements. This can be calculated from data. A setter does not make much sense so only a getter is required, setter should fail.

Fix "accession functions" -> "accessor functions"

As referees of our f1000 article note:

The two uses of “accession functions” reads awkwardly as these types of methods are generally described as “accessor functions.” Would the authors consider revising the phrasing?

McDonald D and Bolyen E. Referee Report For: biojs-io-biom, a BioJS component for handling data in Biological Observation Matrix (BIOM) format [version 1; referees: 1 approved with reservations]. F1000Research 2016, 5:2348 (doi: 10.5256/f1000research.10362.r16546)

Fix biom-conversion-server for conversion from 1.x to 2.x

As a referee to our f1000 article notes:

The biom-conversion-server does not appear to be backwards compatible (I could not upload and convert a BIOM 1.x file to 2.x format) - this one-way conversion functionality is should be clearly indicated in the first paragraph of the “Application” section. In addition, if users try to upload a BIOM 1.0 file they should be presented with an appropriate error message (I didn’t see one - the tool just froze when I attempted to upload a BIOM 1.0 file).

Bik H. Referee Report For: biojs-io-biom, a BioJS component for handling data in Biological Observation Matrix (BIOM) format [version 1; referees: 2 approved with reservations]. F1000Research 2016, 5:2348
(doi: 10.5256/f1000research.10362.r16436)

Efficient representation of sparse data

As referees of our f1000 article note:

The in memory representation of the data following parse by BioJS are either in a dense matrix, or in a dict of keys style sparse representation. As the authors note, specialized methods will need to be created to handle large data efficiently, however the authors may wish to consider placing emphasis instead on specialized data structures such as compressed sparse row or column.

McDonald D and Bolyen E. Referee Report For: biojs-io-biom, a BioJS component for handling data in Biological Observation Matrix (BIOM) format [version 1; referees: 1 approved with reservations]. F1000Research 2016, 5:2348 (doi: 10.5256/f1000research.10362.r16546)

This is a very good point. Right now we only use the original sparse or dense representation as it is defined for the biom version 1.0 json. But depending on the input data a lot of memory can be saved by using specialized data structures to internally store the biom object on parse. It can then be transformed back to the json representation when write is called.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.