Giter VIP home page Giter VIP logo

signalstore's Introduction

README

SignalStore is a Python package for storing and retrieving large labeled data arrays using MongoDB and xarray compatible formats such as netCDF4 and Zarr. It also has a framework for accomodating data adapters for arbitrary formats such as PyTorch model weights, NumPy arrays, and pandas DataFrames. It is designed to be used as a cloud-agnostic common library for dockerized data analysis micro-services. To use the package, you need a MongoDB client and a fsspec FileSystem object (compatible with all local filesystems, Google Cloud, Amazon Web Services, etc.).

Installation

Use pip to install the package:

pip install signalstore

Usage

The main class in SignalStore is the UnitOfWork. You use the UnitOfWorkProvider to create instances of the UnitOfWork. The UnitOfWork is used to store and retrieve data in a safe manner.

from signalstore import UnitOfWorkProvider
import fsspec
from pymongo import MongoClient

og_filesystem = fsspec.filesystem('file')
root = '.'
filesystem = fsspec.DirFileSystem(root, og_filesystem)
client = MongoClient('localhost', 27017)

memory_store = dict()


uow_provider = UnitOfWorkProvider(
    mongo_client=client,
    filesystem=filesystem,
    memory_store=memorystore
)

input_dir = 'path/to/my/input/data

with uow_provider('myproject') as uow:
    data_glob = f'{input_dir}/*.nc'
    for data_file in filesystem.glob(data_glob):
        dataarray = xr.open_dataarray(data_file)
        uow.data.add(dataarray)
    # before the commit, everything will be rolled back if an exception is raised
    uow.commit()

with uow_provider('myproject') as uow:
    session_ref = {
        'schema_ref': 'session',
        'data_name': '2024-04-28-AM-Animal1'
            }
    query = {'session_ref': session_ref}
    metadata = uow.data.find(query) # find all data from specified recording session
    # load the data-arrays from found meta-data
    for record in metadata:
        dataarray = uow.data.get(
            schema_ref=record['schema_ref'],
            data_name=record['data_name'],
        )
        print(dataarray)

Caveats

1D data (e.g. spike labels or spike times) have to be saved as 2D with an extra dimension e.g. (index, 1) This is because of the xarray function "is_list_of_strings" that requires the extra dimension 1D data will be encoded as 2D with the extra dimension termed "1"

MongoDB stores datetime objects as UTC, so when you query for a datetime object, you need to convert it to UTC first.

MongoDB stores datetime objects at millisecond precision. You can use a more precise datetime object for queries, but it will be truncated to millisecond precision. You will not get an exact match if you assert that the original datetime object is equal to the one stored in the database. For speed, the filesystem stores time_of_removal and version_timestamp as microsecond precision integers in filenames, but the metadata in MongoDB is stored as millisecond precision datetime objects. Use full precision for queries to get the right results, but beware that adding to MongoDB and getting back from MongoDB will truncate the precision to milliseconds.

signalstore's People

Contributors

aaoun00 avatar olivershetler avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.