Giter VIP home page Giter VIP logo

neuro's Issues

[GOAL] Subcoordinates stacked along Y-axis with 'group' support

Summary and Links

  • channel-type-grouping Support channel-type grouping with different sampling and amplitude range (normalization).

  • grouped-y-scaling: Manual scaling/zooming of Y axis per channel group

    • includes hover-based scroll-zooming per group
  • channel-group-yticks Switch y-ticks to group values when zoomed out enough that channel-based y-ticks are cluttered.. So a zoomed out view might be something like the following y-ticks (instead of have a tick per row):
    image

    • TODO: create issue for this

Multi-scale Large Image Volume workflow

⚠️ This issue is related to #87. There, the focus was on handling multi-scale data in the time dimension. In contrast, this issue is focused on multi-scaling volumetric images (x,y,z).

Problem:

See #87

Description/Solution/Goals:

See #87 for general motivation. In contrast, the goal of this current issue is to focus on multi-scale large image volumes, rather than downscaling in the time dimension.

Potential Methods and Tools to Leverage:

See #87
Also:

  • ipyvolume
  • VTK.js, also see Panel's VTK components: VTK and VTKVolume
  • Neuroglancer + Cloudvolume + Igneous stack
    • neuroglancer: WebGL-based viewer for volumetric data
      • works with several data sources. See their info about working with zarr and in-memory Python stuff.
    • cloudvolume - Python interface of neuroglancer precomputed data format.
    • Igneous - Python pipeline for scalable meshing, skeletonizing, downsampling, and managment of large 3d images focusing on Neuroglancer Precomputed format.

Tasks:

  1. Evaluate and determine whether to adopt/adapt any aspects of the Neuroglancer + Cloudvolume + Igneous stack.
  2. Build a POC example visualizing a medium (multi-GB) multi-scale image volume from local storage
  3. Build a POC example visualizing a multi-scale image volume from cloud storage

Use-Cases, Starter Viz Code, and Datasets:

Electron Microscopy (EM):

[GOAL] Minimap

  • works with subcoordinate_y in target plot
  • works with dynamic map for target plot
  • works with a rasterized image in source plot
  • box select
  • inversion of highlight (darken unselected region)
  • handle glyphs on edges
  • scroll zoom to expand/contract when cursor is in range

GOAL: Large-data-handling

UPDATE: This initiative has been superseded by other, more targeted efforts

Summary and Links

  • large-data-handling (lead: ): Develop a first pass solution for the various workflow types
  • Large data handling meeting notes
  • Below, the phases are prioritized into domain sections, going ephys > imaging > eeg. This is because:
    • Ephys data has a very high sampling rate (30KHz) and (in the last few years) many channels (>100), so large datasets are ubiquitous.
    • Imaging data also gets large pretty quickly and dealing with larger datasets has been communicated as the primary pain point for miniscope users in the Minian pipeline.
    • There are already feature-rich browser-based EEG viewers, but they cannot handle large datasets very well, so although EEG datasets are typically not as large as in ephys or imaging, it's still relevant.

Note.. each domain section below starts with some important 'Context'.

Task Planning:

Electrophysiology (Ephys)

**Context:**

While the continuous, raw data is streamed and viewed during data acquisition, it's not that critical to look at the full-band 30KHz version during processing/analysis. Instead, the raw-ish displays are of the low-pass filtered (<1000 Hz) continuous data (like a filtered version of the ephys viewer workflow), stacked 'spike raster' of action potential events (see spike raster workflow), and a view of the spike waveforms (see waveform workflow). These three workflows represent different challenges to large data handling and may require specific approaches.

Additionally, although there is a lot of heterogeneity in technique and equipment in electrophysiology, below we are focusing on the Allen Institute data is advantageous because they have a well-funded group maintaining their sdk, they utilize Neuropixel probes which are relatively high channel-count (and therefore represent a more difficult use case), and their data are available via the NWB 2.0 file format (fancy HDF5) which is becoming increasingly common in neuroscience. Demetris has some contacts with the Allen institute but we haven't yet engaged with them for feedback/collaboration; but this will happen once we have something to show them that is demonstrably better than their current approach. Additionally, we are collaborating with one of Jim's former colleagues, who works primarily with relatively smaller spike-time datasets (some real, some synthetic) and is primarily interested in spike-raster-type workflows, so the work below will benefit his group as well even though we will focus on Allen Institute data.

Ephys Phase 1: Understanding the ecosystem, problems, and foundations for the solution

  • Ecosystem Review: Read about the existing ecosystem on ephys data/viz, challenges, and commonly used tools. (Living notes here)
  • Research Common Workflows: Identify and establish the most common workflow(s) for accessing and visualizing Neuropixels data. This will be crucial for later benchmarking. (Relevant materials here)
  • Allen Institute Data Familiarization: Download and understand the structure of an NWB formatted Neuropixels dataset from the Allen Institute.
  • Review Pangeo Community Tools: Familiarize with the Pangeo community approach, emphasizing scalable and modular workflows.
  • Review Pandata SOSA Whitepaper: Read Jim and Martin's whitepaper to potentially align the project with their principles and suggestions when applicable.

Ephys Phase 2: Building an MVP

  • Data Conversion to Zarr: Convert a subset of the Allen Institute data from NWB (HDF5) format to Zarr for efficient chunking.
  • Integration with Dask and Xarray: Create a notebook and/or script demonstrating optimized workflow for data access, emphasizing the use of Dask for scalable computations and Xarray for labeled data structures, aligning with other the Pangeo/Pandata stack principles when applicable.
  • Basic Visualization: Incorporate basic visualization workflows into this pipeline using HoloViz with Bokeh backend (see spike raster and ephys viewer workflows - Demetris can continue to lead on this task).
  • User Feedback on MVP: Present the MVPs (spike raster, continuous traces) to a subset of potential users and gather feedback for refinement.
  • Refinement based on Feedback: Make necessary adjustments to the MVPs based on user feedback.

Ephys Phase 3: Benchmarking the MVP

  • Define Benchmark Metrics: Establish concrete metrics to evaluate performance, usability, and user experience. (See benchmarking notes)
  • Benchmark Against Common Workflows: Compare the MVPs against the previously identified common workflows for accessing and visualizing Neuropixels data.
  • Document Benchmark Results: Document the benchmark results, emphasizing areas where the MVPs clearly differ from existing solutions.

Ephys Phase 4: Advanced Visualization Techniques

  • Integrate Datashader: Add variant using Datashader for efficient rendering of large datasets.
  • Explore Decimation: Add variant using data decimation techniques like HoloView's LTTB.
  • Bokeh with WebGL: Add variant using Bokeh's WebGL backend.

Ephys Phase 5: Minimap/multi-scale

  • Evaluate xarray DataTree: Explore the xarray datatree for multi-scale rendering. (some exploratory notes here)
  • Leverage Minimap/RangeTool: Leverage our work on the minimap feature (Maybe the minimap could use a precomputed lower-res data rendering stored in the datatree to speed up initial display, or the minimap could be used to help ensure that up to a certain optimized chunk/area/channels/time is displayed at first).

PROBABLY SKIP THIS: Ephys Phase 6: Exploring Direct HDF5 Access with Kerchunk

  • Kerchunk Integration: Integrate Kerchunk to provide direct, chunked access to original HDF5 datasets.
  • Benchmark Kerchunk Performance: Compare the performance of accessing data via Kerchunk versus the Zarr data copy approach.

Ephys Phase 7: Adapt Progress to Waveform Workflow

  • Waveform Workflow: Although the initial focus should be on optimizing the ephys viewer and spike raster, the waveform workflow is also an important component and should be updated accordingly. As it displays thousands of overlapping and grouped lines, it may require an adapted approach compared to the other workflows.

1-Photon Calcium Imaging (1P-Imaging)

Primarily regarding the Miniscope device and associated Minian software

**Context:** The Minian work so far uses many SOSA tools (zarr, dask, xarray, holoviews, panel, bokeh, etc) which is great and we want to help improve their pipeline, especially since there are parts (like the CNMF app) that are reportedly unusable with large data. If we could make their pipeline streamlined, that would be a massive win for everyone. However, Demetris is trying to engage with the primary developer of Minian to see if they would consider accepting PR's (the project hasn't been updated since June 2022, old versions of most packages are pinned, and it doesn't have a build for osx_arm64), or else we'd need to find a solution that has visibility in the community, which gets more complicated. The developer is now working with a company called Metacell which is facilitating imaging analysis platforms, so this could either be an opportunity for accelerated adoption or something less good if we can't improve things and show that a bokeh-based workflow is the best approach. There is also some potentially competing/complementary solutions in the works from the fastplotlib folks, and they already have a collab going with the popular 2-Photon analysis suite 'CaImAn', which could potentially absorb 1-Photon workflows in the future (unless our solution and community support is demonstrably better).

1P-Imaging Phase 1: Understanding the ecosystem, problems, and foundations for the solution

1P-Imaging Phase 2: Building from the existing MVP

  • Explanation: Minian, as it currently exists, is the MVP that we want to benchmark and build from. Demetris has been building a more generalized version of a video viewer reminiscent of Minian's use case, for easier benchmarking and development but eventually improvement should feed back into the Minian ecosystem.
  • User Feedback on MVP: We have already met with Cai lab members about pain points on the existing Minian pipeline, and have the identified initial targets for improvement. These are the pipeline steps of the "CNMF" app and the "Temporal update" app in the Minian pipeline. While planning for addressing these pipeline steps, and if useful, experiment with the generalized video viewer workflow, which should be independent of Minian-specific machinery.
    • Here is a note about CNMF struggles from the meeting notes: "[user] doesn't use the CNMF viewer because it takes a long time to load, although it would be extremely helpful if it did work. It's the least used but could potentially be the most helpful. Right now they are setting parameters based on the first video chunk, then apply those parameters to the rest of the videos and inspect the max projection to see if those parameters worked. If they need to adjust the parameters, they would need to run the whole pipeline again."
    • Here is a note about the Temporal update step: "Temporal update plot in the Minian pipeline would also be really useful to improve. It's just a timeseries plot, but after some duration (~1 hr) it becomes unusable."
  • Refinement: Fork the Minian repo and make preliminary targeted adjustments to address the initial targets for improvement.

1P-Imaging Phase 3: Benchmarking the improvements

  • Define Benchmark Metrics: Establish concrete metrics to evaluate performance, usability, and user experience. (See benchmarking notes)
  • Benchmark original Minian Approach vs Adjusted Approach: Develop benchmark tests and compare the adjustments with the original.
  • Document Benchmark Results: Document the benchmark results, emphasizing areas where the adjustments clearly differ from the origin solution.
  • Benchmark and/or Document Adjusted Approach vs Competitors: Ideally, find a way to quantitatively compare the Adjusted Approach with something comparable being done with fastplotlib and napari. These are all very different tools and it will be difficult to directly compare apples to apples, but we want some indicator (while documenting caveats) of user experience between the tools/approaches.

EEG

Primarily regarding the MNE software

**Context:** The MNE software is well-maintained, documented, and widespread. We have established a friendly collaboration with one of their developers, and a successful end result is a HoloViz/Bokeh approach to EEG visualization that they advertise to their users. The extent of actual integration into the MNE software is yet to be determined, but one ('best') possible situation is that the HoloViz/Bokeh backend is shipped with their package so users can easily switch to it with an argument. The next best possible situation is that they advertise the HoloViz/Bokeh approach in some way, but it remains outside of their package. Either way, we want to fashion our solution such that it would be possible to integrate and complement their tooling. This has implications for the data-access approach, as we want to try to utilize their existing data readers and formats as much as possible. In the future, a possible grant extension could work with MNE developers to adopt a data-access approach that uses zarr, dask, xarray, etc if there was some hints that this approach would be more promising.

EEG Phase 1: Understanding the ecosystem, problems, and foundations for the solution

  • Ecosystem Review: Read about the existing ecosystem on ephys data/viz, challenges, and commonly used tools. (Living notes here)
  • Research Common Workflows: Identify and establish the most common workflow(s) for accessing and visualizing EEG data. This will be crucial for later benchmarking. (Relevant materials here)

EEG Phase 2: Benchmark the eeg viewer workflow version that uses MNE I/O

  • Define Benchmark Metrics: Establish concrete metrics to evaluate performance, usability, and user experience. (See benchmarking notes)
  • Benchmark Against Common Workflows: Compare the MVP against the previously identified common workflows for accessing and visualizing EEG data. This would involve benchmarking something about one or both of their current backends (although I doubt we'll be able to benchmark the qt-based backend).
  • Document Benchmark Results: Document the benchmark results, emphasizing areas where the MVP clearly outperforms or underperforms existing solutions.

EEG Phase 3: Advanced Visualization Techniques (common to Ephys)

  • Integrate Datashader: Add variant using Datashader for efficient rendering of large datasets
  • Explore Decimation: Add variant using data decimation techniques like HoloView's LTTB.
  • Bokeh with WebGL: Add variant using Bokeh's WebGL backend.

EEG Phase 4: Minimap/multi-scale (common to Ephys)

  • Evaluate xarray DataTree: Explore the xarray datatree for multi-scale rendering. (some exploratory notes here)
  • Leverage Minimap/RangeTool: Leverage our work on the minimap feature (Maybe the minimap could use a precomputed lower-res data rendering stored in the datatree to speed up initial display, or the minimap could be used to help ensure that up to a certain optimized chunk/area/channels/time is displayed at first).

[GOAL] Support Range Annotations

Summary and Links

  • annotation (lead: @hoxbro)
  • Use holonote for roi/annotation management
  • Potential extension: UI integration into the Bokeh toolbar.

Relevant Workflows

  • ephys viewer
  • eeg viewer
  • spike raster

Inspiration:

  • MNE timespan annotations.. Our 1D annotations should be able to cover the functionality on this tutorial.
  • CaImAn spatial annotation (contours) renderer
  • SpatialData package polygon annotations

Requirements

  • Add/remove individual annotations
    • Add
      • Programmatically
      • UI:
        • Drag inside the data-view to create annotations with the description currently selected, or If there is none selected, can add a new description for the yet uncategorized annotations.
        • editing the table
    • Remove
      • Programmatically
      • UI:
        • right-clicking on an individual annotation, or all selected, for an option to remove
        • editing the table
  • Edit individual annotation range
    • Programmatically
    • UI:
      • dragging them or their boundaries
      • editing the table (ideally with optional dial widgets in each value cell)
      • auto merge overlapping annotation ranges of same description
  • Add/remove/edit description (aka tag / label)
    • Edit the description of one single annotation or all annotations of the currently selected kind
    • Remove description removes all annotations of the input or selected description
    • ability to assign multiple descriptions to the same annotated range
      Styling
    • Auto-color-code any annotations based on description
  • Labels on annotations OR some way to easily derive a legend/color-key from the annotation table
    • image
      Behavior
  • Synchronized annotations across a HoloViews container (overlay, layout)
    • example: annotations need to be linked across eeg-viewer and its minimap
      • image
  • Synchronized annotations across plots in different notebook cells
  • Shift-select of multiple annotations (for batch actions)
  • Live-updating table UI of annotation - (ID, dimN-start, dimN-end, description(s))
    • Ideally as a click & type-editable tabulator-like pane
  • Often, the duration of annotations for a given description will be consistent (e.g. 1 second), so we need some way to create annotations just based on start/onset and a defined duration
  • Here is an example of how annotations are stored for MNE.. it contains original time then onset, duration, and description fields
# MNE-Annotations
# orig_time : 2009-08-12 16:15:00
# onset, duration, description
0.0,4.2,T0
4.2,4.1,T2
8.3,4.2,T0
12.5,4.1,T1
16.6,4.2,T0
20.8,4.1,T1
24.9,4.2,T0

[EPIC] Multi-channel timeseries viewer

Summary and Links

  • subcoordinates (lead: ): Stacked traces on Y axis sub-coordinates
  • The intention is to avoid having to add an offset to the data values in order plot stacked on the same canvas
  • Terminology

Relevant workflows

  • ephys viewer
  • eeg-viewer

Requirements

  • Stacked channels: Users should be able to vertically offset the start (left-most data point in viewport) of many timeseries at discretely spaced positions on the y-axis without having to change/copy the data.
  • Hover information: The hover tooltip should be able to display the channel name, time, and actual data value (not an offset-adjusted data value).
Hover information vid:
Screen.Recording.2023-08-15.at.3.46.40.PM.mov
  • Group-wise scaling of Y: When y-wheel-zoom is active and a user's cursor is above a trace ('current trace'), all the traces in the data group to which the current trace belongs should accordingly adjust their zoom/scale when actively zoom scrolling. Notably, the extent of the data should by default not be limited to a certain y-range and instead be allowed to overlap other channels to any extent.
Group-wise scaling of Y vid (with sound):
Screen.Recording.2023-08-15.at.3.51.52.PM.mov
  • Group-wise initial scale: The initial scale of each data group should be settable so that it is in a reasonable range for data visibility
  • (OPTIONAL/SKIP FOR NOW)Group-wise clipping: - above a given clipping threshold, the data should be not visible. This is sometimes used to hide large fluctuations while resolving small details without having extensive trace overlap.
Group-wise clipping vid (with sound):
Screen.Recording.2023-08-15.at.5.28.00.PM.mov

Important implementation accommodations

  • Scale bar: The stacked traces implementation should be amenable to having a scale bar per data group
  • minimap/rangetool: The stacked traces implementation should be amenable to be controlled by a minimap/rangetool plot/widget
  • Annotation: The stacked traces implementation should be amenable to having timeseries annotations (start, end, description) as colored/labeled vertical spans across the entire viewport
  • large data: The stacked traces implementation should be amenable to handling large amounts data and interaction with little latency.

[GOAL] Scale bar

Summary and Links

scale-bar (lead: @mattpap for Bokeh, @hoxbro for HoloViews)

Resulting Issues and PRs:

Relevant Workflows

  • Waveform
  • Stacked-timeseries
  • Video Viewer

Resources

Requirements

General

  • Compatibility: Ensure compatibility with different data types like lines, images, image stacks
  • Responsiveness: Must adapt to various screen sizes and resolutions.
  • Performance: Efficient rendering

Scale Bar Styling

  • Style Control: color, transparency, line width, line style, orientation, font styling, text alignment, padding, etc. Ideally the options should be similar in scope to matplotlib-scalebar.

Scale Reference

  • Size Control: Ability to specify a reference line size, e.g., "1 um" or "10 uV". And if we need separate scale bars for the x and y axes, they should be independently settable.
  • 2D Scale Bars: Support for separate scale bars for the x and y axes, with the ability to independently configure each axis, including size, unit, precision, and notation.
  • Interaction and Intersection of Scale Bars: Options to define how the x and y scale bars intersect and interact with each other. For example, users might want them to intersect at a right angle at a specific corner of the plot, or have them without intersection.
  • Precision and Notation: The decimal precision and notation should be configurable. This allows users to control how many decimal places are shown and whether to use standard or scientific (e.g. 1 × 10^10) notation, depending on the specific data and audience.
    - Unit Conversion: Support for various units and conversion between them, such as converting between millimeters and inches.

Interaction with Data

  • Zoom Integration: Scale bar should adjust dynamically based on the zoom level for both x and y axes, maintaining an accurate representation of scale. For example, zooming in on the x-axis might require the x-axis scale bar to adjust, while the y-axis scale bar remains the same.
  • (Low priority) Tooltip Information: Hovering over the scale bar could provide additional information

Position

  • Configurable Position: Scale bar position can be set and changed dynamically. For example, a user could move the scale bar to the top-right corner if it's obstructing important data.
  • Draggable: Option to allow users to drag and reposition the scale bar within the plot, offering flexibility in positioning.

Multiple Subplots and Coordinates

  • Multiple Scale Bars: Ability to place multiple scale bars on the same plot, with individual control over each.
  • Linked Control: Control over which data a particular scale bar is linked to when multiple subplots/subcoordinates are on a canvas. For example, in a plot with multiple types of timeseries, one scale bar may correspond only to the EEG data and another to the MEG data. See image:
Image:

image

  • Independent Styling: Each scale bar can have independent styling properties, allowing for different colors or fonts within the same plot.
  • Dynamic Visibility: Visibility can be made dynamic in response to interaction. For example, the scale bar might for one group of subplots may disappear when panning beyond their visible range. See video:
Video:
Screen.Recording.2023-08-11.at.3.32.13.PM.mov

GOAL: Benchmark workflows

Summary and Links

  • benchmark: Benchmark speed of initial display and interaction (zoom, pan)

Key Benchmarking Metrics:

  1. Latency to initial display (of anything useful)
  2. Latency for interaction updates (primarily pan and zoom)
    • How much zoom/pan should we test?
      • I think this depends on the modality/workflow, and should mimic a reasonable action by the user. For instance, let's say we start by rendering a typical display frame for EEG of 20 seconds (of a total 1 hour recording)... We could test zooming out to half the total duration (let's say 30 minutes), and then test panning from the first to the second half of the dataset (the second 30 minutes).
  3. Memory
  4. CPU

Test Scenarios:

  • Test scenarios are the workflow notebooks (or dev versions of them)

Benchmarking Dimensions/Parameters:

  • Tweaks to the workflow code
    • This includes any code that lives outside the workflow notebooks, but is specifically created for them, such as anything that would go into the hvneuro module.
    • Each benchmarking run would ideally be labeled with a commit hash of the code it tested
  • Backend/approach employed
    • For example, WebGL, Datashader, LTTB, caching
  • HoloViz/Bokeh versions
    • Let's just start with a single (latest) version of each package and then when things are further along, we can look at expanding the test matrix. I'm mostly thinking about the situation where we would want to note when a new Bokeh release happened which may impact benchmarking results.
  • Dataset size
    • For each modality, we should have at least a lower, mid, and upper dataset size tested.
  • Use of CPU and/or GPU for computing
    • This is highly dependent on the approach, but probably impactful on the metrics enough that they should be distinguished.
    • Ideally, we would only rely on the CPU for computing, but it's not a hard requirement at this time.
  • Environment
    • Jupyter Lab vs VS Code

Other thoughts:

  • We want benchmarking set up in such a way that we can trigger it somewhat automatically in the future.
    • Maybe this means incorporating it into the CI?
  • We want to specify some threshold of diminishing returns; for example, at what point is the interaction latency good enough that any further improvements provide little additional value to the user
  • We want to generate a report with these benchmarking results such that we can note certain approaches or improvements made over time and how they impacted the benchmarking scores
    • benchmark diagram
  • If we can't achieve something reasonable for latency to initial display of something useful for the largest datasets, I could imagine that we take an approach where the initial render is slow (and we provide info, apologies, and a loading indicator) but subsequent loading and interaction are very fast.

Software for Benchmarking:

  • Bokeh custom JavaScript callbacks that capture the time before and after user interaction events
  • Playwright
  • airspeed velocity (asv)

Benchmark comparisons

  • fastplotlib
  • napari

GOAL: Intensity histogram for image stack

Summary and Links

  • intensity-hist:
  • By examining and manipulating the histogram of an image, users can observe and manipulate the distribution of pixel intensities and identify patterns or characteristics in different tonal ranges.

Relevant Workflows

  • video viewer

Large gap between components of EEG viewer

The gap (vertical distance) between the two components of the EEG viewer (signal browser and overview bar) is too large on my machine:

Screenshot 2023-07-20 at 12 33 54

I started the viewer with panel serve workflow_eeg-viewer.ipynb --show. Here's the output of import holoviews as hv; hv.extension("bokeh"); hv.show_versions():

Python              :  3.9.16 | packaged by conda-forge | (main, Feb  1 2023, 21:38:11) 
[Clang 14.0.6 ]
Operating system    :  macOS-13.4.1-arm64-arm-64bit
holoviews           :  1.16.2

IPython             :  8.14.0
bokeh               :  3.2.0
colorcet            :  3.0.1
cudf                :  -
dask                :  2023.7.0
datashader          :  0.15.1
geoviews            :  -
hvplot              :  0.8.4
ibis                :  -
jupyter_bokeh       :  -
jupyterlab          :  4.0.3
matplotlib          :  3.7.2
networkx            :  3.1
notebook            :  7.0.0
numba               :  0.57.1
numpy               :  1.24.4
pandas              :  2.0.3
panel               :  1.2.0
param               :  1.13.0
pillow              :  -
plotly              :  -
pyarrow             :  -
scipy               :  1.11.1
skimage             :  0.21.0
spatialpandas       :  -
streamz             :  -
xarray              :  2023.7.0

GOAL: create synthetic data generators

Summary and Links

  • datagen (lead: @droumis)
  • Create simple neuro data generators for main workflows

Relevant Workflows

  • All are relevant, but initially focusing on
  • eeg viewer
  • ephys viewer
  • video viewer

Inspiration:

Requirements

Create baseline large image workflow

This current idea is to demonstrate inspecting a large electron microscopy image.

Features include utilizing max_interval + datashaded minimap + viewport-specific rendering to limit the data being sent to browser while inspecting a region at full resolution.

[GOAL] Demo viewing of large multi-chan timeseries data with multi-time-resolution generator and dynamic accessor

Problem:

On their own, our current methods like Datashader and downsampling are insufficient for data that cannot be fully loaded into memory.

Description/Solution/Goals:

This project aims to enable effective processing and visualization of biological datasets that exceed available memory limits. The task is to develop a proof of concept for an xarray-datatree-based multi-resolution generator and dynamic accessor. This involves generating and storing incrementally downsampled versions of a large dataset, and then accessing the appropriate resolution copy based on viewport and screen parameters. We want to leverage existing work and standards as much as possible, aligning with the geo and bio communities.

Potential Methods and Tools to Leverage:

Tasks:

  1. Research xarray-datatree storage conventions in zarr, summarize the options here and then write a notebook that takes the ephys data and uses the downsample1d operation (or directly use tsdownsampler) to generate a hierarchical tree of downsampled versions of the data.
  2. Write a notebook that uses xarray-datatree and a DynamicMap to load based on zoom level
  3. Implement a data interface in HoloViews that wraps xarray-datatree and loads the appropriate subset of data automatically given a configurable max data size.
  4. Repeat the relevant steps above for the microscopy data use case and dataset.

Use-Cases, Starter Viz Code, and Datasets:

Stacked timeseries:

Summary

  • A stacked timeseries plot is commonly used for synchronized examination of time-aligned, amplitude/unit-diverse timeseries. A useful stacked timeseries plot might look like this:
Code
from scipy.stats import zscore
import h5py

import holoviews as hv; hv.extension('bokeh')
from holoviews.plotting.links import RangeToolLink
from holoviews.operation.datashader import rasterize
from bokeh.models import HoverTool

filename = 'recording_neuropixels_10s_384ch.h5'
f = h5py.File(filename, "r")

n_sample_chans = 40
n_sample_times = 25000 # sampling frequency is 25 kHz
clim_mul = 2

# main plot
hover = HoverTool(tooltips=[
    ("Channel", "@channel"),
    ("Time", "$x s"),
    ("Amplitude", "$y µV")])

time = f['timestamps'][:n_sample_times]
data = f['recordings'][:n_sample_times,:n_sample_chans].T

f.close()

channels = [f'ch{i}' for i in range(n_sample_chans)]
channels = channels[:n_sample_chans]

channel_curves = []
for i, channel in enumerate(channels):
    ds = hv.Dataset((time, data[i,:], channel), ["Time", "Amplitude", "channel"])
    curve = hv.Curve(ds, "Time", ["Amplitude", "channel"], label=f'{channel}')
    curve.opts(color="black", line_width=1, subcoordinate_y=True, subcoordinate_scale=3, tools=[hover])
    channel_curves.append(curve)

curves = hv.Overlay(channel_curves, kdims="Channel")

curves = curves.opts(
    xlabel="Time (s)", ylabel="Channel", show_legend=False,
    padding=0, aspect=1.5, responsive=True, shared_axes=False, framewise=False)

# minimap
y_positions = range(len(channels))
yticks = [(i, ich) for i, ich in enumerate(channels)]
z_data = zscore(data, axis=1)

minimap = rasterize(hv.Image((time, y_positions, z_data), ["Time (s)", "Channel"], "Amplitude (uV)"))
minimap = minimap.opts(
    cmap="RdBu_r", colorbar=False, xlabel='', yticks=[yticks[0], yticks[-1]], toolbar='disable',
    height=120, responsive=True, clim=(-z_data.std()*clim_mul, z_data.std()*clim_mul))

RangeToolLink(minimap, curves, axes=["x", "y"],
              boundsx=(.1, .3),
              boundsy=(10, 30))

(curves + minimap).cols(1)
  • The main interaction is zooming and panning through time (x) or channels (y). A primary goal of this multi-res initiative is to make this interaction responsive, regardless of the dataset size.
  • In addition to a main plot of the actual timeseries (dims: time, source/channel), it is beneficial to utilize a minimap/RangeToolLink image to be able to get an impression of the whole dataset and navigate the viewport of the main plot to various regions of interest. For example, see the minimap plot on the bottom in the image above - the image is a rasterized version of the entire dataset (at least the chunk that I chose to work with for this demo) with the same x and y dims as the main plot. This particular minimap image is zscored per channel/source to normalize the amplitude across channels and facilitate pattern detection (although it's not necessary for this demo dataset that has been bandpass filtered).

Data

  • The datasets below are simulated multielectrode electrophysiological (ephys) data, saved to .h5 (a common underlying format for ephys data). They were created with this nb.
    • Larger Simulated Ephys data (5,000,000 time samples (200s), 384 channels) - 15 GB:
      • datasets.holoviz.org/ephys_sim/v1/ephys_sim_neuropixels_200s_384ch.h5
    • Smaller Simulated Ephys data (250,000 time samples (10s), 384 channels) - 3 GB:
      • datasets.holoviz.org/ephys_sim/v1/ephys_sim_neuropixels_10s_384ch.h5

Note... I recommend working through this notebook on accessing ephys HDF5 Datasets into xarray via Kerchunk and Zarr that Ian created. I can imagine a situation in which the approach to a multiresolution access just utilizes kerchunk references instead of downsampled data copies; although I'm not sure how that would work with xarray-datatree - maybe it would have to be either kerchunk or xarray-datatree, but not both. Maybe we could consult Martin.

Miniscope Image Stack: UPDATE: solved without needing multi-res handling

Summary

  • A miniscope image stack typically has a modest height and width resolution but a deep time/frame dimension. A useful miniscope image stack viewer might look like this:
Code
import xarray as xr
import panel as pn; pn.extension()
import holoviews as hv; hv.extension('bokeh')
import hvplot.xarray

DATA_ARRAY = '1000frames'

DATA_PATH = f"<miniscope_sim_{DATA_ARRAY}.zarr>"

ldataset = xr.open_dataset(DATA_PATH, engine='zarr', chunks='auto')

data = ldataset[DATA_ARRAY]

# data.hvplot.image(groupby="frame", cmap="Viridis", height=400, width=400, colorbar=False)

FRAMES_PER_SECOND = 30
FRAMES = data.coords["frame"].values

# Create a video player widget
video_player = pn.widgets.Player(
    length=len(data.coords["frame"]),
    interval=1000 // FRAMES_PER_SECOND,  # ms
    value=int(FRAMES.min()),
    max_width=400,
    max_height=90,
    loop_policy="loop",
    sizing_mode="stretch_width",
)

# Create the main plot
main_plot = data.hvplot.image(
    groupby="frame",
    cmap="Viridis",
    frame_height=400,
    frame_width=400,
    colorbar=False,
    widgets={"frame": video_player},
)

# frame indicator lines on side plots
line_opts = dict(color='red', alpha=.6, line_width=3)
dmap_hline = hv.DynamicMap(pn.bind(lambda value: hv.HLine(value), video_player)).opts(**line_opts)
dmap_vline = hv.DynamicMap(pn.bind(lambda value: hv.VLine(value), video_player)).opts(**line_opts)

# height side view
right_plot = data.mean(['width']).hvplot.image(x='frame',
    cmap="Viridis",
    frame_height=400,
    frame_width=200,
    colorbar=False,
    rasterize=True,
    title='_', # TODO: Fix this. See https://github.com/bokeh/bokeh/issues/13225#issuecomment-1611172355
) * dmap_vline

# width side view
bottom_plot = data.mean(['height']).hvplot.image(y='frame',
    cmap="Viridis",
    frame_height=200,
    frame_width=400,
    colorbar=False,
    rasterize=True,
) * dmap_hline

video_player.margin = (20, 20, 20, 70) # center widget over main

sim_app = pn.Column(
    video_player,
    pn.Row(main_plot[0], right_plot),
    bottom_plot)

sim_app
  • The main interaction is scrubbing or playing through the time/frames. A primary goal of this multi-res initiative is to make this scrubbing/playing responsive, regardless of the dataset size.
  • In addition to a main plot of a single time/frame (dims: height, width), it is beneficial to see 2D side-views of the image stack cube where either the width or height dimension is aggregated. For example, see the plot on the right in the image above - the width dimension is max-aggragated, and it shows the progression of height values over times/frames. Another primary goal of this multi-res initiative is to be able to render and display these side plots, regardless of the dataset size.

Data

  • The datasets below are simulated miniscope data, chunked in the time/frame dimension, saved to [zarr via xarray](xarray-augmented zarr format). They were created with this script which runs code from here.
    • Larger Simulated Miniscope data (512 height, 512 width, 10,000 frames) - 24 GB:
      • datasets.holoviz.org/sim_miniscope/v1/miniscope_sim_10000frames.zarr
    • Smaller Simulated Miniscope data (512 height, 512 width, 1,000 frames) - 2.4 GB:
      • datasets.holoviz.org/sim_miniscope/v1/miniscope_sim_1000frames.zarr

Additional Notes and Resources:

[GOAL] channel type grouping

Summary and Links

  • channel-type-grouping Support channel-type grouping with different sampling and amplitude range.

  • channel-group-yticks Switch y-ticks to group values when zoomed out enough that channel-based y-ticks are cluttered.. So a zoomed out view might be something like the following y-ticks (instead of have a tick per row)

image

Relevant Workflows

  • eeg viewer
  • ephys viewer

[GOAL] Benchmarking Workflows

Summary and Links

  • benchmark: Benchmark speed of initial display and interaction

Key Benchmarking Metrics:

  1. Latency to initial display (of anything useful)
  2. Latency for interaction updates (pan, zoom, or scroll/scrub). The type of interaction that we want to prioritize depends on the workflow. For a stacked timeseries viewer, either a zoom out or pan is ideal. For an imagestack viewer, scrolling/scrubbing through the frames is ideal.

Benchmarking Dimensions/Parameters:

  1. Dataset size (e.g. for stacked timeseries, number of channels or samples; for an imagestack, number of frames or frame size)
  2. Underlying software changes, including a specific commit/version of any relevant package. (e.g. Param 1.X vs 2.X, or before and after a bug fix to HoloViews). This is the closest thing to what ASV usually would test for over time, but now for a whole set of relevant packages and for specific cherry-picked comparisons. This would require a log/env of commit/versions used per benchmarking run.
  3. Workflow approach employed (e.g. stacked timeseries using HoloViews downsample options: LTTB vs MinMaxLTTB vs viewport. A second example is utilizing a numpy array vs pandas df vs xarray da as the data source for a holoviews element. A third example is using hvplot vs holoviews to produce a similar output. A fourth example is using Bokeh Canvas vs WebGL rendering). This would require a manual description about the benchmarking run.

Results handling

  • The results of the benchmarking need to be reproducible - storing a copy of the benchmarks, info about the environment used, manual notes about the approach, info about the machine that it was run on.
  • The timing results need to be in a format amenable to a comparison.. (e.g. show the latency to display as a function of the number of samples for the stacked timeseries workflow when employing no downsampling vs MinMaxLTTB downsampling)

Out of scope, future stuff:

  • Incorporate into the CI
  • Put benchmarking stuff in a separate repo (it's totally fine to do that now if you want, but not expected)

Tools for Benchmarking:

  • Bokeh 'FigureView actual paint' messages that capture figure_id, state (start, end), and render count
  • Playwright

Etc

Previous version of Benchmarking Goal

GOAL: Minimap/RangeTool

Summary and Links

  • minimap (lead: @droumis and @hoxbro )
  • demonstrate a zscored image minimap for timeseries data
  • potentially build this into the large data handling approach

Relevant Workflows

  • eeg viewer
  • ephys viewer

Inspiration:

  • MNE raw.plot

Requirements

  • RangeTool-like functionality to control the x/y range display of a timeseries plot
  • Downscaled/datashaded z-scored image of the timeseries as the default minimap, since it reflects all of a potentially massive dataset
  • extension to toggle between different image representations, like [empty, channels (HSpans for different channel group types), zscore]
  • Work within a Holoviews or Panel layout (there's a workaround)
  • Work with VSpans (coordinated annotations with the main plot)
  • 'reset' bokeh toolbar button resets the view to the potentially restricted initial xy range of the rangetool, not the entire plot (HoloViews #5848)
  • minimap should be responsive in the X direction to the size of the window, but the Y-size of the minimap should stay constant.
  • ensuring all requirements above works in standalone server vs notebook

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.