alleninstitute / neuroglia Goto Github PK

View Code? Open in Web Editor NEW

36.0 19.0 7.0 561 KB

a Python machine learning library for neurophysiology data

Home Page: http://neuroglia.readthedocs.io

License: BSD 3-Clause "New" or "Revised" License

Python 100.00%

scikit-learn machine-learning neuroscience neurophysiology python open-science nwb electrophysiology calcium-imaging

neuroglia's Introduction

neuroglia: more than just brain glue

Neuroglia is a Python machine learning library for neurophysiology data. It features scikit-learn compatible transformers for extracting features from extracellular electrophysiology & optical physiology data for machine learning pipelines.

Installation

pip install git+https://github.com/AllenInstitute/neuroglia.git

Level of Support

We are planning on occasional updating this tool with no fixed schedule. Community involvement is encouraged through both issues and pull requests.

License

BSD-3-Clause

Authors

Development Lead

Justin Kiggins

Contributors

Nicholas Cain
Michael Oliver
Sahar Manavi
Johannes Friedrich
Christopher Mochizuki

neuroglia's People

Contributors

Stargazers

Watchers

Forkers

gitter-badger neuromusic johnsonc j-friedrich mochic nilegraddis jun-lizst

neuroglia's Issues

should Smoother accept an explicit list of neurons?

currently, smoother automatically discovers the IDs of the neurons through a groupby operation on the X dataframe of spike times that it accepts.

these then become the columns of the output dataframe

however, if a given neuron is unobserved in the data passed into X, it will not get a column.

this could be supported by accepting a "neurons" kwarg when initializing the object & replacing the groupby operation with an explicity loop over the neurons values & building masks.

this approach would also let the user ignore any neurons in X that they don't want to consider with a kwarg (that is, which neurons get smoothed could become a hyperparameter to optimize)

add example: smoothing with multiple kernels

add an example that demonstrates smoothing a small population (3?) of synthetic spike trains with various kernels & plots them

spike inference

candidate to wrap:
https://github.com/j-friedrich/OASIS

docs: show examples on API pages

use sphinx-gallery to automatically generate links to examples that use a given class on the class's page in the API

MNT: Stop using ci-helpers in appveyor.yml

To whom it may concern,

If you are using https://github.com/astropy/ci-helpers in your appveyor.yml , please know that the Astropy project has dropped active development/support for Appveyor CI. If it still works, good for you, because we did not remove the relevant files (yet). But if it ever stops working, we have no plans to fix anything for Appveyor CI. Please consider using native Windows support other CI, e.g., Travis CI (see https://docs.travis-ci.com/user/reference/windows/). We apologize for any inconvenience caused.

If this issue is opened in error or irrelevant to you, feel free to close. Thank you.

xref astropy/ci-helpers#464

identify open datasets for demos

look at crcns.org for spiking data (since neuropixels is unreleased)

and brain observatory is an obvious candidate for calcium traces

add support for codecov.io

merge junk in API reference

neuroglia version: 0.2.9
Python version: 3.6
Operating System: ubuntu 17.10

Description

API reference docs has a bit of merge junk:

 40 <<<<<<< HEAD
 41 
 42 =======
 43 
 44 >>>>>>> 14a9cab... :memo: typo in API docs

What I Did

cd docs
python -m sphinx . _build

improve test coverage

Description

test coverage is low ~67% so getting > 90% would be nice and not too much effort

@j-friedrich i'm not very experienced with your contributions could you potentially review the test code i submit?

add "datasets" module

this module should implement a similar api as sklearn.datasets and nilearn.datasets

use sklearn.datasets.base (including sklearn datasets cache folder) to store downloaded data
use logic as in crcnsget to download data (optionally using environment variables for passwords)
functions should return data ready to analyze with neuroglia (e.g. event dataframes, spike dataframes, or trace dataframes/xarrays)

first candidate datasets, depending on needs for examples:

cai-1 for calcium inference

allen institute brain observatory experiment for decoding example

refactor spike inference

currently, spike inference is implemented with a transformer called OASISInferer, which takes the OASIS arguments as parameters.

this transformer should be replaced with an algorithm-agnostic transformer that accepts more intuitive arguments.

e.g.:

inferer = ng.calcium.EventInferer(
    penalty='l0',
    method='oasis',
)

add example: spike inference on spikefinder datasets

add a script to examples/ that does the following:

loads data from one spikefinder dataset (see http://spikefinder.codeneuro.org) & fits it
tests the prediction against the spikefinder metrics (import spikefinder; spikefinder.score(y,y_pred) or something like that should work. see https://github.com/codeneuro/spikefinder-python)
compactly repeats 1 & 2 on all spikefinder datasets, generating a colorized table of results as in http://spikefinder.codeneuro.org

If the results are any good, consider submitting them to spikefinder, if it is still accepting new submissions :D

move robust_std to utils.py

factor out the "robust_std" method into a function in utils.py
https://github.com/AllenInstitute/neuroglia/blob/master/neuroglia/calcium.py#L25

add docstrings to event module

implement dF/F as Trace -> Trace transformer

the title says it all

add docstrings to trace module

automatically publish pre-releases on github

see commented out code in the circleci config

add example: calcium decoding.py

add example of decoding natural images from one brain observatory experiment

add docstrings to nwb module

implement more interpolation methods for PeriEventTraceSampler

with keyword arguments

cubic spline (default)
sinc
kriging

further, the user should be able to pass any of the univariate functions from scipy.interpolate that take x & y as arguments and return a function that can be applied to new x values to return interpolated y values

Bug: tox.ini

neuroglia version: b20ee5c
Python version: N/A
Operating System: N/A

Description

Looks like a typo in tox.ini:

neuroglia/tox.ini

Line 6 in b20ee5c

deps = -rrequirements.txt

rrequirements.txt --> requirements.txt

@neuromusic maybe triage this to a new user as a learning example?

implement TraceTensorizer

the TraceTensorizer should be initialized with a dataframe of events and a time axis relative to event times over which the

a key challenge is that if event times are "in between" times on the trace time axis, then a decision needs to be made:

do we align to the nearest time bin?
do we resample the trace?
if we resample, what method should we use?

my first thoughts:

if the trace is integers, then it is likely spike counts and we should NOT interpolate. grab nearest?
if the trace is continuous, then we should interpolate. cubic spline is an obvious default. might want to look into others. kriging? https://en.m.wikipedia.org/wiki/Kriging

add docstrings to calcium module

add CONTRIBUTING.md

Feature request: source extraction for calcium images

Currently neuroglia's calcium module works with extracted fluorescence traces. It would be useful to integrate the ability to extract fluorescence traces for downstream processing.
This could look something like:

from dask.array.image import imread
from neuroglia.calcium import SourceExtraction

image = imread('image.tif')
se = SourceExtraction(method='some_method', *args, **kwargs)

fluorescence_traces = se.transform(image)

Some libraries already exist for this (SIMA, CaImAn, Thunder), but an integrated solution with a consistent API would allow for more efficient processing. Which algorithms to use and how to implement or wrap them are up for discussion. Dask is used in the example above because it would support both in-memory processing of small images and out-of-memory processing of large images, and because it integrates naturally with xarray for downstream analysis.

EventTraceTensorizer fails if `bins` is an integer

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-17-90797e9adccb> in <module>()
      2     traces,
      3     bins=30,
----> 4     range=(0,1)
      5 )

c:\users\justink\code\neuroglia\neuroglia\event.py in __init__(self, traces, bins, range)
     13         super(EventTraceTensorizer, self).__init__()
     14         self.traces = traces
---> 15         self.bins = bins[:-1]
     16         self.range = range
     17 

TypeError: 'int' object is not subscriptable

docs: get autosummary class to display each method's docstrings

currently, only a table is generated for the methods, but it would be best if a full description of each method was displayed, like in the scikit-learn documentation.

this will likely require a custom autosummary class.rst template

setup readthedocs

PeriEventTraceTensorizer should accept multiple data structures for trace

xarray.DataArray

dims: time, sources
note: to most conveniently pass into scikit learn, we want the rows to be time and the columns to be neurons

pandas.DataFrame

rows: observed times
columns: neurons

np.ndarray

would need to also accept labels for each dimension

improve PEP8 compliance

todo:

get flake8 passing
add flake8 check to circleci

PeriEventSpikeTensorizer should accept multiple data structures for spikes

pandas.DataFrame

rows: observed spikes
columns: time, source, *spike_features

np.ndarray

can also accept each column as an array
need to pass additional args to indicate time and neuron column indices

dict

keys: cluster ids
values: timestamps

or should the dict representation be a separate transformation step, as in nwb.SpikeTablizer?

deployment plan

i think we need a deployment plan for pushing to pypi:

checkout master (will use it as dev since we've already started doing that?)
update CHANGELOG.rst
bumpversion
(wait for ci to pass)
push tag
merge to production
(wait for ci to pass and publish to pypi)

ResponseExtractor needs numpy

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-16-e2b82dfff346> in <module>()
      1 from neuroglia.tensor import ResponseExtractor
----> 2 extractor = ResponseExtractor()
      3 X = extractor.fit_transform(X)

c:\users\justink\code\neuroglia\neuroglia\tensor.py in __init__(self, method, dim)
      7 
      8         if method == 'mean':
----> 9             self.method = np.mean
     10         elif method == 'max':
     11             self.method = np.max

NameError: name 'np' is not defined

fix readthedocs build

readthedocs is currently failing due to trouble building the documentation (in particular, sphinx gallery examples)

pipenv instead of requirements files

do we want to use pipenv instead of requirements files?

Transformers should inherit from BaseEstimator as well

e.g.

class EventTraceTensorizer(TransformerMixin):

needs to be

class EventTraceTensorizer(BaseEstimator,TransformerMixin):

for sklearn pipelines to work

support annotating neuron edge when creating tensor

feature request from @jiaxx:

when creating a tensor (e.g. with the PeriEventSpikeSampler), allow annotation of neuron metadata

add BSD3 license

add example: synchrony.py

implement van Rossum spike train synchrony analysis for a pair of neurons

[demo] reliability

implement Dan Denman's "reliability" analysis in a neuroglia pipeline

Binned at 0.5ms, then smoothed with 5ms boxcar, then trials extracted, then for each trial (e.g. stimulus), calculate reliability. need to double check with Dan what the reliability metric was

transformers TO Tensors should output xarrays

event x neuron x time

note: event is first dimension because this is most likely dimension to cross validate on

add docstrings to spike module

make test fails

neuroglia version: b20ee5c
Python version: 2.7
Operating System: Ubuntu 16.04.2

Description

Looks like test rule needs to be fixed

What I Did

tried to run make test , received:

make test
test.sh
make: test.sh: Command not found
Makefile:2: recipe for target 'test' failed
make: *** [test] Error 127

epoch_reducer = ng.epoch.EpochTraceReducer(
    traces=TRACES,
    agg_func=np.mean,
)

mean_responses = epoch_reducer.fit_transform(EPOCHS)

open question: what should the expected format of EPOCHS be?

option 1: a 'time' columns and a 'duration' column

this maintains consistency with the EVENTS dataframes expected with other transformers in this package.

option 2: a 'start' column and an 'end' column

this is likely closer to the native representation of this kind of data

alleninstitute / neuroglia Goto Github PK

neuroglia's Introduction

neuroglia: more than just brain glue

Installation

Level of Support

License

Authors

Development Lead

Contributors

neuroglia's People

Contributors

Stargazers

Watchers

Forkers

neuroglia's Issues

Description

What I Did

Description

with keyword arguments

Description

Description

What I Did

option 1: a 'time' columns and a 'duration' column

option 2: a 'start' column and an 'end' column

option 3: both

Recommend Projects

Recommend Topics

Recommend Org