nel-lab / mesmerize-core Goto Github PK

View Code? Open in Web Editor NEW

55.0 5.0 15.0 1.83 MB

High level pandas-based API for batch analysis of Calcium Imaging data using CaImAn

License: Other

Python 63.12% Jupyter Notebook 36.88%

caiman calcium-imaging calcium-imaging-analysis neuroscience

mesmerize-core's People

Contributors

Stargazers

Watchers

Forkers

phaedrus2018 wherewolf72 katesal327 ambtetl ericthomson jesuslovern davidhildebrand ruiosilva asharma11 tkmymgc zfyke melkalliny marburyvthesea rly proektlab

mesmerize-core's Issues

Delete associated files when removing batch item

When removing a batch item, there should be an option (or default with option for not doing) to delete all files with the same uuid

Load input movies as memmaps

Should convert input tiff to memmap

add cnmf extension for component eval

component evaluation should be done here within the backend, not in mesmerize-napari. write a cnmf extension that wraps cnmf_obj.estimates.filter_components.

Should it store this as a new batch item in the dataframe or update an existing item and just update the eval_kwargs of the params? @ArjunPutcha thoughts?

make for loops for all the saved files from algos

Installation - problem while solving environment

While trying to update the environment via mamba:
mamba env update -n mesmerize-core --file environment.yml
ERROR:
Encountered problems while solving:

package caiman-1.9.10-py39h2e25243_0 requires python >=3.9,<3.10.0a0, but none of the providers can be installed

checking the installed python --version = 3.10.4
Then I tried conda install python=3.10.0, rerunning the env update yielded the same error

Next conda install python=3.9 made the env update work.

Should I just continue with 3.9, or use something else?

add `get_chunked_dfof()` cnmf extension function

args can be specific ranges of indices to use for Fo, everything in between is dfof w.r.t. these ranges.
list of (fo_start, fo_end, (F_start, F_end))

get_chunked_dfof(temporal: np.ndarray, f0_ranges: List[np.ndarray], f_ranges: List[np.ndarray]) -> np.ndarray

temporal is 2d array of traces
f0_ranges vstack of [f0_start, f0_end] for each temporal trace, each trace has 1 array, list of arrays for all traces
f_ranges vstack of [f_start, f_end] for each temporal trace, each trace has 1 array, list of arrays for all traces

label with experimental warning

DataJoint integration

Determine whether it's best to wrap datajoint within mesmerize-core pandas extensions, or if datajoint should make calls to mesmerize-core pandas extensions within their table make() methods. https://github.com/datajoint/element-calcium-imaging

remove qdialog decorators from utils

remove use_open_file_dialog, use_save_file_dialog, use_open_dir_dialog, and present_exceptions. All Qt code should be removed from mesmerize-core except for QProcess (maybe remove that too?)

@ArjunPutcha can you please confirm that none of these decorator functions are used in mesmerize-napari

rename `name` column to `item_name`

Since pandas.Series use the name attribute to store the index of the Series instance in the parent DataFrame.

`return_copy` should probably be dropped from `kwargs` when checking cache hits

@clewis7 posting this because it just popped into my head, don't worry about it until you're back

the return_copy kwarg is irrelevant for determine a cache hit/miss therefore it should be dropped for the comparison.

`@validate` decorator should raise formatted tracebook

something like:

raise UnsuccessfulItem(df.iloc[0].outputs["traceback"])

Windows OS error while running caiman - FileNotFound

Running Windows 11 x64, running caiman gives a couple errors:

I can confirm the file is there -
os.path.isfile('C:/Users/gjb326/caiman_data/mesmerize-core-batch/f1f56d3d-53b9-4e80-a92c-a34a635b28da.runfile.ps1') returns True, but only when the '.ps1' is appended

But os.path.isfile(r'C:\Users\..\..a.runfile.ps1') and os.path.isfile('C:\\Users\\..\\..a.runfile.ps1') return False

---------------------------------Begin error------------------------------------------
OSError Traceback (most recent call last)
File c:\users\gjb326\mesmerize-core\mesmerize_core\caiman_extensions\common.py:248, in CaimanSeriesExtensions.run(self, backend, callbacks_finished, callback_std_out)
247 try:
--> 248 self.process = getattr(self, f"run{backend}")(
249 runfile, callbacks_finished, callback_std_out
250 )
251 except:

File c:\users\gjb326\mesmerize-core\mesmerize_core\caiman_extensions\common.py:184, in CaimanSeriesExtensions._run_subprocess(self, runfile_path, callbacks_finished, callback_std_out)
182 parent_path = self._series.paths.resolve(self._series.input_movie_path).parent
--> 184 self.process = Popen(runfile_path, cwd=parent_path)
185 return self.process

File ~\Anaconda3\envs\mesmerize-core\lib\subprocess.py:951, in Popen.init(self, args, bufsize, executable, stdin, stdout, stderr, preexec_fn, close_fds, shell, cwd, env, universal_newlines, startupinfo, creationflags, restore_signals, start_new_session, pass_fds, user, group, extra_groups, encoding, errors, text, umask)
948 self.stderr = io.TextIOWrapper(self.stderr,
949 encoding=encoding, errors=errors)
--> 951 self._execute_child(args, executable, preexec_fn, close_fds,
952 pass_fds, cwd, env,
953 startupinfo, creationflags, shell,
954 p2cread, p2cwrite,
955 c2pread, c2pwrite,
956 errread, errwrite,
957 restore_signals,
958 gid, gids, uid, umask,
959 start_new_session)
960 except:
961 # Cleanup if the child failed starting.

File ~\Anaconda3\envs\mesmerize-core\lib\subprocess.py:1420, in Popen._execute_child(self, args, executable, preexec_fn, close_fds, pass_fds, cwd, env, startupinfo, creationflags, shell, p2cread, p2cwrite, c2pread, c2pwrite, errread, errwrite, unused_restore_signals, unused_gid, unused_gids, unused_uid, unused_umask, unused_start_new_session)
1419 try:
-> 1420 hp, ht, pid, tid = _winapi.CreateProcess(executable, args,
1421 # no special security
1422 None, None,
1423 int(not close_fds),
1424 creationflags,
1425 env,
1426 cwd,
1427 startupinfo)
1428 finally:
1429 # Child is launched. Close the parent's copy of those pipe
1430 # handles that only the child should have open. You need
(...)
1433 # pipe will not close when the child process exits and the
1434 # ReadFile will hang.

OSError: [WinError 193] %1 is not a valid Win32 application

During handling of the above exception, another exception occurred:

FileNotFoundError Traceback (most recent call last)
Input In [5], in <cell line: 3>()
1 # run the first "batch item"
2 # this will run in a subprocess by default
----> 3 process = df.iloc[0].caiman.run()
4 process.wait()

File c:\users\gjb326\mesmerize-core\mesmerize_core\caiman_extensions\common.py:252, in CaimanSeriesExtensions.run(self, backend, callbacks_finished, callback_std_out)
248 self.process = getattr(self, f"run{backend}")(
249 runfile, callbacks_finished, callback_std_out
250 )
251 except:
--> 252 with open(runfile_path, "r") as f:
253 raise ValueError(f.read())
255 return self.process

FileNotFoundError: [Errno 2] No such file or directory: 'C:\Users\gjb326\caiman_data\mesmerize-core-batch\f1f56d3d-53b9-4e80-a92c-a34a635b28da.runfile'

Test to do list

Test caiman dataframe extensions
Test caiman series extensions
mcorr extensions
cnmf extensions
cnmfe extensions
check the return vals of these functions with ixs_components passed, as well as different combinations of kwargs when relevant. @clewis7 I think you finished this right?
Test utils
~~make_runfile() - not really necessary, stuff won't run if the runfile is bad~~

caiman extension get_spatial_masks()

using caiman extension to get_spatial_masks() does not plot properly...not filling in neurons

cache tests fails on windows

this is extremely bizarre and occurs only on windows as of recently, nothing has changed in the cache implementation ever since this test started failing.

https://github.com/nel-lab/mesmerize-core/runs/8042869907?check_suite_focus=true#step:5:2145

I might just disable the cache on windows and fix it in a later release

Final API changes

all data that relates to an item should be in a dir named using the uuid for that item. - Done in #51
add_item: params could be chunked by step. - done in #50 , algo params get put in "main" key allowing room to add more keys for other things like Ain matrix etc.
caiman.run() needs to only take backend as arg and rest are optional kwargs. _run_subprocess only takes runfile_path as an arg. Remove the callbacks args from everything, it should be passed as a kwarg only within the napari stuff for QProcess and therefore removed from mesmerize-core.
add comments as default column

cnmf:

getting the dfof array. It can take kwargs to pass to the caiman dfof function. If dfof has already been calculated with the given kwargs, it just returns those, (perhaps just using the cache). If new kwargs are provided it will use the caiman function to calculate them and then return the dfof vals @kushalkolar
reconstructed movie, pass idx_components, also allow passing a single integer to ixs_frames for getting a single frame @clewis7
get_reconstructed_background() @clewis7
get_residuals() @clewis7
evaluate_components() @kushalkolar #20
#69

general:

test on windows

move `CaimanSeriesExtensions._run_qprocess` to mesmerize-napari

It should be possible to subclass CaimanSeriesExtensions in mesmerize-napari and append the _run_qprocess() method to that, this way mesmerize-core gets rid of Qt completely!

@ArjunPutcha

"caiman" run() extension - callbacks_finished kwarg

callbacks_finished should be an optional argument in common.py caiman.run() extension

do not add background by default for cnmf.get_temporal_components()

[BUG] cnmf output projection paths are not relative

the output paths for cnmf max, mean, std projection are stored as absolute paths, not relative (like for the hdf5 and corr img paths)

cnmf extension for component registration

simplify arrays returned by `get_reconstructed_movie()` and `get_residuals()`

If only a single frame index is requested for get_reconstructed_movie() or get_residuals(), the returned array should be of shape [x_pixels, y_pixels] and not [1, x_pixels, y_pixels]

Something like this at the end should do it:

if np.diff(ixs_frames).item() == 1:
    return residuals[0]
return residuals

@clewis7 wait until your current stuff is done and merged into master, this can be done afterwards.

Docs

API docs, hosted on rtd
examples for every function in the docstrings
example notebook using fastplotlib
- mcorr -> cnmf
- mcorr -> cnmfe
- more detailed notebook that shows residuals, downsampled avg mcorr movie, etc. with simple gridplots. Point users to mesmerize-viz for ready to use widgets.

cnmf extension for dfof

There should be an extension for getting the dfof vals. It can take kwargs to pass to the caiman dfof function.

If dfof has already been calculated with the given kwargs, it just returns those, (perhaps just using the cache). If new kwargs are provided it will use the caiman function to calculate them and then return the dfof vals

make PyQt fully optional

replace the type annotation for CaimanDataFrameExtension.process with Any and a comment annotation stating it is one of QtCore.QProcess or subprocess.POpen
make a test script that start a Qt event loop and just runs the QProcess backend for any algo. This might have to be an individual test because the Qt event loop will be necessary, i.e.QApplication
remove from requirements.txt and enviroment.yml

bug in cache

all keys within kwargs are not checked:
https://github.com/nel-lab/mesmerize-core/blob/master/mesmerize_core/caiman_extensions/cache.py#L125

This should instead call _check_args_equality():
https://github.com/nel-lab/mesmerize-core/blob/master/mesmerize_core/caiman_extensions/cache.py#L23

new tests

@_component_indices_parser
@cache.invalidate
run_eval()
- get_good_components() before and after eval
- get_bad_components() before and after eval
get_detrend_dfof() with diff args, such as dfof and just detrend
get_chunked_dfof()
get_rcm() with different temporal_components arg
modify tests for get_rcm(), get_rcb() and get_residuals() w.r.t. the new #140

Add these to an existing test function that produces a lot of rows:

get_children()
get_parent()
save_to_disk() with safety checks
remove_item()

get_input_movie() should return `tifffile.TiffFile` instance or `pims` stuff depending on a kwarg

Makes it easier for downstream random-access handling of large movies

allow richer arguments to `component_indices`

change behavior of component_indices argument for CNMF extensions:

None: uses cnmf.estimates.idx_components, i.e. good components
"good": same as None, uses cnmf.estimates.idx_components
"bad": uses cnmf.estimates.idx_components_bad
"all": uses np.arange(cnmf.estimates.A.shape[1])

increase cache fetch duration check

can you increase the difference that's checked here, the tests fail sometimes because the github CI pipeline computers aren't fast enough. Maybe something like 0.05? https://github.com/nel-lab/mesmerize-core/runs/7752247571?check_suite_focus=true#step:5:1652

towards v0.3

high priority

new features

mcorr

Add get_template(), list of templates for each chunk

cnmf

register_items() - DataFrame extension for CNMF outputs, multisession registration

dfof extensions

caiman detrend
2. simple dfof, args can be specific ranges of indices to use for Fo, everything in between is dfof w.r.t. these ranges.
list of (fo_start, fo_end, (F_start, F_end)) moved to v0.2

dsavg movie class

class so that a downsampled average movie can be utilized like other arrays by making a class implementing __getitem__

Maybe subclass np.ndarray?

roughly:

class DSAvgMovie:
  def __init__(self, mcorr_memmap: np.ndarray, window_size: int):
    self.mcorr_memmap = mcorr_memmap
    self.window_size = window_size

  def __getitem__(self, ix: int):
    w = self.window_size
    return np.nanmean(self.mcorr_memmap[ix - w:ix + w], axis=0)

Installation error - "cannot import name 'init_std_stream_encoding' from 'conda.common.compat' "

OS - Windows 11

Hey guys, while installing mesmerize-core for development I got an error while updating the env with the environment file:

(mesmerize-core) C:\Users\gjb326\mesmerize-core>mamba env update -n mesmerize-core --file environment.yml
Traceback (most recent call last):
--File "C:\Users\gjb326\Anaconda3\Scripts\mamba-script.py", line 10, in
----sys.exit(main())
--File "C:\Users\gjb326\Anaconda3\lib\site-packages\mamba\mamba.py", line 848, in main
----from conda.common.compat import ensure_text_type, init_std_stream_encoding
ImportError: cannot import name 'init_std_stream_encoding' from 'conda.common.compat' (C:\Users\gjb326\Anaconda3\lib\site-packages\conda\common\compat.py)

renaming stuff

args:
idx_components -> component_indices
ixs_frames -> frame_indices

methods:
get_spatial_masks() -> get_masks()
get_spatial_contours() -> get_contours()
get_temporal_components() -> get_temporal()
get_reconstructed_movie() -> get_rcm()
get_reconstructed_background() -> get_rcb()
get_correlation_image() -> get_corr_image()

[MODIFICATION] Use new caiman data storage dir for mcorr outputs

Stop using the kushalkolar branch for saving the mcorr memmaps with a specific filename, use this once the PR is merged into caiman master branch.

flatironinstitute/CaImAn#948

caiman.run() needs to invalidate cnmf cache

Otherwise old outputs remain in RAM and new outputs can't be reloaded.

Eval extension, components etc.

get_good_components() - returns indices of good components
get_bad_components() - return indices of bad components
evaluate_components(<eval kwargs>) - performs eval, modifies df.iloc[i].params["eval"] in place, and modifies the hdf5 file on disk. Replaces the data in the hdf5 file. @clewis7 this would require invalidation of all cache entries with the uuid of this batch item.

move extension `caiman.get_input_movie()` from common to mcorr extensions

example script to create compatible dataframe from existing data

make an example script where a user can add items to a batch dataframe from existing caiman-processed data.

algo: specified by the user
input_movie_path: the path to the movie used by the algo, if not a memmap convert it to an appropriate memmap
params: user either passes a dict manually or use the hdf5 file if algo is cnmf
output: dict containing path to the output files, if the user does not cn_image, projections etc. calculate them

add cnmf extension get_residuals()

reconstructed movie extension should take `ixs_components` as arg

currently the CNMF extension get_reconstructed_movie() doesn't take any ixs_components arg. This should be implemented so that the reconstructed movie is for example made only using the good components

CI is a nightmare right now

Because:

Need wait for CAIMAN_TEMP to be implemented for mcorr memmaps
h5py version issues, wait for newer release of caiman with latest h5py and the CAIMAN_TEMP

lru_cache doesn't work with 'np.ndarray' as argument

See cnmf.py, CNMFExtensions._get_spatial_contours
Passing np.ndarray for ixs_components yields the following error: "TypeError: unhashable type: 'numpy.ndarray'"

override `CNMF.Estimates.C` for reconstructed movie

Allow passing in custom temporal traces to get_reconstructed_movie() as a kwarg to use these instead of estimates.C. Useful for making reconstructed movie using dF/F0, detrended trace, z-scored, etc.

Rename stuff: `ixs_frames` etc. --> `indices_frames`, `ixs_components`, `idx_components` etc. to `indices_components`. If the argument is a range (like for reconstructed and residual) name it `indices_range_frames` I think it'll make it easier for users to understand? @ArjunPutcha @clewis7 thoughts? Also rename functions to `get_temporal()`, `get_contours()` etc. remove redundant "components" from function names.

remove idx_components args for cnmf extensions

cnmf extensions should not rely on any arguments that require accessing the hdf5 output file since this defeats the whole purpose of that. For example, get_spatial_contour_coors() should not require the idx_components, it should be a kwargs with default value of None. And if None it uses cnmf.estimates.idx_components after loading the output within the function itself. If the user provides an idx_components it uses that.

Use a real database

ideas:

pandas API can be used to read from a SQL db

df = pd.read_sql("SELECT * FROM my_table", connector)
df_cached = df.query()
df.commit() # write back to db

Daniel suggested looking into MongoDB since large files will need to be lazy-loaded

[FEATURE] Single cache for CNMF extensions

Create a single cache, probably using a class decorator, for CNMF outputs (contours, the hdf5 file etc.) because they can take a few seconds to load sometimes.

make auto publish to pypi and conda-forge

make auto publish to pypi and conda-forge, we're ready for v0.1 once pandas v1.5 is out