Giter VIP home page Giter VIP logo

mapca's Introduction

mapca

A Python implementation of the moving average principal components analysis methods from GIFT

Latest Version PyPI - Python Version License CircleCI Codecov Average time to resolve an issue Percentage of issues still open Join the chat at https://gitter.im/ME-ICA/mapca

About

mapca is a Python package that performs dimensionality reduction with principal component analysis (PCA) on functional magnetic resonance imaging (fMRI) data. It is a translation to Python of the dimensionality reduction technique used in the MATLAB-based GIFT package and introduced by Li et al. 20071.

Footnotes

  1. Li, Y. O., Adali, T., & Calhoun, V. D. (2007). Estimating the number of independent components for functional magnetic resonance imaging data. Human Brain Mapping, 28(11), 1251โ€“1266. https://doi.org/10.1002/hbm.20359 โ†ฉ

mapca's People

Contributors

eurunuela avatar handwerkerd avatar notzaki avatar tsalo avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

mapca's Issues

maPCA in tedana does not currently z-score data before performing PCA

As discussed in tedana #636, the maPCA in tedana currently does not normalize data before performing PCA.

@tsalo already added the normalize option when he restructured the toolbox to be sklearn compatible. However, as I mentioned in the PR, I think we should add an if statement that checks the data is already z-scored and normalizes the data when it isn't. I'd remove the normalize option and do the normalization by default.

Scaler parameters overridden within MovingAveragePCA

@eurunuela in #26, it looks like you commented out code that generated a Scaler within the subsampling procedure, and instead reused the one for the general scaling:

mapca/mapca/mapca.py

Lines 174 to 177 in 6c1d58c

# Perform Variance Normalization
# scaler = StandardScaler(with_mean=True, with_std=True)
dat = self.scaler_.fit_transform(dat.T).T
# data = ((dat.T - dat.T.mean(axis=0)) / dat.T.std(axis=0)).T

My concern is that re-fitting self.scaler_ based on new data will make it invalid for rescaling data back to the original scale, as here:

mapca/mapca/mapca.py

Lines 331 to 332 in 6c1d58c

if self.normalize:
X_orig = self.scaler_.inverse_transform(X_orig.T).T

Does that make sense, or was there a specific reason you got rid of the original approach? Apologies for not noticing this in #26 and bringing it up while that PR was open.

Make class accept images instead of arrays

Originally, we wanted to have the function operate on images and the class operate on arrays, in case we could get rid of the spatial elements of the algorithm (see #5). Since that's not possible anymore, the way we pass around a 3D shape tuple and a 1D mask vector is more than a little awkward. We can just move the image-related code from the function to the class and pass in a data image and a mask image to clean up the interface.

Documentation missing

I believe we should add a documentation page that clearly explains what the algorithm does.

I think it would make it easier to understand for users of tedana especially, as we receive some questions about it in Neurostars (see this post for example).

Deploy to PyPi

We'll want to deploy to PyPi to make it easier to use this package in tedana and other tools.

Add license

We need to incorporate information for the appropriate license from GIFT.

Identified part of the issue with component misestimation

I have a dataset where the number of components with any criterion were sometimes way to high or low. This is an issue that has been reported by others. I've identified one reason this is happening.

The MAPCA algorithm depends on estimating the spatial dependence of voxels and then running a cost function on a sparse sampling of the voxels that are far enough apart to be independent & identically distributed (i.i.d.). That is, if neighboring voxels have dependence, then apply the cost function to every other voxel in 3D space (i.e. use 1/(2^3) of the total voxels).

I made a branch of mapca which outputted a lot more logging including this subsampling factor: sub_iid_sp_median The attached figure shows this subsampling factor on the x axis and the estimated number of components on the y axis. The different markers are different run types, but all have 300-350 volumes. When the subsampling factor is 1 (i.e. use every voxel) the number of components estimation always fails by giving way too many components. When the subsampling factor is 3 (1/27 of the total voxels used for estimation) the number of components estimation fails by being way too low. I'm Given the very large range of estimated components when the subsampling factor is 2, I suspect there are other issues, but this discrete parameter should be constant for data collected within the same acquisition parameters and bad things happen when it's not.

I'm starting to think about options for completely different approaches for estimating the number of components, but I wanted to document this here. As an intermediate step, we might want to add a few more values, such as this subsamplign factor, into logging.
image

As a tangent, I realized mapca used a LGR but it wasn't writing out the log. In my above branch, I changed the LGR declaration to "general" and changed a few other things. MAPCA still didn't output it's own log, but then the LGR output from mapca was included in the tedana log.
https://github.com/handwerkerd/mapca/blob/a03727b3346c0f92380b5a1699b7c001d43c6956/mapca/mapca.py#L31

Document typical results for different datasets/acquisition parameters

This stems from ME-ICA/tedana#849 and is related to #42. Basically, I think it would be awesome if we had some idea of how many components are "typical" for the different criteria, depending on a few factors, such as (1) number of volumes, (2) temporal resolution, and (3) spatial resolution. We could then plot and share those results in the MAPCA documentation, much like how the tedana documentation includes distributions of typical ME-EPI parameters in the literature.

`n_features_` from sklearn's PCA is deprecated

n_features_ from sklearn's PCA is deprecated and has to be updated to n_features_in_.

/export/home/mflores/.conda/envs/tedana/lib/python3.9/site-packages/sklearn/utils/deprecation.py:101: FutureWarning: Attribute `n_features_` was deprecated in version 1.2 and will be removed in 1.4. Use `n_features_in_` instead.
  warnings.warn(msg, category=FutureWarning)

[BUG] Dimensions of masked data not considered for fit_transform

While working on the integration test, I saw there is a little bug in the code that probably comes from losing the context we had in tedana.

Basically, we are not considering that the data has been masked when giving shape3d to fit_transform(). We're currently giving the original dimensions when it should be the masked ones.

Edit: The error comes from the fact that apply_mask from nilearn only keeps the values inside the mask, while mapca in tedana keeps everything.

Drop spatial elements from sklearn-style class

Currently we pass around shape_3d (a three-element tuple with the shape of the image) and mask_vec (a 1D array indicating whether a given voxel falls within the mask) throughout the MovingAveragePCA class, and I would love to support 2D data, if that's possible.

Update package installation and minimum Python version

Working on #59 has made me realize we probably want to increase the minimum Python version we support if we want to stay compatible with new versions of scikit-learn.

I would also follow the installation changes we did in tedana and move everything to the project.toml file.

Codecov not working

I thought #11 would fix the Codecov issue but apparently, it didn't.

The CircleCI config looks good to me. I'll probably look into it tomorrow.

Replacing utils._kurtn with scipy.stats.kurtosis

It looks like utils._kurtn(arr) can be replaced with scipy.stats.kurtosis(arr, axis = 0, fisher = True).
Any cons with making this replacement?

Few other notes:

  • The test for _kurtn converts an input with shape (2,3,4) into (3,1). I was expecting the output shape to be (3,4). Is this intentional?
    def test_kurtn():
    """
    Unit test for _kurtn function
    """
    test_data = np.random.rand(2, 3, 4)
    kurt = _kurtn(test_data)
    assert kurt.shape == (3, 1)
  • The docstring for _kurtn says that the kurtosis is calculated along the second dimension, but the code is calculating kurtosis along every dimension except the second dimension.

    mapca/mapca/utils.py

    Lines 317 to 329 in 0d01a2b

    Returns
    -------
    kurt : (1:N) array-like
    The kurtosis of each vector in x along the second dimension. For
    tedana, this will be the kurtosis of each PCA component.
    """
    kurt = np.zeros((data.shape[1], 1))
    for i in range(data.shape[1]):
    data_norm = detrend(data[:, i], type="constant")
    data_norm /= np.std(data_norm)
    kurt[i] = np.mean(data_norm ** 4) - 3

Tests fail due to make command not being installed

As of today, tests fail due to the following error:

Package make is not available, but is referred to by another package.
This may mean that the package is missing, has been obsoleted, or
is only available from another source

E: Package 'make' has no installation candidate

Exited with code exit status 100

Operate on images in subsampling function

If the subsampling function operates on images, it will be easier to use nilearn's functions for masking/unmasking the data. See #32.

Original post:

How do you all feel about using nilearn.image.resample_img() for utils.subsampling()? I am working on #29, and I think it would be easier have a function that downsamples and returns images instead of arrays.

Set up continuous integration

I copied the relevant tests from tedana over but we'll need to set up CI with workflows for the unit tests, coverage, and linting.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.