Giter VIP home page Giter VIP logo

combatharmonization's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

combatharmonization's Issues

train/test matlab implementation

Hi Jean-Philippe,

I'm trying to workout how to use combat in a train/test situation because it's super important to us and we want to use your implementation in our studies.

It seems that this could be pretty easily done by:

  1. for training sample, output the stand_mean, var_pooled, gamma_star, delta_star.
  2. for test sample, get the s_data using the training stand_mean/var_pooled and forward to the adjustment using the gamma_star/delta_star
  3. for n_batch, use the training data n_batch and establish a dummy matrix for the test data with zeros if they do not contain a specific n_batch (i.e., if the test sample does not have any cases from a specific MRI site.

Is this on the right track? Is it possible to review the code if I do this? I talked with Russel Shinohara about it before a few times but haven't implemented it. My name is Dom Dwyer and I'm from LMU in Germany.

Thanks!

Best, Dom

[matlab] need to raise error if there is only 1 batch, otherwise it produces incorrect output

For data that only includes a single batch (eg: you have a multi-site dataset but you mask some data out during a loop and are unexpectedly left with only a single site), this code ends up removing the batchmod column from the design matrix (in combat.m line 33: design(:,bad)=[];, and then later when it extracts design(:,1:n_batch) or design(:,(n_batch+1):nn, it ends up changing the data in unexpected ways.

One solution would be to simply add a check in the beginning after generating n_batch:

    if n_batch < 2
        error('Error. Input must have at least 2 batches to proceed.');
    end

Harmonization for multi-study project where each study has multiple sites

Greetings,

I am hoping to combine the MRI data from 2 independent studies, each of which had 4 unique sites. Thus, there are in total 8 sites where data was collected that I am interested in aggregating.

I had two questions:

  1. In this scenario, because the actual study site (8 in total) is the smallest unit, would the batch id vector be made up of the site information, and not study?

  2. In any post-harmonization statistical analyses, for example linear regression, is it suggested/would there be any value in including study (2 in total) as a covariate? On that note, I assume there would be no need for site (which would be already addressed in the initial harmonization procedure) to be included as a covariate as well?

Sorry for what are naive questions I'm sure. Any response would be greatly appreciated!
Thank you.

Possible to use parallelization to speed up non-parametric estimation in R?

Hi
I am testing out neuroCombat to harmonize fMRI data across two sites. The parametric version works great even on a larger number of input features. But it takes an exceedingly long time to run the non-parametric version with thousands of features.
I was wondering if it would be possible to use parallel processing to speed this up? Particularly the getEbEstimators helper function for "Finding non-parametric adjustments"
Thanks!

aprior formular

Hello,
I have a question regarding the aprior function,
From the equation that is shown in Johnson 2007
image

y=(2s2+m^2)/s2 in the aprior function confused me,
if I understand it correctly, should it be written as
y=(2
s2+m)/s2?

zhipeng

mod.mean in neuroCombatFromTraining()

Hi Fortin,

Thanks for your update, I found the R version of ComBat begins to support using estimation from training data on unseen data.

For the function neuroCombatFromTraining(), I am curious about why you use rowMeans(mod.mean) instead of using something like beta.hat * X_{new_data}. beta_hat is estimated from training data, X_{new_data} is the covariates from new dataset.

It seems quite straightforward to me, is there is any reason why you do not do this?

Thanks!

Is it possible to use ComBat in native space?

Hello,
We know that ComBat requires the images to be registered to a common template space. For some of our analyses, we want to get the value of subjects in native space. Is it possible to use ComBat in native space? Or can we firstly use ComBat for all in a template space, then project the subjects back to native space? Any risk for this procedure (i.e. introduce the bias back again)?

Thanks!

[Python] Dividing by zeros in standardize_accros_features function

(neuroCombat migrated here for maintainability)

From @RocaVincent:

Hi,

Thanks for this code,

Line 170 of neuroCombat.py file, I got a problem when it divides using the following term : np.dot(np.sqrt(var_pooled), np.ones((1, n_sample))). In this term, almost half of the coefficients are zeros so I got a RuntimeWarning and Nan values create problems later. Have you any idea why I got zeros and how I can fix this problem ?

Thanks

Nonparametric implementation of ComBat in Matlab

From email: I would like to use your Combat Harmonization routine to harmonise imaging data coming from different acquisition systems. I'm working in Matlab. Is the non parametric routine also available for Matlab or it is just implemented in R?

Unable to install R package

I keep getting an error code when I try to install this package. I tried in RStudio on mac and on linux, I tried using devtools and remotes, I tried forking the package and installing from that, and I installed BiocParallel just in case that was the issue. The install always fails. Here is the output:

> remotes::install_github("Jfortin1/neuroCombat_Rpackage")
Downloading GitHub repo Jfortin1/neuroCombat_Rpackage@HEAD
Error: Failed to install 'neuroCombat' from GitHub:
  Command failed (69)
In addition: Warning message:
In system(full, intern = TRUE, ignore.stderr = quiet) :
  running command ''/usr/bin/git' ls-remote https://git.bioconductor.org/packages/BiocParallel RELEASE_3_17 2>/dev/null' had status 69

Help would be greatly appreciated.

Detection of single batch in one site so var() calculation is not needed

Hi Jfortin1,

I've been using combat to harmonize multisite MRI data and it has been very helpful so far! However, I recently ran into a bit of trouble after limiting my cohort for further analyses. I kept getting an error that points me to line 80 in the combat.m function:

delta_hat = [];
	for i=1:n_batch
		indices = batches{i};
		delta_hat = [delta_hat; var(s_data(:,indices)')];
	end

After some digging, I realize that for single column entries in a batch, var() would calculate the column variance as opposed to row variance which I assume would happen if there were more column entries in the batch. Would you be able to quickly amend this part to include situations illustrated above?

Thanks so much for the great work! Hope to hear back from you soon!!

Sincerely,
Judy

Error using combat in Matlab, error about vertcat although dimensions of input are right

Hello Experts,

When I run combat using Matlab, I get the following error about dimensions of the the arrays although my batch is a vector of 125, and my data is 9 features by 125 (9x125).

I am using mode = [];
DataHarmonized = combat(DataVol, batch, mod, 0);

Error using vertcat
Dimensions of arrays being concatenated are not consistent.

Error in combat (line 80)
delta_hat = [delta_hat; var(s_data(:,indices)')];

Applying ComBat on single feature

I would like to use ComBat on a single feature. From the formulation and the fact that this is done in the cortical thickness paper referred to in the readme, I would think it should be possible. However, the parametric formulation gives an error, while the non-parametric version results in only NaNs:

data = randn(1,10);
batch = [1 1 1 1 1 2 2 2 2 2];
mod = [1 2 1 2 1 2 1 2 1 2]';
combat(data, batch, mod, 0)

Results in:

[combat] Found 2 batches
[combat] Adjusting for 1 covariate(s) of covariate level(s)
[combat] Standardizing Data across features
[combat] Fitting L/S model and finding priors
[combat] Finding non-parametric adjustments
[combat] Adjusting the Data
ans =

NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN

combat(data, batch, mod, 1)

Results in:

[combat] Found 2 batches
[combat] Adjusting for 1 covariate(s) of covariate level(s)
[combat] Standardizing Data across features
[combat] Fitting L/S model and finding priors
[combat] Finding parametric adjustments
Index exceeds matrix dimensions.

Error in combat (line 93)
temp =
itSol(s_data(:,indices),gamma_hat(i,:),delta_hat(i,:),gamma_bar(i),t2(i),a_prior(i),b_prior(i),
0.001);

The error seems to be in the second iteration of the for loop line 93 is in. The index exceeding is in gamma_bar and t2.

Is this a bug in the code, or a limitation of the method I'm unaware of? Thanks in advance.

FYI: I'm using the MATLAB version in MATLAB 2015b.

Working with voxel-wise data

Hey combat users,
I have a question regarding the use of voxel-wise data for combat (Python). I have implemented neurocombat for ROI-wise data, but I don't know how to input voxel-wise data which would require the shape of features (= participants) x samples (= 3D voxels). Can someone help on how to include 3D voxel information per participant into a numpy array?
Best, Melissa

Neurocombat on 3D volumes

Hello,

I am working on 3D CT images and trying to apply neurocombat on 3D volumes where my features are voxel intensities. I transformed for each image the 3d array of voxel intensities into a long 1d vector by concatenating the slides. This allows me to create a matrix where each column is one image. Then I applied neuroCombat on the matrix, and transformed back each harmonized column into a 3d array with the original dimensions. However, the background intensities interfere with the results and poses problems for the NeuroCombat model (harmonized results are worse than original images). Could you please tell what is the strategy to follow when features are individual voxels? I would appreciate it if you could direct me to a code example where neurocombat is applied on voxel intensities.

Error in solve.default(t(design) %*% design)?

Hello,

I previously got ComBat to work with some test data, but recently ran into an error when working with "real data". The data I'm working with is located here.

My basic R syntax was--

source("scripts/utils.R");
source("scripts/combat.R")

dat<-t(as.matrix(read.csv("Freesurfer_HBN_w_Site_For_Combat.csv")))
batch<-read.csv("HBN_site.csv")
batch<-t(batch$Site)
data.harmonized <- combat(dat=dat, batch=batch)

After I run that command, I get the following error--

untitled

Any thoughts re: what I might be doing incorrectly?
Thanks much!

All the best,
Jamie.

How to deal with .nii files directly?

Hi, sorry for bothering. I am WangRong.
The codes you provided are very useful! :) But I have some questions.
I want to do combat harmonization for .nii files directly, because I want to do Independent Components Analysis, moreover, the parameter needed to group-level analysis is too much.

I tried following codes:
a = load_nifti('I:\combatdemo\411.nii');
b = load_nifti('I:\combatdemo\CDP.nii');
p = {a,b};
n = 2;
dat = repmat(p,n,1);

But results give me lots of error:
var (第 164 行)
y = sum(abs(x - sum(x,dim)./n).^2, dim) ./ denom;
std (第 59 行)
y = sqrt(var(varargin{:}));
combat (第 9 行)
[sds] = std(dat')';
Because sum should be used for a numeric or logical value.

I was wondering if you could tell me how to solve my problem. Thank you very much!

ComBat parametric adjustment returning NANs in Matlab

Dear Experts,

I am running ComBat in Matlab for Mac version 9.7.0.1190202 (R2019b).
When I run the batch correction using parametric adjustments, I get all NANs. This has been a recurring issue for me, even in other datasets.
There are no NANs in my data and, in the past, the non-parametric adjustment has worked.
Any idea what might be causing this?
Thank you,

Leonardo Tozzi

Typo in matlab implementation tutorial

Where the documentation reads:

We create a p x 2 model matrix with age as the first column

Shouldn't it read?

We create a n x 2 model matrix with age as the first column

ComBat in large sample size

Hello, I am trying to run ComBat in Matlab using a relatively large sample size (n= 300) which contains DTI data from two different sites. Although I have been able to run ComBat, it returns a matrix with only NA in it. Any suggestions to clear this issue?
Thanks

How can I detect overcorrection?

Hi guys, I'm using neuroCombat (python version) to assess the impact of different CT machines (and other stuff) on a radiomic dataset.
The only thing that I'm still trying to figure out is how to determine if neuroCombat leads me to overcorrection.
Can someone help?
Sorry, but I usually analyze other types of data and this is my first time working on features from CT images.

Thanks in advance :)

Correct batch effect of test data

Hello, thank you for your work on this wonderful tool.
I'm using this tool for a machine learning related project.
To prevent information leakage when correct the batch effect of them together, would it be possible to correct the batch effect of test data with information generated from train data?

Non-parametric form by Matlab

Thanks for sharing the code!

I am using your Matlab code, and parametric format works perfectly, but non-paramteric format can't work, most values are"NaN" after ComBat.

Can you help me with this problem? Thank you very much.

harmonize DTI files

Hi,

I am using combat to harmonize DTI nifti file, and I am getting the error:
"Error. There are rows with constant values across samples. Remove
these rows and rerun ComBat."

I know there are many 0s in my 3D nifti image data, I am wondering if I how can I using combat in for those data?

Thanks.

Call neuroCombat without ref_batch

Hi,

I want to use the Python version of neuroCombat and I see that in the neuroCombat function there is an optional argument ref_batch. I understand the role of the ref batch thanks to one of the ComBat papers, but I don't understand how it can be optional since all theoretical explanations use a reference site. Can you tell me where I can find explanations about ComBat used without a reference site ?

Thanks in advance

Nonparametric version of ComBat in R

From email: Hello,
I am applying the ComBat software to a multisite harmonization question. I have seen mentions of ComBat being available in both parametric and non-parametric versions. Is the version available on GitHub parametric or non-parametric?

train/test in R

Hi Jfortin1,

Thank you for this ComBat code. Just like someone pointed out earlier, I would like to be able to use ComBat in new test data as well but using the parameter of the training set, I would like to ask how can I do that in R since I am primarily using R for the purpose of radiomics study.

I really appreciate your time and effort.

Error in apply(data, 1, sd) : dim(X) must have a positive length

Hello ComBatHarmonization users,

I was trying to test-drive ComBat Harmonization, but ran into some issues with the apply function. I'm unsure if there's an issue with another R library or my R version (?).

I'm using a mac (10.13.5) with R-studio 1.1.453 and R 3.5.0

source("scripts/utils.R"); source("scripts/combat.R")
p=10000
n=10
batch = c(1,1,1,1,1,2,2,2,2,2) #Batch variable for the scanner id
dat = matrix(runif(p*n), p, n) #Random Data matrix
nrow(dat)
[1] 10000
ncol(dat)
[1] 10

data.harmonized <- combat(dat=dat, batch=batch)
Show Traceback
Rerun with Debug
Error in apply(data, 1, sd) : dim(X) must have a positive length `

I wondered if others had run into this issues?
Any suggestions for troubleshooting this?

Thanks much!
Jamie.

batch needed to be ordered

I had an issue in MATLAB when the batch was unordered. I didn't have data from sites 1 and 2 but had the other 19 sites (from 21 sites in ABCD). But n_batch showed 21 and had some error. I tried using "batchmode(:,~any(batchmod,1))=[];" inline 15 to remove zero columns.

Use of ComBat when I have travelling subjects

Hi,

Can I use this code if I have traveling subject data, means suppose, I have 100 subjects and all of them are acquired in 4 different sites. If yes, may I know how?

Thanks and regards,
Jagruti Patel

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.