jfortin1 / combatharmonization Goto Github PK
View Code? Open in Web Editor NEWHarmonization of multi-site imaging data with ComBat
Harmonization of multi-site imaging data with ComBat
Hi Jean-Philippe,
I'm trying to workout how to use combat in a train/test situation because it's super important to us and we want to use your implementation in our studies.
It seems that this could be pretty easily done by:
Is this on the right track? Is it possible to review the code if I do this? I talked with Russel Shinohara about it before a few times but haven't implemented it. My name is Dom Dwyer and I'm from LMU in Germany.
Thanks!
Best, Dom
For data that only includes a single batch (eg: you have a multi-site dataset but you mask some data out during a loop and are unexpectedly left with only a single site), this code ends up removing the batchmod
column from the design matrix (in combat.m line 33: design(:,bad)=[];
, and then later when it extracts design(:,1:n_batch)
or design(:,(n_batch+1):nn
, it ends up changing the data in unexpected ways.
One solution would be to simply add a check in the beginning after generating n_batch:
if n_batch < 2
error('Error. Input must have at least 2 batches to proceed.');
end
Greetings,
I am hoping to combine the MRI data from 2 independent studies, each of which had 4 unique sites. Thus, there are in total 8 sites where data was collected that I am interested in aggregating.
I had two questions:
In this scenario, because the actual study site (8 in total) is the smallest unit, would the batch id vector be made up of the site information, and not study?
In any post-harmonization statistical analyses, for example linear regression, is it suggested/would there be any value in including study (2 in total) as a covariate? On that note, I assume there would be no need for site (which would be already addressed in the initial harmonization procedure) to be included as a covariate as well?
Sorry for what are naive questions I'm sure. Any response would be greatly appreciated!
Thank you.
@Shotgunosine Migrating neuroCombat here since this does not seem to be supported my previous co-author.
Hi
I am testing out neuroCombat to harmonize fMRI data across two sites. The parametric version works great even on a larger number of input features. But it takes an exceedingly long time to run the non-parametric version with thousands of features.
I was wondering if it would be possible to use parallel processing to speed this up? Particularly the getEbEstimators helper function for "Finding non-parametric adjustments"
Thanks!
Hi Fortin,
Thanks for your update, I found the R version of ComBat begins to support using estimation from training data on unseen data.
For the function neuroCombatFromTraining(), I am curious about why you use rowMeans(mod.mean)
instead of using something like beta.hat * X_{new_data}
. beta_hat
is estimated from training data, X_{new_data}
is the covariates from new dataset.
It seems quite straightforward to me, is there is any reason why you do not do this?
Thanks!
Problematic row: [23 23 23 5 5 5]
where batch = [1 1 1 2 2 2]
Hello,
We know that ComBat requires the images to be registered to a common template space. For some of our analyses, we want to get the value of subjects in native space. Is it possible to use ComBat in native space? Or can we firstly use ComBat for all in a template space, then project the subjects back to native space? Any risk for this procedure (i.e. introduce the bias back again)?
Thanks!
(neuroCombat migrated here for maintainability)
From @RocaVincent:
Hi,
Thanks for this code,
Line 170 of neuroCombat.py file, I got a problem when it divides using the following term : np.dot(np.sqrt(var_pooled), np.ones((1, n_sample))). In this term, almost half of the coefficients are zeros so I got a RuntimeWarning and Nan values create problems later. Have you any idea why I got zeros and how I can fix this problem ?
Thanks
From email: I would like to use your Combat Harmonization routine to harmonise imaging data coming from different acquisition systems. I'm working in Matlab. Is the non parametric routine also available for Matlab or it is just implemented in R?
I keep getting an error code when I try to install this package. I tried in RStudio on mac and on linux, I tried using devtools and remotes, I tried forking the package and installing from that, and I installed BiocParallel just in case that was the issue. The install always fails. Here is the output:
> remotes::install_github("Jfortin1/neuroCombat_Rpackage")
Downloading GitHub repo Jfortin1/neuroCombat_Rpackage@HEAD
Error: Failed to install 'neuroCombat' from GitHub:
Command failed (69)
In addition: Warning message:
In system(full, intern = TRUE, ignore.stderr = quiet) :
running command ''/usr/bin/git' ls-remote https://git.bioconductor.org/packages/BiocParallel RELEASE_3_17 2>/dev/null' had status 69
Help would be greatly appreciated.
Hi
How to save a combat model,then use a pre-trained model in a new dataset
Hi Jfortin1,
I've been using combat to harmonize multisite MRI data and it has been very helpful so far! However, I recently ran into a bit of trouble after limiting my cohort for further analyses. I kept getting an error that points me to line 80 in the combat.m function:
delta_hat = [];
for i=1:n_batch
indices = batches{i};
delta_hat = [delta_hat; var(s_data(:,indices)')];
end
After some digging, I realize that for single column entries in a batch, var() would calculate the column variance as opposed to row variance which I assume would happen if there were more column entries in the batch. Would you be able to quickly amend this part to include situations illustrated above?
Thanks so much for the great work! Hope to hear back from you soon!!
Sincerely,
Judy
Hello Experts,
When I run combat using Matlab, I get the following error about dimensions of the the arrays although my batch is a vector of 125, and my data is 9 features by 125 (9x125).
I am using mode = [];
DataHarmonized = combat(DataVol, batch, mod, 0);
Error using vertcat
Dimensions of arrays being concatenated are not consistent.
Error in combat (line 80)
delta_hat = [delta_hat; var(s_data(:,indices)')];
The same question has been answered by authors. So I closed this question and thanks authors very much!
I would like to use ComBat on a single feature. From the formulation and the fact that this is done in the cortical thickness paper referred to in the readme, I would think it should be possible. However, the parametric formulation gives an error, while the non-parametric version results in only NaNs:
data = randn(1,10);
batch = [1 1 1 1 1 2 2 2 2 2];
mod = [1 2 1 2 1 2 1 2 1 2]';
combat(data, batch, mod, 0)
Results in:
[combat] Found 2 batches
[combat] Adjusting for 1 covariate(s) of covariate level(s)
[combat] Standardizing Data across features
[combat] Fitting L/S model and finding priors
[combat] Finding non-parametric adjustments
[combat] Adjusting the Data
ans =
NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
combat(data, batch, mod, 1)
Results in:
[combat] Found 2 batches
[combat] Adjusting for 1 covariate(s) of covariate level(s)
[combat] Standardizing Data across features
[combat] Fitting L/S model and finding priors
[combat] Finding parametric adjustments
Index exceeds matrix dimensions.
Error in combat (line 93)
temp =
itSol(s_data(:,indices),gamma_hat(i,:),delta_hat(i,:),gamma_bar(i),t2(i),a_prior(i),b_prior(i),
0.001);
The error seems to be in the second iteration of the for loop line 93 is in. The index exceeding is in gamma_bar and t2.
Is this a bug in the code, or a limitation of the method I'm unaware of? Thanks in advance.
FYI: I'm using the MATLAB version in MATLAB 2015b.
Hi, I am wondering whether we could Fit ComBat on training set and apply fitted ComBat on test set?
If code supports, how could I save fitted parameters of ComBat
Thanks for your help!
when I run the code in matlab, I got a error "there are rows with constant values across samples. Remove these rows and rerun ComBat".
could you tell me why this error happen?
Can you share use of the "plot.R" script? I'd like to make density plots, but have struggled to make progress--just starting out in R!
Hey combat users,
I have a question regarding the use of voxel-wise data for combat (Python). I have implemented neurocombat for ROI-wise data, but I don't know how to input voxel-wise data which would require the shape of features (= participants) x samples (= 3D voxels). Can someone help on how to include 3D voxel information per participant into a numpy array?
Best, Melissa
Hello,
I am working on 3D CT images and trying to apply neurocombat on 3D volumes where my features are voxel intensities. I transformed for each image the 3d array of voxel intensities into a long 1d vector by concatenating the slides. This allows me to create a matrix where each column is one image. Then I applied neuroCombat on the matrix, and transformed back each harmonized column into a 3d array with the original dimensions. However, the background intensities interfere with the results and poses problems for the NeuroCombat model (harmonized results are worse than original images). Could you please tell what is the strategy to follow when features are individual voxels? I would appreciate it if you could direct me to a code example where neurocombat is applied on voxel intensities.
Hello,
I previously got ComBat to work with some test data, but recently ran into an error when working with "real data". The data I'm working with is located here.
My basic R syntax was--
source("scripts/utils.R");
source("scripts/combat.R")
dat<-t(as.matrix(read.csv("Freesurfer_HBN_w_Site_For_Combat.csv")))
batch<-read.csv("HBN_site.csv")
batch<-t(batch$Site)
data.harmonized <- combat(dat=dat, batch=batch)
After I run that command, I get the following error--
Any thoughts re: what I might be doing incorrectly?
Thanks much!
All the best,
Jamie.
Hi, sorry for bothering. I am WangRong.
The codes you provided are very useful! :) But I have some questions.
I want to do combat harmonization for .nii files directly, because I want to do Independent Components Analysis, moreover, the parameter needed to group-level analysis is too much.
I tried following codes:
a = load_nifti('I:\combatdemo\411.nii');
b = load_nifti('I:\combatdemo\CDP.nii');
p = {a,b};
n = 2;
dat = repmat(p,n,1);
But results give me lots of error:
var (第 164 行)
y = sum(abs(x - sum(x,dim)./n).^2, dim) ./ denom;
std (第 59 行)
y = sqrt(var(varargin{:}));
combat (第 9 行)
[sds] = std(dat')';
Because sum should be used for a numeric or logical value.
I was wondering if you could tell me how to solve my problem. Thank you very much!
Dear Experts,
I am running ComBat in Matlab for Mac version 9.7.0.1190202 (R2019b).
When I run the batch correction using parametric adjustments, I get all NANs. This has been a recurring issue for me, even in other datasets.
There are no NANs in my data and, in the past, the non-parametric adjustment has worked.
Any idea what might be causing this?
Thank you,
Leonardo Tozzi
Where the documentation reads:
We create a p x 2 model matrix with age as the first column
Shouldn't it read?
We create a n x 2 model matrix with age as the first column
Hello, I am trying to run ComBat in Matlab using a relatively large sample size (n= 300) which contains DTI data from two different sites. Although I have been able to run ComBat, it returns a matrix with only NA in it. Any suggestions to clear this issue?
Thanks
Hi guys, I'm using neuroCombat (python version) to assess the impact of different CT machines (and other stuff) on a radiomic dataset.
The only thing that I'm still trying to figure out is how to determine if neuroCombat leads me to overcorrection.
Can someone help?
Sorry, but I usually analyze other types of data and this is my first time working on features from CT images.
Thanks in advance :)
Hello, thank you for your work on this wonderful tool.
I'm using this tool for a machine learning related project.
To prevent information leakage when correct the batch effect of them together, would it be possible to correct the batch effect of test data with information generated from train data?
remotes::install_github('Jfortin1/ComBatHarmonization/R/neuroCombat')
"ERROR: dependency ‘BiocParallel’ is not available for package ‘neuroCombat’
Thanks for sharing the code!
I am using your Matlab code, and parametric format works perfectly, but non-paramteric format can't work, most values are"NaN" after ComBat.
Can you help me with this problem? Thank you very much.
Hi,
I am using combat to harmonize DTI nifti file, and I am getting the error:
"Error. There are rows with constant values across samples. Remove
these rows and rerun ComBat."
I know there are many 0s in my 3D nifti image data, I am wondering if I how can I using combat in for those data?
Thanks.
Hi,
I want to use the Python version of neuroCombat and I see that in the neuroCombat function there is an optional argument ref_batch. I understand the role of the ref batch thanks to one of the ComBat papers, but I don't understand how it can be optional since all theoretical explanations use a reference site. Can you tell me where I can find explanations about ComBat used without a reference site ?
Thanks in advance
From email: Hello,
I am applying the ComBat software to a multisite harmonization question. I have seen mentions of ComBat being available in both parametric and non-parametric versions. Is the version available on GitHub parametric or non-parametric?
Hi Jfortin1,
Thank you for this ComBat code. Just like someone pointed out earlier, I would like to be able to use ComBat in new test data as well but using the parameter of the training set, I would like to ask how can I do that in R since I am primarily using R for the purpose of radiomics study.
I really appreciate your time and effort.
Hello ComBatHarmonization users,
I was trying to test-drive ComBat Harmonization, but ran into some issues with the apply function. I'm unsure if there's an issue with another R library or my R version (?).
I'm using a mac (10.13.5) with R-studio 1.1.453 and R 3.5.0
source("scripts/utils.R"); source("scripts/combat.R")
p=10000
n=10
batch = c(1,1,1,1,1,2,2,2,2,2) #Batch variable for the scanner id
dat = matrix(runif(p*n), p, n) #Random Data matrix
nrow(dat)
[1] 10000
ncol(dat)
[1] 10data.harmonized <- combat(dat=dat, batch=batch)
Show Traceback
Rerun with Debug
Error in apply(data, 1, sd) : dim(X) must have a positive length `
I wondered if others had run into this issues?
Any suggestions for troubleshooting this?
Thanks much!
Jamie.
I had an issue in MATLAB when the batch was unordered. I didn't have data from sites 1 and 2 but had the other 19 sites (from 21 sites in ABCD). But n_batch showed 21 and had some error. I tried using "batchmode(:,~any(batchmod,1))=[];" inline 15 to remove zero columns.
Hi,
Can I use this code if I have traveling subject data, means suppose, I have 100 subjects and all of them are acquired in 4 different sites. If yes, may I know how?
Thanks and regards,
Jagruti Patel
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.