lmweber / diffcyt Goto Github PK

View Code? Open in Web Editor NEW

18.0 18.0 12.0 1.56 MB

R package for differential discovery analyses in high-dimensional cytometry data

License: MIT License

R 100.00%

diffcyt's People

Contributors

Stargazers

Watchers

Forkers

stanleyn helenalc fionarhuang gjhanchem retogerber terskikh lucaslelann igordot crfoley97 samgg eweine

diffcyt's Issues

Possible bugfix for LMM fit outputting NAs for all comparisons

See https://github.com/SamGG/diffcyt commits and email discussion

diffcyt() contrasts , when comparing two out of 4 conditions in the data

I have the following doubt over diffcyt() contrasts.

I have a daframe object from CATALYST; which I now want to input for differential abundance analysis. My "condition" field of daframe object has 4 conditions; out of which 3 conditions are several kinds of tumors and the last condition is control. I built a contrast (0,0,0,1); for differential analysis between all tumor combined and the control.
However now I want a contrast between a particular tumor type and all the controls.
I am aware each coefficient in the contrast should correspond to one of the condition; but I am not sure how to mention the particular contrast i want.

Since, diffcyt internally includes edgeR and limma steps I could individually for each cell population do the usual limma steps:

**design <- model.matrix(~condition)
fit <- lmFit(eset, design)
fit2 <- eBayes(fit)
de_genes <- topTable(fit2)

edgeR set of commands

exprDesign <- model.matrix(~condition)
propData <- DGEList(count_table,
lib.size = colSums(count_table)) # BE VERY CAREFUL ABOUT YOUR lib.size parameter, this is the number of cells you collected. If you perform a hierarchical clustering (i.e. Citrus), your lib.size is not the sum of your columns.

fit <- estimateDisp(propData, design) # this is the first deviation from limma, estimating dispersion based on your design help accounts for the variability in cell counting
fit <- glmQLFit(fit, design, robust=TRUE) # this is analogous to the lmFit step above...**

But is there a way I could do this with diffcyt?

Thanks for your time.
I will be grateful if you could comment on this.

topTable function: show_logFC argument doesn't work

Hello,

Could you help me with the show_logFC argument in the topTable function? When I set it equal to TRUE, I receive an error:

Error: subscript contains invalid names

I did a little digging, and it doesn't look like there are any arguments in the diffcyt function that outputs a logFC value associated with the object, unless I'm doing something wrong. I recreated the steps seen here, but no matter what I do, there is never a logFC column in the result object, so the matrix subsetting that happens in line 186 or 188 (depending on whether DA or DE) always fails.

For reproducibility, see code snippet below.

rslt <- diffcyt(daf, # daf is a SingleCellExperiment
                formula=fmla, # fmla was created using createFormula
                contrast=ctst, # ctst is a basic contrast matrix
                analysis_type='DA',
                method_DA='diffcyt-DA-GLMM',
                clustering_to_use='meta20',
                verbose=F)
tt   <- topTable(rslt, all=T, show_props=T, format_vals=F, show_logFC=T)

Let me know if there is any further info you need from me. I am running the latest version of this package (and others, like CATALYST) on R 4.0.2. Thanks for any and all help!

J.D.

Diffcyt for two different columns

Hi there,

I am trying to do an analysis for two different columns of data.

Column 1 is Reticulocyte count at T0 and column 2 is Reticulocyte count at T1.

I want to compare patients with "Normal" in T0 to patients with "High" in T1, where the factors for both columns are "Normal", "Low", "High" and "N/A".

In the past, I have just filtered SCE objects to give me patients with desired factors (e.g. "Normal" and "High") and used contrast 0,1 to compare baseline with condition. But I am now interested in comparing across multiple columns.

How can I set up an analysis for this? Is this possible?

Thank you!

Issue running DA analysis

Hi there I am trying to run a DA analysis (method DA-GLMM). However when I run the analysis I receive the error as seen below. Not too sure whats going wrong.

Session Info
**R version 4.0.3 (2020-10-10)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS High Sierra 10.13.6

Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib

locale:
[1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8

attached base packages:
[1] parallel stats4 stats graphics grDevices utils datasets methods base

other attached packages:
[1] lme4_1.1-25 Matrix_1.2-18 diffcyt_1.10.0 cowplot_1.1.0 CATALYST_1.14.0 SingleCellExperiment_1.12.0
[7] SummarizedExperiment_1.20.0 Biobase_2.50.0 GenomicRanges_1.42.0 GenomeInfoDb_1.26.0 IRanges_2.24.0 S4Vectors_0.28.0
[13] BiocGenerics_0.36.0 MatrixGenerics_1.2.0 matrixStats_0.57.0 readxl_1.3.1 forcats_0.5.0 stringr_1.4.0
[19] dplyr_1.0.2 purrr_0.3.4 readr_1.4.0 tidyr_1.1.2 tibble_3.0.4 ggplot2_3.3.2
[25] tidyverse_1.3.0 CytoML_2.2.1 flowWorkspace_4.2.0 flowCore_2.2.0

loaded via a namespace (and not attached):
[1] backports_1.2.0 circlize_0.4.11 drc_3.0-1 plyr_1.8.6 igraph_1.2.6 ConsensusClusterPlus_1.54.0
[7] splines_4.0.3 BiocParallel_1.24.1 scater_1.18.3 TH.data_1.0-10 digest_0.6.27 viridis_0.5.1
[13] fansi_0.4.1 magrittr_1.5 cluster_2.1.0 limma_3.46.0 aws.signature_0.6.0 openxlsx_4.2.3
[19] ComplexHeatmap_2.6.2 modelr_0.1.8 RcppParallel_5.0.2 sandwich_3.0-0 cytolib_2.2.0 jpeg_0.1-8.1
[25] colorspace_2.0-0 rvest_0.3.6 ggrepel_0.8.2 haven_2.3.1 xfun_0.19 crayon_1.3.4
[31] RCurl_1.98-1.2 jsonlite_1.7.1 hexbin_1.28.1 graph_1.68.0 survival_3.2-7 zoo_1.8-8
[37] glue_1.4.2 gtable_0.3.0 nnls_1.4 zlibbioc_1.36.0 XVector_0.30.0 GetoptLong_1.0.4
[43] DelayedArray_0.16.0 ggcyto_1.18.0 BiocSingular_1.6.0 car_3.0-10 Rgraphviz_2.34.0 shape_1.4.5
[49] abind_1.4-5 scales_1.1.1 mvtnorm_1.1-1 edgeR_3.32.0 DBI_1.1.0 Rcpp_1.0.5
[55] plotrix_3.7-8 viridisLite_0.3.0 clue_0.3-57 rsvd_1.0.3 foreign_0.8-80 FlowSOM_1.22.0
[61] tsne_0.1-3 httr_1.4.2 RColorBrewer_1.1-2 ellipsis_0.3.1 farver_2.0.3 pkgconfig_2.0.3
[67] XML_3.99-0.5 scuttle_1.0.0 uwot_0.1.9 dbplyr_2.0.0 locfit_1.5-9.4 labeling_0.4.2
[73] tidyselect_1.1.0 rlang_0.4.8 reshape2_1.4.4 munsell_0.5.0 cellranger_1.1.0 tools_4.0.3
[79] cli_2.1.0 generics_0.1.0 broom_0.7.2 ggridges_0.5.2 aws.s3_0.3.21 yaml_2.2.1
[85] knitr_1.30 fs_1.5.0 zip_2.1.1 nlme_3.1-150 RBGL_1.66.0 sparseMatrixStats_1.2.0
[91] xml2_1.3.2 compiler_4.0.3 rstudioapi_0.13 beeswarm_0.2.3 curl_4.3 png_0.1-7
[97] reprex_0.3.0 statmod_1.4.35 stringi_1.5.3 RSpectra_0.16-0 lattice_0.20-41 nloptr_1.2.2.2
[103] vctrs_0.3.4 pillar_1.4.6 lifecycle_0.2.0 BiocManager_1.30.10 GlobalOptions_0.1.2 RcppAnnoy_0.0.17
[109] BiocNeighbors_1.8.1 irlba_2.3.3 data.table_1.13.2 bitops_1.0-6 R6_2.5.0 latticeExtra_0.6-29
[115] gridExtra_2.3 RProtoBufLib_2.2.0 rio_0.5.16 vipor_0.4.5 codetools_0.2-18 boot_1.3-25
[121] MASS_7.3-53 gtools_3.8.2 assertthat_0.2.1 rjson_0.2.20 withr_2.3.0 multcomp_1.4-15
[127] GenomeInfoDbData_1.2.4 hms_0.5.3 ncdfFlow_2.36.0 grid_4.0.3 beachmat_2.6.1 minqa_1.2.4
[133] DelayedMatrixStats_1.12.0 carData_3.0-4 Cairo_1.5-12.2 Rtsne_0.15 lubridate_1.7.9.2 base64enc_0.1-3
[139] ggbeeswarm_0.6.0**

Error plotting heatmap

Hi Dr. Weber,

Thanks for the awesome package! My data file is too big to use the built-in plotHeatmap function so I tried the plotDiffHeatMap from the Catalyst package, but got this error message "plotDiffHeatmap(out_DA$d_se, out_DA$res)
Error in .check_sce(u$x, TRUE) :
is(x, "SingleCellExperiment") is not TRUE"

The same error occurs when I apply the exact example codes from your diffcyt workflow on the exact sample dataset.

Could you help me with the problem? Is it because the package got updated? Thanks so much!

Regards,

Tammy

design & constrast matrix

Hi,

I'm running diffcyt via catalyst and I have some problems in setting up the design and contrast matrix. I have 144 samples divided into 4 conditions. I set up the code like this:

design <- createDesignMatrix(ei(sce), cols_design = "cond1")
contrast <- createContrast(c(0,0,0,1 ))

since I would like to compare to the last condition/group. However, even after reading the tutorials, I'm not sure I understood it correctly:

What determines the length of the contrast matrix?
Does it make sense to highlight the condition I want to compare?

Thank you very much in advance!!

How does diffcyt calculate logFC?

Dear Lukas,

looking at the logFC values in my toptable results, I am confused as to how diffcyt calculates logFC values. If I use the median values, either in my DA or DS results, to calculate log fold change by hand, I end up with completely different values to those in the toptables, even trying out base 2 and 10 for the logarithm or calculating it on untransformed and arcSinh(x/5)-transformed values.

Regards.
Luka

Error running differential analysis on diffcyt

I am currently following a CyTOF workflow in order to analyse CyTOF datas with CATALYST package and diffcyt package; however I keep getting the same error and I don't understand where it comes from.
Everything works just fine until I try to create the object "ds_res1".

> ds_res1 <- diffcyt(sce, 
+                    formula = ds_formula1, contrast = contrast,
+                    analysis_type = "DS", method_DS = "diffcyt-DS-LMM",
+                    clustering_to_use = "merging1", verbose = FALSE)

I always get this error:
Error in rep(NA, nrow(meds)) : invalid 'times' argument.

I am new to R and I don't know what "times" means in this case.

Thanks for your time.
I will be grateful if you could comment on this.

#N/A in expression markers values

Hello

I get some #N/A values when I run

p <- topTable(ds_res2, show_props = TRUE, format_vals = TRUE, digits = 2)
write.xlsx(p@listData, "ds_res2.xlsx") (File attached)

Is this a problem of my dataset or is it something I am doing wrong? Any idea of how to solve this?

I have another issue opened which I think it could be related to this one.

Thank you for your help.

Regards
Juan

NEWds_res2.xlsx

CONTRAST DOUBT

Hi @lmweber ,
thanks for the pakage and the support.
I have a doubt regarding the contrast. I have two groups (CTRL and TREATED), using contrast (0,1) all p value show no significance but changing contrast (1,0) five population are different. I also find this last result using the contrast (-1,1). This make me confused.

Thanks,
Domenico

NA's after DA-GLMM

Hello,

Thank you for the package. It is well written. I have a paired-design experiment with different times of visits. I am trying to get DA-GLMM to work on an experiment with a design formula. The code I am using is the following:

ei <- metadata(sce3)$experiment_info
da_formula <- createFormula(ei, cols_fixed = c("Clinical_Benefit","Visit", "Dose"), cols_random = c("sample_id", "Donor"))
contrst <- createContrast(c(0, 1, 0, 0, 0, 0, 0, 0, 0, 0))
da_res <- diffcyt(sce3, formula = da_formula, contrast = contrst, analysis_type = "DA", method_DA = "diffcyt-DA-GLMM", clustering_to_use = "meta20", verbose = TRUE)

A sample of the experiment metadata

sample_id Donor Visit Dose Clinical_Benefit Stain_Batch n_cells
58101-002_PBMC_C1_300 58101-002 C1 300 No 200229 5e+05
58101-002_PBMC_C2_300 58101-002 C2 300 No 200229 5e+05
58105-001_PBMC_C1_300 58105-001 C1 300 Yes 200229 5e+05
58105-001_PBMC_C2_300 58105-001 C2 300 Yes 200229 483611
58105-001_PBMC_C4_300 58105-001 C4 300 Yes 200229 451636
58105-003_PBMC_C1_600 58105-003 C1 600 No 200229 5e+05
58105-003_PBMC_C2_600 58105-003 C2 600 No 200229 5e+05
58106-001_PBMC_C1_300 58106-001 C1 300 Yes 200229 5e+05

The levels for each of the fixed effects are:
Clinical_Benefit: No | Yes
Visit: C1 | C2 | C4
Dose: 45 | 135 | 300 | 600 | 1200

The command runs fine but I am just getting NAs in the output.

Thank you for your help.

Best regards,

Yanal

Association with subject-level continuous phenotypes

Hi,

Thanks for the great package!

Does diffcyt allows for settings where estimating associations with a continuous phenotype at the subject level (e.g., BMI of subjects) is of interest, or does it only consider proper differential analysis settings (comparing groups of subjects, e.g., overweight vs normal weight) ? In the latter case, is there any reason why running an LMM for a continuous phenotype wouldn't be appropriate in a flow/mass cytometry context (I'm new to the field!)? Would this be because of statistical power concerns if the number of subjects is low? If so, I guess your functions could be easily adapted to handle this scenario (for applications where the number of subjects is reasonably large)?

Thanks !

customized weight in testDS_LMM

Hi Lukas,
I made some edits to testDS_LMM & testDS_limma. The change is summarized as below.

add droplevels to testDS_LMM & testDS_limma
here
here
This is to remove levels that are actually not in cluster_id.
allow customized weights:
here
You would see that it allows the version to choose TRUE or FALSE. In this way, it would not affect users who are using diffcyt.
fix the bug when using weights = NULL
here
I add a second cbind to fix the error that occurs in combining a data.frame and NULL.

Please let me know whether it works

error with wrapper diffcyt() function

Hey,
I am encountering this error when using the wrapper function:

Error in diffcyt(da_input = "d_flowSet", experiment_info = read.csv("21_8_6_metadata_mouseid_MTJ.csv"), :
unused argument (da_input = "d_flowSet")

Here is my input code: diffcyt(da_input = "d_flowSet", experiment_info = read.csv("21_8_6_metadata_mouseid_MTJ.csv"), marker_info = read.csv("21_8_6_tet2ko_panel_metadata_MTJ"))

createContrast output for comparing multiple levels

Hi @lmweber,

Thanks for all your work. I'm currently following the CATALYST workflow and ran into a question when I was trying to get the createContrast() for the diffcyt-DA-GLMM to work. In my 'condition' parameter I have 3 levels: Healthy, Non-Hospitalized and Hospitalized. As far as I understand createContrast(c(0,1,0)) compares Non-Hospitalized to the Healthy group and createContrast(c(0,-1,1)) compares the Non-Hospitalized to the Hospitalized group. But what output do I get when I use createContrast(c(0,1,1))? I'm trying to get p-values for comparisons between each of the three groups (Healthy vs. (Non-)Hospitalized and Non-hospitalized vs. Hospitalized). The only way I can currently see to make that happen is to run the analysis multiple times, each time with a different contrast matrix. I'm wondering if that is statistically allowed though (I'm not a statistician) and if not, if there is a way to do this properly.

Thanks!

Martijn

diffcyt differential expression model with patient_id as random effect only returns NA p values

Hi,

I have followed the tutorial and so far all differential expression analyses worked if I did not specify a random variable.
This is how the ds_formula2 looks for me.

diffcyt(sce, formula = ds_formula2, contrast = contrast, analysis_type = "DS", method_DS = "diffcyt-DS-LMM", clustering_to_use = "merging_all", verbose = FALSE)

Has this occured before? I am not sure what the problem might be, as the differential expression works fine without specifying a random effect.

Thanks!

low p values when using testDA_GLMM().. lme4 too sensitive to weights?

Hi Lukas,

first off, thanks for your work on diffcyt, it's been enormously helpful for my research.

I've been mostly using the edgeR option for my differential abundance analysis, but was working on an experimental design that would benefit from random effects, so I gave testDA_GLMM() a try. When comparing 50 clusters from 9 individuals between two timepoints each, the results looked a bit confusing, as all the p values were extremely close to 0. The issue looks extremely similar to the one described in [https://github.com//issues/17]. I had a look in the source code and I think I found an issue with lme4: when supplying the weights to glmer, my cell counts are apparently too high (between 10,000 and 90,000 per sample) so that very small differences in abundance are found significant.

I decided to divide all counts in n_cells_smp by 10 (or even 5 or 2) just to see what would happen, and the p values shot up a lot, so glmer(family = "binomial") seems extremely sensitive to large weights. I tried your testDA_GLMM() code and swapped out glmer with MASS::glmmPQL() while keeping the weights unchanged. While this approach wasn't as sensitive as testDA_edgeR(), it didn't look to me as though it supplied many false positives, which testDA_GLMM() certainly did. That's why I thought I'd let you know, because to me this all sounds like an issue with how lme4 deals with weights.

Let me know if it would be helpful and I can put together a reprex using my now public data.

Cheers

Florian

Error when comparing merged clusters

I am using this package to compare clusters following analysis with CATALYST package. It works fine until I subset out certain clusters post-CATALYST cluster merging. Once any clusters are removed diffcyt returns the error: "In order(as.numeric(rownames(n_cells))) : NAs introduced by coercion" and the cluster_ids do not line up with the appropriate p_values. This only occurs when "clustering_to_use" is set to search my merged clusters and not the original flowSOM clusters even though both have been removed. Any solution? I am using diffcyt V1.8.6 and CATALYST V1.12.2.
Thank you.

My diffcyt call:
"da_res1 <- diffcyt(daf,
formula = da_formula1, contrast = contrast,
analysis_type = "DA", method_DA = "diffcyt-DA-GLMM",
clustering_to_use = "merging1", verbose = FALSE) "

Call to subset merged clusters in CATALYST:
"daf <- filterSCE(daf, cluster_id != "Debris", k = "merging1")"

sessionInfo()
R version 4.0.1 (2020-06-06)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Catalina 10.15.4

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] parallel stats4 stats graphics grDevices utils datasets methods base

other attached packages:
[1] diffcyt_1.8.6 CATALYST_1.12.2 SingleCellExperiment_1.10.1 SummarizedExperiment_1.18.1
[5] DelayedArray_0.14.0 matrixStats_0.56.0 Biobase_2.48.0 GenomicRanges_1.40.0
[9] GenomeInfoDb_1.24.2 IRanges_2.22.2 S4Vectors_0.26.1 BiocGenerics_0.34.0
[13] readxl_1.3.1 flowCore_2.0.1 forcats_0.5.0 stringr_1.4.0
[17] dplyr_1.0.0 purrr_0.3.4 readr_1.3.1 tidyr_1.1.0
[21] tibble_3.0.1 ggplot2_3.3.2 tidyverse_1.3.0

loaded via a namespace (and not attached):
[1] backports_1.1.8 circlize_0.4.10 drc_3.0-1 plyr_1.8.6
[5] igraph_1.2.5 ConsensusClusterPlus_1.52.0 splines_4.0.1 BiocParallel_1.22.0
[9] scater_1.16.2 TH.data_1.0-10 digest_0.6.25 viridis_0.5.1
[13] fansi_0.4.1 magrittr_1.5 CytoML_2.0.5 cluster_2.1.0
[17] limma_3.44.3 openxlsx_4.1.5 ComplexHeatmap_2.4.2 modelr_0.1.8
[21] RcppParallel_5.0.2 sandwich_2.5-1 flowWorkspace_4.0.6 cytolib_2.0.3
[25] jpeg_0.1-8.1 colorspace_1.4-1 blob_1.2.1 rvest_0.3.5
[29] ggrepel_0.8.2 haven_2.3.1 xfun_0.15 crayon_1.3.4
[33] RCurl_1.98-1.2 jsonlite_1.7.0 hexbin_1.28.1 graph_1.66.0
[37] lme4_1.1-23 survival_3.2-3 zoo_1.8-8 glue_1.4.1
[41] gtable_0.3.0 nnls_1.4 zlibbioc_1.34.0 XVector_0.28.0
[45] GetoptLong_1.0.0 ggcyto_1.16.0 car_3.0-8 BiocSingular_1.4.0
[49] Rgraphviz_2.32.0 shape_1.4.4 abind_1.4-5 scales_1.1.1
[53] mvtnorm_1.1-1 edgeR_3.30.3 DBI_1.1.0 Rcpp_1.0.4.6
[57] plotrix_3.7-8 viridisLite_0.3.0 clue_0.3-57 foreign_0.8-80
[61] rsvd_1.0.3 FlowSOM_1.20.0 tsne_0.1-3 httr_1.4.1
[65] RColorBrewer_1.1-2 ellipsis_0.3.1 farver_2.0.3 pkgconfig_2.0.3
[69] XML_3.99-0.3 dbplyr_1.4.4 locfit_1.5-9.4 labeling_0.3
[73] tidyselect_1.1.0 rlang_0.4.6 reshape2_1.4.4 munsell_0.5.0
[77] cellranger_1.1.0 tools_4.0.1 cli_2.0.2 generics_0.0.2
[81] broom_0.5.6 ggridges_0.5.2 yaml_2.2.1 fs_1.4.2
[85] zip_2.0.4 RBGL_1.64.0 nlme_3.1-148 xml2_1.3.2
[89] compiler_4.0.1 rstudioapi_0.11 beeswarm_0.2.3 curl_4.3
[93] png_0.1-7 reprex_0.3.0 statmod_1.4.34 stringi_1.4.6
[97] lattice_0.20-41 Matrix_1.2-18 nloptr_1.2.2.1 vctrs_0.3.1
[101] pillar_1.4.4 lifecycle_0.2.0 GlobalOptions_0.1.2 BiocNeighbors_1.6.0
[105] data.table_1.12.8 cowplot_1.0.0 bitops_1.0-6 irlba_2.3.3
[109] R6_2.4.1 latticeExtra_0.6-29 gridExtra_2.3 RProtoBufLib_2.0.0
[113] rio_0.5.16 vipor_0.4.5 codetools_0.2-16 boot_1.3-25
[117] MASS_7.3-51.6 gtools_3.8.2 assertthat_0.2.1 rjson_0.2.20
[121] withr_2.2.0 multcomp_1.4-13 GenomeInfoDbData_1.2.3 hms_0.5.3
[125] ncdfFlow_2.34.0 grid_4.0.1 minqa_1.2.4 DelayedMatrixStats_1.10.0
[129] carData_3.0-4 Rtsne_0.15 lubridate_1.7.9 base64enc_0.1-3
[133] tinytex_0.24 ggbeeswarm_0.6.0

testDA_edgeR returns NA for some clusters

Hi ,
I am using the testDA_edgeR function to do differential abundance analysis. For a cluster where count is close to zero in one group, I am getting NAs as a result.

 cluster_id     logFC    logCPM        LR     p_val     p_adj

 <factor> <numeric> <numeric> <numeric> <numeric> <numeric>

  52         52        NA        NA        NA        NA        NA

The count for this cluster are as follows:
`

    SPF_BalbC_1   SPF_BalbC_2   SPF_BalbC_3 SPF_dblGata_5 SPF_dblGata_6 SPF_dblGata_7

      391           578           252             1             1             1

`
This cluster is clearly differentially abundant. Can you please explain why this is happening and if there is a way to fix this.
Thanks!
Hena

Create design & contrast matrix for differential state analysis

I have a mass cytometry dataset of samples collected from patients with AML (n=30). There is one sample from each patient, and the patients can be divided into two groups based on initial response to experimental therapy (21 responders vs 9 non-responders). I have been using the CATALYST pipeline and the diffcyt() function to do differential state analysis. Using this I find lot if very significant differences between the two groups. But when I use a different script that just do a lot of two sided T-tests (and then FDR adjustment), I find nothing significant.

When I use diffcyt() and my t-test script on a different and paired dataset, they find similar significant differences. So, I surmise that my design and/or contrast is not correctly set up? Can anyone help me? I hope my R output and code snippet below is informative! 😊

design <- createDesign(ei(sce), cols_design = “condition”)
(Intercept) conditionRD
1 1 1
2 1 1
3 1 0
4 1 1
5 1 0
6 1 0
7 1 0
8 1 0
9 1 0
10 1 0
11 1 0
12 1 0
13 1 0
14 1 0
15 1 0
16 1 1
17 1 1
18 1 1
19 1 0
20 1 1
21 1 0
22 1 1
23 1 1
24 1 0
25 1 0
26 1 0
27 1 0
28 1 0
29 1 0
30 1 0
attr(,"assign")
[1] 0 1
attr(,"contrasts")
attr(,"contrasts")$condition
[1] "contr.treatment"

contrast <- createContrast(c(0,1))
[,1]
[1,] 0
[2,] 1

I am using the diffcyt() (https://rdrr.io/bioc/diffcyt/man/diffcyt.html) function from the diffcyt pipeline, with mostly default conditions. Ive pasted the code im using below:

design <- createDesignMatrix(ei(sce), cols_design = cond)
contrast <- createContrast(c(0, 1))

res_DS <- diffcyt(sce,
clustering_to_use = clust,
analysis_type = "DS",
method_DS = "diffcyt-DS-limma",
design = design,
contrast = contrast,
verbose = FALSE)
tbl_DS <- rowData(res_DS$res)

plotDiffHeatmap(sce, tbl_DS, all = TRUE, top_n = 30, fdr = 0.1,
sort_by = "padj", col_anno = cond, normalize =TRUE)

I assume that “diffcyt-DS-limma” uses limma::lmFit. It looks to me that diffcyt() (https://rdrr.io/bioc/diffcyt/src/R/diffcyt_wrapper.R) uses testDS_limma.R with my input as:

res <- testDS_limma(d_counts, d_medians, design, contrast, block_id, trend, weights, markers_to_test, min_cells, min_samples, plot, path)

Looking further into testDS_limma() it seems like it only usable for paired data:

estimate correlation between paired samples

(note: paired designs only; >2 measures per sample not allowed)

But somehow runs on my data without error? I must be overlooking something…

calcCounts is Set Up Incorrectly

I was trying to get diffcyt to show me the number of cells in each metacluster for each file in the SCE.

For example, I have 37 samples of granulocytes, which I have clustered and manually assigned metacluster IDs to and made the "metacluster_id_Granulocytes" column in colData(SCE_Granulocytes). I wanted to know how many of each: neutrophils, eosinophils, basophils, which are the aforementioned metaclusters, are present in each of the 37 samples, so I tried using calcCounts(SCE_Granulocytes). The desired result was 3 numbers for each of the 37 samples - the number of neutrophils, the number of eosinophils, and the number of basophils in each of the 37 samples.

However calcCounts didn't work, giving an error that said that the "Data object does not contain cluster labels.", by which it meant that sample_id was missing from the rowData, which is strange, since rowData shows the channels and not the factors that define the data, which "sample_id" is one of, and belongs to colData. So I looked at the calcCounts code and made some changes:

I replaced every rowData with colData, except the rowData in the next-to-last line - that would be the rowData in the rowData = row_data, colData = col_data piece of code.
I replaced every cluster_id with metacluster_id_Granulocytes.
I replaced every instance of d_se with my own SCE - SCE_Granulocytes.
I had to manually load some packages:
- tidyr - because the code could not find function "complete".
- dplyr - because the code could not find function "tally".
- reshape2 - because the code could not find function "acast".

The calcCounts code BEFORE:

function (d_se) 
{
    if (!is(d_se, "SummarizedExperiment")) {
        stop("Data object must be a 'SummarizedExperiment'")
    }
    if (!("cluster_id" %in% (colnames(rowData(d_se))))) {
        stop("Data object does not contain cluster labels. Run 'generateClusters' to generate cluster labels.")
    }
    rowdata_df <- as.data.frame(rowData(d_se))
    n_cells <- rowdata_df %>% group_by(cluster_id, sample_id, 
        .drop = FALSE) %>% tally %>% complete(sample_id)
    n_cells <- acast(n_cells, cluster_id ~ sample_id, value.var = "n", 
        fill = 0)
    if (nrow(n_cells) < nlevels(rowData(d_se)$cluster_id)) {
        ix_missing <- which(!(levels(rowData(d_se)$cluster_id) %in% 
            rownames(n_cells)))
        n_cells_tmp <- matrix(0, nrow = length(ix_missing), 
            ncol = ncol(n_cells))
        rownames(n_cells_tmp) <- ix_missing
        n_cells <- rbind(n_cells, n_cells_tmp)
        n_cells <- n_cells[order(as.numeric(rownames(n_cells))), 
            , drop = FALSE]
    }
    n_cells_total <- rowSums(n_cells)
    row_data <- data.frame(cluster_id = factor(rownames(n_cells), 
        levels = levels(rowData(d_se)$cluster_id)), n_cells = n_cells_total, 
        stringsAsFactors = FALSE)
    col_data <- metadata(d_se)$experiment_info
    n_cells <- n_cells[, match(col_data$sample_id, colnames(n_cells)), 
        drop = FALSE]
    stopifnot(all(col_data$sample_id == colnames(n_cells)))
    d_counts <- SummarizedExperiment(assays = list(counts = n_cells), 
        rowData = row_data, colData = col_data)
    d_counts
}

And the calcCounts code AFTER:

function (SCE_Granulocytes) 
{
    if (!is(SCE_Granulocytes, "SummarizedExperiment")) {
        stop("Data object must be a 'SummarizedExperiment'")
    }
    if (!("metacluster_id_Granulocytes" %in% (colnames(colData(SCE_Granulocytes))))) {
        stop("Data object does not contain cluster labels. Run 'generateClusters' to generate cluster labels.")
    }
    coldata_df <- as.data.frame(colData(SCE_Granulocytes))
    n_cells <- coldata_df %>% group_by(metacluster_id_Granulocytes, sample_id, 
                                       .drop = FALSE) %>% tally %>% complete(sample_id)
    n_cells <- acast(n_cells, metacluster_id_Granulocytes ~ sample_id, value.var = "n", 
                     fill = 0)
    if (nrow(n_cells) < nlevels(colData(SCE_Granulocytes)$metacluster_id_Granulocytes)) {
        ix_missing <- which(!(levels(colData(SCE_Granulocytes)$metacluster_id_Granulocytes) %in% 
                                  rownames(n_cells)))
        n_cells_tmp <- matrix(0, nrow = length(ix_missing), 
                              ncol = ncol(n_cells))
        rownames(n_cells_tmp) <- ix_missing
        n_cells <- rbind(n_cells, n_cells_tmp)
        n_cells <- n_cells[order(as.numeric(rownames(n_cells))), 
                           , drop = FALSE]
    }
    n_cells_total <- rowSums(n_cells)
    row_data <- data.frame(metacluster_id_Granulocytes = factor(rownames(n_cells), 
                                                                        levels = levels(colData(SCE_Granulocytes)$metacluster_id_Granulocytes)), n_cells = n_cells_total, 
                           stringsAsFactors = FALSE)
    col_data <- metadata(SCE_Granulocytes)$experiment_info
    n_cells <- n_cells[, match(col_data$sample_id, colnames(n_cells)), 
                       drop = FALSE]
    stopifnot(all(col_data$sample_id == colnames(n_cells)))
    d_counts <- SummarizedExperiment(assays = list(counts = n_cells), 
                                     rowData = row_data, colData = col_data)
    d_counts
}

And it worked. I executed the block of code (I didn't run calcCounts) and it returned the n_cells variable, which contained exactly the information I needed - the number of cells in each sample belonging to each metacluster. It also returned the following objects:

coldata_df - colData(SCE_Granulocytes) in object form.
col_data - The metadata of SCE_Granulocytes, but with n_cells for each sample as the last column.
row_data - How many cells are in each metacluster total across all samples.
d_counts - Seems to be an SCE version of col_data.

I checked that the numbers of cells were correct by exporting the fcs files made from splitting the SCE_Granulocytes by sample_id and manually gating them to check, and all seems correct.

Could you please fix the calcCounts function to work like this? I also need expression median data, which I will get to checking diffcyt functionality for tomorrow.

I think should be made possible to input something other than the default cluster_id when calling the calcCounts function. In this case, it was metacluster_id_Granulocytes.

Did i have to manually load the three packages mentioned above? Just a curiosity for improving my coding efficiency.

EDIT: To have a "blank slate" SCE object that contained none of my editing and manipulation, I also tried generating an SCE from scratch and running calcCounts on that. The same issue was present and the same fix resolved it.

Kind regards.
Luka Tandaric

Automatically add group IDs to rowData of 'd_se'

Currently adding group IDs to rowData of 'd_se' manually prior to plotting. Find a way to do this automatically, e.g. within the differential testing function when 'group_IDs' argument is first introduced. Use 'group' for column name in rowData.

DA for multiple treatments

Hi!

thank you for this package and for the useful tutorials!

I am working on CyTOF generated data, where I have four groups of treatments, and I want to make three pairwise comparisons
treatmentA to treatmentAB
treatmentC to treatmentCB
treatmentCB to treatmentAB

How do I create a design matrix and contrast?

this is what I did, but I am not sure it was the correct thing to do

my results don't show the different comparisons and without any statistical value

thanks!

condition = ei$condition
condition

[1] treatmentA treatmentA treatmentA treatmentA treatmentA treatmentA treatmentAB treatmentAB treatmentAB treatmentC treatmentC treatmentC treatmentC treatmentC treatmentC treatmentC treatmentCB treatmentCB treatmentCB treatmentCB
[21] treatmentCB
Levels: treatmentA treatmentAB treatmentC treatmentCB

design <- model.matrix(~0+condition)

contrasts <- makeContrasts(
  treatmentAB-treatmentA,treatmentCB-treatmentC , treatmentCB -treatmentAB ,
  levels=colnames(design))

da_res1 <- diffcyt(sce, 
                   design = design, contrast = contrasts,
                   analysis_type = "DA", method_DA = "diffcyt-DA-GLMM",
                   clustering_to_use = "merging")


rowData(da_res1$res)

DataFrame with 11 rows and 3 columns
cluster_id p_val p_adj

B cells B cells NA NA
CD4 T cells CD4 T cells NA NA
CD45- CD45- NA NA
CD45- CD11b+ CD11c+ CD45- CD11b+ CD11c+ NA NA
CD45+ CD45+ NA NA
CD8 T cells CD8 T cells NA NA
DCs DCs NA NA
Granulocytes Granulocytes NA NA
Macrophages Macrophages NA NA
Monocytes Monocytes NA NA
TCRgd T cells TCRgd T cells NA NA

createContrast help

Hi there,

I am an R and CyTOF beginner looking at some AML and MDS patient data.

My conditions are 5 different timepoints: T1, T2, T3, T4 and T5

I have 8 patients who have received DLI and 6 patients who have not received DLI, but all are AML/MDS patients. The DLI and non-DLI patients have not been specified, as I thought I could just subset them downstream.

If I want to compare T1 to T2, T2 to T3, T3 to T4 etc. for each patient, how would I set up the contrast to do this? How can I subset the patients, instead of looking at every single patient for every single time point?

I would have just separated on DLI and no DLI instead of timepoint, but the timepoint is important as it specifies pre-transplant and post-transplant samples for the patients. It would have been too simple to just state DLI and no DLI.

Thank you kindly!

Shan

markers_to_test = what?

Can you please provide a list of the options that the argument markers_to_test can take?
I would like to do differential states analysis using the type markers.

markers_to_test = id_type_markers is not working.
markers_to_test = type_markers(sce) is also not working.
Using specifics marker names such as markers_to_test = "CD11b" also did not work.

I cannot find any examples in the package vignette. Any help would be appreciated. Thank you.

PS. This comment has been edited.

Manual gating input for DA clusters

Hi All,

I have Cytof data, instead of doing unsupervised clustering, I found that manual gating made more sense and allowed me to find more subsets. I know have the cell subsets for each individual in an excel sheet, row(Individual) col(Cell subsets). is it possible that I use this as an input to the DA test?

Thank you a lot,

Compare only 2 out of many levels

Hello,
I have a feature which contains 8 levels:
"aTNF_LPMC" "aINT_LPMC" "aIL_LPMC" "aJAK_LPMC"
"5ASA_LPMC" "Steroids_LPMC" "6MP_LPMC" "HC_LPMC".

Using contrast <- createContrast(c(0, 0, 0, 0, 0, 0, 0, 1)) it would compare my healthy controls to all other groups combined. How can I compare level number 8 to each of the other levels seperately?

Thanks so much for your help!

diffCyt and edgeR return different results

Hi,
thank you for this helpful package.
I raised an issue in the cytofWorkflow Github repo, because I am getting different results with diffcyt and edgeR. While I get reasonable results with edgeR, diffcyt returns NA p-values and zero abundance for some clusters across all samples (which is not true).

For now, I will just use edgeR directly. However, if you have time to look into this issue, it would be nice to know where the difference is coming from or whether the way I input my data to diffcyt is wrong.

Thank you!

Integration w/ FCS Express

Hello,

I am new to flow cytometry analysis and supporting a group remotely. The team prefers to use FCS Express for HD clustering and has little command line experience. I understand FCS Express utilizes FlowSOM, as does diffcyt, and also allows for integration of R scripts. I was wondering if you could advise on how to perform DA/DS within this FCS express environment (Windows) provided metadata/design-contrast matrices would be imported by diffcyt R script? I suppose the alternative would be to have the group export the annotated fcs files (assigned clusters) and execute the script (non-wrapper) in a Command Prompt.Thanks!

-Todd

error on diffcyt

I encounter this error occurs repeatedly. And I have not clue on the limited error message.
Can you advice some inputs?

res_DS_2vs1 = diffcyt(CD154_sce_SCIT, clustering_to_use = "meta20",

                   analysis_type = "DS", method_DS = "diffcyt-DS-limma",

                   design = design, contrast = contrast2vs1)

using SingleCellExperiment object from CATALYST as input
using cluster IDs from clustering stored in column 'meta20' of 'cluster_codes' data frame in 'metadata' of SingleCellExperiment object from CATALYST
calculating features...
Error in calcMedians(d_se) :
all(sapply(medians, function(m) { .... is not TRUE

Is clustering consistent over comparison of multiple levels?

diffcyt-DA-edgeR Design and Contrast

Hi, I am running any analysis with 59 samples. I have one outcome variable so for my design, I created it as follows:

design <- createDesignMatrix(experiment_info,cols_design = c("Outcome"))

This outcome is set as a factor of 0s and 1s.

For the contrast,

contrast <- createContrast(c(0, 1, rep(0, 58)))

However when I run the check I get an error that the number of columns in the design and rows in the contrast aren't the same. Since my design only contrasts contrast, it will be two columns (intercept and outcome).

This is the final command, I am running (59 samples 64 markers where 32 of these are the ones I am interested in),

out_DA <- diffcyt(
d_input = d_flowSet,
experiment_info = experiment_info,
marker_info = marker_info,
design = design,
contrast = contrast,
analysis_type = "DA",
seed_clustering = 123
)

This is the error from diffcyt:

preparing data...
transforming data...
generating clusters...
FlowSOM clustering completed in 258.2 seconds
calculating features...
calculating DA tests using method 'diffcyt-DA-edgeR'...
Error in glmLRT(fit, contrast = contrast) :
contrast vector of wrong length, should be equal to number of coefficients in the linear model.

Can you please guide me on where I am going wrong?

Many thanks,
Cherise.

Error in SummarizedExperiment: the rownames and colnames of the supplied assay(s) must be NULL or identical to those of the SummarizedExperiment object (or derivative) to construct

I was running an analysis based on the CATALYST workflow ran and got an error with diffcyt():

Error in SummarizedExperiment(assays = list(counts = n_cells), rowData = row_data,  : 
  the rownames and colnames of the supplied assay(s) must be NULL or identical to those of
  the SummarizedExperiment object (or derivative) to construct

It turns out the problem was due to my metadata data frame having row names. Having row names makes me a little bit more comfortable in case something gets out of order somewhere. I thought this was reasonable since everything else was working until I got to diffcyt().

To summarize, here is the default workflow (from the CATALYST vignette) that works:

library(CATALYST)
library(diffcyt)

sce <- prepData(PBMC_fs, PBMC_panel, PBMC_md)
sce <- cluster(sce, features = "type",  xdim = 10, ydim = 10, maxK = 20, verbose = FALSE, seed = 1)
design <- createDesignMatrix(ei(sce), cols_design = "condition")
contrast <- createContrast(c(0, 1))
res_DA <- diffcyt(sce, clustering_to_use = "meta10", analysis_type = "DA", method_DA = "diffcyt-DA-edgeR", design = design, contrast = contrast, verbose = FALSE)

This is the problematic version (where PBMC_md has row names):

rownames(PBMC_md) <- PBMC_md$file_name
sce <- prepData(PBMC_fs, PBMC_panel, PBMC_md)
sce <- cluster(sce, features = "type",  xdim = 10, ydim = 10, maxK = 20, verbose = FALSE, seed = 1)
design <- createDesignMatrix(ei(sce), cols_design = "condition")
contrast <- createContrast(c(0, 1))
res_DA <- diffcyt(sce, clustering_to_use = "meta10", analysis_type = "DA", method_DA = "diffcyt-DA-edgeR", design = design, contrast = contrast, verbose = FALSE)

Model is nearly unidentifiable: very large eigenvalue

Hi Lukas

I'm running the Cytof Workflow pipeline. I get a very long message (attached below) after running

ds_res2 <- diffcyt(sce,
formula = ds_formula2, contrast = contrast,
analysis_type = "DS", method_DS = "diffcyt-DS-LMM",
clustering_to_use = "merging1b", verbose = FALSE)

I have read in other forums that I could solve this by changing some parameters. My two questions are:

How can I change in diffcyt() any of the recommended parameters if needed?
Do I need to change it or does this message mean that my data doesn't have any significant result and therefore I shouldn't change anything?

My design, contrast and formula are:

md
file_name sample_id condition patient_id batch intervention week_of_life trial_arm
1 P_31B_1.fcs P_31B Neonate P_31 B_1 post week_6 Intervention
2 P_32A_1.fcs P_32A Neonate P_32 B_1 pre week_3 Control
3 P_32B_1.fcs P_32B Neonate P_32 B_1 post week_7 Control
4 P_34A_1.fcs P_34A Neonate P_34 B_1 pre week_3 Control
5 P_35A_1.fcs P_35A Neonate P_35 B_1 pre week_3 Intervention
6 P_36A_2.fcs P_36A Neonate P_36 B_2 pre week_2 Control

ei <- metadata(sce)$experiment_info
(da_formula1 <- createFormula(ei,
cols_fixed = "trial_arm",
cols_random = "sample_id"))
(da_formula2 <- createFormula(ei,
cols_fixed = "trial_arm",
cols_random = c("sample_id", "patient_id")))

contrast <- createContrast(c(0, 1))

da_res1 <- diffcyt(sce,
formula = da_formula1, contrast = contrast,
analysis_type = "DA", method_DA = "diffcyt-DA-GLMM",
clustering_to_use = "merging1b", verbose = FALSE)
da_res2 <- diffcyt(sce,
formula = da_formula2, contrast = contrast,
analysis_type = "DA", method_DA = "diffcyt-DA-GLMM",
clustering_to_use = "merging1b", verbose = FALSE)

table(rowData(da_res1$res)$p_adj < FDR_cutoff)
table(rowData(da_res2$res)$p_adj < FDR_cutoff)
FALSE TRUE
15 2

FALSE TRUE
16 1

ds_formula1 <- createFormula(ei, cols_fixed = "trial_arm")
ds_formula2 <- createFormula(ei,
cols_fixed = "trial_arm", cols_random = "patient_id")

ds_res1 <- diffcyt(sce,
formula = ds_formula1, contrast = contrast,
analysis_type = "DS", method_DS = "diffcyt-DS-LMM",
clustering_to_use = "merging1b", verbose = FALSE)
table(rowData(ds_res1$res)$p_adj < FDR_cutoff)
ds_res2 <- diffcyt(sce,
formula = ds_formula2, contrast = contrast,
analysis_type = "DS", method_DS = "diffcyt-DS-LMM",
clustering_to_use = "merging1b", verbose = FALSE)
table(rowData(ds_res2$res)$p_adj < FDR_cutoff)

Thank you for your help.

Regards
Juan

Model failed to converge with max|grad| = 0.015349 (tol = 0.002, component 1)Model is nearly unidentifiable: very large eigenvalue

Rescale variables?Model is nearly unidentifiable: very large eigenvalue
Rescale variables?Model failed to converge with max|grad| = 0.00233629 (tol = 0.002, component 1)boundary (singular) fit: see ?isSingular
Model is nearly unidentifiable: very large eigenvalue
Rescale variables?Model failed to converge with max|grad| = 0.00555007 (tol = 0.002, component 1)Model failed to converge with max|grad| = 0.00742892 (tol = 0.002, component 1)Model failed to converge with max|grad| = 0.00240135 (tol = 0.002, component 1)Model is nearly unidentifiable: very large eigenvalue
Rescale variables?Model failed to converge with max|grad| = 0.0223681 (tol = 0.002, component 1)Model failed to converge with max|grad| = 0.0051913 (tol = 0.002, component 1)Model failed to converge with max|grad| = 0.00238886 (tol = 0.002, component 1)boundary (singular) fit: see ?isSingular
Model failed to converge with max|grad| = 0.0199976 (tol = 0.002, component 1)Model failed to converge with max|grad| = 0.0389122 (tol = 0.002, component 1)Model failed to converge with max|grad| = 0.0047067 (tol = 0.002, component 1)Model failed to converge with max|grad| = 0.00297547 (tol = 0.002, component 1)Model is nearly unidentifiable: very large eigenvalue
Rescale variables?Model is nearly unidentifiable: very large eigenvalue
Rescale variables?Model failed to converge with max|grad| = 0.0152538 (tol = 0.002, component 1)Model failed to converge with max|grad| = 0.00844551 (tol = 0.002, component 1)Model failed to converge with max|grad| = 0.0313113 (tol = 0.002, component 1)Model failed to converge with max|grad| = 0.022935 (tol = 0.002, component 1)Model is nearly unidentifiable: very large eigenvalue
Rescale variables?Model failed to converge with max|grad| = 0.0240415 (tol = 0.002, component 1)Model is nearly unidentifiable: very large eigenvalue
Rescale variables?Model failed to converge with max|grad| = 0.00217955 (tol = 0.002, component 1)Model is nearly unidentifiable: very large eigenvalue
Rescale variables?Model failed to converge with max|grad| = 0.0241478 (tol = 0.002, component 1)Model failed to converge with max|grad| = 0.00663443 (tol = 0.002, component 1)Model failed to converge with max|grad| = 0.00649518 (tol = 0.002, component 1)boundary (singular) fit: see ?isSingular
Model failed to converge with max|grad| = 0.055191 (tol = 0.002, component 1)Model is nearly unidentifiable: very large eigenvalue
Rescale variables?Model failed to converge with max|grad| = 0.104247 (tol = 0.002, component 1)Model is nearly unidentifiable: very large eigenvalue
Rescale variables?Model failed to converge with max|grad| = 0.0157847 (tol = 0.002, component 1)Model is nearly unidentifiable: very large eigenvalue
Rescale variables?Model is nearly unidentifiable: very large eigenvalue
Rescale variables?boundary (singular) fit: see ?isSingular
Model failed to converge with max|grad| = 0.0102596 (tol = 0.002, component 1)Model is nearly unidentifiable: very large eigenvalue
Rescale variables?boundary (singular) fit: see ?isSingular
[.....I have cut most of it....]

Rescale variables?Model failed to converge with max|grad| = 0.00365807 (tol = 0.002, component 1)Model is nearly unidentifiable: very large eigenvalue
Rescale variables?Model failed to converge with max|grad| = 0.00612798 (tol = 0.002, component 1)Model failed to converge with max|grad| = 0.00824278 (tol = 0.002, component 1)Model failed to converge with max|grad| = 0.00280907 (tol = 0.002, component 1)boundary (singular) fit: see ?isSingular
Model failed to converge with max|grad| = 0.00516716 (tol = 0.002, component 1)Model failed to converge with max|grad| = 0.00521023 (tol = 0.002, component 1)boundary (singular) fit: see ?isSingular
Model is nearly unidentifiable: very large eigenvalue
Rescale variables?Model failed to converge with max|grad| = 0.00921459 (tol = 0.002, component 1)boundary (singular) fit: see ?isSingular
boundary (singular) fit: see ?isSingular
boundary (singular) fit: see ?isSingular
boundary (singular) fit: see ?isSingular
Model failed to converge with max|grad| = 0.00678294 (tol = 0.002, component 1)Model is nearly unidentifiable: very large eigenvalue
Rescale variables?boundary (singular) fit: see ?isSingular
boundary (singular) fit: see ?isSingular
boundary (singular) fit: see ?isSingular
boundary (singular) fit: see ?isSingular
boundary (singular) fit: see ?isSingular
boundary (singular) fit: see ?isSingular

issue with setting up design/contrast matrices for DA/DS testing with catalyst SCE

Hi,
I've been working through the analysis pipelines for catalyst and diffcyt, and had some questions with setting up the design and contrast matrices. For my particular code, I am feeding an sce from catalyst in to the diffcyt wrapper and am using the code below to generate my design and contrast. Here, patient_id is equal to the genotype of my mice (wt vs KO).

design <- createDesignMatrix(ei(sce), cols_design = c("patient_id"))
contrast <- createContrast(c(1,0))```
The pic below shows the design matrix this produced:
![image](https://user-images.githubusercontent.com/77988339/129968898-e6497e4a-a740-47a2-b103-c2ebe368d540.png).
The first three 1's correspond to the rows in my metadata for the mice where the 1s are for WT mice and 0s are for KO mice. Then, would I not set up my contrast to be c(1,0) since that is how the numbers are ordered in my design matrix? Such that this would compare wt vs KO? or would I order the contrast as c(0,1)? 

Thanks for your help!
Matt Jenkins

How to use diffcyt to identify differentially-expressed proteins/biomarkers of a cluster in flow cytometry data?

Hi there,

The tutorial (https://www.bioconductor.org/packages/release/bioc/vignettes/diffcyt/inst/doc/diffcyt_workflow.html) is very useful, but it only shows how to perform the analysis between conditions.

I had performed clustering using CATALYST.

Would you mind advising me how to adjust the codes, especially diffcyt::createDesignMatrix and contrast to identify differentially-expressed proteins (biomarkers) of a cluster compared to every other cluster, done for each cluster as in DEGs for scRNAseq analyses? Or is it not possible?

Thank you for your help!

Diffcyt on SCE - Error in glmFit.default()

Hello, I have been using CATALYST for my SCE experiment, which are discussed in these issues (please see for experimental details).
HelenaLC/CATALYST#119
HelenaLC/CATALYST#121

Dimensionality reduction with 4000 cells per sample. NOTE: I have a lot more than this. However, my decently powerful Mac works overnight if I use all cells.

set.seed(123)
sce <- runDR(sce, dr = "UMAP",cells = 4000, features = "type")

Checking reduced dimensions

reducedDimNames(sce)

 ##  [1] "UMAP"

head(reducedDim(sce, "UMAP"))

 ##       [,1] [,2]
##  [1,]   NA   NA
##  [2,]   NA   NA
##  [3,]   NA   NA

tail(reducedDim(sce, "UMAP"))

##                [,1]      [,2]
##  [578650,] 1.430724 0.2447781
##  [578651,]       NA        NA
##  [578652,]       NA        NA

## Differential testing with diffcyt

### create design & contrast matrix
```r
design <- createDesignMatrix(ei(sce), cols_design = "condition")
contrast <- createContrast(c(0, 1))

test for

- differential abundance (DA) of clusters

- differential states (DS) within clusters

res_DA <- diffcyt(sce, clustering_to_use = "meta10",
    analysis_type = "DA", method_DA = "diffcyt-DA-edgeR",
    design = design, contrast = contrast, verbose = FALSE)
res_DS <- diffcyt(sce, clustering_to_use = "meta10",
    analysis_type = "DS", method_DS = "diffcyt-DS-limma",
    design = design, contrast = contrast, verbose = FALSE)

Error in glmFit.default(y = y$counts, design = design, dispersion = dispersion, : NA dispersions not allowed

I am guessing this is due to all the NAs. If yes then why does this happen and how to get rid of NAs here ?

very early error reading .fcs files in

Hey,
I'm using the following code from the workflow vignette to read my .fcs files into diffcyt pipeline.
files <- list.files(
path = "path/to/files", pattern = "\.fcs$", full.names = TRUE
)
d_flowSet <- read.flowSet(
files, transformation = FALSE, truncate_max_range = FALSE
).
It is returning the following error message:
Error in frameList[isFrame] : invalid subscript type 'list'

I was wondering what the way around this is?

Input for createContrast()

In my dataset, I have 2 conditions I would like to analyse:

Treatment: Levels A, B and C
Stimulation: Baseline, After Stimulation

All of the patients have a baseline measurement and a measurement that was done after stimulation. I want to compare the stimulation condition within each treatment group and between the treatment groups.
For this, I would like to compare the results of the implemented methods for differential abundance and differential states but I am unsure if I understand correctly, how the contrast matrix has to be specified.

Case 1: Let's say I wanted to run edgeR on this. I would first create a design matrix with my two conditions like this:
createDesignMatrix(experiment_info, cols_design = c("treatment", "stimulation", "patient_id")) which yields the columns:
"(Intercept)", "treatmentB", "treatmentC", "stimulationAfter", "patient_idP2", ..., "patient_idP10". Should I create my contrast matrix then with
createContrast( c( 0, 1, 1, 1, rep(0, 9) ) ) ?

Case 2: Let's say I now wanted run a GLMM for comparison. For the GLMM, I would specify my formula with:
createFormula(experiment_info, cols_fixed = c("treatment", "stimulation"), cols_random = "patient_id"). Now, my contrast has to correspond to the levels of my fixed effect terms. Should I therefore specify the contrast matrix with:
createContrast( c(1, 1, 1, 1, 1) ) or does the first entry have to be a zero: createContrast( c(0, 1, 1, 0, 1) ) to get a meaningful analysis?

Would this contrast matrix lead to an all-against-all comparison, i.e. every treatment group in every stimulation condition? If I then wanted to compare the different treatment groups only in the baseline stimulation group, what contrast do I have to specify?

Another thing that confuses me is that Wikipedia says that the coefficients of contrast add up to zero. However, in your examples in diffcyt workflow and in ?createContrast this is not the case. Is this done internally?

Thank you for your time!

Build failure on BioC-devel due to Vignette/ExperimentHub

Hi, there is an issue with vignette rebuilding:

Error: processing vignette 'diffcyt_workflow.Rmd' failed with diagnostics:
package or namespace load failed for 'HDCytoData':
 .onLoad failed in loadNamespace() for 'HDCytoData', details:
  call: FUN(X[[i]], ...)
  error: 'Levine_32dim_SE' not found in ExperimentHub

Yours,
Steffen

differential state analysis fails

Hi Lukas,
Congratulations on getting the paper published!

I was trying my data with the new SCE object in CATALYST and diffcyt package. For some reason, when I do the differential state analysis either with limma or LMM, they both fail in slightly different ways.

In case of LMM I get all p-values as NA and in case of limma I get the following message:

Error in contrasts.fit(fit, contrast) : Number of rows of contrast matrix must match number of coefficients in fit

Contrast that I used is :

contrast <- createContrast(c(0, 1,0,0))

Formula is:
ds_formula1 <- createFormula(ei,
cols_fixed = "condition")

The command used is:

ds_res3 <- diffcyt(sce, formula = ds_formula1, contrast = contrast, analysis_type = "DS", method_DS = "diffcyt-DS-limma", clustering_to_use = "meta14", verbose = TRUE,subsampling = 1000,transform = F)

Output:
using SingleCellExperiment object from CATALYST as input
using cluster IDs from clustering stored in column 'meta14' of 'cluster_codes' data frame in 'metadata' of SingleCellExperiment object from CATALYST
calculating features...
calculating DS tests using method 'diffcyt-DS-limma'...
Error in contrasts.fit(fit, contrast) : Number of rows of contrast matrix must match number of coefficients in fit

I have four groups in this dataset with 16 files in total.

Here is the link to the SCE data object in dropbox:

https://www.dropbox.com/s/ceqc28hcq0j8wz5/sce_dbglgata.rda?dl=0

I am sure I am making a silly mistake somewhere but I cannot figure it out even after looking at the source code.

Thanks!
Hena

` sessionInfo()
R version 3.6.2 (2019-12-12)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS Catalina 10.15.1

Random number generation:
RNG: Mersenne-Twister
Normal: Inversion
Sample: Rounding

locale:
[1] en_CA.UTF-8/en_CA.UTF-8/en_CA.UTF-8/C/en_CA.UTF-8/en_CA.UTF-8

attached base packages:
[1] parallel stats4 stats graphics grDevices utils datasets methods base

other attached packages:
[1] DT_0.11 diffcyt_1.6.1 ggplot2_3.2.1
[4] flowCore_1.52.1 SingleCellExperiment_1.8.0 SummarizedExperiment_1.16.1
[7] DelayedArray_0.12.2 BiocParallel_1.20.1 matrixStats_0.55.0
[10] Biobase_2.46.0 GenomicRanges_1.38.0 GenomeInfoDb_1.22.0
[13] IRanges_2.20.2 S4Vectors_0.24.3 BiocGenerics_0.32.0
[16] CATALYST_1.10.1 limma_3.42.0

loaded via a namespace (and not attached):
[1] shinydashboard_0.7.1 R.utils_2.9.2 ks_1.11.6
[4] lme4_1.1-21 tidyselect_0.2.5 htmlwidgets_1.5.1
[7] grid_3.6.2 Rtsne_0.15 munsell_0.5.0
[10] codetools_0.2-16 withr_2.1.2 colorspace_1.4-1
[13] flowViz_1.50.0 knitr_1.27 rstudioapi_0.10
[16] flowClust_3.24.0 robustbase_0.93-5 openCyto_1.24.0
[19] GenomeInfoDbData_1.2.2 mnormt_1.5-5 flowWorkspace_3.34.1
[22] vctrs_0.2.1 TH.data_1.0-10 xfun_0.12
[25] R6_2.4.1 ggbeeswarm_0.6.0 clue_0.3-57
[28] rsvd_1.0.2 locfit_1.5-9.1 bitops_1.0-6
[31] assertthat_0.2.1 promises_1.1.0 scales_1.1.0
[34] multcomp_1.4-12 beeswarm_0.2.3 gtable_0.3.0
[37] sandwich_2.5-1 rlang_0.4.2 zeallot_0.1.0
[40] GlobalOptions_0.1.1 splines_3.6.2 lazyeval_0.2.2
[43] hexbin_1.28.0 shinyBS_0.61 yaml_2.2.0
[46] reshape2_1.4.3 abind_1.4-5 backports_1.1.5
[49] httpuv_1.5.2 IDPmisc_1.1.20 RBGL_1.62.1
[52] tools_3.6.2 ellipsis_0.3.0 RColorBrewer_1.1-2
[55] ggridges_0.5.2 Rcpp_1.0.3 plyr_1.8.5
[58] base64enc_0.1-3 zlibbioc_1.32.0 purrr_0.3.3
[61] RCurl_1.98-1.1 FlowSOM_1.18.0 GetoptLong_0.1.8
[64] viridis_0.5.1 cowplot_1.0.0 zoo_1.8-7
[67] haven_2.2.0 ggrepel_0.8.1 cluster_2.1.0
[70] fda_2.4.8 magrittr_1.5 ncdfFlow_2.32.0
[73] data.table_1.12.8 openxlsx_4.1.4 circlize_0.4.8
[76] mvtnorm_1.0-12 hms_0.5.3 shinyjs_1.1
[79] mime_0.8 xtable_1.8-4 XML_3.99-0.3
[82] rio_0.5.16 jpeg_0.1-8.1 mclust_5.4.5
[85] readxl_1.3.1 gridExtra_2.3 shape_1.4.4
[88] ggcyto_1.14.0 compiler_3.6.2 scater_1.14.6
[91] ellipse_0.4.1 tibble_2.1.3 flowStats_3.44.0
[94] KernSmooth_2.23-16 crayon_1.3.4 minqa_1.2.4
[97] R.oo_1.23.0 htmltools_0.4.0 corpcor_1.6.9
[100] pcaPP_1.9-73 later_1.0.0 tidyr_1.0.0
[103] rrcov_1.5-2 RcppParallel_4.4.4 ComplexHeatmap_2.2.0
[106] MASS_7.3-51.5 boot_1.3-24 Matrix_1.2-18
[109] car_3.0-6 R.methodsS3_1.7.1 igraph_1.2.4.2
[112] forcats_0.4.0 pkgconfig_2.0.3 foreign_0.8-75
[115] plotly_4.9.1 vipor_0.4.5 XVector_0.26.0
[118] drc_3.0-1 stringr_1.4.0 digest_0.6.23
[121] tsne_0.1-3 ConsensusClusterPlus_1.50.0 graph_1.64.0
[124] cellranger_1.1.0 edgeR_3.28.0 DelayedMatrixStats_1.8.0
[127] curl_4.3 shiny_1.4.0 gtools_3.8.1
[130] nloptr_1.2.1 rjson_0.2.20 nlme_3.1-143
[133] lifecycle_0.1.0 jsonlite_1.6 carData_3.0-3
[136] BiocNeighbors_1.4.1 viridisLite_0.3.0 pillar_1.4.3
[139] lattice_0.20-38 fastmap_1.0.1 httr_1.4.1
[142] plotrix_3.7-7 DEoptimR_1.0-8 survival_3.1-8
[145] glue_1.3.1 zip_2.0.4 png_0.1-7
[148] Rgraphviz_2.30.0 stringi_1.4.5 nnls_1.4
[151] BiocSingular_1.2.1 CytoML_1.12.0 latticeExtra_0.6-29
[154] dplyr_0.8.3 irlba_2.3.3 `

Fixed the calcCounts and calcMedians Functions and Added Option to Choose Which Clustering Is Used

Dear Lukas,

I have fixed the calcCounts and calcMedians functions so that the rows and columns of SCEs are read properly.
I have also added an option to pass an argument to the function that sets the (meta)clustering (from colnames(colData(SCE))) to be used when making the calculations. The argument has to be passed as a string (in "double quotes").

This function outputs counts per (meta)cluster per sample:

# Define the function that replaces calcCounts:
# SCE = The summarized experiment object the counts calculation will be done on.
# clustering = A (meta)clustering from colnames(colData(SCE)) (in quotes ""). The cell counts will be calculated for each (meta)cluster.
calcCountsFixed <- function (SCE, clustering = "cluster_id"){
    if (!is(SCE, "SummarizedExperiment")) {
        stop("Data object must be a 'SummarizedExperiment'.")
    }
    if (!(clustering %in% (colnames(colData(SCE))))) {
        stop("Data object does not contain the given clustering. Check colnames(colData(SCE)).")
    }
    coldata_df <- as.data.frame(colData(SCE))
    n_cells <- coldata_df %>% group_by(!!rlang::sym(clustering), sample_id, .drop = FALSE) %>% tally %>% complete(sample_id)
    n_cells <- acast(n_cells, paste0(clustering, " ~ sample_id"), value.var = "n", fill = 0)
    if (nrow(n_cells) < nlevels(colData(SCE)[[clustering]])) {
        missing_markers <- which(!(levels(colData(SCE)[[clustering]]) %in% rownames(n_cells)))
        n_cells_tmp <- matrix(0, nrow = length(missing_markers), ncol = ncol(n_cells))
        rownames(n_cells_tmp) <- missing_markers
        n_cells <- rbind(n_cells, n_cells_tmp)
        n_cells <- n_cells[order(as.numeric(rownames(n_cells))), , drop = FALSE]
    }
    n_cells_total <- rowSums(n_cells)
    row_data <- setNames(data.frame(factor(rownames(n_cells), levels = levels(colData(SCE)[[clustering]])), n_cells_total, stringsAsFactors = FALSE),
                         c(clustering, "n_cells"))
    col_data <- metadata(SCE)$experiment_info
    n_cells <- n_cells[, match(col_data$sample_id, colnames(n_cells)), drop = FALSE]
    stopifnot(all(col_data$sample_id == colnames(n_cells)))
    d_counts <- SummarizedExperiment(assays = list(counts = n_cells), 
                                     rowData = row_data,
                                     colData = col_data)
    d_counts
}

This function outputs the median signal values from the arcSinh-transformed signal values:

# Define the function that replaces calcMedians:
# SCE = The summarized experiment object the counts calculation will be done on.
# clustering = A (meta)clustering (in quotes "") from colnames(colData(SCE)). The cell counts will be calculated for each (meta)cluster.
calcMediansFixed <- function (SCE, clustering = "cluster_id"){
    if (!is(SCE, "SummarizedExperiment")) {
        stop("Data object must be a 'SummarizedExperiment'.")
    }
    if (!(clustering %in% (colnames(colData(SCE))))) {
        stop("Data object does not contain the given clustering. Check colnames(colData(SCE)).")
    }
    is_marker <- rowData(SCE)$marker_class != "none"
    id_type_markers <- (rowData(SCE)$marker_class == "type")[is_marker]
    id_state_markers <- (rowData(SCE)$marker_class == "state")[is_marker]
    assaydata_mx <- as.matrix(assays(SCE)[["exprs"]])
    medians <- vector("list", sum(is_marker))
    marker_names_sub <- as.character(rowData(SCE)$marker_name[is_marker])
    names(medians) <- marker_names_sub
    clus <- colData(SCE)[[clustering]]
    smp <- colData(SCE)$sample_id
    for (i in seq_along(medians)) {
        assaydata_i <- t(assaydata_mx[marker_names_sub[i], , drop = FALSE])
        assaydata_i <- as.data.frame(assaydata_i)
        temp_clus <- data.frame(clus)
        names(temp_clus) <- clustering
        assaydata_i <- cbind(assaydata_i, sample_id = smp, temp_clus)
        colnames(assaydata_i)[1] <- "value"
        med <- assaydata_i %>% group_by(!!rlang::sym(clustering), sample_id, .drop = FALSE) %>% summarize(median = median(value))
        med <- acast(med, paste0(clustering, " ~ sample_id"), value.var = "median", fill = NA)
        if (nrow(med) < nlevels(colData(SCE)[[clustering]])) {
            missing_markers <- which(!(levels(colData(SCE)[[clustering]]) %in% rownames(med)))
            med_tmp <- matrix(NA, nrow = length(missing_markers), ncol = ncol(med))
            rownames(med_tmp) <- missing_markers
            med <- rbind(med, med_tmp)
            med <- med[order(as.numeric(rownames(med))), , drop = FALSE]
        }
        medians[[i]] <- med
    }
    for (i in seq_along(medians)) {
        if (!all(rownames(medians[[i]]) == rownames(medians[[1]]))) {
            stop("Cluster IDs do not match")
        }
        if (!all(colnames(medians[[i]]) == colnames(medians[[1]]))) {
            stop("Sample IDs do not match")
        }
    }
    row_data <- setNames(data.frame(factor(rownames(medians[[1]]), levels = levels(colData(SCE)[[clustering]])), stringsAsFactors = FALSE),
                         c(clustering))
    col_data <- metadata(SCE)$experiment_info
    medians <- lapply(medians, function(m) {
        m[, match(col_data$sample_id, colnames(m)), drop = FALSE]
    })
    stopifnot(all(sapply(medians, function(m) {
        col_data$sample_id == colnames(m)
    })))
    metadata <- list(id_type_markers = id_type_markers, id_state_markers = id_state_markers)
    d_medians <- SummarizedExperiment(assays = medians, rowData = row_data, 
                                      colData = col_data, metadata = metadata)
    d_medians
}

This function outputs the median data made from raw signal values:

# The previous function outputs the median expression levels as arcSinh-transformed values with an argument of 5 (or which ever number
# you set as the value for the "cofactor" in "prepData"). In the case of CyTOF data, "cofactor = 5", which gives the formula "asinh(x/5)".
# In the following function, "[["exprs"]]" has been replaced by "[["counts"]]". This makes the function output raw, untransformed medians
# of marker expression. Be advised, arcSinh-transformed values should be used for statistical analyses due to being more comparable
# between markers and/or samples, unlike raw values, which can, for instance, vary from 5 to 5000.

# SCE = The summarized experiment object the counts calculation will be done on.
# clustering = A (meta)clustering (in quotes "") from colnames(colData(SCE)). The cell counts will be calculated for each (meta)cluster.
calcMediansFixed_raw <- function (SCE, clustering = "cluster_id"){
    if (!is(SCE, "SummarizedExperiment")) {
        stop("Data object must be a 'SummarizedExperiment'.")
    }
    if (!(clustering %in% (colnames(colData(SCE))))) {
        stop("Data object does not contain the given clustering. Check colnames(colData(SCE)).")
    }
    is_marker <- rowData(SCE)$marker_class != "none"
    id_type_markers <- (rowData(SCE)$marker_class == "type")[is_marker]
    id_state_markers <- (rowData(SCE)$marker_class == "state")[is_marker]
    assaydata_mx <- as.matrix(assays(SCE)[["counts"]])
    medians <- vector("list", sum(is_marker))
    marker_names_sub <- as.character(rowData(SCE)$marker_name[is_marker])
    names(medians) <- marker_names_sub
    clus <- colData(SCE)[[clustering]]
    smp <- colData(SCE)$sample_id
    for (i in seq_along(medians)) {
        assaydata_i <- t(assaydata_mx[marker_names_sub[i], , drop = FALSE])
        assaydata_i <- as.data.frame(assaydata_i)
        temp_clus <- data.frame(clus)
        names(temp_clus) <- clustering
        assaydata_i <- cbind(assaydata_i, sample_id = smp, temp_clus)
        colnames(assaydata_i)[1] <- "value"
        med <- assaydata_i %>% group_by(!!rlang::sym(clustering), sample_id, .drop = FALSE) %>% summarize(median = median(value))
        med <- acast(med, paste0(clustering, " ~ sample_id"), value.var = "median", fill = NA)
        if (nrow(med) < nlevels(colData(SCE)[[clustering]])) {
            missing_markers <- which(!(levels(colData(SCE)[[clustering]]) %in% rownames(med)))
            med_tmp <- matrix(NA, nrow = length(missing_markers), ncol = ncol(med))
            rownames(med_tmp) <- missing_markers
            med <- rbind(med, med_tmp)
            med <- med[order(as.numeric(rownames(med))), , drop = FALSE]
        }
        medians[[i]] <- med
    }
    for (i in seq_along(medians)) {
        if (!all(rownames(medians[[i]]) == rownames(medians[[1]]))) {
            stop("Cluster IDs do not match")
        }
        if (!all(colnames(medians[[i]]) == colnames(medians[[1]]))) {
            stop("Sample IDs do not match")
        }
    }
    row_data <- setNames(data.frame(factor(rownames(medians[[1]]), levels = levels(colData(SCE)[[clustering]])), stringsAsFactors = FALSE),
                         c(clustering))
    col_data <- metadata(SCE)$experiment_info
    medians <- lapply(medians, function(m) {
        m[, match(col_data$sample_id, colnames(m)), drop = FALSE]
    })
    stopifnot(all(sapply(medians, function(m) {
        col_data$sample_id == colnames(m)
    })))
    metadata <- list(id_type_markers = id_type_markers, id_state_markers = id_state_markers)
    d_medians <- SummarizedExperiment(assays = medians, rowData = row_data, 
                                      colData = col_data, metadata = metadata)
    d_medians
}

Please let me know if you decide to incorporate them into diffcyt.

Kind regards.
Luka Tandaric

plotHeatmap function

Dear Lukas,

I am using diffcyt packege following option number 1 from your tutorial. In the end of analysis I get output which can be displayed using heatmap and plotHeatmap function.
I was wondering if it is possible to custamize the option to display p_val instead of p_adj when I plot a hetmap with results. I am sorry if you mentioned it in your tutorial and I missed it.

Thank you in advance!
Best,
Ekaterina

extracting medians

Hi Lukas,

This isn't so much an issue but more of a query.

Is there a way to extract the median marker value for each cluster? I'd like to use these data to help identify the cell type of each cluster.

The calcMedians() function does something close but for each sample and cluster times by the number of markers.

Many thanks,
Lucas

Direction of comparison between the disease group and the reference group

Hi Dr. Weber @lmweber ,

Is it possible to know the direction of the differential results from diffcyt? I can see which clusters are significant but I don't know the direction of the difference. I have two thoughts to achieve this. Can you please comment on which one would work: a) Run the diffcyt analysis with reference group set to 0 and disease group set to 1 for the contrast matrix and then repeat the analysis with reference group set to 1 and disease group set to 0. b) Simply set the reference group to -1 and disease group to 1 for the contrast matrix.

Thank you!

CATALYST older versions

Hi,
I would like to reinstall an older version of CATALYST, as a daframe object built on CATALYST 1.9.7 gives error when I try to load it on CATALYST 1.10.0. Is there a way around this? I want to avoid doing a re-analysis with new version as it might change some things.

Thanks,

Mixed Model and Contrast

Hello Lukas,
Thank-you for Diffcyt.
I had a quick question about setting up contrast matrix for mixed model. I hypothesize that the cell cluster proportions depend on the condition (case-status, binary) when corrected for age and sex (fixed effect), and the date of batch processing (categorical, 4 values, random effect)

ei <- metadata(sce)$experiment_info
> head(ei)
  sample_id condition    sex age        date n_cells
1       E23   Control Female  46 16-AUG-2019  150000
2       E25   Control   Male  42 28-AUG-2019  150000
3       E26   Control Female  45 16-AUG-2019  150000
4       E28   Control Female  52 16-AUG-2019  150000
5       E29      Case Female  40 28-AUG-2019  150000
6       E30      Case Female  42 28-AUG-2019  150000
> str(ei)
'data.frame':	102 obs. of  6 variables:
 $ sample_id: Factor w/ 102 levels "E23","E25","E26",..: 1 2 3 4 5 6 7 8 9 10 ...
 $ condition: Factor w/ 2 levels "Control","Case": 1 1 1 1 2 2 2 1 1 2 ...
 $ sex      : Factor w/ 2 levels "Female","Male": 1 2 1 1 1 1 1 1 1 1 ...
 $ age      : num  46 42 45 52 40 42 52 52 46 62 ...
 $ date     : Factor w/ 4 levels "08-OCT-2019",..: 2 4 2 2 4 4 1 2 2 4 ...
 $ n_cells  : num  150000 150000 150000 150000 150000 150000 150000 150000 150000 150000 ...
da_formula1 <- createFormula(ei,
                             cols_fixed = c("condition", "age", "sex"),
                             cols_random = "date"
)
> da_formula1[["formula"]]
y ~ condition + age + sex + (1 | date)
contrast <- createContrast(c(0, 1, 0, 0))
> contrast
     [,1]
[1,]    0
[2,]    1
[3,]    0
[4,]    0

da_res1 <- diffcyt(sce,
                   formula = da_formula1, contrast = contrast,
                   analysis_type = "DA", method_DA = "diffcyt-DA-GLMM",
                   clustering_to_use = "meta10", verbose = T
)
topTable(da_res1, format_vals = T)
> topTable(da_res1, format_vals = T)
DataFrame with 10 rows and 3 columns
   cluster_id     p_val     p_adj
     <factor> <numeric> <numeric>
1          1   0.00e+00  0.00e+00
2          2   0.00e+00  0.00e+00
3          3   0.00e+00  0.00e+00
4          4   0.00e+00  0.00e+00
5          5   0.00e+00  0.00e+00
6          6   0.00e+00  0.00e+00
7          7   0.00e+00  0.00e+00
8          8   0.00e+00  0.00e+00
10         10  1.78e-15  1.97e-15
9          9   7.95e-01  7.95e-01

Either I am about to publish a Science / Nature paper or I messed up (the probability of latter is close to 1).
Could you please let me know where am I messing up?
Also, like Toptable, is there a quick & easy way to check the effect size?
Thanks a lot!
Vivek

diffcyt for longitudinal data

Hello
I am using diffcyt and CATALYST to analyse flow data. Everything is working very well so far.
My dataset is made of samples from individuals who differ by gender but I have several time points for each individual.
Iis there a way to run a DA analyze not only by gender but also accounting for the longitudinal design . The underlying question would be something like : does the clustering evolve differently over time (i.e. aging ) between women and men?
For now, I tried lmer model outside of diffcyt but it would be nice to use your package. if not available, do you have any advices?

Thank you in advance,

Sorting x-axis in Heatmap

Hi!

Thank you for sharing the code. I just have a quick question. I have a list of numbered patients, from ID_001 to ID_120, and two classes. However, these are intermingled. My issue is that when I get the results and plot the heatmap (PlotDiffHeatmap) I would like to see all patients of one class on the left and the rest on the right, so that I can compare expression and abundance more clearly, without renaming every patient (It's a standardized nomenclature). Basically, in the image attached all the green patients and purple grouped by condition.

I have tried sorting the md object but it doesn't work. Is there any easy workaround?

Thanks!!
DA_edgeR.pdf

lmweber / diffcyt Goto Github PK

diffcyt's People

Contributors

Stargazers

Watchers

Forkers

diffcyt's Issues

estimate correlation between paired samples

(note: paired designs only; >2 measures per sample not allowed)

Dimensionality reduction with 4000 cells per sample. NOTE: I have a lot more than this. However, my decently powerful Mac works overnight if I use all cells.

Checking reduced dimensions

test for

- differential abundance (DA) of clusters

- differential states (DS) within clusters

Recommend Projects

Recommend Topics

Recommend Org