lmweber / diffcyt Goto Github PK
View Code? Open in Web Editor NEWR package for differential discovery analyses in high-dimensional cytometry data
License: MIT License
R package for differential discovery analyses in high-dimensional cytometry data
License: MIT License
See https://github.com/SamGG/diffcyt commits and email discussion
I have the following doubt over diffcyt() contrasts.
I have a daframe object from CATALYST; which I now want to input for differential abundance analysis. My "condition" field of daframe object has 4 conditions; out of which 3 conditions are several kinds of tumors and the last condition is control. I built a contrast (0,0,0,1); for differential analysis between all tumor combined and the control.
However now I want a contrast between a particular tumor type and all the controls.
I am aware each coefficient in the contrast should correspond to one of the condition; but I am not sure how to mention the particular contrast i want.
Since, diffcyt internally includes edgeR and limma steps I could individually for each cell population do the usual limma steps:
**design <- model.matrix(~condition)
fit <- lmFit(eset, design)
fit2 <- eBayes(fit)
de_genes <- topTable(fit2)
edgeR set of commands
exprDesign <- model.matrix(~condition)
propData <- DGEList(count_table,
lib.size = colSums(count_table)) # BE VERY CAREFUL ABOUT YOUR lib.size parameter, this is the number of cells you collected. If you perform a hierarchical clustering (i.e. Citrus), your lib.size is not the sum of your columns.
fit <- estimateDisp(propData, design) # this is the first deviation from limma, estimating dispersion based on your design help accounts for the variability in cell counting
fit <- glmQLFit(fit, design, robust=TRUE) # this is analogous to the lmFit step above...**
But is there a way I could do this with diffcyt?
Thanks for your time.
I will be grateful if you could comment on this.
Hello,
Could you help me with the show_logFC
argument in the topTable
function? When I set it equal to TRUE
, I receive an error:
Error: subscript contains invalid names
I did a little digging, and it doesn't look like there are any arguments in the diffcyt
function that outputs a logFC value associated with the object, unless I'm doing something wrong. I recreated the steps seen here, but no matter what I do, there is never a logFC
column in the result object, so the matrix subsetting that happens in line 186 or 188 (depending on whether DA or DE) always fails.
For reproducibility, see code snippet below.
rslt <- diffcyt(daf, # daf is a SingleCellExperiment
formula=fmla, # fmla was created using createFormula
contrast=ctst, # ctst is a basic contrast matrix
analysis_type='DA',
method_DA='diffcyt-DA-GLMM',
clustering_to_use='meta20',
verbose=F)
tt <- topTable(rslt, all=T, show_props=T, format_vals=F, show_logFC=T)
Let me know if there is any further info you need from me. I am running the latest version of this package (and others, like CATALYST) on R 4.0.2. Thanks for any and all help!
J.D.
Hi there,
I am trying to do an analysis for two different columns of data.
Column 1 is Reticulocyte count at T0 and column 2 is Reticulocyte count at T1.
I want to compare patients with "Normal" in T0 to patients with "High" in T1, where the factors for both columns are "Normal", "Low", "High" and "N/A".
In the past, I have just filtered SCE objects to give me patients with desired factors (e.g. "Normal" and "High") and used contrast 0,1 to compare baseline with condition. But I am now interested in comparing across multiple columns.
How can I set up an analysis for this? Is this possible?
Thank you!
Hi there I am trying to run a DA analysis (method DA-GLMM). However when I run the analysis I receive the error as seen below. Not too sure whats going wrong.
Session Info
**R version 4.0.3 (2020-10-10)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS High Sierra 10.13.6
Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib
locale:
[1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8
attached base packages:
[1] parallel stats4 stats graphics grDevices utils datasets methods base
other attached packages:
[1] lme4_1.1-25 Matrix_1.2-18 diffcyt_1.10.0 cowplot_1.1.0 CATALYST_1.14.0 SingleCellExperiment_1.12.0
[7] SummarizedExperiment_1.20.0 Biobase_2.50.0 GenomicRanges_1.42.0 GenomeInfoDb_1.26.0 IRanges_2.24.0 S4Vectors_0.28.0
[13] BiocGenerics_0.36.0 MatrixGenerics_1.2.0 matrixStats_0.57.0 readxl_1.3.1 forcats_0.5.0 stringr_1.4.0
[19] dplyr_1.0.2 purrr_0.3.4 readr_1.4.0 tidyr_1.1.2 tibble_3.0.4 ggplot2_3.3.2
[25] tidyverse_1.3.0 CytoML_2.2.1 flowWorkspace_4.2.0 flowCore_2.2.0
loaded via a namespace (and not attached):
[1] backports_1.2.0 circlize_0.4.11 drc_3.0-1 plyr_1.8.6 igraph_1.2.6 ConsensusClusterPlus_1.54.0
[7] splines_4.0.3 BiocParallel_1.24.1 scater_1.18.3 TH.data_1.0-10 digest_0.6.27 viridis_0.5.1
[13] fansi_0.4.1 magrittr_1.5 cluster_2.1.0 limma_3.46.0 aws.signature_0.6.0 openxlsx_4.2.3
[19] ComplexHeatmap_2.6.2 modelr_0.1.8 RcppParallel_5.0.2 sandwich_3.0-0 cytolib_2.2.0 jpeg_0.1-8.1
[25] colorspace_2.0-0 rvest_0.3.6 ggrepel_0.8.2 haven_2.3.1 xfun_0.19 crayon_1.3.4
[31] RCurl_1.98-1.2 jsonlite_1.7.1 hexbin_1.28.1 graph_1.68.0 survival_3.2-7 zoo_1.8-8
[37] glue_1.4.2 gtable_0.3.0 nnls_1.4 zlibbioc_1.36.0 XVector_0.30.0 GetoptLong_1.0.4
[43] DelayedArray_0.16.0 ggcyto_1.18.0 BiocSingular_1.6.0 car_3.0-10 Rgraphviz_2.34.0 shape_1.4.5
[49] abind_1.4-5 scales_1.1.1 mvtnorm_1.1-1 edgeR_3.32.0 DBI_1.1.0 Rcpp_1.0.5
[55] plotrix_3.7-8 viridisLite_0.3.0 clue_0.3-57 rsvd_1.0.3 foreign_0.8-80 FlowSOM_1.22.0
[61] tsne_0.1-3 httr_1.4.2 RColorBrewer_1.1-2 ellipsis_0.3.1 farver_2.0.3 pkgconfig_2.0.3
[67] XML_3.99-0.5 scuttle_1.0.0 uwot_0.1.9 dbplyr_2.0.0 locfit_1.5-9.4 labeling_0.4.2
[73] tidyselect_1.1.0 rlang_0.4.8 reshape2_1.4.4 munsell_0.5.0 cellranger_1.1.0 tools_4.0.3
[79] cli_2.1.0 generics_0.1.0 broom_0.7.2 ggridges_0.5.2 aws.s3_0.3.21 yaml_2.2.1
[85] knitr_1.30 fs_1.5.0 zip_2.1.1 nlme_3.1-150 RBGL_1.66.0 sparseMatrixStats_1.2.0
[91] xml2_1.3.2 compiler_4.0.3 rstudioapi_0.13 beeswarm_0.2.3 curl_4.3 png_0.1-7
[97] reprex_0.3.0 statmod_1.4.35 stringi_1.5.3 RSpectra_0.16-0 lattice_0.20-41 nloptr_1.2.2.2
[103] vctrs_0.3.4 pillar_1.4.6 lifecycle_0.2.0 BiocManager_1.30.10 GlobalOptions_0.1.2 RcppAnnoy_0.0.17
[109] BiocNeighbors_1.8.1 irlba_2.3.3 data.table_1.13.2 bitops_1.0-6 R6_2.5.0 latticeExtra_0.6-29
[115] gridExtra_2.3 RProtoBufLib_2.2.0 rio_0.5.16 vipor_0.4.5 codetools_0.2-18 boot_1.3-25
[121] MASS_7.3-53 gtools_3.8.2 assertthat_0.2.1 rjson_0.2.20 withr_2.3.0 multcomp_1.4-15
[127] GenomeInfoDbData_1.2.4 hms_0.5.3 ncdfFlow_2.36.0 grid_4.0.3 beachmat_2.6.1 minqa_1.2.4
[133] DelayedMatrixStats_1.12.0 carData_3.0-4 Cairo_1.5-12.2 Rtsne_0.15 lubridate_1.7.9.2 base64enc_0.1-3
[139] ggbeeswarm_0.6.0**
Hi Dr. Weber,
Thanks for the awesome package! My data file is too big to use the built-in plotHeatmap function so I tried the plotDiffHeatMap from the Catalyst package, but got this error message "plotDiffHeatmap(out_DA$d_se, out_DA$res)
Error in .check_sce(u$x, TRUE) :
is(x, "SingleCellExperiment") is not TRUE"
The same error occurs when I apply the exact example codes from your diffcyt workflow on the exact sample dataset.
Could you help me with the problem? Is it because the package got updated? Thanks so much!
Regards,
Tammy
Hi,
I'm running diffcyt via catalyst and I have some problems in setting up the design and contrast matrix. I have 144 samples divided into 4 conditions. I set up the code like this:
design <- createDesignMatrix(ei(sce), cols_design = "cond1")
contrast <- createContrast(c(0,0,0,1 ))
since I would like to compare to the last condition/group. However, even after reading the tutorials, I'm not sure I understood it correctly:
Thank you very much in advance!!
Dear Lukas,
looking at the logFC values in my toptable results, I am confused as to how diffcyt calculates logFC values. If I use the median values, either in my DA or DS results, to calculate log fold change by hand, I end up with completely different values to those in the toptables, even trying out base 2 and 10 for the logarithm or calculating it on untransformed and arcSinh(x/5)-transformed values.
Regards.
Luka
I am currently following a CyTOF workflow in order to analyse CyTOF datas with CATALYST package and diffcyt package; however I keep getting the same error and I don't understand where it comes from.
Everything works just fine until I try to create the object "ds_res1".
> ds_res1 <- diffcyt(sce,
+ formula = ds_formula1, contrast = contrast,
+ analysis_type = "DS", method_DS = "diffcyt-DS-LMM",
+ clustering_to_use = "merging1", verbose = FALSE)
I always get this error:
Error in rep(NA, nrow(meds)) : invalid 'times' argument
.
I am new to R and I don't know what "times" means in this case.
Thanks for your time.
I will be grateful if you could comment on this.
Hello
I get some #N/A values when I run
p <- topTable(ds_res2, show_props = TRUE, format_vals = TRUE, digits = 2)
write.xlsx(p@listData, "ds_res2.xlsx") (File attached)
Is this a problem of my dataset or is it something I am doing wrong? Any idea of how to solve this?
I have another issue opened which I think it could be related to this one.
Thank you for your help.
Regards
Juan
Hi @lmweber ,
thanks for the pakage and the support.
I have a doubt regarding the contrast. I have two groups (CTRL and TREATED), using contrast (0,1) all p value show no significance but changing contrast (1,0) five population are different. I also find this last result using the contrast (-1,1). This make me confused.
Thanks,
Domenico
Hello,
Thank you for the package. It is well written. I have a paired-design experiment with different times of visits. I am trying to get DA-GLMM to work on an experiment with a design formula. The code I am using is the following:
ei <- metadata(sce3)$experiment_info
da_formula <- createFormula(ei, cols_fixed = c("Clinical_Benefit","Visit", "Dose"), cols_random = c("sample_id", "Donor"))
contrst <- createContrast(c(0, 1, 0, 0, 0, 0, 0, 0, 0, 0))
da_res <- diffcyt(sce3, formula = da_formula, contrast = contrst, analysis_type = "DA", method_DA = "diffcyt-DA-GLMM", clustering_to_use = "meta20", verbose = TRUE)
A sample of the experiment metadata
sample_id Donor Visit Dose Clinical_Benefit Stain_Batch n_cells
58101-002_PBMC_C1_300 58101-002 C1 300 No 200229 5e+05
58101-002_PBMC_C2_300 58101-002 C2 300 No 200229 5e+05
58105-001_PBMC_C1_300 58105-001 C1 300 Yes 200229 5e+05
58105-001_PBMC_C2_300 58105-001 C2 300 Yes 200229 483611
58105-001_PBMC_C4_300 58105-001 C4 300 Yes 200229 451636
58105-003_PBMC_C1_600 58105-003 C1 600 No 200229 5e+05
58105-003_PBMC_C2_600 58105-003 C2 600 No 200229 5e+05
58106-001_PBMC_C1_300 58106-001 C1 300 Yes 200229 5e+05
The levels for each of the fixed effects are:
Clinical_Benefit: No | Yes
Visit: C1 | C2 | C4
Dose: 45 | 135 | 300 | 600 | 1200
The command runs fine but I am just getting NAs in the output.
Thank you for your help.
Best regards,
Yanal
Hi,
Thanks for the great package!
Does diffcyt allows for settings where estimating associations with a continuous phenotype at the subject level (e.g., BMI of subjects) is of interest, or does it only consider proper differential analysis settings (comparing groups of subjects, e.g., overweight vs normal weight) ? In the latter case, is there any reason why running an LMM for a continuous phenotype wouldn't be appropriate in a flow/mass cytometry context (I'm new to the field!)? Would this be because of statistical power concerns if the number of subjects is low? If so, I guess your functions could be easily adapted to handle this scenario (for applications where the number of subjects is reasonably large)?
Thanks !
Hi Lukas,
I made some edits to testDS_LMM & testDS_limma. The change is summarized as below.
add droplevels to testDS_LMM & testDS_limma
here
here
This is to remove levels that are actually not in cluster_id
.
allow customized weights:
here
You would see that it allows the version to choose TRUE
or FALSE
. In this way, it would not affect users who are using diffcyt
.
fix the bug when using weights = NULL
here
I add a second cbind
to fix the error that occurs in combining a data.frame
and NULL
.
Please let me know whether it works
Hey,
I am encountering this error when using the wrapper function:
Error in diffcyt(da_input = "d_flowSet", experiment_info = read.csv("21_8_6_metadata_mouseid_MTJ.csv"), :
unused argument (da_input = "d_flowSet")
Here is my input code: diffcyt(da_input = "d_flowSet", experiment_info = read.csv("21_8_6_metadata_mouseid_MTJ.csv"), marker_info = read.csv("21_8_6_tet2ko_panel_metadata_MTJ"))
Hi @lmweber,
Thanks for all your work. I'm currently following the CATALYST workflow and ran into a question when I was trying to get the createContrast()
for the diffcyt-DA-GLMM to work. In my 'condition' parameter I have 3 levels: Healthy, Non-Hospitalized and Hospitalized. As far as I understand createContrast(c(0,1,0))
compares Non-Hospitalized to the Healthy group and createContrast(c(0,-1,1))
compares the Non-Hospitalized to the Hospitalized group. But what output do I get when I use createContrast(c(0,1,1))
? I'm trying to get p-values for comparisons between each of the three groups (Healthy vs. (Non-)Hospitalized and Non-hospitalized vs. Hospitalized). The only way I can currently see to make that happen is to run the analysis multiple times, each time with a different contrast matrix. I'm wondering if that is statistically allowed though (I'm not a statistician) and if not, if there is a way to do this properly.
Thanks!
Martijn
Hi,
I have followed the tutorial and so far all differential expression analyses worked if I did not specify a random variable.
This is how the ds_formula2 looks for me.
diffcyt(sce, formula = ds_formula2, contrast = contrast, analysis_type = "DS", method_DS = "diffcyt-DS-LMM", clustering_to_use = "merging_all", verbose = FALSE)
Has this occured before? I am not sure what the problem might be, as the differential expression works fine without specifying a random effect.
Thanks!
Hi Lukas,
first off, thanks for your work on diffcyt, it's been enormously helpful for my research.
I've been mostly using the edgeR option for my differential abundance analysis, but was working on an experimental design that would benefit from random effects, so I gave testDA_GLMM()
a try. When comparing 50 clusters from 9 individuals between two timepoints each, the results looked a bit confusing, as all the p values were extremely close to 0. The issue looks extremely similar to the one described in [https://github.com//issues/17]. I had a look in the source code and I think I found an issue with lme4: when supplying the weights to glmer
, my cell counts are apparently too high (between 10,000 and 90,000 per sample) so that very small differences in abundance are found significant.
I decided to divide all counts in n_cells_smp
by 10 (or even 5 or 2) just to see what would happen, and the p values shot up a lot, so glmer(family = "binomial")
seems extremely sensitive to large weights. I tried your testDA_GLMM()
code and swapped out glmer
with MASS::glmmPQL()
while keeping the weights unchanged. While this approach wasn't as sensitive as testDA_edgeR()
, it didn't look to me as though it supplied many false positives, which testDA_GLMM()
certainly did. That's why I thought I'd let you know, because to me this all sounds like an issue with how lme4 deals with weights.
Let me know if it would be helpful and I can put together a reprex using my now public data.
Cheers
Florian
I am using this package to compare clusters following analysis with CATALYST package. It works fine until I subset out certain clusters post-CATALYST cluster merging. Once any clusters are removed diffcyt returns the error: "In order(as.numeric(rownames(n_cells))) : NAs introduced by coercion" and the cluster_ids do not line up with the appropriate p_values. This only occurs when "clustering_to_use" is set to search my merged clusters and not the original flowSOM clusters even though both have been removed. Any solution? I am using diffcyt V1.8.6 and CATALYST V1.12.2.
Thank you.
My diffcyt call:
"da_res1 <- diffcyt(daf,
formula = da_formula1, contrast = contrast,
analysis_type = "DA", method_DA = "diffcyt-DA-GLMM",
clustering_to_use = "merging1", verbose = FALSE) "
Call to subset merged clusters in CATALYST:
"daf <- filterSCE(daf, cluster_id != "Debris", k = "merging1")"
sessionInfo()
R version 4.0.1 (2020-06-06)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Catalina 10.15.4
Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] parallel stats4 stats graphics grDevices utils datasets methods base
other attached packages:
[1] diffcyt_1.8.6 CATALYST_1.12.2 SingleCellExperiment_1.10.1 SummarizedExperiment_1.18.1
[5] DelayedArray_0.14.0 matrixStats_0.56.0 Biobase_2.48.0 GenomicRanges_1.40.0
[9] GenomeInfoDb_1.24.2 IRanges_2.22.2 S4Vectors_0.26.1 BiocGenerics_0.34.0
[13] readxl_1.3.1 flowCore_2.0.1 forcats_0.5.0 stringr_1.4.0
[17] dplyr_1.0.0 purrr_0.3.4 readr_1.3.1 tidyr_1.1.0
[21] tibble_3.0.1 ggplot2_3.3.2 tidyverse_1.3.0
loaded via a namespace (and not attached):
[1] backports_1.1.8 circlize_0.4.10 drc_3.0-1 plyr_1.8.6
[5] igraph_1.2.5 ConsensusClusterPlus_1.52.0 splines_4.0.1 BiocParallel_1.22.0
[9] scater_1.16.2 TH.data_1.0-10 digest_0.6.25 viridis_0.5.1
[13] fansi_0.4.1 magrittr_1.5 CytoML_2.0.5 cluster_2.1.0
[17] limma_3.44.3 openxlsx_4.1.5 ComplexHeatmap_2.4.2 modelr_0.1.8
[21] RcppParallel_5.0.2 sandwich_2.5-1 flowWorkspace_4.0.6 cytolib_2.0.3
[25] jpeg_0.1-8.1 colorspace_1.4-1 blob_1.2.1 rvest_0.3.5
[29] ggrepel_0.8.2 haven_2.3.1 xfun_0.15 crayon_1.3.4
[33] RCurl_1.98-1.2 jsonlite_1.7.0 hexbin_1.28.1 graph_1.66.0
[37] lme4_1.1-23 survival_3.2-3 zoo_1.8-8 glue_1.4.1
[41] gtable_0.3.0 nnls_1.4 zlibbioc_1.34.0 XVector_0.28.0
[45] GetoptLong_1.0.0 ggcyto_1.16.0 car_3.0-8 BiocSingular_1.4.0
[49] Rgraphviz_2.32.0 shape_1.4.4 abind_1.4-5 scales_1.1.1
[53] mvtnorm_1.1-1 edgeR_3.30.3 DBI_1.1.0 Rcpp_1.0.4.6
[57] plotrix_3.7-8 viridisLite_0.3.0 clue_0.3-57 foreign_0.8-80
[61] rsvd_1.0.3 FlowSOM_1.20.0 tsne_0.1-3 httr_1.4.1
[65] RColorBrewer_1.1-2 ellipsis_0.3.1 farver_2.0.3 pkgconfig_2.0.3
[69] XML_3.99-0.3 dbplyr_1.4.4 locfit_1.5-9.4 labeling_0.3
[73] tidyselect_1.1.0 rlang_0.4.6 reshape2_1.4.4 munsell_0.5.0
[77] cellranger_1.1.0 tools_4.0.1 cli_2.0.2 generics_0.0.2
[81] broom_0.5.6 ggridges_0.5.2 yaml_2.2.1 fs_1.4.2
[85] zip_2.0.4 RBGL_1.64.0 nlme_3.1-148 xml2_1.3.2
[89] compiler_4.0.1 rstudioapi_0.11 beeswarm_0.2.3 curl_4.3
[93] png_0.1-7 reprex_0.3.0 statmod_1.4.34 stringi_1.4.6
[97] lattice_0.20-41 Matrix_1.2-18 nloptr_1.2.2.1 vctrs_0.3.1
[101] pillar_1.4.4 lifecycle_0.2.0 GlobalOptions_0.1.2 BiocNeighbors_1.6.0
[105] data.table_1.12.8 cowplot_1.0.0 bitops_1.0-6 irlba_2.3.3
[109] R6_2.4.1 latticeExtra_0.6-29 gridExtra_2.3 RProtoBufLib_2.0.0
[113] rio_0.5.16 vipor_0.4.5 codetools_0.2-16 boot_1.3-25
[117] MASS_7.3-51.6 gtools_3.8.2 assertthat_0.2.1 rjson_0.2.20
[121] withr_2.2.0 multcomp_1.4-13 GenomeInfoDbData_1.2.3 hms_0.5.3
[125] ncdfFlow_2.34.0 grid_4.0.1 minqa_1.2.4 DelayedMatrixStats_1.10.0
[129] carData_3.0-4 Rtsne_0.15 lubridate_1.7.9 base64enc_0.1-3
[133] tinytex_0.24 ggbeeswarm_0.6.0
Hi ,
I am using the testDA_edgeR function to do differential abundance analysis. For a cluster where count is close to zero in one group, I am getting NAs as a result.
`
cluster_id logFC logCPM LR p_val p_adj
<factor> <numeric> <numeric> <numeric> <numeric> <numeric>
52 52 NA NA NA NA NA
`
The count for this cluster are as follows:
`
SPF_BalbC_1 SPF_BalbC_2 SPF_BalbC_3 SPF_dblGata_5 SPF_dblGata_6 SPF_dblGata_7
391 578 252 1 1 1
`
This cluster is clearly differentially abundant. Can you please explain why this is happening and if there is a way to fix this.
Thanks!
Hena
I have a mass cytometry dataset of samples collected from patients with AML (n=30). There is one sample from each patient, and the patients can be divided into two groups based on initial response to experimental therapy (21 responders vs 9 non-responders). I have been using the CATALYST pipeline and the diffcyt() function to do differential state analysis. Using this I find lot if very significant differences between the two groups. But when I use a different script that just do a lot of two sided T-tests (and then FDR adjustment), I find nothing significant.
When I use diffcyt() and my t-test script on a different and paired dataset, they find similar significant differences. So, I surmise that my design and/or contrast is not correctly set up? Can anyone help me? I hope my R output and code snippet below is informative! 😊
design <- createDesign(ei(sce), cols_design = “condition”)
(Intercept) conditionRD
1 1 1
2 1 1
3 1 0
4 1 1
5 1 0
6 1 0
7 1 0
8 1 0
9 1 0
10 1 0
11 1 0
12 1 0
13 1 0
14 1 0
15 1 0
16 1 1
17 1 1
18 1 1
19 1 0
20 1 1
21 1 0
22 1 1
23 1 1
24 1 0
25 1 0
26 1 0
27 1 0
28 1 0
29 1 0
30 1 0
attr(,"assign")
[1] 0 1
attr(,"contrasts")
attr(,"contrasts")$condition
[1] "contr.treatment"
contrast <- createContrast(c(0,1))
[,1]
[1,] 0
[2,] 1
I am using the diffcyt() (https://rdrr.io/bioc/diffcyt/man/diffcyt.html) function from the diffcyt pipeline, with mostly default conditions. Ive pasted the code im using below:
design <- createDesignMatrix(ei(sce), cols_design = cond)
contrast <- createContrast(c(0, 1))
res_DS <- diffcyt(sce,
clustering_to_use = clust,
analysis_type = "DS",
method_DS = "diffcyt-DS-limma",
design = design,
contrast = contrast,
verbose = FALSE)
tbl_DS <- rowData(res_DS$res)
plotDiffHeatmap(sce, tbl_DS, all = TRUE, top_n = 30, fdr = 0.1,
sort_by = "padj", col_anno = cond, normalize =TRUE)
I assume that “diffcyt-DS-limma” uses limma::lmFit. It looks to me that diffcyt() (https://rdrr.io/bioc/diffcyt/src/R/diffcyt_wrapper.R) uses testDS_limma.R with my input as:
res <- testDS_limma(d_counts, d_medians, design, contrast, block_id, trend, weights, markers_to_test, min_cells, min_samples, plot, path)
Looking further into testDS_limma() it seems like it only usable for paired data:
But somehow runs on my data without error? I must be overlooking something…
I was trying to get diffcyt to show me the number of cells in each metacluster for each file in the SCE.
For example, I have 37 samples of granulocytes, which I have clustered and manually assigned metacluster IDs to and made the "metacluster_id_Granulocytes
" column in colData(SCE_Granulocytes)
. I wanted to know how many of each: neutrophils, eosinophils, basophils, which are the aforementioned metaclusters, are present in each of the 37 samples, so I tried using calcCounts(SCE_Granulocytes)
. The desired result was 3 numbers for each of the 37 samples - the number of neutrophils, the number of eosinophils, and the number of basophils in each of the 37 samples.
However calcCounts
didn't work, giving an error that said that the "Data object does not contain cluster labels.
", by which it meant that sample_id
was missing from the rowData
, which is strange, since rowData
shows the channels and not the factors that define the data, which "sample_id
" is one of, and belongs to colData
. So I looked at the calcCounts
code and made some changes:
I replaced every rowData
with colData
, except the rowData
in the next-to-last line - that would be the rowData
in the rowData = row_data, colData = col_data
piece of code.
I replaced every cluster_id
with metacluster_id_Granulocytes
.
I replaced every instance of d_se
with my own SCE - SCE_Granulocytes
.
I had to manually load some packages:
- tidyr
- because the code could not find function "complete"
.
- dplyr
- because the code could not find function "tally"
.
- reshape2
- because the code could not find function "acast"
.
The calcCounts code BEFORE:
function (d_se)
{
if (!is(d_se, "SummarizedExperiment")) {
stop("Data object must be a 'SummarizedExperiment'")
}
if (!("cluster_id" %in% (colnames(rowData(d_se))))) {
stop("Data object does not contain cluster labels. Run 'generateClusters' to generate cluster labels.")
}
rowdata_df <- as.data.frame(rowData(d_se))
n_cells <- rowdata_df %>% group_by(cluster_id, sample_id,
.drop = FALSE) %>% tally %>% complete(sample_id)
n_cells <- acast(n_cells, cluster_id ~ sample_id, value.var = "n",
fill = 0)
if (nrow(n_cells) < nlevels(rowData(d_se)$cluster_id)) {
ix_missing <- which(!(levels(rowData(d_se)$cluster_id) %in%
rownames(n_cells)))
n_cells_tmp <- matrix(0, nrow = length(ix_missing),
ncol = ncol(n_cells))
rownames(n_cells_tmp) <- ix_missing
n_cells <- rbind(n_cells, n_cells_tmp)
n_cells <- n_cells[order(as.numeric(rownames(n_cells))),
, drop = FALSE]
}
n_cells_total <- rowSums(n_cells)
row_data <- data.frame(cluster_id = factor(rownames(n_cells),
levels = levels(rowData(d_se)$cluster_id)), n_cells = n_cells_total,
stringsAsFactors = FALSE)
col_data <- metadata(d_se)$experiment_info
n_cells <- n_cells[, match(col_data$sample_id, colnames(n_cells)),
drop = FALSE]
stopifnot(all(col_data$sample_id == colnames(n_cells)))
d_counts <- SummarizedExperiment(assays = list(counts = n_cells),
rowData = row_data, colData = col_data)
d_counts
}
And the calcCounts code AFTER:
function (SCE_Granulocytes)
{
if (!is(SCE_Granulocytes, "SummarizedExperiment")) {
stop("Data object must be a 'SummarizedExperiment'")
}
if (!("metacluster_id_Granulocytes" %in% (colnames(colData(SCE_Granulocytes))))) {
stop("Data object does not contain cluster labels. Run 'generateClusters' to generate cluster labels.")
}
coldata_df <- as.data.frame(colData(SCE_Granulocytes))
n_cells <- coldata_df %>% group_by(metacluster_id_Granulocytes, sample_id,
.drop = FALSE) %>% tally %>% complete(sample_id)
n_cells <- acast(n_cells, metacluster_id_Granulocytes ~ sample_id, value.var = "n",
fill = 0)
if (nrow(n_cells) < nlevels(colData(SCE_Granulocytes)$metacluster_id_Granulocytes)) {
ix_missing <- which(!(levels(colData(SCE_Granulocytes)$metacluster_id_Granulocytes) %in%
rownames(n_cells)))
n_cells_tmp <- matrix(0, nrow = length(ix_missing),
ncol = ncol(n_cells))
rownames(n_cells_tmp) <- ix_missing
n_cells <- rbind(n_cells, n_cells_tmp)
n_cells <- n_cells[order(as.numeric(rownames(n_cells))),
, drop = FALSE]
}
n_cells_total <- rowSums(n_cells)
row_data <- data.frame(metacluster_id_Granulocytes = factor(rownames(n_cells),
levels = levels(colData(SCE_Granulocytes)$metacluster_id_Granulocytes)), n_cells = n_cells_total,
stringsAsFactors = FALSE)
col_data <- metadata(SCE_Granulocytes)$experiment_info
n_cells <- n_cells[, match(col_data$sample_id, colnames(n_cells)),
drop = FALSE]
stopifnot(all(col_data$sample_id == colnames(n_cells)))
d_counts <- SummarizedExperiment(assays = list(counts = n_cells),
rowData = row_data, colData = col_data)
d_counts
}
And it worked. I executed the block of code (I didn't run calcCounts
) and it returned the n_cells
variable, which contained exactly the information I needed - the number of cells in each sample belonging to each metacluster. It also returned the following objects:
coldata_df
- colData(SCE_Granulocytes)
in object form.col_data
- The metadata of SCE_Granulocytes
, but with n_cells for each sample as the last column.row_data
- How many cells are in each metacluster total across all samples.d_counts
- Seems to be an SCE version of col_data
.I checked that the numbers of cells were correct by exporting the fcs files made from splitting the SCE_Granulocytes by sample_id
and manually gating them to check, and all seems correct.
Could you please fix the calcCounts
function to work like this? I also need expression median data, which I will get to checking diffcyt functionality for tomorrow.
I think should be made possible to input something other than the default cluster_id
when calling the calcCounts
function. In this case, it was metacluster_id_Granulocytes
.
Did i have to manually load the three packages mentioned above? Just a curiosity for improving my coding efficiency.
EDIT: To have a "blank slate" SCE object that contained none of my editing and manipulation, I also tried generating an SCE from scratch and running calcCounts on that. The same issue was present and the same fix resolved it.
Kind regards.
Luka Tandaric
Currently adding group IDs to rowData of 'd_se' manually prior to plotting. Find a way to do this automatically, e.g. within the differential testing function when 'group_IDs' argument is first introduced. Use 'group' for column name in rowData.
Hi!
thank you for this package and for the useful tutorials!
I am working on CyTOF generated data, where I have four groups of treatments, and I want to make three pairwise comparisons
treatmentA to treatmentAB
treatmentC to treatmentCB
treatmentCB to treatmentAB
How do I create a design matrix and contrast?
this is what I did, but I am not sure it was the correct thing to do
my results don't show the different comparisons and without any statistical value
thanks!
condition = ei$condition
condition
[1] treatmentA treatmentA treatmentA treatmentA treatmentA treatmentA treatmentAB treatmentAB treatmentAB treatmentC treatmentC treatmentC treatmentC treatmentC treatmentC treatmentC treatmentCB treatmentCB treatmentCB treatmentCB
[21] treatmentCB
Levels: treatmentA treatmentAB treatmentC treatmentCB
design <- model.matrix(~0+condition)
contrasts <- makeContrasts(
treatmentAB-treatmentA,treatmentCB-treatmentC , treatmentCB -treatmentAB ,
levels=colnames(design))
da_res1 <- diffcyt(sce,
design = design, contrast = contrasts,
analysis_type = "DA", method_DA = "diffcyt-DA-GLMM",
clustering_to_use = "merging")
rowData(da_res1$res)
DataFrame with 11 rows and 3 columns
cluster_id p_val p_adj
B cells B cells NA NA
CD4 T cells CD4 T cells NA NA
CD45- CD45- NA NA
CD45- CD11b+ CD11c+ CD45- CD11b+ CD11c+ NA NA
CD45+ CD45+ NA NA
CD8 T cells CD8 T cells NA NA
DCs DCs NA NA
Granulocytes Granulocytes NA NA
Macrophages Macrophages NA NA
Monocytes Monocytes NA NA
TCRgd T cells TCRgd T cells NA NA
Hi there,
I am an R and CyTOF beginner looking at some AML and MDS patient data.
My conditions are 5 different timepoints: T1, T2, T3, T4 and T5
I have 8 patients who have received DLI and 6 patients who have not received DLI, but all are AML/MDS patients. The DLI and non-DLI patients have not been specified, as I thought I could just subset them downstream.
If I want to compare T1 to T2, T2 to T3, T3 to T4 etc. for each patient, how would I set up the contrast to do this? How can I subset the patients, instead of looking at every single patient for every single time point?
I would have just separated on DLI and no DLI instead of timepoint, but the timepoint is important as it specifies pre-transplant and post-transplant samples for the patients. It would have been too simple to just state DLI and no DLI.
Thank you kindly!
Shan
Can you please provide a list of the options that the argument markers_to_test can take?
I would like to do differential states analysis using the type markers.
markers_to_test = id_type_markers is not working.
markers_to_test = type_markers(sce) is also not working.
Using specifics marker names such as markers_to_test = "CD11b" also did not work.
I cannot find any examples in the package vignette. Any help would be appreciated. Thank you.
PS. This comment has been edited.
Hi All,
I have Cytof data, instead of doing unsupervised clustering, I found that manual gating made more sense and allowed me to find more subsets. I know have the cell subsets for each individual in an excel sheet, row(Individual) col(Cell subsets). is it possible that I use this as an input to the DA test?
Thank you a lot,
Hello,
I have a feature which contains 8 levels:
"aTNF_LPMC" "aINT_LPMC" "aIL_LPMC" "aJAK_LPMC"
"5ASA_LPMC" "Steroids_LPMC" "6MP_LPMC" "HC_LPMC".
Using contrast <- createContrast(c(0, 0, 0, 0, 0, 0, 0, 1)) it would compare my healthy controls to all other groups combined. How can I compare level number 8 to each of the other levels seperately?
Thanks so much for your help!
Hi,
thank you for this helpful package.
I raised an issue in the cytofWorkflow Github repo, because I am getting different results with diffcyt and edgeR. While I get reasonable results with edgeR, diffcyt returns NA p-values and zero abundance for some clusters across all samples (which is not true).
For now, I will just use edgeR directly. However, if you have time to look into this issue, it would be nice to know where the difference is coming from or whether the way I input my data to diffcyt is wrong.
Thank you!
Hello,
I am new to flow cytometry analysis and supporting a group remotely. The team prefers to use FCS Express for HD clustering and has little command line experience. I understand FCS Express utilizes FlowSOM, as does diffcyt, and also allows for integration of R scripts. I was wondering if you could advise on how to perform DA/DS within this FCS express environment (Windows) provided metadata/design-contrast matrices would be imported by diffcyt R script? I suppose the alternative would be to have the group export the annotated fcs files (assigned clusters) and execute the script (non-wrapper) in a Command Prompt.Thanks!
-Todd
I encounter this error occurs repeatedly. And I have not clue on the limited error message.
Can you advice some inputs?
res_DS_2vs1 = diffcyt(CD154_sce_SCIT, clustering_to_use = "meta20",
analysis_type = "DS", method_DS = "diffcyt-DS-limma",
design = design, contrast = contrast2vs1)
using SingleCellExperiment object from CATALYST as input
using cluster IDs from clustering stored in column 'meta20' of 'cluster_codes' data frame in 'metadata' of SingleCellExperiment object from CATALYST
calculating features...
Error in calcMedians(d_se) :
all(sapply(medians, function(m) { .... is not TRUE
Hi, I am running any analysis with 59 samples. I have one outcome variable so for my design, I created it as follows:
design <- createDesignMatrix(experiment_info,cols_design = c("Outcome"))
This outcome is set as a factor of 0s and 1s.
For the contrast,
contrast <- createContrast(c(0, 1, rep(0, 58)))
However when I run the check I get an error that the number of columns in the design and rows in the contrast aren't the same. Since my design only contrasts contrast, it will be two columns (intercept and outcome).
This is the final command, I am running (59 samples 64 markers where 32 of these are the ones I am interested in),
out_DA <- diffcyt(
d_input = d_flowSet,
experiment_info = experiment_info,
marker_info = marker_info,
design = design,
contrast = contrast,
analysis_type = "DA",
seed_clustering = 123
)
This is the error from diffcyt:
preparing data...
transforming data...
generating clusters...
FlowSOM clustering completed in 258.2 seconds
calculating features...
calculating DA tests using method 'diffcyt-DA-edgeR'...
Error in glmLRT(fit, contrast = contrast) :
contrast vector of wrong length, should be equal to number of coefficients in the linear model.
Can you please guide me on where I am going wrong?
Many thanks,
Cherise.
I was running an analysis based on the CATALYST workflow ran and got an error with diffcyt()
:
Error in SummarizedExperiment(assays = list(counts = n_cells), rowData = row_data, :
the rownames and colnames of the supplied assay(s) must be NULL or identical to those of
the SummarizedExperiment object (or derivative) to construct
It turns out the problem was due to my metadata data frame having row names. Having row names makes me a little bit more comfortable in case something gets out of order somewhere. I thought this was reasonable since everything else was working until I got to diffcyt()
.
To summarize, here is the default workflow (from the CATALYST vignette) that works:
library(CATALYST)
library(diffcyt)
sce <- prepData(PBMC_fs, PBMC_panel, PBMC_md)
sce <- cluster(sce, features = "type", xdim = 10, ydim = 10, maxK = 20, verbose = FALSE, seed = 1)
design <- createDesignMatrix(ei(sce), cols_design = "condition")
contrast <- createContrast(c(0, 1))
res_DA <- diffcyt(sce, clustering_to_use = "meta10", analysis_type = "DA", method_DA = "diffcyt-DA-edgeR", design = design, contrast = contrast, verbose = FALSE)
This is the problematic version (where PBMC_md
has row names):
rownames(PBMC_md) <- PBMC_md$file_name
sce <- prepData(PBMC_fs, PBMC_panel, PBMC_md)
sce <- cluster(sce, features = "type", xdim = 10, ydim = 10, maxK = 20, verbose = FALSE, seed = 1)
design <- createDesignMatrix(ei(sce), cols_design = "condition")
contrast <- createContrast(c(0, 1))
res_DA <- diffcyt(sce, clustering_to_use = "meta10", analysis_type = "DA", method_DA = "diffcyt-DA-edgeR", design = design, contrast = contrast, verbose = FALSE)
Hi Lukas
I'm running the Cytof Workflow pipeline. I get a very long message (attached below) after running
ds_res2 <- diffcyt(sce,
formula = ds_formula2, contrast = contrast,
analysis_type = "DS", method_DS = "diffcyt-DS-LMM",
clustering_to_use = "merging1b", verbose = FALSE)
I have read in other forums that I could solve this by changing some parameters. My two questions are:
How can I change in diffcyt() any of the recommended parameters if needed?
Do I need to change it or does this message mean that my data doesn't have any significant result and therefore I shouldn't change anything?
My design, contrast and formula are:
md
file_name sample_id condition patient_id batch intervention week_of_life trial_arm
1 P_31B_1.fcs P_31B Neonate P_31 B_1 post week_6 Intervention
2 P_32A_1.fcs P_32A Neonate P_32 B_1 pre week_3 Control
3 P_32B_1.fcs P_32B Neonate P_32 B_1 post week_7 Control
4 P_34A_1.fcs P_34A Neonate P_34 B_1 pre week_3 Control
5 P_35A_1.fcs P_35A Neonate P_35 B_1 pre week_3 Intervention
6 P_36A_2.fcs P_36A Neonate P_36 B_2 pre week_2 Control
ei <- metadata(sce)$experiment_info
(da_formula1 <- createFormula(ei,
cols_fixed = "trial_arm",
cols_random = "sample_id"))
(da_formula2 <- createFormula(ei,
cols_fixed = "trial_arm",
cols_random = c("sample_id", "patient_id")))
contrast <- createContrast(c(0, 1))
da_res1 <- diffcyt(sce,
formula = da_formula1, contrast = contrast,
analysis_type = "DA", method_DA = "diffcyt-DA-GLMM",
clustering_to_use = "merging1b", verbose = FALSE)
da_res2 <- diffcyt(sce,
formula = da_formula2, contrast = contrast,
analysis_type = "DA", method_DA = "diffcyt-DA-GLMM",
clustering_to_use = "merging1b", verbose = FALSE)
table(rowData(da_res1$res)$p_adj < FDR_cutoff)
table(rowData(da_res2$res)$p_adj < FDR_cutoff)
FALSE TRUE
15 2
FALSE TRUE
16 1
ds_formula1 <- createFormula(ei, cols_fixed = "trial_arm")
ds_formula2 <- createFormula(ei,
cols_fixed = "trial_arm", cols_random = "patient_id")
ds_res1 <- diffcyt(sce,
formula = ds_formula1, contrast = contrast,
analysis_type = "DS", method_DS = "diffcyt-DS-LMM",
clustering_to_use = "merging1b", verbose = FALSE)
table(rowData(ds_res1$res)$p_adj < FDR_cutoff)
ds_res2 <- diffcyt(sce,
formula = ds_formula2, contrast = contrast,
analysis_type = "DS", method_DS = "diffcyt-DS-LMM",
clustering_to_use = "merging1b", verbose = FALSE)
table(rowData(ds_res2$res)$p_adj < FDR_cutoff)
Thank you for your help.
Regards
Juan
Model failed to converge with max|grad| = 0.015349 (tol = 0.002, component 1)Model is nearly unidentifiable: very large eigenvalue
Rescale variables?Model is nearly unidentifiable: very large eigenvalue
Rescale variables?Model failed to converge with max|grad| = 0.00233629 (tol = 0.002, component 1)boundary (singular) fit: see ?isSingular
Model is nearly unidentifiable: very large eigenvalue
Rescale variables?Model failed to converge with max|grad| = 0.00555007 (tol = 0.002, component 1)Model failed to converge with max|grad| = 0.00742892 (tol = 0.002, component 1)Model failed to converge with max|grad| = 0.00240135 (tol = 0.002, component 1)Model is nearly unidentifiable: very large eigenvalue
Rescale variables?Model failed to converge with max|grad| = 0.0223681 (tol = 0.002, component 1)Model failed to converge with max|grad| = 0.0051913 (tol = 0.002, component 1)Model failed to converge with max|grad| = 0.00238886 (tol = 0.002, component 1)boundary (singular) fit: see ?isSingular
Model failed to converge with max|grad| = 0.0199976 (tol = 0.002, component 1)Model failed to converge with max|grad| = 0.0389122 (tol = 0.002, component 1)Model failed to converge with max|grad| = 0.0047067 (tol = 0.002, component 1)Model failed to converge with max|grad| = 0.00297547 (tol = 0.002, component 1)Model is nearly unidentifiable: very large eigenvalue
Rescale variables?Model is nearly unidentifiable: very large eigenvalue
Rescale variables?Model failed to converge with max|grad| = 0.0152538 (tol = 0.002, component 1)Model failed to converge with max|grad| = 0.00844551 (tol = 0.002, component 1)Model failed to converge with max|grad| = 0.0313113 (tol = 0.002, component 1)Model failed to converge with max|grad| = 0.022935 (tol = 0.002, component 1)Model is nearly unidentifiable: very large eigenvalue
Rescale variables?Model failed to converge with max|grad| = 0.0240415 (tol = 0.002, component 1)Model is nearly unidentifiable: very large eigenvalue
Rescale variables?Model failed to converge with max|grad| = 0.00217955 (tol = 0.002, component 1)Model is nearly unidentifiable: very large eigenvalue
Rescale variables?Model failed to converge with max|grad| = 0.0241478 (tol = 0.002, component 1)Model failed to converge with max|grad| = 0.00663443 (tol = 0.002, component 1)Model failed to converge with max|grad| = 0.00649518 (tol = 0.002, component 1)boundary (singular) fit: see ?isSingular
Model failed to converge with max|grad| = 0.055191 (tol = 0.002, component 1)Model is nearly unidentifiable: very large eigenvalue
Rescale variables?Model failed to converge with max|grad| = 0.104247 (tol = 0.002, component 1)Model is nearly unidentifiable: very large eigenvalue
Rescale variables?Model failed to converge with max|grad| = 0.0157847 (tol = 0.002, component 1)Model is nearly unidentifiable: very large eigenvalue
Rescale variables?Model is nearly unidentifiable: very large eigenvalue
Rescale variables?boundary (singular) fit: see ?isSingular
Model failed to converge with max|grad| = 0.0102596 (tol = 0.002, component 1)Model is nearly unidentifiable: very large eigenvalue
Rescale variables?boundary (singular) fit: see ?isSingular
[.....I have cut most of it....]
Rescale variables?Model failed to converge with max|grad| = 0.00365807 (tol = 0.002, component 1)Model is nearly unidentifiable: very large eigenvalue
Rescale variables?Model failed to converge with max|grad| = 0.00612798 (tol = 0.002, component 1)Model failed to converge with max|grad| = 0.00824278 (tol = 0.002, component 1)Model failed to converge with max|grad| = 0.00280907 (tol = 0.002, component 1)boundary (singular) fit: see ?isSingular
Model failed to converge with max|grad| = 0.00516716 (tol = 0.002, component 1)Model failed to converge with max|grad| = 0.00521023 (tol = 0.002, component 1)boundary (singular) fit: see ?isSingular
Model is nearly unidentifiable: very large eigenvalue
Rescale variables?Model failed to converge with max|grad| = 0.00921459 (tol = 0.002, component 1)boundary (singular) fit: see ?isSingular
boundary (singular) fit: see ?isSingular
boundary (singular) fit: see ?isSingular
boundary (singular) fit: see ?isSingular
Model failed to converge with max|grad| = 0.00678294 (tol = 0.002, component 1)Model is nearly unidentifiable: very large eigenvalue
Rescale variables?boundary (singular) fit: see ?isSingular
boundary (singular) fit: see ?isSingular
boundary (singular) fit: see ?isSingular
boundary (singular) fit: see ?isSingular
boundary (singular) fit: see ?isSingular
boundary (singular) fit: see ?isSingular
Hi,
I've been working through the analysis pipelines for catalyst and diffcyt, and had some questions with setting up the design and contrast matrices. For my particular code, I am feeding an sce from catalyst in to the diffcyt wrapper and am using the code below to generate my design and contrast. Here, patient_id is equal to the genotype of my mice (wt vs KO).
design <- createDesignMatrix(ei(sce), cols_design = c("patient_id"))
contrast <- createContrast(c(1,0))```
The pic below shows the design matrix this produced:
![image](https://user-images.githubusercontent.com/77988339/129968898-e6497e4a-a740-47a2-b103-c2ebe368d540.png).
The first three 1's correspond to the rows in my metadata for the mice where the 1s are for WT mice and 0s are for KO mice. Then, would I not set up my contrast to be c(1,0) since that is how the numbers are ordered in my design matrix? Such that this would compare wt vs KO? or would I order the contrast as c(0,1)?
Thanks for your help!
Matt Jenkins
Hi there,
The tutorial (https://www.bioconductor.org/packages/release/bioc/vignettes/diffcyt/inst/doc/diffcyt_workflow.html) is very useful, but it only shows how to perform the analysis between conditions.
I had performed clustering using CATALYST
.
Would you mind advising me how to adjust the codes, especially diffcyt::createDesignMatrix
and contrast
to identify differentially-expressed proteins (biomarkers) of a cluster compared to every other cluster, done for each cluster as in DEGs for scRNAseq analyses? Or is it not possible?
Thank you for your help!
Hello, I have been using CATALYST for my SCE experiment, which are discussed in these issues (please see for experimental details).
HelenaLC/CATALYST#119
HelenaLC/CATALYST#121
set.seed(123)
sce <- runDR(sce, dr = "UMAP",cells = 4000, features = "type")
reducedDimNames(sce)
## [1] "UMAP"
head(reducedDim(sce, "UMAP"))
## [,1] [,2]
## [1,] NA NA
## [2,] NA NA
## [3,] NA NA
tail(reducedDim(sce, "UMAP"))
## [,1] [,2]
## [578650,] 1.430724 0.2447781
## [578651,] NA NA
## [578652,] NA NA
## Differential testing with diffcyt
### create design & contrast matrix
```r
design <- createDesignMatrix(ei(sce), cols_design = "condition")
contrast <- createContrast(c(0, 1))
res_DA <- diffcyt(sce, clustering_to_use = "meta10",
analysis_type = "DA", method_DA = "diffcyt-DA-edgeR",
design = design, contrast = contrast, verbose = FALSE)
res_DS <- diffcyt(sce, clustering_to_use = "meta10",
analysis_type = "DS", method_DS = "diffcyt-DS-limma",
design = design, contrast = contrast, verbose = FALSE)
Error in glmFit.default(y = y$counts, design = design, dispersion = dispersion, : NA dispersions not allowed
I am guessing this is due to all the NAs. If yes then why does this happen and how to get rid of NAs here ?
Hey,
I'm using the following code from the workflow vignette to read my .fcs files into diffcyt pipeline.
files <- list.files(
path = "path/to/files", pattern = "\.fcs$", full.names = TRUE
)
d_flowSet <- read.flowSet(
files, transformation = FALSE, truncate_max_range = FALSE
).
It is returning the following error message:
Error in frameList[isFrame] : invalid subscript type 'list'
I was wondering what the way around this is?
In my dataset, I have 2 conditions I would like to analyse:
All of the patients have a baseline measurement and a measurement that was done after stimulation. I want to compare the stimulation condition within each treatment group and between the treatment groups.
For this, I would like to compare the results of the implemented methods for differential abundance and differential states but I am unsure if I understand correctly, how the contrast matrix has to be specified.
Case 1: Let's say I wanted to run edgeR on this. I would first create a design matrix with my two conditions like this:
createDesignMatrix(experiment_info, cols_design = c("treatment", "stimulation", "patient_id"))
which yields the columns:
"(Intercept)", "treatmentB", "treatmentC", "stimulationAfter", "patient_idP2", ..., "patient_idP10". Should I create my contrast matrix then with
createContrast( c( 0, 1, 1, 1, rep(0, 9) ) )
?
Case 2: Let's say I now wanted run a GLMM for comparison. For the GLMM, I would specify my formula with:
createFormula(experiment_info, cols_fixed = c("treatment", "stimulation"), cols_random = "patient_id")
. Now, my contrast has to correspond to the levels of my fixed effect terms. Should I therefore specify the contrast matrix with:
createContrast( c(1, 1, 1, 1, 1) )
or does the first entry have to be a zero: createContrast( c(0, 1, 1, 0, 1) )
to get a meaningful analysis?
Would this contrast matrix lead to an all-against-all comparison, i.e. every treatment group in every stimulation condition? If I then wanted to compare the different treatment groups only in the baseline stimulation group, what contrast do I have to specify?
Another thing that confuses me is that Wikipedia says that the coefficients of contrast add up to zero. However, in your examples in diffcyt workflow
and in ?createContrast
this is not the case. Is this done internally?
Thank you for your time!
Hi, there is an issue with vignette rebuilding:
Error: processing vignette 'diffcyt_workflow.Rmd' failed with diagnostics:
package or namespace load failed for 'HDCytoData':
.onLoad failed in loadNamespace() for 'HDCytoData', details:
call: FUN(X[[i]], ...)
error: 'Levine_32dim_SE' not found in ExperimentHub
Yours,
Steffen
Hi Lukas,
Congratulations on getting the paper published!
I was trying my data with the new SCE object in CATALYST and diffcyt package. For some reason, when I do the differential state analysis either with limma or LMM, they both fail in slightly different ways.
In case of LMM I get all p-values as NA and in case of limma I get the following message:
Error in contrasts.fit(fit, contrast) : Number of rows of contrast matrix must match number of coefficients in fit
Contrast that I used is :
contrast <- createContrast(c(0, 1,0,0))
Formula is:
ds_formula1 <- createFormula(ei,
cols_fixed = "condition")
The command used is:
ds_res3 <- diffcyt(sce, formula = ds_formula1, contrast = contrast, analysis_type = "DS", method_DS = "diffcyt-DS-limma", clustering_to_use = "meta14", verbose = TRUE,subsampling = 1000,transform = F)
Output:
using SingleCellExperiment object from CATALYST as input
using cluster IDs from clustering stored in column 'meta14' of 'cluster_codes' data frame in 'metadata' of SingleCellExperiment object from CATALYST
calculating features...
calculating DS tests using method 'diffcyt-DS-limma'...
Error in contrasts.fit(fit, contrast) : Number of rows of contrast matrix must match number of coefficients in fit
I have four groups in this dataset with 16 files in total.
Here is the link to the SCE data object in dropbox:
https://www.dropbox.com/s/ceqc28hcq0j8wz5/sce_dbglgata.rda?dl=0
I am sure I am making a silly mistake somewhere but I cannot figure it out even after looking at the source code.
Thanks!
Hena
` sessionInfo()
R version 3.6.2 (2019-12-12)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS Catalina 10.15.1
Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRlapack.dylib
Random number generation:
RNG: Mersenne-Twister
Normal: Inversion
Sample: Rounding
locale:
[1] en_CA.UTF-8/en_CA.UTF-8/en_CA.UTF-8/C/en_CA.UTF-8/en_CA.UTF-8
attached base packages:
[1] parallel stats4 stats graphics grDevices utils datasets methods base
other attached packages:
[1] DT_0.11 diffcyt_1.6.1 ggplot2_3.2.1
[4] flowCore_1.52.1 SingleCellExperiment_1.8.0 SummarizedExperiment_1.16.1
[7] DelayedArray_0.12.2 BiocParallel_1.20.1 matrixStats_0.55.0
[10] Biobase_2.46.0 GenomicRanges_1.38.0 GenomeInfoDb_1.22.0
[13] IRanges_2.20.2 S4Vectors_0.24.3 BiocGenerics_0.32.0
[16] CATALYST_1.10.1 limma_3.42.0
loaded via a namespace (and not attached):
[1] shinydashboard_0.7.1 R.utils_2.9.2 ks_1.11.6
[4] lme4_1.1-21 tidyselect_0.2.5 htmlwidgets_1.5.1
[7] grid_3.6.2 Rtsne_0.15 munsell_0.5.0
[10] codetools_0.2-16 withr_2.1.2 colorspace_1.4-1
[13] flowViz_1.50.0 knitr_1.27 rstudioapi_0.10
[16] flowClust_3.24.0 robustbase_0.93-5 openCyto_1.24.0
[19] GenomeInfoDbData_1.2.2 mnormt_1.5-5 flowWorkspace_3.34.1
[22] vctrs_0.2.1 TH.data_1.0-10 xfun_0.12
[25] R6_2.4.1 ggbeeswarm_0.6.0 clue_0.3-57
[28] rsvd_1.0.2 locfit_1.5-9.1 bitops_1.0-6
[31] assertthat_0.2.1 promises_1.1.0 scales_1.1.0
[34] multcomp_1.4-12 beeswarm_0.2.3 gtable_0.3.0
[37] sandwich_2.5-1 rlang_0.4.2 zeallot_0.1.0
[40] GlobalOptions_0.1.1 splines_3.6.2 lazyeval_0.2.2
[43] hexbin_1.28.0 shinyBS_0.61 yaml_2.2.0
[46] reshape2_1.4.3 abind_1.4-5 backports_1.1.5
[49] httpuv_1.5.2 IDPmisc_1.1.20 RBGL_1.62.1
[52] tools_3.6.2 ellipsis_0.3.0 RColorBrewer_1.1-2
[55] ggridges_0.5.2 Rcpp_1.0.3 plyr_1.8.5
[58] base64enc_0.1-3 zlibbioc_1.32.0 purrr_0.3.3
[61] RCurl_1.98-1.1 FlowSOM_1.18.0 GetoptLong_0.1.8
[64] viridis_0.5.1 cowplot_1.0.0 zoo_1.8-7
[67] haven_2.2.0 ggrepel_0.8.1 cluster_2.1.0
[70] fda_2.4.8 magrittr_1.5 ncdfFlow_2.32.0
[73] data.table_1.12.8 openxlsx_4.1.4 circlize_0.4.8
[76] mvtnorm_1.0-12 hms_0.5.3 shinyjs_1.1
[79] mime_0.8 xtable_1.8-4 XML_3.99-0.3
[82] rio_0.5.16 jpeg_0.1-8.1 mclust_5.4.5
[85] readxl_1.3.1 gridExtra_2.3 shape_1.4.4
[88] ggcyto_1.14.0 compiler_3.6.2 scater_1.14.6
[91] ellipse_0.4.1 tibble_2.1.3 flowStats_3.44.0
[94] KernSmooth_2.23-16 crayon_1.3.4 minqa_1.2.4
[97] R.oo_1.23.0 htmltools_0.4.0 corpcor_1.6.9
[100] pcaPP_1.9-73 later_1.0.0 tidyr_1.0.0
[103] rrcov_1.5-2 RcppParallel_4.4.4 ComplexHeatmap_2.2.0
[106] MASS_7.3-51.5 boot_1.3-24 Matrix_1.2-18
[109] car_3.0-6 R.methodsS3_1.7.1 igraph_1.2.4.2
[112] forcats_0.4.0 pkgconfig_2.0.3 foreign_0.8-75
[115] plotly_4.9.1 vipor_0.4.5 XVector_0.26.0
[118] drc_3.0-1 stringr_1.4.0 digest_0.6.23
[121] tsne_0.1-3 ConsensusClusterPlus_1.50.0 graph_1.64.0
[124] cellranger_1.1.0 edgeR_3.28.0 DelayedMatrixStats_1.8.0
[127] curl_4.3 shiny_1.4.0 gtools_3.8.1
[130] nloptr_1.2.1 rjson_0.2.20 nlme_3.1-143
[133] lifecycle_0.1.0 jsonlite_1.6 carData_3.0-3
[136] BiocNeighbors_1.4.1 viridisLite_0.3.0 pillar_1.4.3
[139] lattice_0.20-38 fastmap_1.0.1 httr_1.4.1
[142] plotrix_3.7-7 DEoptimR_1.0-8 survival_3.1-8
[145] glue_1.3.1 zip_2.0.4 png_0.1-7
[148] Rgraphviz_2.30.0 stringi_1.4.5 nnls_1.4
[151] BiocSingular_1.2.1 CytoML_1.12.0 latticeExtra_0.6-29
[154] dplyr_0.8.3 irlba_2.3.3 `
Dear Lukas,
I have fixed the calcCounts
and calcMedians
functions so that the rows and columns of SCEs are read properly.
I have also added an option to pass an argument to the function that sets the (meta)clustering (from colnames(colData(SCE))
) to be used when making the calculations. The argument has to be passed as a string (in "double quotes").
This function outputs counts per (meta)cluster per sample:
# Define the function that replaces calcCounts:
# SCE = The summarized experiment object the counts calculation will be done on.
# clustering = A (meta)clustering from colnames(colData(SCE)) (in quotes ""). The cell counts will be calculated for each (meta)cluster.
calcCountsFixed <- function (SCE, clustering = "cluster_id"){
if (!is(SCE, "SummarizedExperiment")) {
stop("Data object must be a 'SummarizedExperiment'.")
}
if (!(clustering %in% (colnames(colData(SCE))))) {
stop("Data object does not contain the given clustering. Check colnames(colData(SCE)).")
}
coldata_df <- as.data.frame(colData(SCE))
n_cells <- coldata_df %>% group_by(!!rlang::sym(clustering), sample_id, .drop = FALSE) %>% tally %>% complete(sample_id)
n_cells <- acast(n_cells, paste0(clustering, " ~ sample_id"), value.var = "n", fill = 0)
if (nrow(n_cells) < nlevels(colData(SCE)[[clustering]])) {
missing_markers <- which(!(levels(colData(SCE)[[clustering]]) %in% rownames(n_cells)))
n_cells_tmp <- matrix(0, nrow = length(missing_markers), ncol = ncol(n_cells))
rownames(n_cells_tmp) <- missing_markers
n_cells <- rbind(n_cells, n_cells_tmp)
n_cells <- n_cells[order(as.numeric(rownames(n_cells))), , drop = FALSE]
}
n_cells_total <- rowSums(n_cells)
row_data <- setNames(data.frame(factor(rownames(n_cells), levels = levels(colData(SCE)[[clustering]])), n_cells_total, stringsAsFactors = FALSE),
c(clustering, "n_cells"))
col_data <- metadata(SCE)$experiment_info
n_cells <- n_cells[, match(col_data$sample_id, colnames(n_cells)), drop = FALSE]
stopifnot(all(col_data$sample_id == colnames(n_cells)))
d_counts <- SummarizedExperiment(assays = list(counts = n_cells),
rowData = row_data,
colData = col_data)
d_counts
}
This function outputs the median signal values from the arcSinh-transformed signal values:
# Define the function that replaces calcMedians:
# SCE = The summarized experiment object the counts calculation will be done on.
# clustering = A (meta)clustering (in quotes "") from colnames(colData(SCE)). The cell counts will be calculated for each (meta)cluster.
calcMediansFixed <- function (SCE, clustering = "cluster_id"){
if (!is(SCE, "SummarizedExperiment")) {
stop("Data object must be a 'SummarizedExperiment'.")
}
if (!(clustering %in% (colnames(colData(SCE))))) {
stop("Data object does not contain the given clustering. Check colnames(colData(SCE)).")
}
is_marker <- rowData(SCE)$marker_class != "none"
id_type_markers <- (rowData(SCE)$marker_class == "type")[is_marker]
id_state_markers <- (rowData(SCE)$marker_class == "state")[is_marker]
assaydata_mx <- as.matrix(assays(SCE)[["exprs"]])
medians <- vector("list", sum(is_marker))
marker_names_sub <- as.character(rowData(SCE)$marker_name[is_marker])
names(medians) <- marker_names_sub
clus <- colData(SCE)[[clustering]]
smp <- colData(SCE)$sample_id
for (i in seq_along(medians)) {
assaydata_i <- t(assaydata_mx[marker_names_sub[i], , drop = FALSE])
assaydata_i <- as.data.frame(assaydata_i)
temp_clus <- data.frame(clus)
names(temp_clus) <- clustering
assaydata_i <- cbind(assaydata_i, sample_id = smp, temp_clus)
colnames(assaydata_i)[1] <- "value"
med <- assaydata_i %>% group_by(!!rlang::sym(clustering), sample_id, .drop = FALSE) %>% summarize(median = median(value))
med <- acast(med, paste0(clustering, " ~ sample_id"), value.var = "median", fill = NA)
if (nrow(med) < nlevels(colData(SCE)[[clustering]])) {
missing_markers <- which(!(levels(colData(SCE)[[clustering]]) %in% rownames(med)))
med_tmp <- matrix(NA, nrow = length(missing_markers), ncol = ncol(med))
rownames(med_tmp) <- missing_markers
med <- rbind(med, med_tmp)
med <- med[order(as.numeric(rownames(med))), , drop = FALSE]
}
medians[[i]] <- med
}
for (i in seq_along(medians)) {
if (!all(rownames(medians[[i]]) == rownames(medians[[1]]))) {
stop("Cluster IDs do not match")
}
if (!all(colnames(medians[[i]]) == colnames(medians[[1]]))) {
stop("Sample IDs do not match")
}
}
row_data <- setNames(data.frame(factor(rownames(medians[[1]]), levels = levels(colData(SCE)[[clustering]])), stringsAsFactors = FALSE),
c(clustering))
col_data <- metadata(SCE)$experiment_info
medians <- lapply(medians, function(m) {
m[, match(col_data$sample_id, colnames(m)), drop = FALSE]
})
stopifnot(all(sapply(medians, function(m) {
col_data$sample_id == colnames(m)
})))
metadata <- list(id_type_markers = id_type_markers, id_state_markers = id_state_markers)
d_medians <- SummarizedExperiment(assays = medians, rowData = row_data,
colData = col_data, metadata = metadata)
d_medians
}
This function outputs the median data made from raw signal values:
# The previous function outputs the median expression levels as arcSinh-transformed values with an argument of 5 (or which ever number
# you set as the value for the "cofactor" in "prepData"). In the case of CyTOF data, "cofactor = 5", which gives the formula "asinh(x/5)".
# In the following function, "[["exprs"]]" has been replaced by "[["counts"]]". This makes the function output raw, untransformed medians
# of marker expression. Be advised, arcSinh-transformed values should be used for statistical analyses due to being more comparable
# between markers and/or samples, unlike raw values, which can, for instance, vary from 5 to 5000.
# SCE = The summarized experiment object the counts calculation will be done on.
# clustering = A (meta)clustering (in quotes "") from colnames(colData(SCE)). The cell counts will be calculated for each (meta)cluster.
calcMediansFixed_raw <- function (SCE, clustering = "cluster_id"){
if (!is(SCE, "SummarizedExperiment")) {
stop("Data object must be a 'SummarizedExperiment'.")
}
if (!(clustering %in% (colnames(colData(SCE))))) {
stop("Data object does not contain the given clustering. Check colnames(colData(SCE)).")
}
is_marker <- rowData(SCE)$marker_class != "none"
id_type_markers <- (rowData(SCE)$marker_class == "type")[is_marker]
id_state_markers <- (rowData(SCE)$marker_class == "state")[is_marker]
assaydata_mx <- as.matrix(assays(SCE)[["counts"]])
medians <- vector("list", sum(is_marker))
marker_names_sub <- as.character(rowData(SCE)$marker_name[is_marker])
names(medians) <- marker_names_sub
clus <- colData(SCE)[[clustering]]
smp <- colData(SCE)$sample_id
for (i in seq_along(medians)) {
assaydata_i <- t(assaydata_mx[marker_names_sub[i], , drop = FALSE])
assaydata_i <- as.data.frame(assaydata_i)
temp_clus <- data.frame(clus)
names(temp_clus) <- clustering
assaydata_i <- cbind(assaydata_i, sample_id = smp, temp_clus)
colnames(assaydata_i)[1] <- "value"
med <- assaydata_i %>% group_by(!!rlang::sym(clustering), sample_id, .drop = FALSE) %>% summarize(median = median(value))
med <- acast(med, paste0(clustering, " ~ sample_id"), value.var = "median", fill = NA)
if (nrow(med) < nlevels(colData(SCE)[[clustering]])) {
missing_markers <- which(!(levels(colData(SCE)[[clustering]]) %in% rownames(med)))
med_tmp <- matrix(NA, nrow = length(missing_markers), ncol = ncol(med))
rownames(med_tmp) <- missing_markers
med <- rbind(med, med_tmp)
med <- med[order(as.numeric(rownames(med))), , drop = FALSE]
}
medians[[i]] <- med
}
for (i in seq_along(medians)) {
if (!all(rownames(medians[[i]]) == rownames(medians[[1]]))) {
stop("Cluster IDs do not match")
}
if (!all(colnames(medians[[i]]) == colnames(medians[[1]]))) {
stop("Sample IDs do not match")
}
}
row_data <- setNames(data.frame(factor(rownames(medians[[1]]), levels = levels(colData(SCE)[[clustering]])), stringsAsFactors = FALSE),
c(clustering))
col_data <- metadata(SCE)$experiment_info
medians <- lapply(medians, function(m) {
m[, match(col_data$sample_id, colnames(m)), drop = FALSE]
})
stopifnot(all(sapply(medians, function(m) {
col_data$sample_id == colnames(m)
})))
metadata <- list(id_type_markers = id_type_markers, id_state_markers = id_state_markers)
d_medians <- SummarizedExperiment(assays = medians, rowData = row_data,
colData = col_data, metadata = metadata)
d_medians
}
Please let me know if you decide to incorporate them into diffcyt.
Kind regards.
Luka Tandaric
Dear Lukas,
I am using diffcyt packege following option number 1 from your tutorial. In the end of analysis I get output which can be displayed using heatmap and plotHeatmap function.
I was wondering if it is possible to custamize the option to display p_val instead of p_adj when I plot a hetmap with results. I am sorry if you mentioned it in your tutorial and I missed it.
Thank you in advance!
Best,
Ekaterina
Hi Lukas,
This isn't so much an issue but more of a query.
Is there a way to extract the median marker value for each cluster? I'd like to use these data to help identify the cell type of each cluster.
The calcMedians()
function does something close but for each sample and cluster times by the number of markers.
Many thanks,
Lucas
Hi Dr. Weber @lmweber ,
Is it possible to know the direction of the differential results from diffcyt? I can see which clusters are significant but I don't know the direction of the difference. I have two thoughts to achieve this. Can you please comment on which one would work: a) Run the diffcyt analysis with reference group set to 0 and disease group set to 1 for the contrast matrix and then repeat the analysis with reference group set to 1 and disease group set to 0. b) Simply set the reference group to -1 and disease group to 1 for the contrast matrix.
Thank you!
Hi,
I would like to reinstall an older version of CATALYST, as a daframe object built on CATALYST 1.9.7 gives error when I try to load it on CATALYST 1.10.0. Is there a way around this? I want to avoid doing a re-analysis with new version as it might change some things.
Thanks,
Hello Lukas,
Thank-you for Diffcyt.
I had a quick question about setting up contrast matrix for mixed model. I hypothesize that the cell cluster proportions depend on the condition (case-status, binary) when corrected for age and sex (fixed effect), and the date of batch processing (categorical, 4 values, random effect)
ei <- metadata(sce)$experiment_info
> head(ei)
sample_id condition sex age date n_cells
1 E23 Control Female 46 16-AUG-2019 150000
2 E25 Control Male 42 28-AUG-2019 150000
3 E26 Control Female 45 16-AUG-2019 150000
4 E28 Control Female 52 16-AUG-2019 150000
5 E29 Case Female 40 28-AUG-2019 150000
6 E30 Case Female 42 28-AUG-2019 150000
> str(ei)
'data.frame': 102 obs. of 6 variables:
$ sample_id: Factor w/ 102 levels "E23","E25","E26",..: 1 2 3 4 5 6 7 8 9 10 ...
$ condition: Factor w/ 2 levels "Control","Case": 1 1 1 1 2 2 2 1 1 2 ...
$ sex : Factor w/ 2 levels "Female","Male": 1 2 1 1 1 1 1 1 1 1 ...
$ age : num 46 42 45 52 40 42 52 52 46 62 ...
$ date : Factor w/ 4 levels "08-OCT-2019",..: 2 4 2 2 4 4 1 2 2 4 ...
$ n_cells : num 150000 150000 150000 150000 150000 150000 150000 150000 150000 150000 ...
da_formula1 <- createFormula(ei,
cols_fixed = c("condition", "age", "sex"),
cols_random = "date"
)
> da_formula1[["formula"]]
y ~ condition + age + sex + (1 | date)
contrast <- createContrast(c(0, 1, 0, 0))
> contrast
[,1]
[1,] 0
[2,] 1
[3,] 0
[4,] 0
da_res1 <- diffcyt(sce,
formula = da_formula1, contrast = contrast,
analysis_type = "DA", method_DA = "diffcyt-DA-GLMM",
clustering_to_use = "meta10", verbose = T
)
topTable(da_res1, format_vals = T)
> topTable(da_res1, format_vals = T)
DataFrame with 10 rows and 3 columns
cluster_id p_val p_adj
<factor> <numeric> <numeric>
1 1 0.00e+00 0.00e+00
2 2 0.00e+00 0.00e+00
3 3 0.00e+00 0.00e+00
4 4 0.00e+00 0.00e+00
5 5 0.00e+00 0.00e+00
6 6 0.00e+00 0.00e+00
7 7 0.00e+00 0.00e+00
8 8 0.00e+00 0.00e+00
10 10 1.78e-15 1.97e-15
9 9 7.95e-01 7.95e-01
Either I am about to publish a Science / Nature paper or I messed up (the probability of latter is close to 1).
Could you please let me know where am I messing up?
Also, like Toptable, is there a quick & easy way to check the effect size?
Thanks a lot!
Vivek
Hello
I am using diffcyt and CATALYST to analyse flow data. Everything is working very well so far.
My dataset is made of samples from individuals who differ by gender but I have several time points for each individual.
Iis there a way to run a DA analyze not only by gender but also accounting for the longitudinal design . The underlying question would be something like : does the clustering evolve differently over time (i.e. aging ) between women and men?
For now, I tried lmer model outside of diffcyt but it would be nice to use your package. if not available, do you have any advices?
Thank you in advance,
Hi!
Thank you for sharing the code. I just have a quick question. I have a list of numbered patients, from ID_001 to ID_120, and two classes. However, these are intermingled. My issue is that when I get the results and plot the heatmap (PlotDiffHeatmap) I would like to see all patients of one class on the left and the rest on the right, so that I can compare expression and abundance more clearly, without renaming every patient (It's a standardized nomenclature). Basically, in the image attached all the green patients and purple grouped by condition.
I have tried sorting the md object but it doesn't work. Is there any easy workaround?
Thanks!!
DA_edgeR.pdf
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.