Giter VIP home page Giter VIP logo

alasca's People

Contributors

andjar avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

alasca's Issues

ID error in cross-sectional analysis

Hi!
I want to apply ALASCA for a design that does not include repeated measurements similar to the first model in example 3. To test it, I first downloaded the files of example 3 (https://figshare.com/articles/software/ALASCA_An_R_package_for_longitudinal_and_cross-sectional_analysis_of_multivariate_data_by_ASCA-based_methods/21362979/1) and when running the model (02.ex3.part1) it generated an error with the IDs, as follows:

INFO [2023-01-15 11:17:47] Initializing ALASCA (v1.0.7, 2022-12-10)
WARN [2023-01-15 11:17:47] Guessing effects: disease
INFO [2023-01-15 11:17:47] Will use linear models!
INFO [2023-01-15 11:17:47] Will use Rfast!
WARN [2023-01-15 11:17:47] Converting IDs to integer values
WARN [2023-01-15 11:17:47] The disease column is used for stratification
fstcore package v0.9.14
(OpenMP detected, using 20 threads)
INFO [2023-01-15 11:17:48] Scaling data with sdref ...
WARN [2023-01-15 11:17:48] The scaling sdref has been replaced by sdt1 as there is only one effect term. This corresponds to the column disease
INFO [2023-01-15 11:17:50] Calculating LM coefficients
INFO [2023-01-15 11:17:50] Reducing the number of dimensions with PCA
INFO [2023-01-15 11:17:54] Keeping 111 components from initial PCA, explaining 95.11 % of variation. The limit can be changed with reduce_dimensions.limit
INFO [2023-01-15 11:17:54] -> Finished the reduction of dimensions!
Error in bmerge(i, x, leftcols, rightcols, roll, rollends, nomatch, mult, :
Incompatible join types: x.ID (integer) and i.V1 (character)

The same error appears when trying to apply it to my data. However, when I run the script 02.ex3.part2 , example 1 and example 2, which are repeated measurements, they were processed without any problem.

How could I solve this problem?
Thank you so much for your time and help.

Sincerely,
Cynthia

sessionInfo()
R version 4.2.2 (2022-10-31 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 22621)

Matrix products: default

locale:
[1] LC_COLLATE=Spanish_Mexico.utf8 LC_CTYPE=Spanish_Mexico.utf8
[3] LC_MONETARY=Spanish_Mexico.utf8 LC_NUMERIC=C
[5] LC_TIME=Spanish_Mexico.utf8

attached base packages:
[1] stats graphics grDevices utils datasets methods base

other attached packages:
[1] fstcore_0.9.14 ggrepel_0.9.2 ggplot2_3.4.0 ALASCA_1.0.7
[5] data.table_1.14.6 here_1.0.1

loaded via a namespace (and not attached):
[1] Rcpp_1.0.9 log4r_0.4.3 pillar_1.8.1 compiler_4.2.2
[5] tools_4.2.2 bit_4.0.5 lifecycle_1.0.3 tibble_3.1.8
[9] gtable_0.3.1 pkgconfig_2.0.3 rlang_1.0.6 cli_3.4.1
[13] DBI_1.1.3 rstudioapi_0.14 parallel_4.2.2 duckdb_0.6.1
[17] withr_2.5.0 dplyr_1.0.10 generics_0.1.3 vctrs_0.5.1
[21] hms_1.1.2 RcppZiggurat_0.1.6 Rfast_2.0.6 rprojroot_2.0.3
[25] bit64_4.0.5 grid_4.2.2 tidyselect_1.2.0 glue_1.6.2
[29] R6_2.5.1 fansi_1.0.3 vroom_1.6.0 tzdb_0.3.0
[33] readr_2.1.3 magrittr_2.0.3 scales_1.2.1 ellipsis_0.3.2
[37] fst_0.9.8 assertthat_0.2.1 colorspace_2.0-3 utf8_1.2.2
[41] munsell_0.5.0 crayon_1.5.2

Plot function prnts output as showing x number of variables, but not actually

Hello,

This was an issue I discovered by chance.

I have my data in the long format, and I have 13 dependent variables. All NAs are omitted from the dataset.

The model itself runs fine, with no error.

When I plot the effect plot, it prints "Showing 13 of 13 variables. Adjust the number with n_limit"

However, in the loadings plot, there is a variable missing. This is irrespective of what type of plot I'm using to look at the loadings. The variable does not show up on the histogram plot, nor the loadings.

And it is just one variable, per principal component, and not even the same every time.

And yes, I have the latest version of the package.

Happy for any lead!

image

plot(mod1, effect = c(1), component = c(1,2), type = 'effect')
INFO [2024-05-15 15:46:16] Effect plot. Selected effect (nr 1): Age. Component: 1 and 2.
WARN [2024-05-15 15:46:16] Showing 13 of 13 variables. Adjust the number with n_limit
WARN [2024-05-15 15:46:16] Showing 13 of 13 variables. Adjust the number with n_limit

Some errors applying ALASCA to longitudinal (repeat measures) microbial counts

Hi Anders!

This is an excellent package! And one of the few to accept repeat measures intelligently! (talking to you vegan)
I commend you on making it so user friendly. I have a special use case (or hopefully not so special) where I want to model microbial community change in a time series. This data seem to fit with the general requirements:
values = microbial abundance (~500 spp.),
time = days (x5, including day 0),
group = individual or biological replicate (x4).
sub_group = technical replicates (3/replicate/day)

My experimental setup is 4 biological replicates (individuals) sampled across Days (0 baseline, 1, 3, 7, and 14). Each biological replicate has 3 technical replicates at each time point (pseudo-replication).

Question 1. I may need a slightly different model structure than your examples define since I have technical replicates. (4 individuals) * (5 days) * (3 tech. replicates) = 60 observations. I'm only interested in the change over time (fixed effect), not in each individuals contribution (individuals are my blocks) and I'd like to explicitly model the technical replicates (i.e, sub_group) nested in each individual (i.e., group) instead of averaging them outside the model. Should I set the random effect to be (group|sub_group)?

model.formula2 <- value ~ time*group + (group|sub_group)
Some results:
16s_time_series_ALASCA_PC1
output from validate = F

Question 2. I ran my model and I get a usable object (awesome!) but when I validate it I get interesting differences between the methods. With bootstrap validation it runs fine. With permutations (see error below). With "loo" ... R crashes. In general, are these bootstraps (or optionally permutations) aware of the model formula? I know some analyses packages use the permute package which requires a call to how() to set blocks, plots etc. so that permutations are not "free" but constrained to only the independent blocks of your study design. How is this handled in ALASCA?

Some issues I found with my unorthodox dataset:

summary(mod$regr.model[[1]]) #returns NULL in every case
Length Class Mode
0 NULL NULL

Also this warning message when I use permutations instead of bootstrap:

  	PE.mod <- ALASCA(df_long, model.formula2, separateTimeAndGroup = T,  useRfast = T, forceEqualBaseline = T, validate = T, validateRegression = F, validationMethod = "permutation", nValRuns = 100)

====== ALASCA ======

0.0.0.106 (2022-01-22)

Will use linear mixed models!
Using group for stratification.
Scaling data...
Calculating LMM coefficients...
Finished calculating regression coefficients!
Calculating predictions from regression models...
Finished calculating predictions from regression models!
Calculating effect matrix
Finished calculating effect matrix!
Running validation...

  • Run 1 of 1000
    Error in prepareValidationRun(object) : object 'temp_object' not found

Thanks for your time Anders,

Sam

sessionInfo()
R version 4.1.2 (2021-11-01)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)

Matrix products: default
BLAS/LAPACK: /usr/lib64/libopenblasp-r0.3.3.so

locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats4 stats graphics grDevices utils datasets methods base

other attached packages:
[1] ALASCA_0.0.0.105 data.table_1.14.2 ggvegan_0.1-0 pairwiseAdonis_0.4
[5] cluster_2.1.2 patchwork_1.1.1 MicrobiotaProcess_1.6.3 weathermetrics_1.2.2
[9] tidyquant_1.0.3 quantmod_0.4.18 TTR_0.24.3 PerformanceAnalytics_2.0.4
[13] xts_0.12.1 zoo_1.8-9 ggtext_0.1.1 lubridate_1.8.0
[17] wesanderson_0.3.6 viridis_0.6.2 viridisLite_0.4.0 Cairo_1.5-14
[21] cowplot_1.1.1 ggthemes_4.2.4 magrittr_2.0.1 reshape_0.8.8
[25] reshape2_1.4.4 forcats_0.5.1 stringr_1.4.0 dplyr_1.0.7
[29] purrr_0.3.4 readr_2.1.1 tidyr_1.1.4 tibble_3.1.6
[33] tidyverse_1.3.1 DivNet_0.4.0 breakaway_4.7.6 DESeq2_1.34.0
[37] SummarizedExperiment_1.24.0 Biobase_2.54.0 MatrixGenerics_1.6.0 matrixStats_0.61.0
[41] GenomicRanges_1.46.1 GenomeInfoDb_1.30.0 IRanges_2.28.0 S4Vectors_0.32.3
[45] BiocGenerics_0.40.0 metagMisc_0.0.4 microbiome_1.16.0 ggplot2_3.3.5
[49] phyloseq_1.38.0 vegan_2.5-7 lattice_0.20-45 permute_0.9-5
[53] ANCOMBC_1.4.0 corrplot_0.92 pvclust_2.2-0 dendextend_1.15.2

Error: Mat::init()

Hey there,
I tried to perform an ALASCA model using a longitudinal dataset of 1260 rows and 5 columns (colnames: ID, Timepoint, Group, Variable, value).

Call for the model:
mod <- ALASCA(df = longitudinal_data, formula = value ~ Timepoint*Group + (1|"Patient code"), scale_function = "sdt1", validate = TRUE, ignore_missing_covars = T)

Unfortunately this error comes out:

INFO  [2023-10-18 11:11:14] Initializing ALASCA (v1.0.11, 2023-06-19)
WARN  [2023-10-18 11:11:14] Guessing effects: `Timepoint+Timepoint:Group+Group`
INFO  [2023-10-18 11:11:14] Will use linear mixed models!
INFO  [2023-10-18 11:11:14] Will use Rfast!
WARN  [2023-10-18 11:11:14] Converting IDs to integer values
WARN  [2023-10-18 11:11:14] The `Timepoint` column is used for stratification
WARN  [2023-10-18 11:11:14] Converting `character` columns to factors
WARN  [2023-10-18 11:11:14] Predictor variables missing for some samples! Continue with caution!
INFO  [2023-10-18 11:11:14] Scaling data with sdt1 ...
INFO  [2023-10-18 11:11:14] Calculating LMM coefficients

Error: Mat::init(): requested size is too large; suggest to enable ARMA_64BIT_WORD

Of course I've tried to delve into various Stackoverflow's threads withount any success.
Do you know how to deal with this error?
Best,

DB

P.S. I'm working on macOS Sonoma 14.0 on Macbook Pro 16" M1 Max

Error in ALASCA with scale_function = "none"

Hi Anders!
I continue testing the fantastic ALASCA package. This time I am trying a previously normalized and standardized database, so when using the ALASCA function, I used the condition scale_function = "none". The function starts running without a problem, but when it reaches the scaling, it generates an error. I tried with the database of Example 2, and the same thing happened:

if (!file.exists(here("output/ex2.part1A/validation_IDs.csv"))) {

  • mod <- ALASCA(
  • df,
    
  • formula = value ~ time + time:group + (1|ID),
    
  • wide = TRUE,
    
  • separate_effects = TRUE,
    
  • equal_baseline = TRUE,
    
  • scale_function = "none",
    
  • filepath = here("output","ex2.part1A"),
    
  • filename = "model.ex2.part1",
    
  • n_validation_runs = 1000,
    
  • validate = TRUE,
    
  • save = TRUE,
    
  • validation_method = "bootstrap",
    
  • save_validation_ids = TRUE
    
  • )
  • flip(mod, effect = 1)
  • plot(mod, effect = c(1,2), component = 1)
  • plot(mod, effect = c(1,2), component = 2)
  • plot(mod, effect = 1, component = c(1,2), type = "2D")
  • plot(mod, effect = 2, component = c(1,2), type = "2D")
  • plot(mod, effect = 1, component = 1, type = "validation")
  • plot(mod, effect = 1, component = 2, type = "validation")
  • plot(mod, effect = 2, component = 1, type = "validation")
  • plot(mod, effect = 2, component = 2, type = "validation")
  • plot(mod, effect = 1, component = 1, type = "histogram")
  • plot(mod, effect = 2, component = 1, type = "histogram")
    }
    INFO [2023-02-01 12:35:03] Initializing ALASCA (v1.0.8, 2023-01-15)
    WARN [2023-02-01 12:35:03] Guessing effects: time and time:group
    INFO [2023-02-01 12:35:03] Will use linear mixed models!
    INFO [2023-02-01 12:35:03] Will use Rfast!
    INFO [2023-02-01 12:35:03] Converting from wide to long!
    INFO [2023-02-01 12:35:03] Found 16 variables
    WARN [2023-02-01 12:35:03] The group column is used for stratification
    WARN [2023-02-01 12:35:03] Not scaling data...
    Error in identity() : argument "x" is missing, with no default

According to the package, the scale_function condition can use the following options: none, sdall, sdref, sdreft1, sdt1. I want to corroborate if, indeed, you can work ALASCA without the scaling of the data inside the function.
Again thank you very much for your time and excellent work.

Regards,
Cynthia

sessionInfo()
R version 4.2.2 (2022-10-31 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 22621)

Matrix products: default

locale:
[1] LC_COLLATE=Spanish_Mexico.utf8 LC_CTYPE=Spanish_Mexico.utf8
[3] LC_MONETARY=Spanish_Mexico.utf8 LC_NUMERIC=C
[5] LC_TIME=Spanish_Mexico.utf8

attached base packages:
[1] stats graphics grDevices utils datasets methods base

other attached packages:
[1] ggplot2_3.4.0 ALASCA_1.0.7 data.table_1.14.6 here_1.0.1

loaded via a namespace (and not attached):
[1] log4r_0.4.3 pillar_1.8.1 compiler_4.2.2 ggpubr_0.5.0 tools_4.2.2
[6] lifecycle_1.0.3 tibble_3.1.8 gtable_0.3.1 pkgconfig_2.0.3 rlang_1.0.6
[11] DBI_1.1.3 cli_3.4.1 rstudioapi_0.14 withr_2.5.0 dplyr_1.0.10
[16] generics_0.1.3 vctrs_0.5.1 gtools_3.9.4 rprojroot_2.0.3 grid_4.2.2
[21] tidyselect_1.2.0 glue_1.6.2 R6_2.5.1 rstatix_0.7.1 fansi_1.0.3
[26] carData_3.0-5 purrr_1.0.0 tidyr_1.2.1 car_3.1-1 magrittr_2.0.3
[31] scales_1.2.1 backports_1.4.1 assertthat_0.2.1 abind_1.4-5 colorspace_2.0-3
[36] ggsignif_0.6.4 utf8_1.2.2 munsell_0.5.0 broom_1.0.2

Problems with adding random slopes to the model.

Hi Anders,

To familiarize myself with the ALASCA package, I tried to retrieve random slopes from simulated data. Running the script:

**
library(lme4)
library(data.table)
library(ggplot2)
library(ALASCA)

df <- fread("[...]/data_long.csv")

res <- ALASCA(
df,
value ~ time + time:group + (time | sub_id),
use_Rfast = FALSE,
equal_baseline = FALSE,
validate = TRUE,
n_validate = 1000,
effects = c("time", "time:group", "time+time:group"),
scale_function = "sdall"
)
**

returns the output:

INFO [2024-06-17 16:47:01] Initializing ALASCA (v1.0.15, 2024-02-07)
INFO [2024-06-17 16:47:01] Will use linear mixed models!
ERROR [2024-06-17 16:47:01] Cannot use Rfast in this case. Use lme4 with use_Rfast = FALSE instead!
Error in private$set_method() :

It works well when I try random intercepts only (1 | sub_id).

Thanks already for the great package! Any further help, working example, or reference is much appreciated!

Cheers
Martin

Error in eigen(w1) : infinite or missing values in 'x' error when validation method is chosen

Dear Anders,

I wanted to open another issue that may help you to answer better so here you go :)

I converted my continuous age variable to a six-level continuous variable and my ALASCA model looks like:

my_model <- ALASCA(input,
                          value ~ time*age_factor+ (1|id),
                          separate_effects = T,
                          scale_function = "sdt1",
                     equal_baseline = T,
                     plot.loading_group_column = "type",
                     plot.loading_group_label = "Amino acid class",
                     max_PC =22, equal_baseline = T,
                     pca_function = "princomp",
                     validate=T, n_validation_runs = 90,
                     validation_method = "jack-knife")

So, this command works well after a couple of trials otherwise, it gives this error:

Error in eigen(w1) : infinite or missing values in 'x'

When I replace the validation method with "bootstrap", it fails more quicker. I tried other pca_function arguments too but jack-knife seems to work with fewer issues.

According to your benchmark in ALASCA publication, jack-knife seems to have smaller CIs but the two methods do not significantly differ from each other. For example, here they preferred "bootstrapping" but here they used jack-knifing. But what causes this issue, why does jack-knife seem to analyze better than Bootstrap and I got this error?

Thanks!
Best regards,
Nilay

object of type 'closure' is not subsettable

Hi!
I am using an untargeted metabolomics data and I have around 5000 features (variables), two time points and 44 subjects, and 4 Treatment groups.
I am getting following error:
mod <- ALASCA(

  • df = final_result,
  • formula = Value ~ Time + (1 | Subject),scale_function = "sdt1")
    INFO [2024-03-11 13:22:11] Initializing ALASCA (v1.0.15, 2024-02-07)
    WARN [2024-03-11 13:22:11] Guessing effects: Time
    INFO [2024-03-11 13:22:11] Will use linear mixed models!
    INFO [2024-03-11 13:22:11] Will use Rfast!
    WARN [2024-03-11 13:22:11] The Time column is used for stratification
    INFO [2024-03-11 13:22:11] Scaling data with sdt1 ...
    Error in value[get(self$effect_terms[[1]]) == self$get_ref(self$effect_terms[[1]])] :
    object of type 'closure' is not subsettable

I have some values with 0's but they are not missing values. Otherwise I do not find why I would see this error.
Is it related to having many variable names ?
Thank you!

type="covars" gives error while plotting the ALASCA object

Hi Anders,

Thanks so much for implementing such a cool package, it's quite useful and versatile!

I have two Q's, the first one is when I try to plot the ALASCA object with "type=covars" argument with plot function, I always get this error with different models:

Error in factor(covars, levels = data_to_plot[order(loading), covars]) : 
  object 'covars' not found

I installed the package with devtools::install_github("andjar/ALASCA", ref = "main") command so it should get the latest updates, right?

Thanks in advance!
Best regards,
Nilay

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.