lieberinstitute / qsvar Goto Github PK

View Code? Open in Web Editor NEW

0.0 6.0 2.0 7.95 MB

Quality Surrogate Variable Analysis for Degradation Correction

Home Page: http://research.libd.org/qsvaR/

R 96.18% CSS 3.67% Rez 0.14%

rstats qsva degradation human brain bioconductor

qsvar's Issues

k_qsvs todos

Similar to #9
Import num.sv https://github.com/LieberInstitute/DeDHed/blob/53103928435e2fdade736de338d52bb08cc3ec36/R/k_qsvs.R#L11.
Use the assayname argument like at #10.
Check that the mod_tx object is a matrix (or model.matrix output).
If it has to be a full rank model matrix, then consider adding something like https://github.com/LieberInstitute/jaffelab/blob/master/R/cleaningY.R#L83. This code comes from https://github.com/lcolladotor/derfinder/blob/e9a7ef2c79fd22167ff2bb176a95b75582341163/R/makeModels.R#L110. Double check if we need to import the qr() function.
Are we using the +1 here like at #10? Maybe they should be consistent.

make wrapper function

create wrapper function for generating qsvs

getBonfTx todos

Similar ones to #9
Note that you expect to have a tpm assay at https://github.com/LieberInstitute/DeDHed/blob/53103928435e2fdade736de338d52bb08cc3ec36/R/getBonfTx.R#L10. So instead, you could add an argument called assayname or something like. See how they use dimred at https://github.com/Alanocallaghan/scater/blob/master/R/plotReducedDim.R#L90 which then at https://github.com/Alanocallaghan/scater/blob/master/R/plotReducedDim.R#L79 people have to specify the name of the assay ( in this case reducedDim) that they want to use. I'd recommend something like assayname = "tpm" such that the default is what you'd expect. You'll have to change assays(covComb_tx)$tpm to something that works with a object that contains the name, that is, something like https://github.com/LieberInstitute/recount3/blob/master/R/transform_counts.R#L82. You'll also want to check that the assayName exists, like at https://github.com/LieberInstitute/recount3/blob/master/R/transform_counts.R#L77.
Properly import prcomp https://github.com/LieberInstitute/DeDHed/blob/53103928435e2fdade736de338d52bb08cc3ec36/R/getBonfTx.R#L10. So add the #' @importFrom package prcomp syntax where package is the actual package where this function lives. Then use usethis::use_package("package") to edit the DESCRIPTION file #1.
I forget if we need to do the same with log2(), likely not. But if we do, we'll know thanks to #4.

Re-check David Zhang's LIBD rstats club video and documentation on testthat. We might want to use edition 3 like you have at https://github.com/LieberInstitute/DeDHed/blob/53103928435e2fdade736de338d52bb08cc3ec36/DESCRIPTION#L18 but I forget if that's what we've used elsewhere. Anyway, could be good to learn the new edition.

Update NAMESPACE

Updating https://github.com/LieberInstitute/DeDHed/blob/main/NAMESPACE is done automatically through https://github.com/LieberInstitute/DeDHed/blob/main/dev/04_update.R#L24-L25. Remember to keep it updated, or you'll run into issues with #4 .

Add unit tests

The video by David Zhang on the latest version of unit tests might be useful https://youtu.be/ClAin7vTwq0.

The R Packages book has a great chapter on unit tests https://r-pkgs.org/tests.html.

You'll want to use usethis::use_test().

get_qsvs todos

Similar to #9
Change 1:k for seq_len(k) at https://github.com/LieberInstitute/DeDHed/blob/53103928435e2fdade736de338d52bb08cc3ec36/R/get_qsvs.R#L11.

> seq_len(0)
integer(0)
> 1:0
[1] 1 0
> seq_len(-3)
Error in seq_len(-3) : argument must be coercible to non-negative integer
> 1:(-3)
[1]  1  0 -1 -2 -3

We could add internally (without exporting) the getPcaVars() code from jaffelab since we don't have plans to make this package available on CRAN/GitHub anytime soon. Aka, don't use an #' @export tag on the getPcaVars() code unlike https://github.com/LieberInstitute/DeDHed/blob/53103928435e2fdade736de338d52bb08cc3ec36/R/get_qsvs.R#L7.
Don't use print() https://github.com/LieberInstitute/DeDHed/blob/53103928435e2fdade736de338d52bb08cc3ec36/R/get_qsvs.R#L12. Use message() (although check how it looks), since messages can be suppressed with suppressMessages( foo() ). You could also consider adding a verbose = TRUE argument like at https://github.com/LieberInstitute/recount3/blob/master/R/create_rse_manual.R#L131-L137.

Figure 1.

add abcd
move correlation value inside the plot

[BUG] Document when users need to specify set.seed() to ensure reproducibility of their results

Document that users need to use set.seed() before qSVA() and other functions.

Document qSVA() examples
Document get_qsvs() examples
Update documentation on the vignette
Check if other functions also are also not deterministic and thus need to have their examples updated

Provide access to all statistical model results

We will likely deposit to Bioconductor's ExperimentHub the full output from limma and the full RangedSummarizedExperiment object for the 119 degradation samples used in this study. Doing so will allow users to re-use the data from this study in different ways.

We will also add an equivalent function to spatialLIBD::fetch_data() in order to download this data.

Design a 2 day workshop / short course

Develop a long format workshop to teach in 2 days the basics of differential expression analysis using RNA-seq data from postmortem human brain, which is affected by degradation. To make this workshop more general, it will also cover how you can select transcripts associated with degradation if you were to generate the relevant data.

We might want to learn about the format used by https://carpentries.org/index.html for short courses prior to designing this short course / long workshop. The target audience would be users who have some basic familiarity with Bioconductor, SummarizedExperiment, limma and/or similar tools.

Update DESCRIPTION

Title
Authors info: check other packages like megadepth for example.
Description: about 3 sentences. Should end with a period. Note the spacing (see other packages).
Add other biocViews terms. Aim for at least 5 terms.

Figure 4

facet grid
labels
tile plot
variable names need to be cleaned up

Add DEqual plots to package

Use code for creating DEqual plot in /dcl01/lieber/ajaffe/lab/degradation_experiments/Joint/all/SCRIPTS/qsva_purr.R to create a helper function for assessing the extent of degradation.

Guide for using helper functions to select transcripts associated with degradation in new subsets or new datasets

We will write a new user guide (vignette) documenting how to use the functions from #36 and verify that they do reproduce the statistical modeling results provided in #32. This guide will help users apply qsvaR to other tissues / organisms if they generate the required experimental data.

Update NEWS.md

You could do this much later, once you have a close to finalized initial version of the package that you are ready to submit to Bioconductor

https://github.com/LieberInstitute/DeDHed/blob/53103928435e2fdade736de338d52bb08cc3ec36/NEWS.md

qsvaR workflow overview image

We plan to improve the qsvaR workflow image and further expand the related background documentation for understanding the different steps of the process, what data you need, what are the outputs of qsvaR, and how you can use them for downstream analyses.

Helper functions for reproducing statistical modeling results that can be applied to subsets of the available data or new degradation datasets

We will write helper functions for reproducing the "main" and "interaction" model results based on the data provided in #32. These functions can then be applied to either subsets of the data to compute new statistical results that users might want to use as input (related to #33). Alternatively these functions could be applied to new degradation data from either new experiments carried at LIBD or data generated elsewhere for other tissues / organisms.

Video guides for qsvaR

We want to create a collection of short videos (likely shared on YouTube) demonstrating how to use the different features of qsvaR. We will collate these videos into a new user guide (a new vignette). These videos will be helpful to explain the different components related to qsvaR such as the experimental design for RNA degradation experiments, the selection of transcripts associated with degradation (related to #36 #37), computing the qSVs, and using the qSVs in downstream analyses (related to #35).

Some of these videos will be inspired by actual use cases we hear from our users using publicly available data.

This will be an evolving process as new videos will have to be made to reflect new features added to qsvaR.

Interactive tool for assessing confounding of DE results with RNA degradation

We will build an interactive website, likely powered by shiny, such that users can upload their differential expression results at the gene-level and make DEqual plots to assess if their results appear to be confounded by RNA degradation.

This tool (website) will enable users to check publicly available differential expression results and identify genes which could potentially be false positives. The website will generate an automated report that can then be shared with collaborators. The function for making this report will also be available as an R function that can be used in an non-interactive way.

If possible (due to memory constraints), this tool will also support exon-level, transcript-level, and/or exon-exon junction level DE results.

Consider adding a GHA workflow

Document how to use Salmon and SPEAQeasy output in the vignette

Explore whether we can use the example data from tximport to create an RSE from Salmon transcript count output files and run the qSVA functions on it.

See:

@Nick-Eagles might be able to help compare the code from tximport and/or rnaseqDTU with what we have in SPEAQeasy.

You could also use the SPEAQeasy-example data (see https://github.com/LieberInstitute/SPEAQeasy-example/blob/master/pipeline_outputs/count_objects/rse_tx_Jlab_experiment_n42.Rdata) to document how to import those files and use them for this package.

Switch to using rlang and cli

We want to use rlang and cli to improve the user experience. See https://lcolladotor.github.io/jhustatcomputing2023/posts/18-debugging-r-code/#errors-%C3%A0-la-tidyverse for a quick intro into how these functions improve error messages.

Update Variable names to be user friendly

for example covComb_tx

Check name with others

getDegTx todos

Document the rse_tx argument (parameter) at https://github.com/LieberInstitute/DeDHed/blob/53103928435e2fdade736de338d52bb08cc3ec36/R/getDegTx.R#L3. Since this takes as input a RangedSummarizedExperiment object, you'll likely want to use something like https://github.com/LieberInstitute/recount3/blob/master/R/create_rse_manual.R#L14 so you can link to the documentation of RSE objects.

See https://github.com/LieberInstitute/recount3/blob/master/R/create_rse_manual.R#L3-L6 for another example of how I use it in a sentence.

Explain what the function returns at https://github.com/LieberInstitute/DeDHed/blob/53103928435e2fdade736de338d52bb08cc3ec36/R/getDegTx.R#L5. So that is, what is covComb_tx at https://github.com/LieberInstitute/DeDHed/blob/53103928435e2fdade736de338d52bb08cc3ec36/R/getDegTx.R#L11. Is it a matrix? data.frame? an RSE? What does it contain?
Add a title at https://github.com/LieberInstitute/DeDHed/blob/53103928435e2fdade736de338d52bb08cc3ec36/R/getDegTx.R#L1. It cannot end with a period or #4 will throw an error. See https://github.com/LieberInstitute/recount3/blob/master/R/create_rse_manual.R#L1 for an example.
Add a description paragraph like at https://github.com/LieberInstitute/recount3/blob/master/R/create_rse_manual.R#L2-L6 (starts with an empty line to differentiate it from the title). This one should end with a period, or you'll get an error at #2. Note that if you add a 3rd paragraph, it'll become the details section, like at https://github.com/LieberInstitute/recount3/blob/master/R/create_hub.R#L8-L11.
Make sure that we tell R to import SummarizedExperiment. Since users will interact a lot with RSE objects, you might want to depend instead of import the SummarizedExperiment package. So that involves editing #1 with https://github.com/LieberInstitute/recount3/blob/master/DESCRIPTION#L36-L37 which you can do with usethis::use_package("SummarizedExperiment", "depends") (or something like that). If you'd prefer to import, then you can use some examples from https://github.com/LieberInstitute/recount3/blob/master/R/create_rse_manual.R#L17-L23. For https://github.com/LieberInstitute/DeDHed/blob/53103928435e2fdade736de338d52bb08cc3ec36/R/getDegTx.R#L10, that would be #' @importFrom SummarizedExperiment "["
Currently your function uses sig_transcripts which is not specified by the user. Is that an object that lives inside the package? That is, data contained in the package? If we are choosing between multiple models, you might want to make a helper function that chooses between a few models. See https://github.com/LieberInstitute/recount3/blob/master/R/annotation_options.R for an example. Note the use of the match.arg() function at https://github.com/LieberInstitute/recount3/blob/master/R/annotation_options.R#L17 which helps give useful errors to users if the input is not among the allowed options at https://github.com/LieberInstitute/recount3/blob/master/R/annotation_options.R#L16. See how I use annotation_options() in other functions like at https://github.com/LieberInstitute/recount3/blob/master/R/create_rse_manual.R#L120
Add examples to showcase how this function is used. So add code below https://github.com/LieberInstitute/DeDHed/blob/53103928435e2fdade736de338d52bb08cc3ec36/R/getDegTx.R#L8. Like https://github.com/LieberInstitute/recount3/blob/master/R/annotation_options.R#L13-L15 or https://github.com/LieberInstitute/recount3/blob/master/R/create_rse_manual.R#L30-L112

Expand functionality for downstream analyses

We plan to add helper functions for visualizing the relationship between qSVs and other covariates, as is typically done in several analyses. This will likely involve making heatmaps with ComplexHeatmap. We will study what qsvaR users have done for other analyses when designing these helper functions.

[Feature Request] Support ENSEMBL IDs

We should add a is_gencode = TRUE default argument such that when it's set to FALSE, it matches using ENSEMBL IDs instead of Gencode IDs. Aka, it removes the trailing .[0-9] in the IDs.

This should include a unit test that checks the results using the same data with Gencode IDs, then manually makes them ENSEMBL IDs, and checks that with is_gencode = FALSE we get exactly the same results (might have to use set.seed() on this unit test).

Code Chunk comments

add comments to code in vignette

Add a guide highlighting differences vs the qSVA 2017 method published in PNAS

We plan to add a guide (aka vignette) describing the differences between qsvaR and the original qSVA method published in PNAS in 2017. This guide will include diagrams to help understand the differences between the two approaches.

Figure 2

telegraph subsetting
Main and interaction should be horizontal
label arrows
pair line plot with matrices

Document concepts and link to external resources for more detail

We need to add more background such that all terms are defined. We don't need to do all the writing ourselves since we can link to external resources such as http://bioconductor.org/packages/release/workflows/vignettes/RNAseq123/inst/doc/designmatrices.html from http://bioconductor.org/packages/release/workflows/html/RNAseq123.html or sections from the vignette of http://bioconductor.org/packages/release/workflows/html/rnaseqGene.html. There's also http://bioconductor.org/packages/release/bioc/html/ExploreModelMatrix.html and http://research.libd.org/SPEAQeasy/.

Package data

will need help making the data in data/ available to the package so that the examples work.

Translate course and intro documentation to Spanish

Translate to Spanish the short workshop #38 and other intro level materials to help increase access to these technologies to Spanish-speaking individuals. This will help increase diversity of our user base and stimulate use of these analytical technologies in other parts of the world.

K_qsvs coverage

a test that shows full rank error
a test that shows low expression error (sva doesn't work)

Update citation

Once you have a title for the package (from #1) and the updated authors list (also from #1), then update the CITATION file. See the spatialLIBD citation file as an example (https://github.com/LieberInstitute/spatialLIBD/blob/master/inst/CITATION).

https://github.com/LieberInstitute/DeDHed/blob/53103928435e2fdade736de338d52bb08cc3ec36/inst/CITATION

Note that you'll likely want to decide on the package name (#2) first since it'll likely be part of the title.

Enhance flexibility of transcript selection

We want to make the functions in qsvaR flexible enough to enable users who want to use the top 10 transcripts from the "main model" or some other subset of the data provided in #32. We already did something like this for the analysis described at https://doi.org/10.1101/2023.11.08.23298172.

Guide for evaluating confounding of DE results with RNA degradation

We will write a user guide (vignette) describing how to use the functions generated for #41 as well as videos showcasing how to the interactive website. Thus users will be able to either use the interactive website or the R functions provided in qsvaR for this type of analysis.

Update README files

Once you have a main function you want to showcase, update the Example section at https://github.com/LieberInstitute/DeDHed/blob/main/README.Rmd#L42-L65.

Once you have updated the main description of the package (the paragraph at #1), then you can update the goal at https://github.com/LieberInstitute/DeDHed/blob/main/README.Rmd#L23.

To update the README.md file, use https://github.com/LieberInstitute/DeDHed/blob/main/dev/04_update.R#L29.

Figure 3

cord equal
facet wrap
black line is hard to see (maybe red)
correlation value moved on to plot
density graphs need a scale
MAIN VS INT
old version of plot is clearer.
A bar plot is clearer to show region distribution

[BUG] Complete qSVA() wrapper

It seems like a bug that sig_transcripts at

qsvaR/R/qSVA.R

Line 23 in 498143f

 qSVA <- function(rse_tx, type = "cell_component", sig_transcripts = select_transcripts(type), mod, assayname) { 

is not used later on the qSVA() function.

Figure 5

add dot size to legend
move to ggplot

Guide for downstream analyses

We will write a user guide (vignette) for downstream analyses using qSVs generated with qsvaR. This guide will showcase the helper functions from #34. Potentially not all the code will be evaluated given testing restrictions on Bioconductor. Though we will explore the use of "long tests". The goal of this user guide is to help orient users into what analyses they should likely run once they have estimated qSVs. This will also help users determine whether they should use qSVs in their analyses or not, for example, if they are analyzing data from a brain region or another tissue for which no degradation data exists.

Complete select_transcripts()

You might want to use code like this to create the vector of transcript IDs.

x <- letters
x
#>  [1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j" "k" "l" "m" "n" "o" "p" "q" "r" "s"
#> [20] "t" "u" "v" "w" "x" "y" "z"
cat(paste0('c("', paste(x, collapse = '", "'), '")'))
#> c("a", "b", "c", "d", "e", "f", "g", "h", "i", "j", "k", "l", "m", "n", "o", "p", "q", "r", "s", "t", "u", "v", "w", "x", "y", "z")

^{Created on 2022-03-15 by the reprex package (v2.0.1)}

Then you can copy paste it into

qsvaR/R/select_transcripts.R

Line 24 in 13f67ee

return("TODO")

lieberinstitute / qsvar Goto Github PK

qsvar's Issues

Recommend Projects

Recommend Topics

Recommend Org