Giter VIP home page Giter VIP logo

dge_workshop_salmon_online's People

Contributors

amelie-tghn avatar eberdan avatar gammerdinger avatar hackdna avatar hwick avatar jihe-liu avatar kant avatar marypiper avatar mistrm82 avatar rkhetani avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

dge_workshop_salmon_online's Issues

design formula order

look at language in lesson and 5b wald test results

order doesn't matter - but as best practice keep main factor last because of certain defaults

cnet plot size warning/legend, lesson text does not match plot output

https://hbctraining.github.io/DGE_workshop_salmon_online/lessons/10_FA_over-representation_analysis.html

When working with my own data set, when running the cnetplot code I get this warning:

Scale for size is already present.
Adding another scale for size, which will replace the existing scale.

I noticed that the legend for size is clearly not pvalue (integer numbers > 1), which is what the instructions say. Similarly, the example plot in the lesson has integers > 1, so they clearly aren't p values either. When looking at the help page for cnet plot, categorySize isn't a listed argument. If I plot without this argument, the plot looks exactly the same (and oddly, I still get the warning). The "size" appears to represent the number of significant genes in the GO term.

Additionally there is this warning:

Warning message:
In cnetplot.enrichResult(x, ...) :
  Use 'color.params = list(foldChange = your_value)' instead of 'foldChange'.
 The foldChange parameter will be removed in the next version.

It sounds like aspects of this function have changed since these lessons were written and need updating. Regardless, the text as is is currently inaccurate even with the current plot image:

"Finally, the category netplot shows the relationships between the genes associated with the top five most significant GO terms and the fold changes of the significant genes associated with these terms (color). The size of the GO terms reflects the pvalues of the terms, with the more significant terms being larger. This plot is particularly useful for hypothesis generation in identifying genes that may be important to several of the most affected processes."

This should be changed to reflect that node size actually represents the number of significant genes in the GO terms.

apeglm stat removed

Hey all :)

Maybe you have seen this already, but I just noticed that they have removed the Stat column from the DESeq2 results output after shrinkage using apeglm. The reason for this is well-described by Mike here: https://support.bioconductor.org/p/129277/. Not sure if this is necessary to add to the lesson, but at least just a heads-up.

gseKEGG no longer recommends nperm, and other issues

gseaKEGG <- gseKEGG(geneList = foldchanges, # ordered named vector of fold changes (Entrez IDs are the associated names)
                    organism = "hsa", # supported organisms listed below
                    nPerm = 1000, # default number permutations
                    minGSSize = 20, # minimum gene set size (# genes in set) - change to test more sets or recover sets with fewer # genes
                    pvalueCutoff = 0.05, # padj cutoff value
                    verbose = FALSE)
no term enriched under specific pvalueCutoff...
Warning messages:
1: In .GSEA(geneList = geneList, exponent = exponent, minGSSize = minGSSize,  :
  We do not recommend using nPerm parameter incurrent and future releases
2: In fgsea(pathways = geneSets, stats = geneList, nperm = nPerm, minSize = minGSSize,  :
  You are trying to run fgseaSimple. It is recommended to use fgseaMultilevel. To run fgseaMultilevel, you need to remove the nperm argument in the fgsea function call.

This also complains that there are no terms enriched under my pvalueCutoff, but there are definitely terms with < 0.05

clusterProfiler_4.8.2

update FA lessons

newer approaches, methods?
Remove some of the older methods we don't really use?

Creating annotation file tx2gene for NCBI human transcriptome

Hi,

thanks a lot for the fantastic workshop for DGE analyses, I really enjoy it and learned a lot. :)

I am now trying to run analyses with my own data. For previous SNP analyses and now the Salmon quantification I used the NCBI RefSeq Transcripts FASTA (https://www.ncbi.nlm.nih.gov/genome/guide/human/). Thus, I am trying to build my tx2gene annotation file from the NCBI annotation. Would you have a recommendation, which ah$dataprovider to query? Is there anything else I should adapt/ keep my eyes on, compared to the presented workflow using ensembldb?

Thanks a lot in advance for your help. :)

Best wishes, Ella

a note for the warning in dotplot

Many people had the ` warning when running the dotplot command:

wrong orderBy parameter; set to default orderBy = "x"

Maybe this will help?

dotplot(myResults, showCategory=15, orderBy="GeneRatio")

paring down the dispersion lesson

The lesson is a bit text heavy and could use some paring down with the use of bullet points, sub-sectioning etc

@mistrm82 has some ideas that she will implement in a a branched version and do a pull request for review from the team

Bug Fixes

  1. Add apeglm install from Biocmanager to instructions

  2. Ensembldb has a filter function now that can overwrite dplyr, so be sure to put dplyr:: in front of filter commands like on the "Summarizing results and extracting significant gene lists" page:
    Change:

sigOE <- res_tableOE_tb %>%
        filter(padj < padj.cutoff)

To:

sigOE <- res_tableOE_tb %>%
        dplyr::filter(padj < padj.cutoff)

And on Visualization
Change:

norm_OEsig <- normalized_counts[,c(1:4,7:9)] %>% 
  filter(gene %in% sigOE$gene) 

To:

norm_OEsig <- normalized_counts[,c(1:4,7:9)] %>% 
  dplyr::filter(gene %in% sigOE$gene) 

Orgdb note didn't work in Gene annotation

This is a draft of how to fix it:
#check available updated database
query(ah,'org.Hs.eg.db.sqlite')
human_orgdb <- query(ah, c("Homo sapiens", "OrgDb"))
test <- human_orgdb[["AH111575"]]
test

pathview doesn't work

No error. Just does not produce any graph

> pathview(gene.data = foldchanges,
+          pathway.id = "hsa03008",
+          species = "hsa",
+          limit = list(gene = 2, # value gives the max/min limit for foldchanges
+                       cpd = 1))
'select()' returned 1:1 mapping between keys and columns
Info: Working in directory /Users/hew416/Library/CloudStorage/OneDrive-HarvardUniversity/Desktop/DEanalysis
Info: Writing image file hsa03008.pathview.png

pathview_1.40.0

GSEgo error

Error about no term enriched .

Should we look into this?

update the demo code at bottom of viz lesson

The code needs either a note to say this is just demo code do not run

OR

See if the code below works with the workshop dataset and update it to this:

DEGreport::degPlot(dds = dds, res = res_tableOE, n = 20, xs = "sampletype", group = "sampletype")
DEGreport::degVolcano(
  data.frame(res_tableOE_tb[,c("log2FoldChange","padj")]), # table - 2 columns
  plot_text = data.frame(res_tableOE_tb[1:10,c("log2FoldChange","padj", "symbol")]))
# Available in the newer version for R 3.4

Day 3 Exercises

While there are no exercises, we no longer have a PollEverywhere, so we need to create a Day 3 Google poll for questions to get posted to.

WGCNA link missing from lesson about network analysis

Apparently the link is broken because Horvath left UCLA and is no longer paying for the website

Our best bet is to probably replace with internet archive link to the original, here: https://web.archive.org/web/20230323144343/horvath.genetics.ucla.edu/html/CoexpressionNetwork/Rpackages/WGCNA/

More info about the disappearance and some other links here:
https://www.reddit.com/r/bioinformatics/comments/1cr7m9j/what_happened_to_the_wgcna_tutorial/

count normalization figure

Figure with X, Y, and Z genes is misleading as Z has the fewest reads but is the longest. As this figure is only designed to visualize differences caused by library size it is better to remove this and only show X and Y. The length issue is discussed in the next figure.

Formulas in self-learning broken

"I am self-learning bioinformatics using the awesome material prepared by your training program.

For the topic:

  • "Differential Gene Expression Analysis (bulk RNA-seq Part II)";
  • Part III (DESeq2);
  • Section 1. Description of steps for DESeq2,

The formulas for "How is the dispersion value derived?" are not displayed.
Just thought I would let you know."

LRT lesson refers to objects created in previous lessons which students may not have loaded up

dds_lrt is originally created in Multiple test corrections with:
dds_lrt <- DESeq(dds, test="LRT", reduced = ~ 1)
Is referred to later in the LRT lesson

rld_mat is originally created in Visualizations self learning:

rld <- rlog(dds, blind=T)
rld_mat <- assay(rld)

Is referred to later in the in LRT lesson

We should either add a note for how to recreate the objects, or instruct students to save (and load) their environment for each lesson. I think it is easier to just add a note for how to recreate the objects since environments can get confusing and might be conceptually new for students (we don't introduce the concept)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.