hbctraining / dge_workshop_salmon_online Goto Github PK

View Code? Open in Web Editor NEW

172.0 172.0 77.0 170.44 MB

Home Page: https://hbctraining.github.io/DGE_workshop_salmon_online/

R 1.36% HTML 98.61% SCSS 0.03%

dge_workshop_salmon_online's People

Contributors

Stargazers

Watchers

dge_workshop_salmon_online's Issues

Add note about knockout genes continuing to be expressed

Broken Link

https://hbctraining.github.io/DGE_workshop_salmon_online/lessons/pathway_topology.html

From this page, the link in the below path is not the correct link to the blog post. There is no "www." in the correct link.

"The blog post from Getting Genetics Done provides a step-by-step procedure for using and understanding SPIA."

Correct Link

create pull down descriptions for self-learning

the schedule page needs to be updated with these

paring down the dispersion lesson

The lesson is a bit text heavy and could use some paring down with the use of bullet points, sub-sectioning etc

@mistrm82 has some ideas that she will implement in a a branched version and do a pull request for review from the team

update heatmap image

Change data visualization heatmap image in lesson with a more recent one

design formula order

look at language in lesson and 5b wald test results

order doesn't matter - but as best practice keep main factor last because of certain defaults

Formulas in self-learning broken

"I am self-learning bioinformatics using the awesome material prepared by your training program.

For the topic:

"Differential Gene Expression Analysis (bulk RNA-seq Part II)";
Part III (DESeq2);
Section 1. Description of steps for DESeq2,

The formulas for "How is the dispersion value derived?" are not displayed.
Just thought I would let you know."

updates to FA lessons

remove things we don't use anymore ? SPIA

Also add new visualizations (suggestions from @hwick's code)

Figure with X, Y, and Z genes is misleading as Z has the fewest reads but is the longest. As this figure is only designed to visualize differences caused by library size it is better to remove this and only show X and Y. The length issue is discussed in the next figure.

Fix ggplot2 link

The ggplot2 link on the schedule page is not working.

Create project with data and folders

In the setup lesson (https://hbctraining.github.io/DGE_workshop_salmon_online/lessons/01b_DGE_setup_and_overview.html), provide the project that already contains data and folders, so that there is no need to download data.

Creating annotation file tx2gene for NCBI human transcriptome

Hi,

thanks a lot for the fantastic workshop for DGE analyses, I really enjoy it and learned a lot. :)

I am now trying to run analyses with my own data. For previous SNP analyses and now the Salmon quantification I used the NCBI RefSeq Transcripts FASTA (https://www.ncbi.nlm.nih.gov/genome/guide/human/). Thus, I am trying to build my tx2gene annotation file from the NCBI annotation. Would you have a recommendation, which ah$dataprovider to query? Is there anything else I should adapt/ keep my eyes on, compared to the presented workflow using ensembldb?

Thanks a lot in advance for your help. :)

Best wishes, Ella

for cluster profiler new msigdbr package

updates in cluster profiler use new msigdbr package to query these datasets; however, older method should still work: https://yulab-smu.top/biomedical-knowledge-mining-book/universal-api.html?q=msig#msigdb-analysis

deseq2 figure for wald test results

https://hbctraining.github.io/DGE_workshop_salmon_online/lessons/05b_wald_test_results.html

The green and purple gene figure on the right is not fully explained. The dashed vs. solid lines are crucial and not mentioned. This is another figure from the DESEQ2 paper.

My mistake, misread!

Adding gene names to KEGG output

Hey all :)

in Clusterprofiler added utility to translate Entrez ID outputs (e.g. from gseKEGG) to gene symbols: https://yulab-smu.top/biomedical-knowledge-mining-book/useful-utilities.html#setReadable

gseKEGG no longer recommends nperm, and other issues

gseaKEGG <- gseKEGG(geneList = foldchanges, # ordered named vector of fold changes (Entrez IDs are the associated names)
                    organism = "hsa", # supported organisms listed below
                    nPerm = 1000, # default number permutations
                    minGSSize = 20, # minimum gene set size (# genes in set) - change to test more sets or recover sets with fewer # genes
                    pvalueCutoff = 0.05, # padj cutoff value
                    verbose = FALSE)
no term enriched under specific pvalueCutoff...
Warning messages:
1: In .GSEA(geneList = geneList, exponent = exponent, minGSSize = minGSSize,  :
  We do not recommend using nPerm parameter incurrent and future releases
2: In fgsea(pathways = geneSets, stats = geneList, nperm = nPerm, minSize = minGSSize,  :
  You are trying to run fgseaSimple. It is recommended to use fgseaMultilevel. To run fgseaMultilevel, you need to remove the nperm argument in the fgsea function call.

This also complains that there are no terms enriched under my pvalueCutoff, but there are definitely terms with < 0.05

clusterProfiler_4.8.2

featureCount link broken

https://hbctraining.github.io/DGE_workshop_salmon_online/lessons/01b_DGE_setup_and_overview.html

featurecount link broken

clustering coloring

the correlation coloring is the reverse of what the mind (my mind) expects. Should be redone here with the reverse color scheme.

https://github.com/hbctraining/DGE_workshop_salmon_online/blob/master/lessons/03_DGE_QC_analysis.md

Orgdb note didn't work in Gene annotation

This is a draft of how to fix it:
#check available updated database
query(ah,'org.Hs.eg.db.sqlite')
human_orgdb <- query(ah, c("Homo sapiens", "OrgDb"))
test <- human_orgdb[["AH111575"]]
test

cnet plot size warning/legend, lesson text does not match plot output

https://hbctraining.github.io/DGE_workshop_salmon_online/lessons/10_FA_over-representation_analysis.html

When working with my own data set, when running the cnetplot code I get this warning:

Scale for size is already present.
Adding another scale for size, which will replace the existing scale.

I noticed that the legend for size is clearly not pvalue (integer numbers > 1), which is what the instructions say. Similarly, the example plot in the lesson has integers > 1, so they clearly aren't p values either. When looking at the help page for cnet plot, categorySize isn't a listed argument. If I plot without this argument, the plot looks exactly the same (and oddly, I still get the warning). The "size" appears to represent the number of significant genes in the GO term.

Additionally there is this warning:

Warning message:
In cnetplot.enrichResult(x, ...) :
  Use 'color.params = list(foldChange = your_value)' instead of 'foldChange'.
 The foldChange parameter will be removed in the next version.

It sounds like aspects of this function have changed since these lessons were written and need updating. Regardless, the text as is is currently inaccurate even with the current plot image:

"Finally, the category netplot shows the relationships between the genes associated with the top five most significant GO terms and the fold changes of the significant genes associated with these terms (color). The size of the GO terms reflects the pvalues of the terms, with the more significant terms being larger. This plot is particularly useful for hypothesis generation in identifying genes that may be important to several of the most affected processes."

This should be changed to reflect that node size actually represents the number of significant genes in the GO terms.

WGCNA link missing from lesson about network analysis

Apparently the link is broken because Horvath left UCLA and is no longer paying for the website

Our best bet is to probably replace with internet archive link to the original, here: https://web.archive.org/web/20230323144343/horvath.genetics.ucla.edu/html/CoexpressionNetwork/Rpackages/WGCNA/

More info about the disappearance and some other links here:
https://www.reddit.com/r/bioinformatics/comments/1cr7m9j/what_happened_to_the_wgcna_tutorial/

pathview doesn't work

No error. Just does not produce any graph

> pathview(gene.data = foldchanges,
+          pathway.id = "hsa03008",
+          species = "hsa",
+          limit = list(gene = 2, # value gives the max/min limit for foldchanges
+                       cpd = 1))
'select()' returned 1:1 mapping between keys and columns
Info: Working in directory /Users/hew416/Library/CloudStorage/OneDrive-HarvardUniversity/Desktop/DEanalysis
Info: Writing image file hsa03008.pathview.png

pathview_1.40.0

LRT lesson refers to objects created in previous lessons which students may not have loaded up

dds_lrt is originally created in Multiple test corrections with:
dds_lrt <- DESeq(dds, test="LRT", reduced = ~ 1)
Is referred to later in the LRT lesson

rld_mat is originally created in Visualizations self learning:

rld <- rlog(dds, blind=T)
rld_mat <- assay(rld)

Is referred to later in the in LRT lesson

We should either add a note for how to recreate the objects, or instruct students to save (and load) their environment for each lesson. I think it is easier to just add a note for how to recreate the objects since environments can get confusing and might be conceptually new for students (we don't introduce the concept)

apeglm stat removed

Hey all :)

Maybe you have seen this already, but I just noticed that they have removed the Stat column from the DESeq2 results output after shrinkage using apeglm. The reason for this is well-described by Mike here: https://support.bioconductor.org/p/129277/. Not sure if this is necessary to add to the lesson, but at least just a heads-up.

update the demo code at bottom of viz lesson

The code needs either a note to say this is just demo code do not run

See if the code below works with the workshop dataset and update it to this:

DEGreport::degPlot(dds = dds, res = res_tableOE, n = 20, xs = "sampletype", group = "sampletype")
DEGreport::degVolcano(
  data.frame(res_tableOE_tb[,c("log2FoldChange","padj")]), # table - 2 columns
  plot_text = data.frame(res_tableOE_tb[1:10,c("log2FoldChange","padj", "symbol")]))
# Available in the newer version for R 3.4

a note for the warning in dotplot

Many people had the ` warning when running the dotplot command:

wrong orderBy parameter; set to default orderBy = "x"

Maybe this will help?

dotplot(myResults, showCategory=15, orderBy="GeneRatio")

Dispersion figure from DESEQ paper

https://hbctraining.github.io/DGE_workshop_salmon_online/lessons/04b_DGE_DESeq2_analysis.html

The close up of the dispersions figure is taken directly from the DESEQ2 paper but the figure is not really explained and none of the acronyms are listed.

Bug Fixes

Add apeglm install from Biocmanager to instructions
Ensembldb has a filter function now that can overwrite dplyr, so be sure to put dplyr:: in front of filter commands like on the "Summarizing results and extracting significant gene lists" page:
Change:

sigOE <- res_tableOE_tb %>%
        filter(padj < padj.cutoff)

To:

sigOE <- res_tableOE_tb %>%
        dplyr::filter(padj < padj.cutoff)

And on Visualization
Change:

norm_OEsig <- normalized_counts[,c(1:4,7:9)] %>% 
  filter(gene %in% sigOE$gene)

To:

norm_OEsig <- normalized_counts[,c(1:4,7:9)] %>% 
  dplyr::filter(gene %in% sigOE$gene)

hbctraining / dge_workshop_salmon_online Goto Github PK

dge_workshop_salmon_online's People

Contributors

Stargazers

Watchers

Forkers

dge_workshop_salmon_online's Issues

Recommend Projects

Recommend Topics

Recommend Org