genomicsclass / labs Goto Github PK

View Code? Open in Web Editor NEW

2.0K 2.0K 1.9K 8.28 MB

Rmd source files for the HarvardX series PH525x

Home Page: http://genomicsclass.github.io/book

License: MIT License

R 96.71% Makefile 2.65% Shell 0.64%

labs's People

Contributors

Stargazers

Watchers

Forkers

akzaidi diodon obicke gtb-togerther lennon310 winkelmesser lophostoma massie ahnon sheldonpark tsk xtmgah misterkeno saraswathi55 unmadable sezinie santayana esalas01 wudiswk espinella sergeypry happyb citizenric researchkhan jsobral fdd700 shruti145 fredlenkeit eronisko rdienstmann logic4cows ntayyar knbknb kadena rbreslin asboldt timahfeldt dingyp013 honglang kirstenc sergio14 branka callumr jcombadao diego-f-pereira pumbo kenkchu mm-r shru-d saxotomy massimopetretich roeil briansipple smkeil tianye00 jackzhang84 jnkather ezz99 anwar088 intellifora shoelaces3 easteger kern3020 bmatejko lincrampton ewloomis ralvarado-bio way2joy jmgore75 rujinlong jpotericor crishogas porquoi adamp83 eshinesimida almaur abhik1368 mcalvi sm30 pslai zzqin mthiagar tingting4805 waleed-hassan fawnshao bookjunkie himadrinin fjohnson146 eirr82 manfredos stefaniehoho ericbremer ldharriger krishnatray meyera martimij bioinformaticsfmrp merico34 holgerman him72

labs's Issues

404 not found error in dataman_2019.Rmd

Lines 875-877 of dataman_2019.Rmd generate a 404 not found error after authenticating with Google BigQuery and returning to R as directed by the browser:

tcgaCon %>% tbl("Somatic_Mutation") %>% dplyr::filter(project_short_name=="TCGA-GBM") %>% 
       dplyr::select(Variant_Classification, Hugo_Symbol) %>% group_by(Variant_Classification) %>%
       summarise(n=n())
Error: HTTP error [404] Not Found

Is this the appropriate workflow? If so, what do learners need to know or do in order to not encounter this 404 error?

> sessionInfo()
R version 3.6.1 (2019-07-05)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 17763)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252    LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C                           LC_TIME=English_United States.1252    

attached base packages:
 [1] grid      stats4    parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] magrittr_1.5                                       dplyr_0.8.3                                       
 [3] bigrquery_1.2.0                                    RaggedExperiment_1.8.0                            
 [5] curatedTCGAData_1.6.0                              MultiAssayExperiment_1.10.4                       
 [7] VariantTools_1.26.0                                VariantAnnotation_1.30.1                          
 [9] ph525x_0.0.48                                      png_0.1-7                                         
[11] ldblock_1.14.2                                     erma_1.0.0                                        
[13] Homo.sapiens_1.3.1                                 TxDb.Hsapiens.UCSC.hg19.knownGene_3.2.2           
[15] OrganismDbi_1.26.0                                 GenomicFeatures_1.36.4                            
[17] GenomicAlignments_1.20.1                           GenomicFiles_1.20.0                               
[19] rtracklayer_1.44.2                                 Rsamtools_2.0.0                                   
[21] RNAseqData.HNRNPC.bam.chr14_0.22.0                 IlluminaHumanMethylation450kanno.ilmn12.hg19_0.6.0
[23] IlluminaHumanMethylation450kmanifest_0.4.0         minfi_1.30.0                                      
[25] bumphunter_1.26.0                                  locfit_1.5-9.1                                    
[27] iterators_1.0.12                                   foreach_1.4.7                                     
[29] Biostrings_2.52.0                                  XVector_0.24.0                                    
[31] data.table_1.12.2                                  GO.db_3.8.2                                       
[33] org.Hs.eg.db_3.8.2                                 airway_1.4.0                                      
[35] SummarizedExperiment_1.14.1                        DelayedArray_0.10.0                               
[37] BiocParallel_1.18.1                                matrixStats_0.54.0                                
[39] GenomicRanges_1.36.0                               GenomeInfoDb_1.20.0                               
[41] ArrayExpress_1.44.0                                GEOquery_2.52.0                                   
[43] annotate_1.62.0                                    XML_3.98-1.20                                     
[45] AnnotationDbi_1.46.1                               IRanges_2.18.1                                    
[47] S4Vectors_0.22.0                                   Biobase_2.44.0                                    
[49] BiocGenerics_0.30.0                                GSE5859Subset_1.0                                 

loaded via a namespace (and not attached):
  [1] tidyselect_0.2.5              RSQLite_2.1.2                 munsell_0.5.0                 codetools_0.2-16             
  [5] preprocessCore_1.46.0         withr_2.1.2                   colorspace_1.4-1              knitr_1.24                   
  [9] rstudioapi_0.10               labeling_0.3                  GenomeInfoDbData_1.2.1        bit64_0.9-7                  
 [13] rhdf5_2.28.0                  vctrs_0.2.0                   xfun_0.9                      BiocFileCache_1.8.0          
 [17] affxparser_1.56.0             R6_2.4.0                      illuminaio_0.26.0             AnnotationFilter_1.8.0       
 [21] bitops_1.0-6                  reshape_0.8.8                 assertthat_0.2.1              promises_1.0.1               
 [25] scales_1.0.0                  gtable_0.3.0                  ensembldb_2.8.0               rlang_0.4.0                  
 [29] zeallot_0.1.0                 genefilter_1.66.0             splines_3.6.1                 lazyeval_0.2.2               
 [33] gargle_0.3.1                  BiocManager_1.30.4            yaml_2.2.0                    reshape2_1.4.3               
 [37] snpStats_1.34.0               backports_1.1.4               httpuv_1.5.1                  RBGL_1.60.0                  
 [41] tools_3.6.1                   nor1mix_1.3-0                 ggplot2_3.2.1                 affyio_1.54.0                
 [45] ff_2.2-14                     RColorBrewer_1.1-2            siggenes_1.58.0               Rcpp_1.0.1                   
 [49] plyr_1.8.4                    progress_1.2.2                zlibbioc_1.30.0               purrr_0.3.2                  
 [53] RCurl_1.95-4.12               prettyunits_1.0.2             openssl_1.4.1                 fs_1.3.1                     
 [57] ProtGenerics_1.16.0           hms_0.5.1                     mime_0.7                      xtable_1.8-4                 
 [61] mclust_5.4.5                  gridExtra_2.3                 compiler_3.6.1                biomaRt_2.40.4               
 [65] tibble_2.1.3                  crayon_1.3.4                  htmltools_0.3.6               later_0.8.0                  
 [69] snow_0.4-3                    tidyr_0.8.3                   oligo_1.48.0                  DBI_1.0.0                    
 [73] ExperimentHub_1.10.0          dbplyr_1.4.2                  MASS_7.3-51.4                 rappdirs_0.3.1               
 [77] EnsDb.Hsapiens.v75_2.99.0     Matrix_1.2-17                 readr_1.3.1                   quadprog_1.5-7               
 [81] pkgconfig_2.0.2               registry_0.5-1                xml2_1.2.2                    rngtools_1.4                 
 [85] pkgmaker_0.27                 multtest_2.40.0               beanplot_1.2                  bibtex_0.4.2                 
 [89] doRNG_1.7.1                   scrime_1.3.5                  stringr_1.4.0                 digest_0.6.20                
 [93] graph_1.62.0                  base64_2.0                    DelayedMatrixStats_1.6.0      curl_4.0                     
 [97] shiny_1.3.2                   jsonlite_1.6                  nlme_3.1-141                  Rhdf5lib_1.6.0               
[101] askpass_1.1                   limma_3.40.6                  BSgenome_1.52.0               pillar_1.4.2                 
[105] lattice_0.20-38               httr_1.4.1                    survival_2.44-1.1             interactiveDisplayBase_1.22.0
[109] glue_1.3.1                    UpSetR_1.4.0                  bit_1.1-14                    stringi_1.4.3                
[113] HDF5Array_1.12.2              blob_1.2.0                    oligoClasses_1.46.0           AnnotationHub_2.16.1         
[117] memoise_1.1.0

Thanks!

robust_summaries.Rmd requires a library (rafalib) that is no publicly available

GRanges objects don't support lapply, unlist at the moment

Line 181 of bioc2_integExamps.Rmd generates the following error:

> phset = lapply( ovrngs, function(x)
+   unique( gwrngs19[ which(gwrngs19 %over% x) ]$Disease.Trait ) )
Error in getListElement(x, i, ...) : 
  GRanges objects don't support [[, as.list(), lapply(), or
  unlist() at the moment

Thanks! Here is the sessionInfo:

R version 3.6.1 (2019-07-05)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 17763)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
 [1] tools     grid      stats4    parallel  stats     graphics  grDevices utils    
 [9] datasets  methods   base     

other attached packages:
 [1] curatedTCGAData_1.6.0                             
 [2] harbChIP_1.22.0                                   
 [3] yeastCC_1.24.0                                    
 [4] gwascat_2.16.0                                    
 [5] ERBS_1.0                                          
 [6] magrittr_1.5                                      
 [7] dplyr_0.8.3                                       
 [8] bigrquery_1.2.0                                   
 [9] VariantTools_1.26.0                               
[10] VariantAnnotation_1.30.1                          
[11] RaggedExperiment_1.8.0                            
[12] MultiAssayExperiment_1.10.4                       
[13] GenomicAlignments_1.20.1                          
[14] BiocStyle_2.12.0                                  
[15] IlluminaHumanMethylation450kanno.ilmn12.hg19_0.6.0
[16] IlluminaHumanMethylation450kmanifest_0.4.0        
[17] minfi_1.30.0                                      
[18] bumphunter_1.26.0                                 
[19] locfit_1.5-9.1                                    
[20] iterators_1.0.12                                  
[21] foreach_1.4.7                                     
[22] annotate_1.62.0                                   
[23] XML_3.98-1.20                                     
[24] GSE5859Subset_1.0                                 
[25] airway_1.4.0                                      
[26] ph525x_0.0.48                                     
[27] png_0.1-7                                         
[28] RNAseqData.HNRNPC.bam.chr14_0.22.0                
[29] erma_1.0.0                                        
[30] GenomicFiles_1.20.0                               
[31] rtracklayer_1.44.2                                
[32] Rsamtools_2.0.0                                   
[33] Biostrings_2.52.0                                 
[34] XVector_0.24.0                                    
[35] SummarizedExperiment_1.14.1                       
[36] DelayedArray_0.10.0                               
[37] BiocParallel_1.18.1                               
[38] matrixStats_0.54.0                                
[39] Homo.sapiens_1.3.1                                
[40] TxDb.Hsapiens.UCSC.hg19.knownGene_3.2.2           
[41] org.Hs.eg.db_3.8.2                                
[42] GO.db_3.8.2                                       
[43] OrganismDbi_1.26.0                                
[44] GenomicFeatures_1.36.4                            
[45] GenomicRanges_1.36.0                              
[46] GenomeInfoDb_1.20.0                               
[47] AnnotationDbi_1.46.1                              
[48] IRanges_2.18.1                                    
[49] S4Vectors_0.22.0                                  
[50] GEOquery_2.52.0                                   
[51] data.table_1.12.2                                 
[52] Biobase_2.44.0                                    
[53] BiocGenerics_0.30.0                               

loaded via a namespace (and not attached):
  [1] tidyselect_0.2.5              RSQLite_2.1.2                
  [3] devtools_2.1.0                munsell_0.5.0                
  [5] codetools_0.2-16              preprocessCore_1.46.0        
  [7] withr_2.1.2                   colorspace_1.4-1             
  [9] knitr_1.24                    rstudioapi_0.10              
 [11] labeling_0.3                  GenomeInfoDbData_1.2.1       
 [13] bit64_0.9-7                   rhdf5_2.28.0                 
 [15] rprojroot_1.3-2               vctrs_0.2.0                  
 [17] xfun_0.9                      BiocFileCache_1.8.0          
 [19] R6_2.4.0                      illuminaio_0.26.0            
 [21] AnnotationFilter_1.8.0        bitops_1.0-6                 
 [23] reshape_0.8.8                 assertthat_0.2.1             
 [25] promises_1.0.1                scales_1.0.0                 
 [27] gtable_0.3.0                  processx_3.4.1               
 [29] ensembldb_2.8.0               rlang_0.4.0                  
 [31] zeallot_0.1.0                 genefilter_1.66.0            
 [33] splines_3.6.1                 lazyeval_0.2.2               
 [35] gargle_0.3.1                  BiocManager_1.30.4           
 [37] yaml_2.2.0                    snpStats_1.34.0              
 [39] backports_1.1.4               httpuv_1.5.1                 
 [41] RBGL_1.60.0                   usethis_1.5.1                
 [43] nor1mix_1.3-0                 ggplot2_3.2.1                
 [45] RColorBrewer_1.1-2            siggenes_1.58.0              
 [47] sessioninfo_1.1.1             Rcpp_1.0.1                   
 [49] plyr_1.8.4                    progress_1.2.2               
 [51] zlibbioc_1.30.0               purrr_0.3.2                  
 [53] RCurl_1.95-4.12               ps_1.3.0                     
 [55] prettyunits_1.0.2             openssl_1.4.1                
 [57] fs_1.3.1                      ProtGenerics_1.16.0          
 [59] pkgload_1.0.2                 hms_0.5.1                    
 [61] mime_0.7                      evaluate_0.14                
 [63] xtable_1.8-4                  mclust_5.4.5                 
 [65] gridExtra_2.3                 testthat_2.2.1               
 [67] compiler_3.6.1                biomaRt_2.40.4               
 [69] tibble_2.1.3                  crayon_1.3.4                 
 [71] htmltools_0.3.6               later_0.8.0                  
 [73] tidyr_0.8.3                   ldblock_1.14.2               
 [75] DBI_1.0.0                     ExperimentHub_1.10.0         
 [77] dbplyr_1.4.2                  rappdirs_0.3.1               
 [79] MASS_7.3-51.4                 EnsDb.Hsapiens.v75_2.99.0    
 [81] Matrix_1.2-17                 readr_1.3.1                  
 [83] cli_1.1.0                     quadprog_1.5-7               
 [85] pkgconfig_2.0.2               registry_0.5-1               
 [87] xml2_1.2.2                    rngtools_1.4                 
 [89] pkgmaker_0.27                 multtest_2.40.0              
 [91] beanplot_1.2                  bibtex_0.4.2                 
 [93] doRNG_1.7.1                   scrime_1.3.5                 
 [95] stringr_1.4.0                 callr_3.3.1                  
 [97] digest_0.6.20                 graph_1.62.0                 
 [99] rmarkdown_1.15                base64_2.0                   
[101] DelayedMatrixStats_1.6.0      curl_4.0                     
[103] shiny_1.3.2                   nlme_3.1-141                 
[105] jsonlite_1.6                  Rhdf5lib_1.6.0               
[107] desc_1.2.0                    askpass_1.1                  
[109] limma_3.40.6                  BSgenome_1.52.0              
[111] pillar_1.4.2                  lattice_0.20-38              
[113] httr_1.4.1                    pkgbuild_1.0.4               
[115] survival_2.44-1.1             interactiveDisplayBase_1.22.0
[117] glue_1.3.1                    remotes_2.1.0                
[119] UpSetR_1.4.0                  bit_1.1-14                   
[121] stringi_1.4.3                 HDF5Array_1.12.2             
[123] blob_1.2.0                    AnnotationHub_2.16.0         
[125] memoise_1.1.0

make an ePub, please

Could you release an ePub version? When I compiled it by myself, there were lots of errors. Maybe because some files are deprecated. Thank you.

Inquiries about solutions for the book's exercise answer

Hello! Thanks so much for the fantastic materials here. I am wondering if there are solutions for the exercises in the book Data Analysis for the Life Sciences? I think it will be helpful for the readers to verify the answers.

GSE5859 available for R version 3.3.1

Is there a GSE5859 available for R version 3.3.1
When running the following command to install the package

biocLite('GSE5859')
I get the following error
Warning message:
package ‘GSE5859’ is not available (for R version 3.3.1)
thanks!

Extra text and code in week3/montecarlo.Rmd

The text and code from line 106 (https://github.com/genomicsclass/labs/blob/master/week3/montecarlo.Rmd#L106) up until the end of the file, does not seem to belong there (at least it is not used/mentioned in the lecture videos). They might be remnants of an alternative example for the "Inference" and "Permutation" lectures.

Different t.test result in type II error simulation

Hi, Prof. Rafa!
I'm using R 3.6.3 version and doing some false negative demonstration based on edX PH525x course.
i'm using exactly same code with the lecture video and the book. Here is it
`
dat <- read.csv("mice_pheno.csv")

controlPopulation <- filter(dat,Sex == "F" & Diet == "chow") %>%
select(Bodyweight) %>% unlist

hfPopulation <- filter(dat,Sex == "F" & Diet == "hf") %>%
select(Bodyweight) %>% unlist

mu_hf <- mean(hfPopulation)
mu_control <- mean(controlPopulation)

mu_hf - mu_control
[1] 2.375517
(mu_hf - mu_control)/mu_control * 100 # percent increase
[1] 9.942157
'
So far the result still the same with the video.
After that:
'
set.seed(1)
N <- 5
hf <- sample(hfPopulation,N)
control <- sample(controlPopulation,N)
t.test(hf,control)$p.value
the result supposed to be0.1410204, but my result is 0.5806661`. I retried for several times and several generating value method, but the result hasn't changed.

Seeing that this material was last edited 4 years ago, then I think that there is a logarithmic difference in the 'set.seed()' function.

Glad if you help me

Issue with devtools and genomicsclass/GSE5859Subset

Hello,
I am stuck for the past two days. I need to use genomicsclass/GSE5859Subset and
for that I have installed "devtools". I have also installed Rtools (Rtools34) from CRAN. I am running version 3.6.0 of RStudio.

I get these warning and error messages. I cannot use the GSE5859Subset dataset. Any help would be greatly appreciated.

library(devtools)
Loading required package: usethis
Warning messages:
1: package ‘devtools’ was built under R version 3.6.3
2: package ‘usethis’ was built under R version 3.6.3
install_github("genomicsclass/GSE5859Subset")
Error: Failed to install 'unknown package' from GitHub:
HTTP error 403.
API rate limit exceeded for 157.32.239.55. (But here's the good news: Authenticated requests get a higher rate limit. Check out the documentation for more details.)

Rate limit remaining: 0/60
Rate limit reset at: 2020-05-26 15:42:00 UTC

To increase your GitHub API rate limit

Use usethis::browse_github_pat() to create a Personal Access Token.
Use usethis::edit_r_environ() and add the token as GITHUB_PAT.

library(GSE5859Subset)
Error in library(GSE5859Subset) :
there is no package called ‘GSE5859Subset’
data(GSE5859Subset)
Warning message:
In data(GSE5859Subset) : data set ‘GSE5859Subset’ not found

How could I get the answer of the exercises in PH525x series?

Sorry to trouble you, I'm a beginner of bioinformatics and recently I'm reading your book "PH525x series - Biomedical Data Science". I followed the chapter and did the exercises, but I can't find the answer so I came here for help. Could you give me a link to the answer? Thank you.

Can't make progression in Swirl package due to error

When founding this error I cannot make progression. I can also not go back to the menu and select another topic or skip the question. Can somebody help me?

Error in TRUE && c(TRUE, FALSE, FALSE) :
'length = 3' in coercion to 'logical(1)'

Missing RData File for the quiz

Hi! It is impossible to download the RData File for the QQ Plot Exercise . Can you share it here please or share the link. I need it to complete the quiz
Thank you for your help !

getFirehoseData fails in tcga.Rmd: cannot open the connection

The TCGA firehose data download on tcga.Rmd line 49 throws an error stating the connection cannot be opened:

> library(ph525x)
> firehose()
> library(RTCGAToolbox)
> readData = getFirehoseData (dataset="READ", runDate="20150402",forceDownload = TRUE,
+     Clinic=TRUE, Mutation=TRUE, Methylation=TRUE, RNASeq2GeneNorm=TRUE)
gdac.broadinstitute.org_READ.Clinical_Pick_Tier1.Level_4.2015040200.0.0.tar.gz
trying URL 'http://gdac.broadinstitute.org/runs/stddata__2015_04_02/data/READ/20150402/gdac.broadinstitute.org_READ.Clinical_Pick_Tier1.Level_4.2015040200.0.0.tar.gz'
Content type 'application/x-gzip' length 4754 bytes
downloaded 4754 bytes

gdac.broadinstitute.org_READ.Clinical_Pick_Tier1.Level_4.2015040200.0.0
gdac.broadinstitute.org_READ.Merge_rnaseqv2__illuminaga_rnaseqv2__unc_edu__Level_3__RSEM_genes_normalized__data.Level_3.2015040200.0.0.tar.gzgdac.broadinstitute.org_READ.Merge_rnaseqv2__illuminahiseq_rnaseqv2__unc_edu__Level_3__RSEM_genes_normalized__data.Level_3.2015040200.0.0.tar.gz
trying URL 'http://gdac.broadinstitute.org/runs/stddata__2015_04_02/data/READ/20150402/gdac.broadinstitute.org_READ.Merge_rnaseqv2__illuminaga_rnaseqv2__unc_edu__Level_3__RSEM_genes_normalized__data.Level_3.2015040200.0.0.tar.gz'
Content type 'application/x-gzip' length 5917492 bytes (5.6 MB)
downloaded 5.6 MB

gdac.broadinstitute.org_READ.Merge_rnaseqv2__illuminaga_rnaseqv2__unc_edu__Level_3__RSEM_genes_normalized__data.Level_3.2015040200.0.0
cannot open file './20150402-READ-RNAseq2GeneNorm.txt': No such file or directoryError in file(file, "rt") : cannot open the connection

Much of the following code in the section and the related course videos depend on the output of this command.

In addition, the following code block on line 53 appears to read a local path on your machine.

Here is the sessionInfo:

> sessionInfo()
R version 3.6.1 (2019-07-05)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 17763)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252    LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C                           LC_TIME=English_United States.1252    

attached base packages:
 [1] grid      tools     parallel  stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] RTCGAToolbox_2.14.0                     ph525x_0.0.48                           png_0.1-7                              
 [4] yeastCC_1.24.0                          harbChIP_1.22.0                         Biostrings_2.52.0                      
 [7] XVector_0.24.0                          ERBS_1.0                                gwascat_2.16.0                         
[10] Homo.sapiens_1.3.1                      TxDb.Hsapiens.UCSC.hg19.knownGene_3.2.2 org.Hs.eg.db_3.8.2                     
[13] GO.db_3.8.2                             OrganismDbi_1.26.0                      GenomicFeatures_1.36.4                 
[16] GenomicRanges_1.36.0                    GenomeInfoDb_1.20.0                     ggbio_1.32.0                           
[19] ggplot2_3.2.1                           AnnotationDbi_1.46.1                    IRanges_2.18.1                         
[22] S4Vectors_0.22.0                        Biobase_2.44.0                          BiocGenerics_0.30.0                    

loaded via a namespace (and not attached):
 [1] ProtGenerics_1.16.0         bitops_1.0-6                matrixStats_0.54.0          bit64_0.9-7                
 [5] RColorBrewer_1.1-2          progress_1.2.2              httr_1.4.1                  backports_1.1.4            
 [9] R6_2.4.0                    rpart_4.1-15                Hmisc_4.2-0                 DBI_1.0.0                  
[13] lazyeval_0.2.2              colorspace_1.4-1            nnet_7.3-12                 withr_2.1.2                
[17] tidyselect_0.2.5            gridExtra_2.3               prettyunits_1.0.2           GGally_1.4.0               
[21] bit_1.1-14                  curl_4.0                    compiler_3.6.1              graph_1.62.0               
[25] htmlTable_1.13.1            DelayedArray_0.10.0         rtracklayer_1.44.2          scales_1.0.0               
[29] checkmate_1.9.4             RBGL_1.60.0                 RCircos_1.2.1               stringr_1.4.0              
[33] digest_0.6.20               Rsamtools_2.0.0             foreign_0.8-71              base64enc_0.1-3            
[37] dichromat_2.0-0             pkgconfig_2.0.2             htmltools_0.3.6             limma_3.40.6               
[41] ensembldb_2.8.0             BSgenome_1.52.0             htmlwidgets_1.3             rlang_0.4.0                
[45] rstudioapi_0.10             RSQLite_2.1.2               BiocParallel_1.18.1         acepack_1.4.1              
[49] dplyr_0.8.3                 VariantAnnotation_1.30.1    RCurl_1.95-4.12             magrittr_1.5               
[53] GenomeInfoDbData_1.2.1      Formula_1.2-3               Matrix_1.2-17               Rcpp_1.0.1                 
[57] munsell_0.5.0               stringi_1.4.3               RaggedExperiment_1.8.0      RJSONIO_1.3-1.2            
[61] SummarizedExperiment_1.14.1 zlibbioc_1.30.0             plyr_1.8.4                  blob_1.2.0                 
[65] crayon_1.3.4                lattice_0.20-38             splines_3.6.1               hms_0.5.1                  
[69] zeallot_0.1.0               knitr_1.24                  pillar_1.4.2                reshape2_1.4.3             
[73] biomaRt_2.40.4              XML_3.98-1.20               glue_1.3.1                  biovizBase_1.32.0          
[77] latticeExtra_0.6-28         BiocManager_1.30.4          data.table_1.12.2           vctrs_0.2.0                
[81] gtable_0.3.0                purrr_0.3.2                 reshape_0.8.8               assertthat_0.2.1           
[85] xfun_0.9                    AnnotationFilter_1.8.0      survival_2.44-1.1           tibble_2.1.3               
[89] GenomicAlignments_1.20.1    memoise_1.1.0               cluster_2.1.0

Thanks!

add limma link in modeling lecture

http://www.ncbi.nlm.nih.gov/pubmed/16646809
http://www.statsci.org/smyth/pubs/ebayes.pdf

Can't install the packages from github

Hi, I am using R 3.3.3 in window. I was trying to install "genomicsclass/GSE5859Subset" but it always fail with error as:

installing source package 'GSE5859Subset' ...
** data
** help
No man pages found in package 'GSE5859Subset'
*** installing help indices
** building package indices
** testing if installed package can be loaded
Warning in library(pkg_name, lib.loc = lib, character.only = TRUE, logical.return = TRUE) :
no library trees found in 'lib.loc'
Error: loading failed
Execution halted
ERROR: loading failed
removing '\dtu-storage/lumye/Documents/R/win-library/3.3/GSE5859Subset'
Installation failed: Command failed (1)
Can anyone help me solve this problem? Many thanks!
Lumeng

error in Exercise 13, "Inference for High Dimensional Data"

Two small errors:

Create a Monte Carlo Simulation in which you simulate measurements from 8,793 genes for 24 samples, 12 cases and 12 controls. The for 100 genes create a difference of 1 between cases and

Change "The" to "Then"

n <- 24
m <- 8793
mat <- matrix(rnorm(n*m),m,n)
delta <- 1
positives <- 500   ###SHOULD BE 100
mat[1:positives,1:(n/2)] <- mat[1:positives,1:(n/2)]+delta

positives should be 100, or number of genes above should be 500.

Undefined objects in week5/predictions.Rmd (lines 96 and 97)

Lines 96 and 97 of the file https://github.com/genomicsclass/labs/blob/master/week5/prediction.Rmd, refer to 2 undefined objects colshat and bayesrule (see code fragment below)

points(newx,col=colshat,pch=16,cex=0.35)
contour(tmpx,tmpy,matrix(round(bayesrule),GS,GS),levels=c(1,2),add=TRUE,drawlabels=FALSE)

In your result page for that file (http://genomicsclass.github.io/book/pages/prediction.html), this error is also indicated

Permission to port over dplyr tutorial to DataFramesMeta.jl

Hello,

I am the maintainer of the Julia package DataFramesMeta.jl. It is a data manipulation package for the Julia language and it is very similar to dplyr. I would like permission to port your dplyr tutorial to DataFramesMeta.jl and host it on our documentation website.

We are getting very close to releasing version 1.0 of DataFramesMeta and as a result I'm working on tutorials to help new users get on board.

Because so many of our users will be coming from dplyr, it makes sense to not try and re-invent the wheel when it comes to tutorials and instead port over existing tutorials. Your dplyr tutorial ranks pretty high on Google search and is a nice introduction.

Can I modify your tutorial to be a tutorial for DataFramesMeta.jl and host it on our website? This pretty much just involves surface-level syntax changes, but most of the text will remain intact.

Thank you!

R.4.3.3 package installation

Greetings,

Is there any problem to install the package for R Ver. 4.3.3?
I have problem to do this!

My R Ver. 4.3.3, and Rstudio Ver. is 1.1.419

Regards,

Trouble building biocintro_5x / bioc1_summex.Rmd

@vjcitn

Building the biocintro_5x / bioc1_summex.Rmd throws an error:

Quitting from lines 109-115 (bioc1_summex.Rmd)
Error in .local(x, ...) :
unused argument (vals = list(tx_chrom = "chr14"))
Calls: ... withCallingHandlers -> withVisible -> eval -> eval -> genes -> genes

Error using minfi::plotSex in methyl/minfi.Rmd

The second to last line of methyl/minfi.Rmd generates an error:

> plotSex(sex)
Error in (function (classes, fdef, mtable)  : 
  unable to find an inherited method for function 'colData' for signature '"DataFrame"'

I implemented some small fixes to this document in a PR to fix some deprecated functions which you may wish to apply first. I do not know where this DataFrame error comes from.

The session info is below. Thanks!

> sessionInfo()
R version 3.6.1 (2019-07-05)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 17763)

Matrix products: default

Random number generation:
 RNG:     Mersenne-Twister 
 Normal:  Inversion 
 Sample:  Rounding 
 
locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252    LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C                           LC_TIME=English_United States.1252    

attached base packages:
 [1] grid      stats4    parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] IlluminaHumanMethylation450kanno.ilmn12.hg19_0.6.0 IlluminaHumanMethylation450kmanifest_0.4.0        
 [3] minfi_1.30.0                                       HistData_0.8-4                                    
 [5] broom_0.5.2                                        Lahman_7.0-1                                      
 [7] tidytext_0.2.2                                     gutenbergr_0.1.4                                  
 [9] rvest_0.3.4                                        xml2_1.2.2                                        
[11] bumphunter_1.26.0                                  locfit_1.5-9.1                                    
[13] iterators_1.0.12                                   foreach_1.4.7                                     
[15] limma_3.40.6                                       coloncancermeth_1.0                               
[17] cummeRbund_2.26.0                                  Gviz_1.28.1                                       
[19] rtracklayer_1.44.2                                 fastcluster_1.1.25                                
[21] reshape2_1.4.3                                     RSQLite_2.1.2                                     
[23] DEXSeq_1.30.0                                      RColorBrewer_1.1-2                                
[25] pasilla_1.12.0                                     sva_3.32.1                                        
[27] genefilter_1.66.0                                  mgcv_1.8-28                                       
[29] nlme_3.1-141                                       org.Hs.eg.db_3.8.2                                
[31] pheatmap_1.0.12                                    vsn_3.52.0                                        
[33] DESeq2_1.24.0                                      rafalib_1.0.0                                     
[35] GenomicAlignments_1.20.1                           Rsamtools_2.0.0                                   
[37] Biostrings_2.52.0                                  XVector_0.24.0                                    
[39] airway_1.4.0                                       SummarizedExperiment_1.14.1                       
[41] DelayedArray_0.10.0                                BiocParallel_1.18.1                               
[43] matrixStats_0.54.0                                 forcats_0.4.0                                     
[45] stringr_1.4.0                                      dplyr_0.8.3                                       
[47] purrr_0.3.2                                        readr_1.3.1                                       
[49] tidyr_0.8.3                                        tibble_2.1.3                                      
[51] tidyverse_1.2.1                                    dslabs_0.7.1                                      
[53] Cen.ele6_1.0.0                                     TxDb.Celegans.UCSC.ce6.ensGene_3.2.2              
[55] org.Ce.eg.db_3.8.2                                 GO.db_3.8.2                                       
[57] OrganismDbi_1.26.0                                 GenomicFeatures_1.36.4                            
[59] AnnotationDbi_1.46.1                               Biobase_2.44.0                                    
[61] GenomicRanges_1.36.0                               GenomeInfoDb_1.20.0                               
[63] IRanges_2.18.1                                     S4Vectors_0.22.0                                  
[65] ERBS_1.0                                           erbsViz_0.0.0.9000                                
[67] juxtaPack_0.0.0.9000                               ggbio_1.32.0                                      
[69] ggplot2_3.2.1                                      BiocGenerics_0.30.0                               
[71] usethis_1.5.1                                     

loaded via a namespace (and not attached):
  [1] rappdirs_0.3.1           SnowballC_0.6.0          GGally_1.4.0             pkgmaker_0.27            acepack_1.4.1           
  [6] bit64_0.9-7              knitr_1.24               data.table_1.12.2        rpart_4.1-15             hwriter_1.3.2           
 [11] GEOquery_2.52.0          RCurl_1.95-4.12          AnnotationFilter_1.8.0   generics_0.0.2           snow_0.4-3              
 [16] preprocessCore_1.46.0    callr_3.3.1              commonmark_1.7           bit_1.1-14               tokenizers_0.2.1        
 [21] lubridate_1.7.4          assertthat_0.2.1         xfun_0.9                 hms_0.5.1                scrime_1.3.5            
 [26] fansi_0.4.0              progress_1.2.2           readxl_1.3.1             DBI_1.0.0                geneplotter_1.62.0      
 [31] htmlwidgets_1.3          reshape_0.8.8            selectr_0.4-1            backports_1.1.4          annotate_1.62.0         
 [36] textdata_0.3.0           biomaRt_2.40.4           vctrs_0.2.0              remotes_2.1.0            ensembldb_2.8.0         
 [41] withr_2.1.2              triebeard_0.3.0          BSgenome_1.52.0          checkmate_1.9.4          prettyunits_1.0.2       
 [46] mclust_5.4.5             cluster_2.1.0            lazyeval_0.2.2           crayon_1.3.4             pkgconfig_2.0.2         
 [51] labeling_0.3             pkgload_1.0.2            ProtGenerics_1.16.0      nnet_7.3-12              devtools_2.1.0          
 [56] rlang_0.4.0              registry_0.5-1           affyio_1.54.0            modelr_0.1.5             dichromat_2.0-0         
 [61] cellranger_1.1.0         rprojroot_1.3-2          graph_1.62.0             rngtools_1.4             base64_2.0              
 [66] Matrix_1.2-17            urltools_1.7.3           Rhdf5lib_1.6.0           base64enc_0.1-3          whisker_0.4             
 [71] processx_3.4.1           clisymbols_1.2.0         bitops_1.0-6             DelayedMatrixStats_1.6.0 blob_1.2.0              
 [76] doRNG_1.7.1              nor1mix_1.3-0            scales_1.0.0             memoise_1.1.0            magrittr_1.5            
 [81] plyr_1.8.4               hexbin_1.27.3            bibtex_0.4.2             zlibbioc_1.30.0          compiler_3.6.1          
 [86] illuminaio_0.26.0        cli_1.1.0                affy_1.62.0              janeaustenr_0.1.5        ps_1.3.0                
 [91] htmlTable_1.13.1         Formula_1.2-3            MASS_7.3-51.4            tidyselect_0.2.5         stringi_1.4.3           
 [96] askpass_1.1              latticeExtra_0.6-28      VariantAnnotation_1.30.1 tools_3.6.1              rstudioapi_0.10         
[101] foreign_0.8-71           git2r_0.26.1             gridExtra_2.3            digest_0.6.20            BiocManager_1.30.4      
[106] quadprog_1.5-7           Rcpp_1.0.1               siggenes_1.58.0          httr_1.4.1               biovizBase_1.32.0       
[111] colorspace_1.4-1         XML_3.98-1.20            fs_1.3.1                 splines_3.6.1            RBGL_1.60.0             
[116] statmod_1.4.32           multtest_2.40.0          sessioninfo_1.1.1        xtable_1.8-4             jsonlite_1.6            
[121] zeallot_0.1.0            testthat_2.2.1           R6_2.4.0                 Hmisc_4.2-0              pillar_1.4.2            
[126] htmltools_0.3.6          glue_1.3.1               beanplot_1.2             codetools_0.2-16         pkgbuild_1.0.5          
[131] utf8_1.1.4               lattice_0.20-38          curl_4.0                 openssl_1.4.1            survival_2.44-1.1       
[136] roxygen2_6.1.1           desc_1.2.0               munsell_0.5.0            rhdf5_2.28.0             GenomeInfoDbData_1.2.1  
[141] HDF5Array_1.12.2         haven_2.1.1              gtable_0.3.0

Trouble building biocintro_5x / bioc1_LiftOver.Rmd

@vjcitn
Building the biocintro_5x / bioc1_LiftOver.Rmd throws an error:

Quitting from lines 66-70 (bioc1_liftOver.Rmd)
Error in seqlevels<-(*tmp*, force = TRUE, value = "chr1") :
unused argument (force = TRUE)
Calls: ... handle -> withCallingHandlers -> withVisible -> eval -> eval

Dilution data set missing for week4

different result of t.test in type II error simulation

controlPopulation <- filter(dat,Sex == "F" & Diet == "chow") %>%
select(Bodyweight) %>% unlist

hfPopulation <- filter(dat,Sex == "F" & Diet == "hf") %>%
select(Bodyweight) %>% unlist

mu_hf <- mean(hfPopulation)
mu_control <- mean(controlPopulation)

Seeing that this material was last edited 4 years ago, then I think that there is a logarithmic difference in the 'set.seed()' function.

Glad if you help me

Use of `download()` without `library(downloader)` or `downloader`

At: https://genomicsclass.github.io/book/pages/permutation_tests_exercises.html

The line:

download(url, destfile=filename)

gives an error could not find function "download"; should either be downloader::download(url, destfile=filename) or download.file

Installing Bioconductor

installing_Bioconductor_finding_help.Rmd:77 with -> without
installing_Bioconductor_finding_help.Rmd:79-80 +library(geneplotter)
this time evaluated so that plotMA works

getGEO commands in multiple Rmds give HTTP 404 error

I'm from HarvardX and assigned to test and update these courses for rerelease. I'm having trouble running several of the Rmd files and the associated code in the videos due to getGEO issues. Downloading files gives HTTP 404 issues:

For example, this code from "biocintro_5x/dataman2017.Rmd" gives such an error:

library(GEOquery)
glioMA <- getGEO("GSE78703")[[1]]`
> Error in open.connection(x, "rb") : HTTP error 404.`

Here's my session info if needed:

sessionInfo()
R version 3.6.1 (2019-07-05)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 17134)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] AnnotationDbi_1.46.0 IRanges_2.18.1       S4Vectors_0.22.0     GEOquery_2.52.0      data.table_1.12.2   
[6] Biobase_2.44.0       BiocGenerics_0.30.0 

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.1         pillar_1.4.2       compiler_3.6.1     BiocManager_1.30.4 bitops_1.0-6      
 [6] tools_3.6.1        digest_0.6.20      zeallot_0.1.0      bit_1.1-14         memoise_1.1.0     
[11] RSQLite_2.1.2      tibble_2.1.3       pkgconfig_2.0.2    rlang_0.4.0        DBI_1.0.0         
[16] rstudioapi_0.10    yaml_2.2.0         curl_4.0           xfun_0.8           dplyr_0.8.3       
[21] knitr_1.23         xml2_1.2.1         vctrs_0.2.0        hms_0.5.0          bit64_0.9-7       
[26] tidyselect_0.2.5   glue_1.3.1         R6_2.4.0           limma_3.40.6       tidyr_0.8.3       
[31] readr_1.3.1        purrr_0.3.2        blob_1.2.0         magrittr_1.5       backports_1.1.4   
[36] assertthat_0.2.1   RCurl_1.95-4.12    crayon_1.3.4

too many stripcharts plotted in ranktest

gbm, assayNames errors in dataman2017.Rmd

Two errors are present that break the code in the "Working with TCGA mutation data" section.

When defining the gbm object in dataman2017.Rmd, there are errors. The gbm object is still defined, but I am not sure it is successfully updated.

>gbm = updateObject(gbm)
>gbm
A MultiAssayExperiment object of 12 listed
 experiments with user-defined names and respective classes. 
 Containing an Error in vapply(object, FUN = function(obj) { : values must be length 1,
 but FUN(X[[3]]) result is length 0
Error during wrapup: cannot get a slot ("slots") from an object of type "NULL"

This may be related to a downstream error in assayNames. :

> mut = experiments(gbm)[["Mutations"]]
> head(assayNames(mut))
Error in (function (classes, fdef, mtable)  : 
  unable to find an inherited method for function 'assayNames' for signature '"RangedRaggedAssay"'

Later components of the TCGA code rely on mut and cannot be performed due to current errors.

Error in knitr of all RMD files

I'm trying to use the RMD files associated with the Statistics & R course, and I keep getting an error on the first chunk of code:

opts_chunk$set(fig.path=paste0("figure/", sub("(.*).Rmd","\\1",basename(knitr:::knit_concord$get('infile'))), "-"))

Error says:
Error in basename(knitr:::knit_concord$get("infile")) : a character vector argument expected

Any suggestions?

Trouble building biocintro_5x / bioc1_iranges.Rmd

@vjcitn
Building the biocintro_5x / bioc1_iranges.Rmd throws an error:

Error in elementLengths(grl) : could not find function "elementLengths"
Calls: ... handle -> withCallingHandlers -> withVisible -> eval -> eval
Execution halted

elementLengths() has been replaced with elementNROWS() in the latest IRanges package.

Errors in packages.R

On line 36 (https://github.com/genomicsclass/labs/blob/master/packages.R#L36), it should read:

devtools::install_github("stephaniehicks/BackgroundExperimentYeast")

On line 48 (https://github.com/genomicsclass/labs/blob/master/packages.R#L48), it should read:

devtools::install_github("stephaniehicks/mycAffyData")

prediction: object 'colshat' not found

present contrast vector as C^T, transpose of a column vector

Typos in labs/course3/machine_learning.Rmd

Update for the new version (3/15/2015):
line 131: It should have len=80 -- fixed
line 295: "should" should be "shade" -- now line 314
line 313: "specif" should be "specific" -- now line 332
line 317: There's something missing. Right now it looks like this:
In the code above you will notice that we created two sets data"" -- now line 336
line 426: \mobx should be \mbox -- now line 445

line 82: Not a typo, but there's a huge output from loading the SpikeIn library. You can suppress it by adding the chunk option message=FALSE. -- fixed

(There are some others, but I have to stop here.)

New run of the courses?

Hi. I am not sure whether this is a good place to ask, but I am wondering will there be a new run of the PH525 courses next year? I finished 3 courses in the Data Analysis for Genomics Certificate this year and I am interested in taking the other 4 courses in the coming year if possible. Thanks.

Ambiguous sentence in advinference/multiple_testing.Rmd

(Sorry if this is not the appropriate channel for reporting issues regarding potential typos etc.)

I got confused by the wording "This implies that with a 0.05 p-value cut-off, out of the 100 tests we incorrectly call between 4 and 5 significant on average. " in this line:

labs/advinference/multiple_testing.Rmd

Line 269 in c15a1a7

 The FDR is relatively high here. This is because for 90% of the tests, the null hypotheses is true. This implies that with a 0.05 p-value cut-off, out of the 100 tests we incorrectly call between 4 and 5 significant on average. This combined with the fact that we don't "catch" all the cases where the alternative is true, gives us a relatively high FDR. So how can we control this? What if we want lower FDR, say 5%? 

Stating "the 100 tests" made me initially think that the number 100 was supposed to refer to the number of experiments in the Monte Carlo simulation, which was obviously wrong since there are 10,000 experiments/tests for each replication. Did the authors mean something to this effect(?):

Since there are 9000 tests where the null hypothesis is true and the chosen significance level is 0.05, it follows that we incorrectly call between 400 and 500 tests significant on average (5% of 9000 equals 450) .

Issues in section "NGS experiments and the Poisson distribution"

I see several issues in this section:

The section (in the printed book p. 285) states "Assuming most genes are differentially expressed across individuals, then, if the Poisson model is appropriate, there should be a linear relationship in this plot." It is not explained why this should be so. Is it referring to the mean and standard deviation in the Poisson distribution, both being lambda? But that wasn't covered in the course.
The plot which is then generated (in thunk "var_vs_mean") displays variance against means. I think this should be standard deviation? Indeed, plotting sd against means shows a pretty linear picture, with the diagonal cutting through the middle.
In the paragraph following the figure it is unclear what the "this" refers to: "The reason for this is that the variability plotted here includes biological variability [...]." Does the "this" refer to linearity or the absence of linearity?
That paragraph introduces the concepts of "biological variability" and "sampling variability" as if they had been discussed previously. I don't think they are defined at any earlier point in the book. Also, given the seemingly quite linear relation, does this point still hold water?

PS: thanks for the great course and the book!

reduce not necessary for summarizeOverlaps

it says here to reduce() the exonsBy() object in order to avoid duplicate counting, but this is not necessary

https://github.com/genomicsclass/labs/blob/master/course4/HPCami.Rmd

one probeID to multi gene symbols

In mapping_features.Rmd
idx <- match(rownames(e), res$PROBEID)

The method match will choose the first one when there are more than one gene symbols automatically.

stack1kg error in dataman2017.Rmd

The stack1kg function does not run successfully:

>library(ldblock)
>sta = stack1kg()
Error in validObject(.Object) : 
  invalid class “VcfStack” object: all rownames(object) must be in seqlevels(object)

The content in the textbook section "1000 Genomes VCF in the cloud" depends on the sta object produced by running this function with no arguments.

> sessionInfo()
R version 3.6.1 (2019-07-05)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 17763)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252    LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C                           LC_TIME=English_United States.1252    

attached base packages:
 [1] grid      stats4    parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] ldblock_1.14.0                                     HDF5Array_1.12.2                                  
 [3] rhdf5_2.28.0                                       ArrayExpress_1.44.0                               
 [5] magrittr_1.5                                       dplyr_0.8.3                                       
 [7] bigrquery_1.2.0                                    VariantTools_1.26.0                               
 [9] VariantAnnotation_1.30.1                           RaggedExperiment_1.8.0                            
[11] MultiAssayExperiment_1.10.4                        GenomicAlignments_1.20.1                          
[13] BiocStyle_2.12.0                                   IlluminaHumanMethylation450kanno.ilmn12.hg19_0.6.0
[15] IlluminaHumanMethylation450kmanifest_0.4.0         minfi_1.30.0                                      
[17] bumphunter_1.26.0                                  locfit_1.5-9.1                                    
[19] iterators_1.0.12                                   foreach_1.4.7                                     
[21] GSE5859Subset_1.0                                  airway_1.4.0                                      
[23] ph525x_0.0.48                                      png_0.1-7                                         
[25] RNAseqData.HNRNPC.bam.chr14_0.22.0                 erma_1.0.0                                        
[27] GenomicFiles_1.20.0                                rtracklayer_1.44.2                                
[29] Rsamtools_2.0.0                                    Biostrings_2.52.0                                 
[31] XVector_0.24.0                                     SummarizedExperiment_1.14.1                       
[33] DelayedArray_0.10.0                                BiocParallel_1.18.0                               
[35] matrixStats_0.54.0                                 Homo.sapiens_1.3.1                                
[37] TxDb.Hsapiens.UCSC.hg19.knownGene_3.2.2            org.Hs.eg.db_3.8.2                                
[39] GO.db_3.8.2                                        OrganismDbi_1.26.0                                
[41] GenomicFeatures_1.36.4                             GenomicRanges_1.36.0                              
[43] GenomeInfoDb_1.20.0                                GEOquery_2.52.0                                   
[45] data.table_1.12.2                                  knitr_1.24                                        
[47] geneplotter_1.62.0                                 annotate_1.62.0                                   
[49] XML_3.98-1.20                                      AnnotationDbi_1.46.0                              
[51] IRanges_2.18.1                                     S4Vectors_0.22.0                                  
[53] lattice_0.20-38                                    Biobase_2.44.0                                    
[55] BiocGenerics_0.30.0                               

loaded via a namespace (and not attached):
  [1] snow_0.4-3               backports_1.1.4          plyr_1.8.4               lazyeval_0.2.2          
  [5] oligo_1.48.0             splines_3.6.1            ggplot2_3.2.1            digest_0.6.20           
  [9] htmltools_0.3.6          memoise_1.1.0            BSgenome_1.52.0          limma_3.40.6            
 [13] readr_1.3.1              askpass_1.1              siggenes_1.58.0          prettyunits_1.0.2       
 [17] colorspace_1.4-1         blob_1.2.0               xfun_0.8                 jsonlite_1.6            
 [21] crayon_1.3.4             RCurl_1.95-4.12          graph_1.62.0             genefilter_1.66.0       
 [25] zeallot_0.1.0            survival_2.44-1.1        glue_1.3.1               registry_0.5-1          
 [29] gtable_0.3.0             zlibbioc_1.30.0          Rhdf5lib_1.6.0           scales_1.0.0            
 [33] DBI_1.0.0                rngtools_1.4             bibtex_0.4.2             Rcpp_1.0.1              
 [37] xtable_1.8-4             progress_1.2.2           bit_1.1-14               mclust_5.4.5            
 [41] preprocessCore_1.46.0    httr_1.4.1               RColorBrewer_1.1-2       ff_2.2-14               
 [45] pkgconfig_2.0.2          reshape_0.8.8            labeling_0.3             reshape2_1.4.3          
 [49] tidyselect_0.2.5         rlang_0.4.0              later_0.8.0              munsell_0.5.0           
 [53] tools_3.6.1              RSQLite_2.1.2            evaluate_0.14            stringr_1.4.0           
 [57] yaml_2.2.0               bit64_0.9-7              oligoClasses_1.46.0      beanplot_1.2            
 [61] scrime_1.3.5             purrr_0.3.2              RBGL_1.60.0              nlme_3.1-141            
 [65] doRNG_1.7.1              mime_0.7                 nor1mix_1.3-0            xml2_1.2.2              
 [69] biomaRt_2.40.3           compiler_3.6.1           rstudioapi_0.10          curl_4.0                
 [73] affyio_1.54.0            tibble_2.1.3             stringi_1.4.3            Matrix_1.2-17           
 [77] multtest_2.40.0          vctrs_0.2.0              pillar_1.4.2             BiocManager_1.30.4      
 [81] snpStats_1.34.0          bitops_1.0-6             httpuv_1.5.1             R6_2.4.0                
 [85] promises_1.0.1           affxparser_1.56.0        codetools_0.2-16         MASS_7.3-51.4           
 [89] assertthat_0.2.1         openssl_1.4.1            pkgmaker_0.27            withr_2.1.2             
 [93] GenomeInfoDbData_1.2.1   hms_0.5.0                quadprog_1.5-7           tidyr_0.8.3             
 [97] base64_2.0               rmarkdown_1.14           DelayedMatrixStats_1.6.0 illuminaio_0.26.0       
[101] shiny_1.3.2

Getting Started Exercises question #1 sentence malformed

From http://genomicsclass.github.io/book/pages/getting_started_exercises.html - question 1 doesn't make sense:

"Read in the file femaleMiceWeights.csv and report the body weight of the mouse in the exact name of the column containing the weights."

Perhaps it meant:

"Read in the file femaleMiceWeights.csv and report a) the body weights of all the mice, and b) the exact name of the column containing the weights."

Also, is the source for the exercises in this repo? I could find the getting started exercises.

Duplicated questions in section: Direct Approach to FDR and q-values (Advanced)

15 == 16, and 17 == 18:

What are the false negative rates for p.adjust?

What are the false negative rates for p.adjust?

What are the false negative rates for qvalues?

What are the false negative rates for qvalues?

add MDS references

File for association test exercises missing

File http://genomicsclass.github.io/book/pages/assoctest.csv does not exist.

(Linked to from http://genomicsclass.github.io/book/pages/association_tests_exercises.html)

The file is found at https://studio.edx.org/c4x/HarvardX/PH525.1x/asset/assoctest.csv

Incomplete sentence in Factor Analysis chapter

In the first paragraph:

labs/batch/factor_analysis.Rmd

Line 13 in 544a5dc

 Before we introduce the next type of statistical method for batch effect correction, we introduce the statistical idea that motivates the main idea: Factor Analysis. Factor Analysis was first developed over a century ago. Karl Pearson noted that correlation between different subjects when the correlation was computed across students. To explain this, he posed a model having one factor that was common across subjects for each student that explained this correlation: 

The incomplete sentence is: "Karl Pearson noted that correlation between different subjects when the correlation was computed across students."

I don't know what this sentence is supposed to say, so I will not attempt to fix it.

Incidentally there is also a typo in the following equation,

labs/batch/factor_analysis.Rmd

Line 16 in 544a5dc

Y_ij = \alpha_i W_1 + \varepsilon_{ij}

, where Y_ij should be Y_{ij}

Dplyr select() returning error

Hi all,

Running the below code causes this error:
Error i select(., Bodyweight) : unused argument (Bodyweight)

Here is the full code:

library(rafalib)
library(downloader)
library(devtools)
library(dplyr)

install_github("genomicsclass/dagdata")
dir <- system.file(package="dagdata")
filename <- file.path(dir,"extdata/mice_pheno.csv")
dat <- read.csv(filename)

controlPopulation <- filter(dat,Sex == "F" & Diet == "chow") %>% select(Bodyweight) %>% unlist

I am running the following R installation through RStudio:

R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-apple-darwin13.4.0 (64-bit)

I attach a screenshot from RStudio too.

URL for csv file missing in dplyr exercises

See the first paragraph here - after "Download the CSV file from this location:" is blank. URL should be:

https://raw.githubusercontent.com/genomicsclass/dagdata/master/inst/extdata/msleep_ggplot2.csv

genomicsclass / labs Goto Github PK

labs's People

Contributors

Stargazers

Watchers

Forkers

labs's Issues

this time evaluated so that plotMA works

Recommend Projects

Recommend Topics

Recommend Org