genomicsclass / labs Goto Github PK
View Code? Open in Web Editor NEWRmd source files for the HarvardX series PH525x
Home Page: http://genomicsclass.github.io/book
License: MIT License
Rmd source files for the HarvardX series PH525x
Home Page: http://genomicsclass.github.io/book
License: MIT License
Lines 875-877 of dataman_2019.Rmd generate a 404 not found error after authenticating with Google BigQuery and returning to R as directed by the browser:
tcgaCon %>% tbl("Somatic_Mutation") %>% dplyr::filter(project_short_name=="TCGA-GBM") %>%
dplyr::select(Variant_Classification, Hugo_Symbol) %>% group_by(Variant_Classification) %>%
summarise(n=n())
Error: HTTP error [404] Not Found
Is this the appropriate workflow? If so, what do learners need to know or do in order to not encounter this 404 error?
> sessionInfo()
R version 3.6.1 (2019-07-05)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 17763)
Matrix products: default
locale:
[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C LC_TIME=English_United States.1252
attached base packages:
[1] grid stats4 parallel stats graphics grDevices utils datasets methods base
other attached packages:
[1] magrittr_1.5 dplyr_0.8.3
[3] bigrquery_1.2.0 RaggedExperiment_1.8.0
[5] curatedTCGAData_1.6.0 MultiAssayExperiment_1.10.4
[7] VariantTools_1.26.0 VariantAnnotation_1.30.1
[9] ph525x_0.0.48 png_0.1-7
[11] ldblock_1.14.2 erma_1.0.0
[13] Homo.sapiens_1.3.1 TxDb.Hsapiens.UCSC.hg19.knownGene_3.2.2
[15] OrganismDbi_1.26.0 GenomicFeatures_1.36.4
[17] GenomicAlignments_1.20.1 GenomicFiles_1.20.0
[19] rtracklayer_1.44.2 Rsamtools_2.0.0
[21] RNAseqData.HNRNPC.bam.chr14_0.22.0 IlluminaHumanMethylation450kanno.ilmn12.hg19_0.6.0
[23] IlluminaHumanMethylation450kmanifest_0.4.0 minfi_1.30.0
[25] bumphunter_1.26.0 locfit_1.5-9.1
[27] iterators_1.0.12 foreach_1.4.7
[29] Biostrings_2.52.0 XVector_0.24.0
[31] data.table_1.12.2 GO.db_3.8.2
[33] org.Hs.eg.db_3.8.2 airway_1.4.0
[35] SummarizedExperiment_1.14.1 DelayedArray_0.10.0
[37] BiocParallel_1.18.1 matrixStats_0.54.0
[39] GenomicRanges_1.36.0 GenomeInfoDb_1.20.0
[41] ArrayExpress_1.44.0 GEOquery_2.52.0
[43] annotate_1.62.0 XML_3.98-1.20
[45] AnnotationDbi_1.46.1 IRanges_2.18.1
[47] S4Vectors_0.22.0 Biobase_2.44.0
[49] BiocGenerics_0.30.0 GSE5859Subset_1.0
loaded via a namespace (and not attached):
[1] tidyselect_0.2.5 RSQLite_2.1.2 munsell_0.5.0 codetools_0.2-16
[5] preprocessCore_1.46.0 withr_2.1.2 colorspace_1.4-1 knitr_1.24
[9] rstudioapi_0.10 labeling_0.3 GenomeInfoDbData_1.2.1 bit64_0.9-7
[13] rhdf5_2.28.0 vctrs_0.2.0 xfun_0.9 BiocFileCache_1.8.0
[17] affxparser_1.56.0 R6_2.4.0 illuminaio_0.26.0 AnnotationFilter_1.8.0
[21] bitops_1.0-6 reshape_0.8.8 assertthat_0.2.1 promises_1.0.1
[25] scales_1.0.0 gtable_0.3.0 ensembldb_2.8.0 rlang_0.4.0
[29] zeallot_0.1.0 genefilter_1.66.0 splines_3.6.1 lazyeval_0.2.2
[33] gargle_0.3.1 BiocManager_1.30.4 yaml_2.2.0 reshape2_1.4.3
[37] snpStats_1.34.0 backports_1.1.4 httpuv_1.5.1 RBGL_1.60.0
[41] tools_3.6.1 nor1mix_1.3-0 ggplot2_3.2.1 affyio_1.54.0
[45] ff_2.2-14 RColorBrewer_1.1-2 siggenes_1.58.0 Rcpp_1.0.1
[49] plyr_1.8.4 progress_1.2.2 zlibbioc_1.30.0 purrr_0.3.2
[53] RCurl_1.95-4.12 prettyunits_1.0.2 openssl_1.4.1 fs_1.3.1
[57] ProtGenerics_1.16.0 hms_0.5.1 mime_0.7 xtable_1.8-4
[61] mclust_5.4.5 gridExtra_2.3 compiler_3.6.1 biomaRt_2.40.4
[65] tibble_2.1.3 crayon_1.3.4 htmltools_0.3.6 later_0.8.0
[69] snow_0.4-3 tidyr_0.8.3 oligo_1.48.0 DBI_1.0.0
[73] ExperimentHub_1.10.0 dbplyr_1.4.2 MASS_7.3-51.4 rappdirs_0.3.1
[77] EnsDb.Hsapiens.v75_2.99.0 Matrix_1.2-17 readr_1.3.1 quadprog_1.5-7
[81] pkgconfig_2.0.2 registry_0.5-1 xml2_1.2.2 rngtools_1.4
[85] pkgmaker_0.27 multtest_2.40.0 beanplot_1.2 bibtex_0.4.2
[89] doRNG_1.7.1 scrime_1.3.5 stringr_1.4.0 digest_0.6.20
[93] graph_1.62.0 base64_2.0 DelayedMatrixStats_1.6.0 curl_4.0
[97] shiny_1.3.2 jsonlite_1.6 nlme_3.1-141 Rhdf5lib_1.6.0
[101] askpass_1.1 limma_3.40.6 BSgenome_1.52.0 pillar_1.4.2
[105] lattice_0.20-38 httr_1.4.1 survival_2.44-1.1 interactiveDisplayBase_1.22.0
[109] glue_1.3.1 UpSetR_1.4.0 bit_1.1-14 stringi_1.4.3
[113] HDF5Array_1.12.2 blob_1.2.0 oligoClasses_1.46.0 AnnotationHub_2.16.1
[117] memoise_1.1.0
Thanks!
Line 181 of bioc2_integExamps.Rmd generates the following error:
> phset = lapply( ovrngs, function(x)
+ unique( gwrngs19[ which(gwrngs19 %over% x) ]$Disease.Trait ) )
Error in getListElement(x, i, ...) :
GRanges objects don't support [[, as.list(), lapply(), or
unlist() at the moment
Thanks! Here is the sessionInfo
:
R version 3.6.1 (2019-07-05)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 17763)
Matrix products: default
locale:
[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C
[5] LC_TIME=English_United States.1252
attached base packages:
[1] tools grid stats4 parallel stats graphics grDevices utils
[9] datasets methods base
other attached packages:
[1] curatedTCGAData_1.6.0
[2] harbChIP_1.22.0
[3] yeastCC_1.24.0
[4] gwascat_2.16.0
[5] ERBS_1.0
[6] magrittr_1.5
[7] dplyr_0.8.3
[8] bigrquery_1.2.0
[9] VariantTools_1.26.0
[10] VariantAnnotation_1.30.1
[11] RaggedExperiment_1.8.0
[12] MultiAssayExperiment_1.10.4
[13] GenomicAlignments_1.20.1
[14] BiocStyle_2.12.0
[15] IlluminaHumanMethylation450kanno.ilmn12.hg19_0.6.0
[16] IlluminaHumanMethylation450kmanifest_0.4.0
[17] minfi_1.30.0
[18] bumphunter_1.26.0
[19] locfit_1.5-9.1
[20] iterators_1.0.12
[21] foreach_1.4.7
[22] annotate_1.62.0
[23] XML_3.98-1.20
[24] GSE5859Subset_1.0
[25] airway_1.4.0
[26] ph525x_0.0.48
[27] png_0.1-7
[28] RNAseqData.HNRNPC.bam.chr14_0.22.0
[29] erma_1.0.0
[30] GenomicFiles_1.20.0
[31] rtracklayer_1.44.2
[32] Rsamtools_2.0.0
[33] Biostrings_2.52.0
[34] XVector_0.24.0
[35] SummarizedExperiment_1.14.1
[36] DelayedArray_0.10.0
[37] BiocParallel_1.18.1
[38] matrixStats_0.54.0
[39] Homo.sapiens_1.3.1
[40] TxDb.Hsapiens.UCSC.hg19.knownGene_3.2.2
[41] org.Hs.eg.db_3.8.2
[42] GO.db_3.8.2
[43] OrganismDbi_1.26.0
[44] GenomicFeatures_1.36.4
[45] GenomicRanges_1.36.0
[46] GenomeInfoDb_1.20.0
[47] AnnotationDbi_1.46.1
[48] IRanges_2.18.1
[49] S4Vectors_0.22.0
[50] GEOquery_2.52.0
[51] data.table_1.12.2
[52] Biobase_2.44.0
[53] BiocGenerics_0.30.0
loaded via a namespace (and not attached):
[1] tidyselect_0.2.5 RSQLite_2.1.2
[3] devtools_2.1.0 munsell_0.5.0
[5] codetools_0.2-16 preprocessCore_1.46.0
[7] withr_2.1.2 colorspace_1.4-1
[9] knitr_1.24 rstudioapi_0.10
[11] labeling_0.3 GenomeInfoDbData_1.2.1
[13] bit64_0.9-7 rhdf5_2.28.0
[15] rprojroot_1.3-2 vctrs_0.2.0
[17] xfun_0.9 BiocFileCache_1.8.0
[19] R6_2.4.0 illuminaio_0.26.0
[21] AnnotationFilter_1.8.0 bitops_1.0-6
[23] reshape_0.8.8 assertthat_0.2.1
[25] promises_1.0.1 scales_1.0.0
[27] gtable_0.3.0 processx_3.4.1
[29] ensembldb_2.8.0 rlang_0.4.0
[31] zeallot_0.1.0 genefilter_1.66.0
[33] splines_3.6.1 lazyeval_0.2.2
[35] gargle_0.3.1 BiocManager_1.30.4
[37] yaml_2.2.0 snpStats_1.34.0
[39] backports_1.1.4 httpuv_1.5.1
[41] RBGL_1.60.0 usethis_1.5.1
[43] nor1mix_1.3-0 ggplot2_3.2.1
[45] RColorBrewer_1.1-2 siggenes_1.58.0
[47] sessioninfo_1.1.1 Rcpp_1.0.1
[49] plyr_1.8.4 progress_1.2.2
[51] zlibbioc_1.30.0 purrr_0.3.2
[53] RCurl_1.95-4.12 ps_1.3.0
[55] prettyunits_1.0.2 openssl_1.4.1
[57] fs_1.3.1 ProtGenerics_1.16.0
[59] pkgload_1.0.2 hms_0.5.1
[61] mime_0.7 evaluate_0.14
[63] xtable_1.8-4 mclust_5.4.5
[65] gridExtra_2.3 testthat_2.2.1
[67] compiler_3.6.1 biomaRt_2.40.4
[69] tibble_2.1.3 crayon_1.3.4
[71] htmltools_0.3.6 later_0.8.0
[73] tidyr_0.8.3 ldblock_1.14.2
[75] DBI_1.0.0 ExperimentHub_1.10.0
[77] dbplyr_1.4.2 rappdirs_0.3.1
[79] MASS_7.3-51.4 EnsDb.Hsapiens.v75_2.99.0
[81] Matrix_1.2-17 readr_1.3.1
[83] cli_1.1.0 quadprog_1.5-7
[85] pkgconfig_2.0.2 registry_0.5-1
[87] xml2_1.2.2 rngtools_1.4
[89] pkgmaker_0.27 multtest_2.40.0
[91] beanplot_1.2 bibtex_0.4.2
[93] doRNG_1.7.1 scrime_1.3.5
[95] stringr_1.4.0 callr_3.3.1
[97] digest_0.6.20 graph_1.62.0
[99] rmarkdown_1.15 base64_2.0
[101] DelayedMatrixStats_1.6.0 curl_4.0
[103] shiny_1.3.2 nlme_3.1-141
[105] jsonlite_1.6 Rhdf5lib_1.6.0
[107] desc_1.2.0 askpass_1.1
[109] limma_3.40.6 BSgenome_1.52.0
[111] pillar_1.4.2 lattice_0.20-38
[113] httr_1.4.1 pkgbuild_1.0.4
[115] survival_2.44-1.1 interactiveDisplayBase_1.22.0
[117] glue_1.3.1 remotes_2.1.0
[119] UpSetR_1.4.0 bit_1.1-14
[121] stringi_1.4.3 HDF5Array_1.12.2
[123] blob_1.2.0 AnnotationHub_2.16.0
[125] memoise_1.1.0
Could you release an ePub version? When I compiled it by myself, there were lots of errors. Maybe because some files are deprecated. Thank you.
Hello! Thanks so much for the fantastic materials here. I am wondering if there are solutions for the exercises in the book Data Analysis for the Life Sciences? I think it will be helpful for the readers to verify the answers.
Is there a GSE5859 available for R version 3.3.1
When running the following command to install the package
biocLite('GSE5859')
I get the following error
Warning message:
package ‘GSE5859’ is not available (for R version 3.3.1)
thanks!
The text and code from line 106 (https://github.com/genomicsclass/labs/blob/master/week3/montecarlo.Rmd#L106) up until the end of the file, does not seem to belong there (at least it is not used/mentioned in the lecture videos). They might be remnants of an alternative example for the "Inference" and "Permutation" lectures.
Hi, Prof. Rafa!
I'm using R 3.6.3 version and doing some false negative demonstration based on edX PH525x course.
i'm using exactly same code with the lecture video and the book. Here is it
`
dat <- read.csv("mice_pheno.csv")
controlPopulation <- filter(dat,Sex == "F" & Diet == "chow") %>%
select(Bodyweight) %>% unlist
hfPopulation <- filter(dat,Sex == "F" & Diet == "hf") %>%
select(Bodyweight) %>% unlist
mu_hf <- mean(hfPopulation)
mu_control <- mean(controlPopulation)
mu_hf - mu_control
[1] 2.375517
(mu_hf - mu_control)/mu_control * 100 # percent increase
[1] 9.942157
'
So far the result still the same with the video.
After that:
'
set.seed(1)
N <- 5
hf <- sample(hfPopulation,N)
control <- sample(controlPopulation,N)
t.test(hf,control)$p.value
the result supposed to be
0.1410204, but my result is
0.5806661`. I retried for several times and several generating value method, but the result hasn't changed.
Seeing that this material was last edited 4 years ago, then I think that there is a logarithmic difference in the 'set.seed()' function.
Glad if you help me
Hello,
I am stuck for the past two days. I need to use genomicsclass/GSE5859Subset and
for that I have installed "devtools". I have also installed Rtools (Rtools34) from CRAN. I am running version 3.6.0 of RStudio.
I get these warning and error messages. I cannot use the GSE5859Subset dataset. Any help would be greatly appreciated.
library(devtools)
Loading required package: usethis
Warning messages:
1: package ‘devtools’ was built under R version 3.6.3
2: package ‘usethis’ was built under R version 3.6.3
install_github("genomicsclass/GSE5859Subset")
Error: Failed to install 'unknown package' from GitHub:
HTTP error 403.
API rate limit exceeded for 157.32.239.55. (But here's the good news: Authenticated requests get a higher rate limit. Check out the documentation for more details.)
Rate limit remaining: 0/60
Rate limit reset at: 2020-05-26 15:42:00 UTC
To increase your GitHub API rate limit
usethis::browse_github_pat()
to create a Personal Access Token.usethis::edit_r_environ()
and add the token as GITHUB_PAT
.library(GSE5859Subset)
Error in library(GSE5859Subset) :
there is no package called ‘GSE5859Subset’
data(GSE5859Subset)
Warning message:
In data(GSE5859Subset) : data set ‘GSE5859Subset’ not found
Sorry to trouble you, I'm a beginner of bioinformatics and recently I'm reading your book "PH525x series - Biomedical Data Science". I followed the chapter and did the exercises, but I can't find the answer so I came here for help. Could you give me a link to the answer? Thank you.
When founding this error I cannot make progression. I can also not go back to the menu and select another topic or skip the question. Can somebody help me?
Error in TRUE && c(TRUE, FALSE, FALSE) :
'length = 3' in coercion to 'logical(1)'
Hi! It is impossible to download the RData File for the QQ Plot Exercise . Can you share it here please or share the link. I need it to complete the quiz
Thank you for your help !
The TCGA firehose data download on tcga.Rmd line 49 throws an error stating the connection cannot be opened:
> library(ph525x)
> firehose()
> library(RTCGAToolbox)
> readData = getFirehoseData (dataset="READ", runDate="20150402",forceDownload = TRUE,
+ Clinic=TRUE, Mutation=TRUE, Methylation=TRUE, RNASeq2GeneNorm=TRUE)
gdac.broadinstitute.org_READ.Clinical_Pick_Tier1.Level_4.2015040200.0.0.tar.gz
trying URL 'http://gdac.broadinstitute.org/runs/stddata__2015_04_02/data/READ/20150402/gdac.broadinstitute.org_READ.Clinical_Pick_Tier1.Level_4.2015040200.0.0.tar.gz'
Content type 'application/x-gzip' length 4754 bytes
downloaded 4754 bytes
gdac.broadinstitute.org_READ.Clinical_Pick_Tier1.Level_4.2015040200.0.0
gdac.broadinstitute.org_READ.Merge_rnaseqv2__illuminaga_rnaseqv2__unc_edu__Level_3__RSEM_genes_normalized__data.Level_3.2015040200.0.0.tar.gzgdac.broadinstitute.org_READ.Merge_rnaseqv2__illuminahiseq_rnaseqv2__unc_edu__Level_3__RSEM_genes_normalized__data.Level_3.2015040200.0.0.tar.gz
trying URL 'http://gdac.broadinstitute.org/runs/stddata__2015_04_02/data/READ/20150402/gdac.broadinstitute.org_READ.Merge_rnaseqv2__illuminaga_rnaseqv2__unc_edu__Level_3__RSEM_genes_normalized__data.Level_3.2015040200.0.0.tar.gz'
Content type 'application/x-gzip' length 5917492 bytes (5.6 MB)
downloaded 5.6 MB
gdac.broadinstitute.org_READ.Merge_rnaseqv2__illuminaga_rnaseqv2__unc_edu__Level_3__RSEM_genes_normalized__data.Level_3.2015040200.0.0
cannot open file './20150402-READ-RNAseq2GeneNorm.txt': No such file or directoryError in file(file, "rt") : cannot open the connection
Much of the following code in the section and the related course videos depend on the output of this command.
In addition, the following code block on line 53 appears to read a local path on your machine.
Here is the sessionInfo
:
> sessionInfo()
R version 3.6.1 (2019-07-05)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 17763)
Matrix products: default
locale:
[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C LC_TIME=English_United States.1252
attached base packages:
[1] grid tools parallel stats4 stats graphics grDevices utils datasets methods base
other attached packages:
[1] RTCGAToolbox_2.14.0 ph525x_0.0.48 png_0.1-7
[4] yeastCC_1.24.0 harbChIP_1.22.0 Biostrings_2.52.0
[7] XVector_0.24.0 ERBS_1.0 gwascat_2.16.0
[10] Homo.sapiens_1.3.1 TxDb.Hsapiens.UCSC.hg19.knownGene_3.2.2 org.Hs.eg.db_3.8.2
[13] GO.db_3.8.2 OrganismDbi_1.26.0 GenomicFeatures_1.36.4
[16] GenomicRanges_1.36.0 GenomeInfoDb_1.20.0 ggbio_1.32.0
[19] ggplot2_3.2.1 AnnotationDbi_1.46.1 IRanges_2.18.1
[22] S4Vectors_0.22.0 Biobase_2.44.0 BiocGenerics_0.30.0
loaded via a namespace (and not attached):
[1] ProtGenerics_1.16.0 bitops_1.0-6 matrixStats_0.54.0 bit64_0.9-7
[5] RColorBrewer_1.1-2 progress_1.2.2 httr_1.4.1 backports_1.1.4
[9] R6_2.4.0 rpart_4.1-15 Hmisc_4.2-0 DBI_1.0.0
[13] lazyeval_0.2.2 colorspace_1.4-1 nnet_7.3-12 withr_2.1.2
[17] tidyselect_0.2.5 gridExtra_2.3 prettyunits_1.0.2 GGally_1.4.0
[21] bit_1.1-14 curl_4.0 compiler_3.6.1 graph_1.62.0
[25] htmlTable_1.13.1 DelayedArray_0.10.0 rtracklayer_1.44.2 scales_1.0.0
[29] checkmate_1.9.4 RBGL_1.60.0 RCircos_1.2.1 stringr_1.4.0
[33] digest_0.6.20 Rsamtools_2.0.0 foreign_0.8-71 base64enc_0.1-3
[37] dichromat_2.0-0 pkgconfig_2.0.2 htmltools_0.3.6 limma_3.40.6
[41] ensembldb_2.8.0 BSgenome_1.52.0 htmlwidgets_1.3 rlang_0.4.0
[45] rstudioapi_0.10 RSQLite_2.1.2 BiocParallel_1.18.1 acepack_1.4.1
[49] dplyr_0.8.3 VariantAnnotation_1.30.1 RCurl_1.95-4.12 magrittr_1.5
[53] GenomeInfoDbData_1.2.1 Formula_1.2-3 Matrix_1.2-17 Rcpp_1.0.1
[57] munsell_0.5.0 stringi_1.4.3 RaggedExperiment_1.8.0 RJSONIO_1.3-1.2
[61] SummarizedExperiment_1.14.1 zlibbioc_1.30.0 plyr_1.8.4 blob_1.2.0
[65] crayon_1.3.4 lattice_0.20-38 splines_3.6.1 hms_0.5.1
[69] zeallot_0.1.0 knitr_1.24 pillar_1.4.2 reshape2_1.4.3
[73] biomaRt_2.40.4 XML_3.98-1.20 glue_1.3.1 biovizBase_1.32.0
[77] latticeExtra_0.6-28 BiocManager_1.30.4 data.table_1.12.2 vctrs_0.2.0
[81] gtable_0.3.0 purrr_0.3.2 reshape_0.8.8 assertthat_0.2.1
[85] xfun_0.9 AnnotationFilter_1.8.0 survival_2.44-1.1 tibble_2.1.3
[89] GenomicAlignments_1.20.1 memoise_1.1.0 cluster_2.1.0
Thanks!
Hi, I am using R 3.3.3 in window. I was trying to install "genomicsclass/GSE5859Subset" but it always fail with error as:
Two small errors:
Create a Monte Carlo Simulation in which you simulate measurements from 8,793 genes for 24 samples, 12 cases and 12 controls. The for 100 genes create a difference of 1 between cases and
Change "The" to "Then"
n <- 24
m <- 8793
mat <- matrix(rnorm(n*m),m,n)
delta <- 1
positives <- 500 ###SHOULD BE 100
mat[1:positives,1:(n/2)] <- mat[1:positives,1:(n/2)]+delta
positives should be 100, or number of genes above should be 500.
Lines 96 and 97 of the file https://github.com/genomicsclass/labs/blob/master/week5/prediction.Rmd, refer to 2 undefined objects colshat
and bayesrule
(see code fragment below)
points(newx,col=colshat,pch=16,cex=0.35)
contour(tmpx,tmpy,matrix(round(bayesrule),GS,GS),levels=c(1,2),add=TRUE,drawlabels=FALSE)
In your result page for that file (http://genomicsclass.github.io/book/pages/prediction.html), this error is also indicated
Hello,
I am the maintainer of the Julia package DataFramesMeta.jl. It is a data manipulation package for the Julia language and it is very similar to dplyr. I would like permission to port your dplyr tutorial to DataFramesMeta.jl and host it on our documentation website.
We are getting very close to releasing version 1.0 of DataFramesMeta and as a result I'm working on tutorials to help new users get on board.
Because so many of our users will be coming from dplyr
, it makes sense to not try and re-invent the wheel when it comes to tutorials and instead port over existing tutorials. Your dplyr tutorial ranks pretty high on Google search and is a nice introduction.
Can I modify your tutorial to be a tutorial for DataFramesMeta.jl and host it on our website? This pretty much just involves surface-level syntax changes, but most of the text will remain intact.
Thank you!
Greetings,
Is there any problem to install the package for R Ver. 4.3.3?
I have problem to do this!
My R Ver. 4.3.3, and Rstudio Ver. is 1.1.419
Regards,
Building the biocintro_5x / bioc1_summex.Rmd throws an error:
Quitting from lines 109-115 (bioc1_summex.Rmd)
Error in .local(x, ...) :
unused argument (vals = list(tx_chrom = "chr14"))
Calls: ... withCallingHandlers -> withVisible -> eval -> eval -> genes -> genes
The second to last line of methyl/minfi.Rmd generates an error:
> plotSex(sex)
Error in (function (classes, fdef, mtable) :
unable to find an inherited method for function 'colData' for signature '"DataFrame"'
I implemented some small fixes to this document in a PR to fix some deprecated functions which you may wish to apply first. I do not know where this DataFrame
error comes from.
The session info is below. Thanks!
> sessionInfo()
R version 3.6.1 (2019-07-05)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 17763)
Matrix products: default
Random number generation:
RNG: Mersenne-Twister
Normal: Inversion
Sample: Rounding
locale:
[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C LC_TIME=English_United States.1252
attached base packages:
[1] grid stats4 parallel stats graphics grDevices utils datasets methods base
other attached packages:
[1] IlluminaHumanMethylation450kanno.ilmn12.hg19_0.6.0 IlluminaHumanMethylation450kmanifest_0.4.0
[3] minfi_1.30.0 HistData_0.8-4
[5] broom_0.5.2 Lahman_7.0-1
[7] tidytext_0.2.2 gutenbergr_0.1.4
[9] rvest_0.3.4 xml2_1.2.2
[11] bumphunter_1.26.0 locfit_1.5-9.1
[13] iterators_1.0.12 foreach_1.4.7
[15] limma_3.40.6 coloncancermeth_1.0
[17] cummeRbund_2.26.0 Gviz_1.28.1
[19] rtracklayer_1.44.2 fastcluster_1.1.25
[21] reshape2_1.4.3 RSQLite_2.1.2
[23] DEXSeq_1.30.0 RColorBrewer_1.1-2
[25] pasilla_1.12.0 sva_3.32.1
[27] genefilter_1.66.0 mgcv_1.8-28
[29] nlme_3.1-141 org.Hs.eg.db_3.8.2
[31] pheatmap_1.0.12 vsn_3.52.0
[33] DESeq2_1.24.0 rafalib_1.0.0
[35] GenomicAlignments_1.20.1 Rsamtools_2.0.0
[37] Biostrings_2.52.0 XVector_0.24.0
[39] airway_1.4.0 SummarizedExperiment_1.14.1
[41] DelayedArray_0.10.0 BiocParallel_1.18.1
[43] matrixStats_0.54.0 forcats_0.4.0
[45] stringr_1.4.0 dplyr_0.8.3
[47] purrr_0.3.2 readr_1.3.1
[49] tidyr_0.8.3 tibble_2.1.3
[51] tidyverse_1.2.1 dslabs_0.7.1
[53] Cen.ele6_1.0.0 TxDb.Celegans.UCSC.ce6.ensGene_3.2.2
[55] org.Ce.eg.db_3.8.2 GO.db_3.8.2
[57] OrganismDbi_1.26.0 GenomicFeatures_1.36.4
[59] AnnotationDbi_1.46.1 Biobase_2.44.0
[61] GenomicRanges_1.36.0 GenomeInfoDb_1.20.0
[63] IRanges_2.18.1 S4Vectors_0.22.0
[65] ERBS_1.0 erbsViz_0.0.0.9000
[67] juxtaPack_0.0.0.9000 ggbio_1.32.0
[69] ggplot2_3.2.1 BiocGenerics_0.30.0
[71] usethis_1.5.1
loaded via a namespace (and not attached):
[1] rappdirs_0.3.1 SnowballC_0.6.0 GGally_1.4.0 pkgmaker_0.27 acepack_1.4.1
[6] bit64_0.9-7 knitr_1.24 data.table_1.12.2 rpart_4.1-15 hwriter_1.3.2
[11] GEOquery_2.52.0 RCurl_1.95-4.12 AnnotationFilter_1.8.0 generics_0.0.2 snow_0.4-3
[16] preprocessCore_1.46.0 callr_3.3.1 commonmark_1.7 bit_1.1-14 tokenizers_0.2.1
[21] lubridate_1.7.4 assertthat_0.2.1 xfun_0.9 hms_0.5.1 scrime_1.3.5
[26] fansi_0.4.0 progress_1.2.2 readxl_1.3.1 DBI_1.0.0 geneplotter_1.62.0
[31] htmlwidgets_1.3 reshape_0.8.8 selectr_0.4-1 backports_1.1.4 annotate_1.62.0
[36] textdata_0.3.0 biomaRt_2.40.4 vctrs_0.2.0 remotes_2.1.0 ensembldb_2.8.0
[41] withr_2.1.2 triebeard_0.3.0 BSgenome_1.52.0 checkmate_1.9.4 prettyunits_1.0.2
[46] mclust_5.4.5 cluster_2.1.0 lazyeval_0.2.2 crayon_1.3.4 pkgconfig_2.0.2
[51] labeling_0.3 pkgload_1.0.2 ProtGenerics_1.16.0 nnet_7.3-12 devtools_2.1.0
[56] rlang_0.4.0 registry_0.5-1 affyio_1.54.0 modelr_0.1.5 dichromat_2.0-0
[61] cellranger_1.1.0 rprojroot_1.3-2 graph_1.62.0 rngtools_1.4 base64_2.0
[66] Matrix_1.2-17 urltools_1.7.3 Rhdf5lib_1.6.0 base64enc_0.1-3 whisker_0.4
[71] processx_3.4.1 clisymbols_1.2.0 bitops_1.0-6 DelayedMatrixStats_1.6.0 blob_1.2.0
[76] doRNG_1.7.1 nor1mix_1.3-0 scales_1.0.0 memoise_1.1.0 magrittr_1.5
[81] plyr_1.8.4 hexbin_1.27.3 bibtex_0.4.2 zlibbioc_1.30.0 compiler_3.6.1
[86] illuminaio_0.26.0 cli_1.1.0 affy_1.62.0 janeaustenr_0.1.5 ps_1.3.0
[91] htmlTable_1.13.1 Formula_1.2-3 MASS_7.3-51.4 tidyselect_0.2.5 stringi_1.4.3
[96] askpass_1.1 latticeExtra_0.6-28 VariantAnnotation_1.30.1 tools_3.6.1 rstudioapi_0.10
[101] foreign_0.8-71 git2r_0.26.1 gridExtra_2.3 digest_0.6.20 BiocManager_1.30.4
[106] quadprog_1.5-7 Rcpp_1.0.1 siggenes_1.58.0 httr_1.4.1 biovizBase_1.32.0
[111] colorspace_1.4-1 XML_3.98-1.20 fs_1.3.1 splines_3.6.1 RBGL_1.60.0
[116] statmod_1.4.32 multtest_2.40.0 sessioninfo_1.1.1 xtable_1.8-4 jsonlite_1.6
[121] zeallot_0.1.0 testthat_2.2.1 R6_2.4.0 Hmisc_4.2-0 pillar_1.4.2
[126] htmltools_0.3.6 glue_1.3.1 beanplot_1.2 codetools_0.2-16 pkgbuild_1.0.5
[131] utf8_1.1.4 lattice_0.20-38 curl_4.0 openssl_1.4.1 survival_2.44-1.1
[136] roxygen2_6.1.1 desc_1.2.0 munsell_0.5.0 rhdf5_2.28.0 GenomeInfoDbData_1.2.1
[141] HDF5Array_1.12.2 haven_2.1.1 gtable_0.3.0
@vjcitn
Building the biocintro_5x / bioc1_LiftOver.Rmd throws an error:
Quitting from lines 66-70 (bioc1_liftOver.Rmd)
Error in seqlevels<-
(*tmp*
, force = TRUE, value = "chr1") :
unused argument (force = TRUE)
Calls: ... handle -> withCallingHandlers -> withVisible -> eval -> eval
Hi, Prof. Rafa!
I'm using R 3.6.3 version and doing some false negative demonstration based on edX PH525x course.
i'm using exactly same code with the lecture video and the book. Here is it
`
dat <- read.csv("mice_pheno.csv")
controlPopulation <- filter(dat,Sex == "F" & Diet == "chow") %>%
select(Bodyweight) %>% unlist
hfPopulation <- filter(dat,Sex == "F" & Diet == "hf") %>%
select(Bodyweight) %>% unlist
mu_hf <- mean(hfPopulation)
mu_control <- mean(controlPopulation)
mu_hf - mu_control
[1] 2.375517
(mu_hf - mu_control)/mu_control * 100 # percent increase
[1] 9.942157
'
So far the result still the same with the video.
After that:
'
set.seed(1)
N <- 5
hf <- sample(hfPopulation,N)
control <- sample(controlPopulation,N)
t.test(hf,control)$p.value
the result supposed to be
0.1410204, but my result is
0.5806661`. I retried for several times and several generating value method, but the result hasn't changed.
Seeing that this material was last edited 4 years ago, then I think that there is a logarithmic difference in the 'set.seed()' function.
Glad if you help me
At: https://genomicsclass.github.io/book/pages/permutation_tests_exercises.html
The line:
download(url, destfile=filename)
gives an error could not find function "download"; should either be downloader::download(url, destfile=filename)
or download.file
I'm from HarvardX and assigned to test and update these courses for rerelease. I'm having trouble running several of the Rmd files and the associated code in the videos due to getGEO
issues. Downloading files gives HTTP 404 issues:
For example, this code from "biocintro_5x/dataman2017.Rmd" gives such an error:
library(GEOquery)
glioMA <- getGEO("GSE78703")[[1]]`
> Error in open.connection(x, "rb") : HTTP error 404.`
Here's my session info if needed:
sessionInfo()
R version 3.6.1 (2019-07-05)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 17134)
Matrix products: default
locale:
[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C
[5] LC_TIME=English_United States.1252
attached base packages:
[1] stats4 parallel stats graphics grDevices utils datasets methods base
other attached packages:
[1] AnnotationDbi_1.46.0 IRanges_2.18.1 S4Vectors_0.22.0 GEOquery_2.52.0 data.table_1.12.2
[6] Biobase_2.44.0 BiocGenerics_0.30.0
loaded via a namespace (and not attached):
[1] Rcpp_1.0.1 pillar_1.4.2 compiler_3.6.1 BiocManager_1.30.4 bitops_1.0-6
[6] tools_3.6.1 digest_0.6.20 zeallot_0.1.0 bit_1.1-14 memoise_1.1.0
[11] RSQLite_2.1.2 tibble_2.1.3 pkgconfig_2.0.2 rlang_0.4.0 DBI_1.0.0
[16] rstudioapi_0.10 yaml_2.2.0 curl_4.0 xfun_0.8 dplyr_0.8.3
[21] knitr_1.23 xml2_1.2.1 vctrs_0.2.0 hms_0.5.0 bit64_0.9-7
[26] tidyselect_0.2.5 glue_1.3.1 R6_2.4.0 limma_3.40.6 tidyr_0.8.3
[31] readr_1.3.1 purrr_0.3.2 blob_1.2.0 magrittr_1.5 backports_1.1.4
[36] assertthat_0.2.1 RCurl_1.95-4.12 crayon_1.3.4
Two errors are present that break the code in the "Working with TCGA mutation data" section.
When defining the gbm
object in dataman2017.Rmd, there are errors. The gbm
object is still defined, but I am not sure it is successfully updated.
>gbm = updateObject(gbm)
>gbm
A MultiAssayExperiment object of 12 listed
experiments with user-defined names and respective classes.
Containing an Error in vapply(object, FUN = function(obj) { : values must be length 1,
but FUN(X[[3]]) result is length 0
Error during wrapup: cannot get a slot ("slots") from an object of type "NULL"
This may be related to a downstream error in assayNames
. :
> mut = experiments(gbm)[["Mutations"]]
> head(assayNames(mut))
Error in (function (classes, fdef, mtable) :
unable to find an inherited method for function 'assayNames' for signature '"RangedRaggedAssay"'
Later components of the TCGA code rely on mut
and cannot be performed due to current errors.
I'm trying to use the RMD files associated with the Statistics & R course, and I keep getting an error on the first chunk of code:
opts_chunk$set(fig.path=paste0("figure/", sub("(.*).Rmd","\\1",basename(knitr:::knit_concord$get('infile'))), "-"))
Error says:
Error in basename(knitr:::knit_concord$get("infile")) : a character vector argument expected
Any suggestions?
@vjcitn
Building the biocintro_5x / bioc1_iranges.Rmd throws an error:
Error in elementLengths(grl) : could not find function "elementLengths"
Calls: ... handle -> withCallingHandlers -> withVisible -> eval -> eval
Execution halted
elementLengths() has been replaced with elementNROWS() in the latest IRanges package.
On line 36 (https://github.com/genomicsclass/labs/blob/master/packages.R#L36), it should read:
devtools::install_github("stephaniehicks/BackgroundExperimentYeast")
On line 48 (https://github.com/genomicsclass/labs/blob/master/packages.R#L48), it should read:
devtools::install_github("stephaniehicks/mycAffyData")
Update for the new version (3/15/2015):
line 131: It should have len=80 -- fixed
line 295: "should" should be "shade" -- now line 314
line 313: "specif" should be "specific" -- now line 332
line 317: There's something missing. Right now it looks like this:
In the code above you will notice that we created two sets data"" -- now line 336
line 426: \mobx should be \mbox -- now line 445
line 82: Not a typo, but there's a huge output from loading the SpikeIn library. You can suppress it by adding the chunk option message=FALSE. -- fixed
(There are some others, but I have to stop here.)
Hi. I am not sure whether this is a good place to ask, but I am wondering will there be a new run of the PH525 courses next year? I finished 3 courses in the Data Analysis for Genomics Certificate this year and I am interested in taking the other 4 courses in the coming year if possible. Thanks.
(Sorry if this is not the appropriate channel for reporting issues regarding potential typos etc.)
I got confused by the wording "This implies that with a 0.05 p-value cut-off, out of the 100 tests we incorrectly call between 4 and 5 significant on average. " in this line:
labs/advinference/multiple_testing.Rmd
Line 269 in c15a1a7
Stating "the 100 tests" made me initially think that the number 100 was supposed to refer to the number of experiments in the Monte Carlo simulation, which was obviously wrong since there are 10,000 experiments/tests for each replication. Did the authors mean something to this effect(?):
Since there are 9000 tests where the null hypothesis is true and the chosen significance level is 0.05, it follows that we incorrectly call between 400 and 500 tests significant on average (5% of 9000 equals 450) .
I see several issues in this section:
PS: thanks for the great course and the book!
it says here to reduce() the exonsBy() object in order to avoid duplicate counting, but this is not necessary
https://github.com/genomicsclass/labs/blob/master/course4/HPCami.Rmd
In mapping_features.Rmd
idx <- match(rownames(e), res$PROBEID)
The method match will choose the first one when there are more than one gene symbols automatically.
The stack1kg
function does not run successfully:
>library(ldblock)
>sta = stack1kg()
Error in validObject(.Object) :
invalid class “VcfStack” object: all rownames(object) must be in seqlevels(object)
The content in the textbook section "1000 Genomes VCF in the cloud" depends on the sta
object produced by running this function with no arguments.
> sessionInfo()
R version 3.6.1 (2019-07-05)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 17763)
Matrix products: default
locale:
[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C LC_TIME=English_United States.1252
attached base packages:
[1] grid stats4 parallel stats graphics grDevices utils datasets methods base
other attached packages:
[1] ldblock_1.14.0 HDF5Array_1.12.2
[3] rhdf5_2.28.0 ArrayExpress_1.44.0
[5] magrittr_1.5 dplyr_0.8.3
[7] bigrquery_1.2.0 VariantTools_1.26.0
[9] VariantAnnotation_1.30.1 RaggedExperiment_1.8.0
[11] MultiAssayExperiment_1.10.4 GenomicAlignments_1.20.1
[13] BiocStyle_2.12.0 IlluminaHumanMethylation450kanno.ilmn12.hg19_0.6.0
[15] IlluminaHumanMethylation450kmanifest_0.4.0 minfi_1.30.0
[17] bumphunter_1.26.0 locfit_1.5-9.1
[19] iterators_1.0.12 foreach_1.4.7
[21] GSE5859Subset_1.0 airway_1.4.0
[23] ph525x_0.0.48 png_0.1-7
[25] RNAseqData.HNRNPC.bam.chr14_0.22.0 erma_1.0.0
[27] GenomicFiles_1.20.0 rtracklayer_1.44.2
[29] Rsamtools_2.0.0 Biostrings_2.52.0
[31] XVector_0.24.0 SummarizedExperiment_1.14.1
[33] DelayedArray_0.10.0 BiocParallel_1.18.0
[35] matrixStats_0.54.0 Homo.sapiens_1.3.1
[37] TxDb.Hsapiens.UCSC.hg19.knownGene_3.2.2 org.Hs.eg.db_3.8.2
[39] GO.db_3.8.2 OrganismDbi_1.26.0
[41] GenomicFeatures_1.36.4 GenomicRanges_1.36.0
[43] GenomeInfoDb_1.20.0 GEOquery_2.52.0
[45] data.table_1.12.2 knitr_1.24
[47] geneplotter_1.62.0 annotate_1.62.0
[49] XML_3.98-1.20 AnnotationDbi_1.46.0
[51] IRanges_2.18.1 S4Vectors_0.22.0
[53] lattice_0.20-38 Biobase_2.44.0
[55] BiocGenerics_0.30.0
loaded via a namespace (and not attached):
[1] snow_0.4-3 backports_1.1.4 plyr_1.8.4 lazyeval_0.2.2
[5] oligo_1.48.0 splines_3.6.1 ggplot2_3.2.1 digest_0.6.20
[9] htmltools_0.3.6 memoise_1.1.0 BSgenome_1.52.0 limma_3.40.6
[13] readr_1.3.1 askpass_1.1 siggenes_1.58.0 prettyunits_1.0.2
[17] colorspace_1.4-1 blob_1.2.0 xfun_0.8 jsonlite_1.6
[21] crayon_1.3.4 RCurl_1.95-4.12 graph_1.62.0 genefilter_1.66.0
[25] zeallot_0.1.0 survival_2.44-1.1 glue_1.3.1 registry_0.5-1
[29] gtable_0.3.0 zlibbioc_1.30.0 Rhdf5lib_1.6.0 scales_1.0.0
[33] DBI_1.0.0 rngtools_1.4 bibtex_0.4.2 Rcpp_1.0.1
[37] xtable_1.8-4 progress_1.2.2 bit_1.1-14 mclust_5.4.5
[41] preprocessCore_1.46.0 httr_1.4.1 RColorBrewer_1.1-2 ff_2.2-14
[45] pkgconfig_2.0.2 reshape_0.8.8 labeling_0.3 reshape2_1.4.3
[49] tidyselect_0.2.5 rlang_0.4.0 later_0.8.0 munsell_0.5.0
[53] tools_3.6.1 RSQLite_2.1.2 evaluate_0.14 stringr_1.4.0
[57] yaml_2.2.0 bit64_0.9-7 oligoClasses_1.46.0 beanplot_1.2
[61] scrime_1.3.5 purrr_0.3.2 RBGL_1.60.0 nlme_3.1-141
[65] doRNG_1.7.1 mime_0.7 nor1mix_1.3-0 xml2_1.2.2
[69] biomaRt_2.40.3 compiler_3.6.1 rstudioapi_0.10 curl_4.0
[73] affyio_1.54.0 tibble_2.1.3 stringi_1.4.3 Matrix_1.2-17
[77] multtest_2.40.0 vctrs_0.2.0 pillar_1.4.2 BiocManager_1.30.4
[81] snpStats_1.34.0 bitops_1.0-6 httpuv_1.5.1 R6_2.4.0
[85] promises_1.0.1 affxparser_1.56.0 codetools_0.2-16 MASS_7.3-51.4
[89] assertthat_0.2.1 openssl_1.4.1 pkgmaker_0.27 withr_2.1.2
[93] GenomeInfoDbData_1.2.1 hms_0.5.0 quadprog_1.5-7 tidyr_0.8.3
[97] base64_2.0 rmarkdown_1.14 DelayedMatrixStats_1.6.0 illuminaio_0.26.0
[101] shiny_1.3.2
From http://genomicsclass.github.io/book/pages/getting_started_exercises.html - question 1 doesn't make sense:
"Read in the file femaleMiceWeights.csv and report the body weight of the mouse in the exact name of the column containing the weights."
Perhaps it meant:
"Read in the file femaleMiceWeights.csv and report a) the body weights of all the mice, and b) the exact name of the column containing the weights."
Also, is the source for the exercises in this repo? I could find the getting started exercises.
15 == 16, and 17 == 18:
- What are the false negative rates for p.adjust?
- What are the false negative rates for p.adjust?
- What are the false negative rates for qvalues?
- What are the false negative rates for qvalues?
File http://genomicsclass.github.io/book/pages/assoctest.csv does not exist.
(Linked to from http://genomicsclass.github.io/book/pages/association_tests_exercises.html)
The file is found at https://studio.edx.org/c4x/HarvardX/PH525.1x/asset/assoctest.csv
In the first paragraph:
labs/batch/factor_analysis.Rmd
Line 13 in 544a5dc
The incomplete sentence is: "Karl Pearson noted that correlation between different subjects when the correlation was computed across students."
I don't know what this sentence is supposed to say, so I will not attempt to fix it.
Incidentally there is also a typo in the following equation,
labs/batch/factor_analysis.Rmd
Line 16 in 544a5dc
Y_ij
should be Y_{ij}
Hi all,
Running the below code causes this error:
Error i select(., Bodyweight) : unused argument (Bodyweight)
Here is the full code:
library(rafalib)
library(downloader)
library(devtools)
library(dplyr)
install_github("genomicsclass/dagdata")
dir <- system.file(package="dagdata")
filename <- file.path(dir,"extdata/mice_pheno.csv")
dat <- read.csv(filename)
controlPopulation <- filter(dat,Sex == "F" & Diet == "chow") %>% select(Bodyweight) %>% unlist
I am running the following R installation through RStudio:
R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-apple-darwin13.4.0 (64-bit)
See the first paragraph here - after "Download the CSV file from this location:" is blank. URL should be:
https://raw.githubusercontent.com/genomicsclass/dagdata/master/inst/extdata/msleep_ggplot2.csv
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.