Giter VIP home page Giter VIP logo

book's People

Contributors

al2na avatar borauyar avatar jonathanronen avatar snikumbh avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

book's Issues

expression data section 4.1.1

I cannot find the code used to create 'df' (table for expression data of leukemia patients) in clustering section 4.1.1.

Typos & wording

  • 5.1 Enumeration sometimes has periods at the end, sometimes not:
  1. Define a prediction function or method f(X)
  2. Devise a function (called loss or cost function) to optimize the difference between your predictions and observed values, such as ∑(Y−f(X))2
  3. Apply mathematical optimization methods to find best parameter values for f(X) in relation to the cost/loss function.
  • 5.1.1 "However, while doing so[, the] field of statistics developed..."

  • 5.2 Enumeration sometimes has periods at the end, sometimes not.

  • "algorithm becomes relevant.[ ]“Training” generally"

  • 5.4.2: library(caret) loaded to late/inconsistently

  • 5.4.3: "Removing genes or samples have both downsides." Please ask English native speaker. Recommendation:
    "Both, removing genes and samples have downsides"

  • 5.5.1: "For starters, we will split the 30% of the data as test." Which the?

  • 5.7: "Accuracy is the first metric to look at. This metric is is simply..." Double is.

  • # gte k-NN prediction on the training data itself, with k=5. gte?

  • 5.12 "Another variable we can tune is the minimum node size of terminal nodes in the trees (min.node.size). This controls the depth of the trees grown. Setting this to larger numbers might cost a small loss in accuracy but the algorithm will run faster." Shouldn't it mean smaller numbers?

problems on R 4.1.0 on a DELL laptop

Text was 'cut and pasted' from electronic version of book at https://compgenomr.github.io/book/

A small amount of follow up carried out see if there was a simple explanation or work around but no attempt made to go much beyond what someone fairly new to R use might achieve

p182

fit logistic regression model

method and family defines the type of regression

in this case these arguments mean that we are doing logistic

regression

lrFit = train(subtype ~ PDPN,

  •            data=training, trControl=trainControl("none"),
    
  •            method="glm", family="binomial")
    

Error in eval(predvars, data, env) : object 'PDPN' not found

Strangely while not working with PDPN it worked with other genes e.g CBLN1 and DDX3Y

P209

require(rtracklayer)

session <- browserSession("UCSC",url = 'http://genome-euro.ucsc.edu/cgi-bin/')

genome(session) <- "mm9"

choose CpG island track on chr12

query <- ucscTableQuery(session, track="CpG Islands",table="cpgIslandExt",

  •     range=GRangesForUCSCGenome("mm9", "chr12"))
    

Error in GRangesForGenome(genome, chrom = chrom, ranges = ranges, method = "UCSC", :

Failed to obtain information for genome 'mm9'

get the GRanges object for the track

track(query)

Error in h(simpleError(msg, call)) :

error in evaluating the argument 'object' in selecting a method for function 'track': object 'query' not found

P211

library(genomation)

Warning message:

replacing previous import ‘Biostrings::pattern’ by ‘grid::pattern’ when loading ‘genomation’

filePathPeaks=system.file("extdata",

  •           "wgEncodeHaibTfbsGm12878Sp1Pcr1xPkRep1.broadPeak.gz",
    
  •                   package="compGenomRData")
    

read the peaks from a bed file

pk1.gr=readBroadPeak(filePathPeaks)

Error: No such process

get the peaks that overlap with CpG islands

subsetByOverlaps(pk1.gr,cpgi.gr)

Error in h(simpleError(msg, call)) :

error in evaluating the argument 'x' in selecting a method for function 'subsetByOverlaps': object 'pk1.gr' not found

P217

library(rtracklayer)

File from ENCODE ChIP-seq tracks

bwFile=system.file("extdata","wgEncodeHaibTfbsA549.chr21.bw",package="compGenomRData")

bw.gr=import(bwFile, which=promoter.gr) # get coverage vectors

Error in .local(con, format, text, ...) : UCSC library operation failed

In addition: Warning message:

In .local(con, format, text, ...) : Invalid argument

lseek(3, 844957, invalid 'whence' value (1822621639)) failed

Leading to subsequent errors in rest of section

P225

gene.track <- BiomartGeneRegionTrack(genome = "hg19",

  •                                 chromosome = "chr21",
    
  •                                 start = 27698681, end = 28083310,
    
  •                                 name = "ENSEMBL")
    

Error in gzfile(file, mode) : cannot open the connection

Leading to subsequent errors in rest of section

P239

library(Rqc)

folder = system.file(package="ShortRead", "extdata/E-MTAB-1147")

feeds fastq.qz files in "folder" to quality check function

qcRes=rqc(path = folder, pattern = ".fastq.gz", openBrowser=FALSE)

Error in file(file, ifelse(append, "a", "w")) :

cannot open the connection

In addition: Warning messages:

1: In normalizePath(path.expand(path), winslash, mustWork) :

path[1]="C:\Users\david\AppData\Local\Temp\Rtmpg7tnGG": The system cannot find the file specified

2: In (function (filename = if (onefile) "Rplots.svg" else "Rplot%03d.svg", :

cairo error 'error while writing to output stream'

3: In file(file, ifelse(append, "a", "w")) :

cannot open file 'C:\Users\david\AppData\Local\Temp\Rtmpg7tnGG/rqc_report.md': No such file or directory

rqcCycleQualityBoxPlot(qcRes)

Error in h(simpleError(msg, call)) :

error in evaluating the argument 'x' in selecting a method for function 'perCycleQuality': object 'qcRes' not found

Leading to subsequent errors in rest of section

P243

install.packages("astqcr")

Installing package into 'C:/Users/david/Documents/R/win-library/4.1'

(as 'lib' is unspecified)

Warning: unable to access index for repository https://cran.ma.imperial.ac.uk/src/contrib:

cannot open destfile 'C:\Users\david\AppData\Local\Temp\Rtmpg7tnGG\file1eb47b7e36f1', reason 'No such file or directory'

Warning: unable to access index for repository https://cran.ma.imperial.ac.uk/bin/windows/contrib/4.1:

cannot open destfile 'C:\Users\david\AppData\Local\Temp\Rtmpg7tnGG\file1eb47c802d57', reason 'No such file or directory'

Warning message:

package 'astqcr' is not available for this version of R

p245

write out fastq file with only reads where all

quality scores per base are above 20.

writeFastq(fq[qcount == 0],

  •        paste(fastqFile, "Qfiltered", sep="_"))
    

Error: UserArgumentMismatch

P270

plotPCA(countsNormalized[selectedGenes,],

  •     col = as.numeric(colData$group), adj = 0.5,
    
  •     xlim = c(-0.5, 0.5), ylim = c(-0.5, 0.6))
    

Error in (function (classes, fdef, mtable) :

unable to find an inherited method for function 'plotPCA' for signature '"matrix"'

Figure 11.9 is mis-attributed

FIGURE 11.9: A heatmap of NMF factors shows the separability of tumors into subtype clusters. This plot is more useful than a scatter plot when there are more than two factors.

This figure is misattributed as per the code shown

Screenshot attached
Screenshot 2021-03-02 at 12 14 08

Typo in section 2.6.1.

Hello,

There is a typo in section 2.6.1:

library(data.table)
df.f=d(enhancerFilePath, header = FALSE,data.table=FALSE)

It shoud say:

library(data.table)
df.f=fread(enhancerFilePath, header = FALSE,data.table=FALSE)

Thanks,
Carlos.

Typos on chapter 3

Hi,
I just want to report typos you may have missed:
Chapter 3 > 3.1.2 Describing the spread: measurements of variation: In the probability section :
You have written :

In this case, what we want is the are under the curve shaded in blue. To be able to that we need to integrate the probability density function but we will usually let

And then in the following paragraph :

After calculating the Z-score, we can go look up in a table, that contains the area under the curve for the left and right side of the Z-score, but again we use software for that tables are outdated.

Thank you so so much for such useful content!

Matching corrplot and pheatmap in Section 8.3.6.3

I accidentally ran by your bookdown when I searched for how to display correlation matrix with hierarchical clustering tree. I noticed that your corrplot(correlationMatrix, order = 'hclust', addrect = 2) plot doesn't match with your pheatmap below in terms of variables' order and clustering. It's because in corrplot, the function takes the correlation matrix as a distance matrix and runs hclust directly on it. Meanwhile, pheatmap considers the correlation matrix as a normal data set and re-calculates the distance matrix before feeding it into hclust.

To make the two plots consistent with each other, I suggest changing pheatmap function to add two arguments (clustering_distance_rows and clustering_distance_cols) to it. It basically tells pheatmap to use the current correlation matrix as the distance matrix. The 1 - is to ensure that perfect positive correlation (1) is considered as min distance and perfect negative correlation (-1) is considered as max distance.

pheatmap(correlationMatrix, 
         clustering_distance_rows = as.dist(1 - correlationMatrix), 
         clustering_distance_cols = as.dist(1 - correlationMatrix))

Error in validObject(.Object) : invalid class "ScoreMatrix" object: superclass "mMatrix" not defined in the environment of the object's class

Hello

The code of line 530-558 in the 06-genomicIntervals.Rmd can't run successfully:

# get transcription start sites on chr20
library(genomation)
transcriptFile=system.file("extdata",
                      "refseq.hg19.chr20.bed",
                      package="compGenomRData")
feat=readTranscriptFeatures(transcriptFile,
                            remove.unusual = TRUE,
                            up.flank = 500, down.flank = 500)
prom=feat$promoters # get promoters from the features


# get for H3K4me3 values around TSSes
# we use strand.aware=TRUE so - strands will
# be reversed
H3K4me3File=system.file("extdata",
                      "H1.ESC.H3K4me3.chr20.bw",
                      package="compGenomRData")
sm=ScoreMatrix(H3K4me3File, prom,
               type="bigWig", strand.aware = TRUE)
Error in validObject(.Object) : 
  invalid class "ScoreMatrix" object: superclass "mMatrix" not defined in the environment of the object's class

How should I solve this?
Thanks

Missing license in repository root

While the license is clearly indicated on the landing page of the book it is missing in the repository root. Please consider adding the license.

Download PDF version

I am trying to download the PDF version of the book, but I keep getting a 404 page saying

There isn't a GitHub Pages site here.

edit stats exercise question for clarity

Stats chapter:
How does the estimate from the random samples change if we simulate more data with data=matrix(rnorm(6000,mean=200,sd=70),ncol=6)

should be

How does the estimate from the random samples change if we simulate more data with data=matrix(rnorm(6000,mean=200,sd=70),ncol=6) keeping the number of samples per dataset constant, as n=6.

unsupervised learning chapter
reconstruction question should be:
Our next tasks are to remove eigenvectors and reconstruct the matrix using SVD, then calculate the reconstruction error as the difference between original and reconstructed matrix. Remove a few eigenvectors, reconstruct the matrix and calculate the reconstruction error. Reconstruction error can be euclidean distance between original and reconstructed matrices.

Unable to Open in PDF Format

Hello! I'm unable to open this in a PDF format, and instead have to access the book through the web interface. This is fine, but there is a PDF link/icon at the top right of the text, and it takes you to a 404 error.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.