hadley / adv-r Goto Github PK

View Code? Open in Web Editor NEW

2.3K 138.0 1.7K 37.78 MB

Advanced R: a book

Home Page: http://adv-r.hadley.nz

License: Other

TeX 93.27% R 2.64% CSS 3.23% JavaScript 0.16% Lua 0.48% HTML 0.23%

book r programming bookdown

adv-r's Introduction

Advanced R

This is code and text behind the Advanced R book. The site is built with bookdown.

Diagrams

Omnigraffle:

Make sure that 100% is "one postscript point": this ensures canvas size matches physical size. Export at 300 dpi scaled to 100%.
Set grid to 1cm with 10 minor units. Ensure there is 2mm padding around all sides of each diagram.
Conventions:
- Text is set in inconsolata 10pt, with text padding set to 3.
- Emoji set in "Apple Color Emoji" 8pt.
- Default scalar size is 6mm x 6mm.
- Symbols have 4pt rounded corners and plum border.
- Arrow heads should be set to 75%.
- Names should be coloured in steel.

Book:

Inconsolata scaled (by fontspec) to match main font is 9.42pt.
Preview at 100% matches physical size of book. Maximum diagram width is 11cm.

RMarkdown

Remove dpi specification from include_graphics(), instead relying on common.R. Chunk should have output.width = NULL.
Beware caching: after changing the size of an image you may need to clear the cache before it is correctly updated.

To zip files to for publisher:

mkdir crc
cp _book/_main.tex crc
cp -r _bookdown_files/*_files crc
cp -r diagrams crc
cp -r screenshots crc
cp -r emoji crc
cp mina.jpg crc
cp krantz.cls crc
cp book.bib crc
rm crc/diagrams/*.graffle

zip -r adv-r-source.zip crc

Code of conduct

Please note that Advanced R is released with a Contributor Code of Conduct. By contributing to this project, you agree to abide by its terms.

adv-r's People

Contributors

Stargazers

Watchers

Forkers

aaronwolen ameliamn darrkj juancentro jimhester cwickham ijlyttle zhanxw zkamvar ramnathv mjsduncan sxfmol dlebauer trestletech stephens999 michaellogothetis wch purcaro andxh absolutelynowarranty mdbrown sitems praveer13 eipi10 mfenner zackham abresler baptiste sardanza seancarmody myschizobuddy reinholdsson gregorypenn crtahlin robinlovelace demel cmuszynski dckc rmody-collective isomorphisms hovr2pi thomasherbig lipengyu antoinevernet aloknayak29 rtgarden jofrhwld ihar wildoane eddelbuettel aezzata lindbrook alstat jules32 dill gmonaie ggarza binarybana ajmann4 kbroman juil liangcj famuvie pelotom fsky lingbing drewhendrickson clemp ajschumacher jokame clayford parkerabercrombie jmarca tonytonov nstjhp penguinpa vzemlys askming shafiahmed shabbychef linzhp sglyon maartenkruijver marekrogala garnetvaz wilkinson bsvingen shannonrush evanz renkun-ken guttrd kevinushey eriqande agrabovsky nabilabd abzhaobo bsspirit swirlstudent juliakloiber superxroot

adv-r's Issues

Vocabulary section: suggestion

In the Vocabulary section, under Working with R/# Help, consider adding a reference to function help.start (starts R online documentation on your default browser).

Add search functionality for the book (using Google Custom Search Engine?)

There have been several times when I was trying to remember something that I learned from the book, and I could only remember a keyword. What I do is I go to the book homepage and open all the chapters in different tabs and then "Find on page" until I find what I was looking for.

It would be nice IMO to maybe add search to the site. Google makes it very easy to add a Google Search to your site.

Euclidean distance in "High performance functions with Rcpp"

In this section, the code to calculate "Euclidean distance between a value and a vector of values" is provided as follows:

pdistR <- function(x, ys) {
  sqrt((x - ys) ^ 2)
}

While the code is not incorrect, it is quite inefficient. The square-root and square operations almost effectively cancel out. The euclidean distance between two values x and y in one-dimensional space is simply abs(x - y) and is more efficiently calculated as such. One should probably optimize expressions algebraically before trying to optimize this code with Rcpp.

Given that Euclidean distance is more often used and discussed in spaces with higher than one dimension, a more useful function would probably take a vector input and a matrix input and return a vector output. Perhaps a different example for a vector-input and vector-output function should be used, such as mean-centering a vector:

centerR <- function(ys) {
  ys - mean(ys)
}

Continuity error: Computing-on-the-language example doesn't work in way described

In Computing-on-the-language.rmd the section of Calling from another function has errors in the R output by giving the impression of using subset() where really an earlier version of subset2() is used.

Specifically in the debugging part:

> debugonce(subset)
> subscramble(mtcars, cyl == 4)
debugging in: subset(x, condition)
debug: {
    condition_call <- substitute(condition)
    r <- eval(condition_call, x)
    x[r, ]
}

this clearly actually uses a version of subset2() introduced at line 195. Trying it with standard subset() gives me

R> debugonce(subset)
R> subscramble(mtcars, cyl == 4)
debugging in: subset(x, condition)
debug: UseMethod("subset")
Browse[2]> n
Error in eval(expr, envir, enclos) : object 'cyl' not found

Of course this can be fixed with simply replacing subset() with subset2() throughout this first part of the section but then the problem then becomes that two further subset2()s were introduced (using parent.frame() and list2env()) between the definition of the example's subset2() and the example itself.

40 bytes?

This is not a serious issue, more of a question.

In http://adv-r.had.co.nz/memory.html you list 40B as the size of various empty things, and give 4 + 2*8 + 8 + 4 + 4 + ?? as the way to arrive at that figure.

In my R 3.1.1 on 32-bit Ubuntu,

> object_size(list())
24 B
> object.size(list())
24 bytes

and in pqR 2.15.0 (same system)

> object.size(list())
32 bytes
> object.size(raw())
32 bytes

So what's up with the differences? Just the pointers and stuff you mentioned already run well over 24B.

Issue with 6.5.2 .Primitive example

Greetings from the LA Advanced R Reading Group!

The example dealing with .Primitive functions, in which the address doesn't change when you modify a value in a vector, doesn't seem to be true. In our session the address changes:

> x <- 1:10
> address(x)
[1] "0x104becdb8" 
> x[2] <- 7L
> address(x)
[1] "0x101f87a20"

With session info:

sessionInfo()
R version 3.1.1 (2014-07-10)
Platform: x86_64-apple-darwin13.1.0 (64-bit)

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats graphics grDevices utils datasets
[6] methods base

other attached packages:
[1] pryr_0.1

loaded via a namespace (and not attached):
[1] codetools_0.2-8 Rcpp_0.11.3 stringr_0.6.2
[4] tools_3.1.1

90% of formatting lost from Testing.rmd

Most of the text on http://adv-r.had.co.nz/Testing.html appears as flat text without line breaks.

Add custom markdown renderer for rstudio

http://www.rstudio.com/ide/docs/authoring/markdown_custom_rendering

OO chapter: Add recommendation to use R6 rather than RC?

As per discussion today at masterR workshop

rolling loop (functionals)

Hi,

I have been working through your very good advanced R book, but think there may be a bug in the rollmean function in the functionals chapter.

I may be wrong, but i'm fairly sure that a centred moving average would be:

rollmean <- function(x, n){
    out <- rep(NA, length(x))
    offset <- trunc(n/2)
    for (i in (offset + 1):(length(x) - n + offset + 1)) {
        out[i] <- mean(x[(i - offset):(i + offset)])
    }
    out
}

The edits are changing upper edge of the range to length(x) - n + offset + 1, and changing the processing range to (i - offset):(i + offset) in the mean step.

Happy to be corrected.

NB: I assign the copyright of this contribution to Hadley Wickham.

Lazy evaluation example no longer works

The code (http://adv-r.had.co.nz/Functions.html lazy evaluation)

> add <- function(x) {
+   function(y) x + y
+ }

is used to illustrate the issues with lazy evaluation. However, running the code, doesn't reproduce the output in the book

R> adders <- lapply(1:10, add)
R> adders[[1]](10)
[1] 11
R> adders[[10]](10)
[1] 20

R> sessionInfo()
R version 3.2.0 (2015-04-16)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 14.04.2 LTS

locale:
 [1] LC_CTYPE=en_GB.UTF-8       LC_NUMERIC=C               LC_TIME=en_GB.UTF-8        LC_COLLATE=en_GB.UTF-8    
 [5] LC_MONETARY=en_GB.UTF-8    LC_MESSAGES=en_GB.UTF-8    LC_PAPER=en_GB.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] colorout_1.1-0

loaded via a namespace (and not attached):
[1] tools_3.2.0    fortunes_1.5-2

Minor confusion with f() in The downsides of non-standard evaluation

At the start of this section in Computing-on-the-language.rmd line 579 we have

x <- 10; y <- 10
f(10); f(x); f(y)

I admit I may be the only person getting confused but I was not sure as to whether to try running this especially with the previous mention a few lines above of pryr, which triggered a memory that that package has a function f(). Thought I should try it but got this

R> f(10)
Error: is.language(body) is not TRUE

and then I was like "well that's not the same result as f(x)!"
Could just be referred to as fun(10) or func(10) etc. although I now understand the point being made.

Indentation and display error in DSL HTML section

When viewing in Chrome, the section between the HTML exercises and LaTeX goal is indented (wrongly), and the first exercise in HTML asks to escape <!--.
When processing the Rmd in Rstudio, the indent is correct, and the exercise asks to escape the correct </.
This arises probably from a conflict between R Markdown and GitHub Flavored Markdown?

In Profiling pause() is referenced before it is defined

In the Optimizing Code chapter pause() is used in the examples of the Measuring Performance opening section but then it is defined in the much later section Parallelise.

Mistake in exercises on functional programming

In exercise:

"Create a function that creates functions that compute the ith central moment of a numeric vector. You can test it by running the following code:"

One of the lines to be used for testing reads:
"stopifnot(all.equal(m1(x), mean(x)))"

It compares the calculated 1st central moment to the mean of the data and "stops" if they are not equal. But they aren't supposed to be equal, since the 1st central moment is zero and not equal to the first ("non-central") moment, which is the mean. (See the interpretations on the referenced Wikipedia page : http://en.wikipedia.org/wiki/Central_moment )

A mistake in the test case or a misunderstanding on my part?

Buggy output in Functional Programming chapter

The code output at http://adv-r.had.co.nz/Functional-programming.html#anonymous-functions doesn't seem quite right.

Suggestion: label rows&columns of the "Plyr package" tables

The tables under the "The plyr package" heading in Functionals.Rmd do not have labels in the sense of whether the row (or columns) represent the input (or output).

The text would be clearer if, for example, the top left cell of the table would contain the text:
"Output

Input"

Or something even clearer, as this could be misinterpreted I guess ...

Better styling

Use col-xs-12 for both sidebar and turn of sidebar hover

Discuss global string pool in memory chapter

e.g. object.size("banana")

UI: The left panel bleeds onto the main content on wide screens

I'm not sure if this is an inherit problem in bootstrap JS or if it's specific to this book-site.

When I render the site on a wide monitor (currently 1920px), the left-hand panel that serves for navigation + announcements is initially ok. But after scrolling down (on any page), that column gets fixed to the top of the browser, and its width changes. It happens on all screen resolutions, but I never noticed it because only on wide screens it would change enough to actually go on top of the content. It's not too terrible, but it does make it a bit less readable so I just wanted to bring it to your attention.

I took a screenshot of how it looks at the top of the page vs scrolling down a bit:

The cause
The navigation column is set to have width 25%. When we're at the top of the page, that means 25% of its parent container. But when scrolling down, the navigation gets detached from its parent with position: fixed, which means that now the 25% is referring to 25% of the viewport, which is much larger.

To confirm that this is indeed what's happening, here's a snippet from the JS console, starting when the page is at the top

where to document packages?

In http://adv-r.had.co.nz/Documenting-functions.html , the author says "As well [[package level documentation|documenting-packages]] resources, every package should also have its own documentation page." in the section "Documenting packages".

But I cannot figure out where to add the package-wide document comments. Can you please figure which file we should add the package-wide comments?

Missing index.R file

The README mentions _plugins/index.R but it's absent from your repo. A linked issue: can find no indication of how toc.rds is created - is this what index.R does and where is this missing file? Not here! https://github.com/hadley/adv-r/tree/master/_plugins

Add instructions for instaling lineprof package to memory.Rmd

memory.Rmd contains a reference to the lineprof package, but no installation instructions (for installing from github).

Instructions would be welcome, for completeness.

mobile version of Exceptions and Functionals pages are rendered too wide

I'm on a Nexus 4 and the Exceptions and the Functionals pages cant be read because they appear too wide and it is impossible to scroll. The rest of the pages I 've checked look fine

Assignment claim incorrect?

I think the following may be incorrect - either that or don't understand it fully (Environments - binding names to values).

name <<- value is equivalent to assign("name", value, inherits = TRUE)

However functions f3() and f4() imply they are not equivalent:

#1 bound to x in global environment
f1 <- function() x <<- 1
f1()
x
#2 bound to x in global environment
f2 <- function() assign("x", 2, inherits = TRUE)
f2()
x
#4 bound to x in global environment
f3 <- function() {x <- 3; x <<- 4}
f3()
x
#5 bound to x in execution environment
# value bound to name of x in global environment remains unchanged
f4 <- function() {x <- 5; assign("x", 6, inherits = TRUE)}
f4()
x

rollaply functions in Functionals.rmd are not comparable

The two versions of rollapply in Functionals.rmd are not comparable. The version with the for loop returns a rolling average with an offset (correctly "missing" the first few values), while the vapply version starts with no "NA" values. Bellow is a variant that should work.

rollapply <- function(x, n, f, ...) {
  offset <- trunc(n / 2)
  locs <- (offset + 1):(length(x) - n + offset - 1)
  tmp1 <- rep(NA, length=offset)
  tmp2 <- vapply(locs, function(i) f(x[(i - offset):(i + offset - 1)], ...),
         numeric(1))
  c(tmp1, tmp2)
}

confusion regarding "Matching and merging by hand "

Chapter "Subsetting"
Subsection "Matching and merging by hand"

I don't understand why

id <- match(grades, info$grade)

is needed as id comes out to be same as grades.

Also based on the example the following code

rownames(info) <- info$grade
info[as.character(grades), ]

can be simplified to just

info[grades, ]

can you give an example where it is clear why you are using the above commands.

Problems with running examples in memory.Rmd

Do not have time to do a proper testing/bug report, but just as a note (you are maybe already aware of this). The memory.Rmd page has some examples which I cannot seem to get working (might just be something on my machine, but should be at least quickly checked before publication):

the command
prof <- lineprof(read_delim("diamonds.csv"))
apparently runs to quickly for any results to get generated
"Error: No parsing data available. Maybe your function was too fast?"
Probably not a bug, but a bit of a nuissance.
the command with "torture=TRUE" works, but then
shine(prof)
does not work, while printing out (among other things)
"could not find function "slickgridOutput""
the pryr::refs() functions seems always to return "2" in my case, with different arguments used, including those that should have returned "1"

Again, it might be that I have obsolete version of packages, but don't have time right now to play around.

Table doesn't appear correctly in HTML version

See for example third "paragraph" of http://adv-r.had.co.nz/Data-structures.html

C API needs to work through some examples

i.e. find some common base functions and explain line-by-line how they work.

Missing example

Missing example on page:
http://adv-r.had.co.nz/Functional-programming.html

After the text:
"We could write a closure to abstract this away:"

Automatically generate TOC

Needs a little jquery to scrape h2 and h3 and create a TOC in the sidebar.

toc = $("#toc");
add_entry = function() {
  $(this).tagName
};
$(".container").find("h2, h3").each(add_entry);

e.g. https://github.com/shamess/jQuery-Table-of-Contents-Plugin/blob/master/jquery.tableofcontents.js, https://github.com/jgallen23/toc/blob/master/lib/toc.js

Web page buglet

From IM:
Just tried to pry a trick or two from your adv-r webpage ... and noticed a BUG in the css / js / twitter bootstrap
On a wide screen (at work) scrolling down makes the 'learn in person' well run over the empty column and in the 'welcome' text
browser is chrome 30.0.1599....
[...]
Resolution is 1920x1200
That said my ultrabook is also 1600x900 and it is also fscked up
the well widens into the empty sep. column

Match code and pre styles

Add pale gray background to texttt

e.g. http://tex.stackexchange.com/questions/42961/adding-a-highlight-to-texttt-blocks

PDF version

Hello, guys!

First of all thank you for this book. As to my question, I often read in transport on Android phone. Is there a pdf version of the book?

Dead link on page http://adv-r.had.co.nz/Exceptions-Debugging.html

On page:
http://adv-r.had.co.nz/Exceptions-Debugging.html

The following link:
http://adv-r.had.co.nz/beyond-exception-handling.html

Leads to a page with error:
404 Not Found

switched lines

# Create a big object
mem_change(x <- 1:1e6)
#> 4 MB
# Also point to 1:1e6 from y
mem_change(y <- x)
#> -4 MB
# Remove x, no memory freed because y is still pointing to it
mem_change(rm(x))
#> 1.42 kB
# Now nothing points to it and the memory can be freed
mem_change(rm(y))
#> -4 MB

I can see in the source sweave for this page that these outputs are just being auto-generated, but 1.42kB should go along with mem_change(y<-x).

Environments section: Layout bug on small screen

See attached screenshot. You cannot fully read the content.
Firefox Android 4.2 (also happens on the desktop firefox)

Latex output needs to use latex figure commands

Links that work for both html and pdf

Use internal link style (#abc) everywhere
Write pandoc writer that produces indexed list of sections and file names (start with pandoc --print-default-data-file sample.lua)
When building html, generate the index and use it to replace internal links with explicit links.

Match Rstudio syntax highlighting

https://github.com/rstudio/rstudio/blob/master/src/cpp/session/resources/r_highlight.html

Cache markdown and html separately

To make it easier to produce a copy of the book.

subsetting and which

When first learning subsetting, a common mistake is to use x[which(y)] instead of x[y]. Here the which() achieves nothing: it switches from logical to integer subsetting, but the result will be exactly the same.

I'm not sure I agree; if length(which(y)) << length(y) there can be obvious performance benefits:

x <- runif(1e8)
x[1] <- NA;

system.time(x[is.na(x)])
system.time(x[which(is.na(x))])

Links Broken in Package Basics

In the "Package Basics" page several links appear between [[ | ]] rather than as links.
E.g. [[documenting packages]] [[namespaces]] [[unit tests|testing]].

Performance chapter: profiling times in the text do not agree with reported times from code

When reading through the performance chapter, many times after running a microbenchmark() there is a piece of text describing the result. The numbers in the text are usually very different from what the code shows, which caused me some confusion initially.

For example, after this piece of code

x <- runif(100)
microbenchmark(
  sqrt(x),
  x ^ 0.5
)
#> Unit: nanoseconds
#>     expr    min     lq  mean median     uq    max neval
#>  sqrt(x)  1,570  1,830  2554  2,050  2,310 31,400   100
#>    x^0.5 15,200 15,500 16769 15,600 16,000 71,200   100

It says that each computation takes about 800 ns, but I could not figure out where you got that number from (I'm assuming in a previous compilation of the book, the median was 800?).

This happens several more times later. I think trying to use inline R while still keeping the numbers readable is overkill, but it might be a good idea to just add a short note somewhere saying that the numbers in the text do not necessarily match the numbers in the output, so that readers will not go crazy trying to figure out where you're getting the numbers from.

Unclear paragraph in Functional Programming

In:
http://adv-r.had.co.nz/Functional-programming.html

There is a paragraph which is IMO unclear. I quote:
"You can call anonymous functions directly, but the code is a little tricky to read because you must use parentheses in two different ways: to call a function, and to make it clear that we want to call the anonymous function function(x) 3, not inside our anonymous function call a function called 3 (which isn't a valid function name!): "

I guess you could skip the part "..., not inside our anonymous function call a function called 3 (which isn't a valid function name!)" and it would better? The example bellow IMO demonstrates the situation well even without additional text.

Clarify exercise

Exercise 1 in chapter "Functional programming", section "Lists of functions" says "Implement a summary function that works like base::summary()". But base::summary() is a generic. Do you mean that it should work like the default method? And what should it do about NAs, for which another column with their count is created if they exist?

Use inconsolata web font

http://www.google.com/fonts#UsePlace:use/Collection:Inconsolata

Wrong/misleading information in Environments (multiple names pointing to the same object)

Under "Environment Basics", it says "multiple names can point to the same object." followed by a diagram that shows two names pointing to the same object.

The diagram shows a and d both pointing to the same object. But this suggests that they are both actually using the same object in memory. This can be disproved with either pryr::address(e$a) == pryr::address(e$d) (returning FALSE) or by doing e$a[1] <- 100; e$d; and seeing that the vector hasn't changed in d.

Maybe I'm just misinterpreting the diagram.