djnavarro / rbook Goto Github PK

Source files for "Learning Statistics with R"

Home Page: https://learningstatisticswithr.com

PostScript 44.30% TeX 12.80% R 1.34% Shell 0.02% HTML 41.06% CSS 0.30% JavaScript 0.18%

rbook's Introduction

Learning statistics with R

This repository contains all the source materials for Learning Statistics with R. There are two versions of the content, the original version (LSR v0.6) written in LaTeX and the bookdown adaptation (LSR v0.6.1). The two versions are kept in distinct folders to ensure they share no dependencies.

Bookdown

To generate the bookdown version, source the bookdown/serve_book.R script. The generated book appears in the bookdown/_book subdirectory.

Original

To typeset the original LaTeX version, the root file is original/pdf/lsr.tex, and the generated file is the original/pdf/lsr.pdf file it produces.

Docs

GitHub pages deploys the site from the docs directory; to publish an updated version of the bookdown version to https://learningstatisticswithr.com just copy the entire contents of bookdown/_book to docs/book, and push to GitHub.

rbook's People

Contributors

Stargazers

Watchers

Forkers

ariff118 anhnguyendepocen ashwan alffajardo vigneshpraj ekothe crumplab r-forks-to-learn juanpedrovasquez it-gro llorban hags50 snowdj debruine jacodela p1981thompson martinmodrak pomingchen seblammers yukoga libardo1 astaples gjcooper lejarx okanbulut rjack steelwagstaff anouel 662d7 lidkalee nidhinad camkay lusiki bassk rickchen0910 allensmile minajs9918 yanamal erin311 ananthanambikairajah tomfaulkenberry amrofi kuroshiwo ktavabi rsohlot jumcar surekhapadmam nguyenthaomy dataeducation cadadr frankfan007 jhon-tiahuallpa uberstig cnyuanh tahnok ravindra-raut xiazc zeigna gdsttian amthapar wal eugenelau18 nithinmkp wmmurrah ltrangng hdocmsu ritra han-tun morpheushito yadevi duybluemind1988 lhbikos noutland nf15 ayalhassan ethanweed alexandrafv van-den hamrita helixcn gerardoirigoyen coldarmill kelseymoty merveet-learningr robintux afreenkhalidm felipevillota nicholustintzaw bnaveeng9 babasaraki owain-s brlemeri zhaoshuyuan00 nicholaskarlson michellewells4591 liyunfei5126 galexandros soupeeli lgreski inefable12

rbook's Issues

About typos, small grammar fixes, etc.

Hello,

This is a great book and I'm loving it as a more pragmatic follow up to the rather theoretical but still nice Statistics in Plain English by Urdan (which was the book we used in our research methods course---FWIW, I'm doing an MA in linguistics), many thanks for authoring it and publishing it as an open-source resource.

Being the pedant that I am, and having access to sources, I'm fixing typos and some wording as I go along the book. I wanted to ask, if you'd want the fixes submitted as a single PR, or on a case by case basis. I've 30+ edits by now and most are just typos so wouldn't really warrant a review, but a couple are wording so you might want to reject some changes. I think a single PR would be better because so many are minor edits so maybe just picking hunks you like from a big commit is more efficient, but I can send them one by one in distincts PRs if you prefer it that way.

Sum contrasts

Dear Danielle,

Thanks for writing the great book „Learning statistics with R“!
I use it for my 1st-year stats-teaching, and it is of great benefit!!

I noticed one error, though, that I wanted to draw your attention to:
In section "16.7.3 Sum to zero contrasts“ (page 534), you state that sum to zero contrasts compare each group with a baseline category. That’s actually not correct (it’s what treatment contrasts do, but not sum contrasts). What the sum to zero contrast does is compare each group with the grand mean!
I think this mis-conception is rather wide-spread in the community; I have seen it made in other places.

I have worked on contrasts myself and written a tutorial on it, as well as built an R package called „hypr“ to work more easily with contrasts in R (citations see below).

In case you have any questions on this, let me know.
And again, thanks a lot for writing this great book! :-)

Best,
Daniel

Schad, D.J., Hohenstein, S., Vasishth, S., & Kliegl, R. (2020). How to capitalize on a priori contrasts in linear (mixed) models: A tutorial. Journal of Memory and Language, 110, 104038, doi: 10.1016/j.jml.2019.104038, arXiv preprint arXiv:1807.10451

Rabe, M.M., Vasishth, S., Hohenstein, S., Kliegl, R., & Schad, D.J. (2020). hypr: An R package for hypothesis-driven contrast coding. The Journal of Open Source Software, 2134. doi: 10.21105/joss.02134

To demonstrate the hypotheses tested by the treatment and the sum contrasts, one can use the hyper-package as follows:

library(hypr)
hTreat <- hypr()
cmat(hTreat) <- cbind(1,contr.treatment(4,base=4))
hTreat
hSum <- hypr()
cmat(hSum) <- cbind(1,contr.sum(4))
hSum

The hypothesis matrix shows which comparisons between means are tested by a given contrast.
For the first sum contrast: 3/4 x mu1 - 1/4 x mu2 - 1/4 mu3 -1/4 mu4 = 0 —> mu1 = (mu1 + mu2 + mu3 + mu4)/4

——
Prof. Dr. Daniel J. Schad
Department of Cognitive Science and Artificial Intelligence
Tilburg University, Netherlands
Web: danielschad.github.io
Email: [email protected]
Phone: +49-179-9676111

The example in this section is not correct.

See:

rbook/bookdown/02.04-mechanics.Rmd

Line 701 in 8aeeb95

group == gender

The text from the PDF version is as follows

And besides, the problem that this causes is much more serious than a single sad nerd… because R has no way of knowing that the 1s in the group variable are a very different kind of thing to the 1s in the gender variable. So if I try to ask which elements of the group variable are equal to the corresponding elements in gender, R thinks this is totally kosher, and gives me this:
group == gender
[1] TRUE TRUE TRUE FALSE FALSE TRUE FALSE FALSE FALSE
Well, that’s … especially stupid. The problem here is that R is very literal minded. Even though you’ve declared both group and gender to be factors, it still assumes that a 1 is a 1 no matter which variable it appears in.

But the example actually throws the following error

group == gender
## Error in Ops.factor(group, gender): level sets of factors are different

Any clue how to source it with 12pt font size?

My sight is not that great so I would appreciate having a larger font.

I've been trying to $ pdflatex lsr.tex with no success. Maybe I am missing something obvious but I just can't get to create the pdf out of it.

I know I have to edit the parameter in the header.tex file but I just can't create the pdf in the first place.

Any help please? Maybe I have to install a font or some library?

Here is the pastebin link of the lsr.log(expires Dec 2023):

https://pastebin.com/rWh9aWi3

Tried on MacOS 12.6 Darwin 21.6.0 x86_64

Typo- First Paragraph in Chapter 9

Great voice, love your sense of humor. Thank you!

It looks like there is a typo at the end of the first paragraph in Chapter 9, on page 275. It currently reads,

The bigger and more useful part of statistics is that it provides that let you make inferences about data.

I think the first "that" should perhaps be "what".

Current known broken links

Figures where the image doesn't appear in learningstatisticswithr.com/book:
3.2 3.3 3.4 3.5 3.6
4.1 4.2 4.3 4.4 4.5 4.6 4.7 4.8 4.9 4.10
6.5 6.6 6.13 6.18

in chapter 6 broken links: we discussed in Section ?? is shown in Figure ??. The R command that I used to draw it is this:

I’ll use the tabulate() function to do this (which will be discussed properly in Section ??, since it creates a simple numeric vector:

To do this, we need to alter set the las parameter, which I discussed briefly in Section ??. What I’ll do is tell R

I have checked up to chapter 7 and will add others as I see them.

Example is deprecated

rbook/bookdown/02.04-mechanics.Rmd

Line 859 in 8aeeb95

print.formula( my.formula ) # print it out using the print.formula() method

According to the PDF version this should result in

> print.formula( my.formula ) # print it out using the print.formula() method 
blah ~ blah.blah

However the when evaluated this actually returns

print.formula( my.formula )       # print it out using the print.formula() method

## Appears to be deprecated

An example using a non-deprecated generic should be used to replace this

img/ttest2/wilcox1.png missing

The image image/ttest2/wilcox1.png is missing.
It is used by the file bookdown/05.13-ttest.Rmd in line 1357
knitr::include_graphics(file.path(projecthome, "img/ttest2/wilcox2.png"))

markdown mixed in with latex

The latex files (from Chapter 5 on) are full of markdown (bookdown?) markups., such as *word* for \emph{word}.
(I wanted to re-latex it with 12 point letters, because the present pdf has too long lines for me. But it's no big deal.)

Thanks a lot for writing and sharing your book!

Typo in intro section

section 3.2.4 R can sometimes tell that you’re not finished yet (but not often)

It says

For example, if I then go on to type 3 and hit enter, what I get is this:

> 10+

+ 20

[1] 30

And as far as R is concerned, this is exactly the same as if you had typed 10 + 20.

Should say

For example, if I then go on to type 20 and hit enter, what I get is this:

i.e., replace “3” with “20”.

(spotted by Kimberly Barchard)

Link on Cover-Page is depricated

In version 0.6, as published on https://learningstatisticswithr.com/ .
The link is forwarding to some weird website. Seems to be a problem with http://compcogscisydney.org/ . Maybe it got hacked, or the domain changed ownership.

Typos

Thanks for sharing your book. It is a very nice read. I've collected some typos that caught my eyes, perhaps you have not noticed them yet.

page 6, paragraph 3: ...and the -> and they
p 12, footnote: in ant case -> in any case
p 19, last item: similar answers -> similar answers?
p 20, par 2: we doing -> we are doing
p 22, last par: able draw -> able to draw
p 25, par 2: An simple -> A simpe
p 50, par 3: I thinks -> I think (2x, not completely sure if this is typo or a joke I don't get)
p 53, par 2: default values -> quote default values
p 78, par 2: command -> command.
p 81, last par: much easier more useful -> much more (?) much easier, more useful
p 82, par 4: don't access to -> don't have access to
p 83, par 3: location , -> location,
Sec 4.8.2: The first phrase states that there is only one
variable called expt, but age, gender group and score still exist
if a reader came from 4.8.1 right away. Beginners on R may think
that the were removed by data.frame().
p 114, par 2: The two most -> The three most
p 122, par 4: margin -> margin.
p 128, par 3: the we saw -> the way we saw
p 128, par 3: Chapter 10 I'll -> Chapter 10, I'll
p 140, par 1: old man -> old woman (?, after page 12.)
p 141, footnote: a single quote is in excess just before ``baby
p 145, par 4: The, -> The
p 145, par 4: the in file -> the data in a file
p 145, par 5: setswould -> sets would
p 150, par 4: is create -> I create
p 152, par 4: It seems that this whole paragraph is repeated, perhaps forgotten in a past edition.
p 155, last par: event -> even
p 157, last par: the these -> the
p 159, par 3: so may of the more -> so many of the more
p 170, par 4: covered -> covered the
p 180, last par: look they way -> look the way
p 195, par 1: sometimes might -> sometimes there might
p 209, par 3: one one -> one
p 215, par 1: has are -> has
p 224, par 3: and and -> and
p 225, par 2: correspond -> corresponding (?)
p 232, par 5: does does -> does
p 253, last par: instead you write -> you write
p 255, par 3: work other people -> work with other people
p 256: it seems that like the font size got smaller.
p 257, par 1: new script -> a new script
p 257, par 1: "File " -> "File"
p 257, par 1: to be helpful. -> to be helpful.)
p 257, par 2: Rstudio provide -> Rstudio provides
p 258, par 3: it it -> it
p 273, par 3: relied on in -> relied on
p 273, par 3: dialog relied an -> dialog relied on an
p 273, par 3: Wellesely-Croker -> Wellesley-Croker
p 278, par 4: only 2 November -> only a single 2 November
p 280, par 1: frequentists do this -> frequentists doing this
p 288, par 3: the the -> the
p 298, 3rd code portion: a space is missing before ) in rchisq
p 302, par 2: most situations the situation -> most situations the scenario (?)
p 302, par 2: typical a psychological -> typical psychological
caption fig 10.8: 2,4 -> 2, 4
p315, last par: statisticians often ->statisticians often use
p 326: a space is missing between ci.fun and =, 2x.
p 327, par 2: testing really a -> testing is really a
p 333, par 2: lead us to reject null -> lead us to reject the null
s. 11.4.2: footnote has slipped one page away.
p 335, par 5: only covers the -> only cover the
p 337, par 1: but my -> but by
Table 11.1: Rows 2 and 3 have a period but others don't, and they are after a number; a little odd.

Typo

Two spelling mistakes in section 5.7.7 (pg 163).

"I could correlate hours with pay quite using cor()..."

"It order to get the correlations that I want using the cor() function, is create a new data frame that doesn't contain the factor variables, and then feed that new data frame into the cor() function."

Love the book!!

Help wanted?

Hi,

I've been lurking for a bit and have enjoyed watching you get organized and start tackling this. I'm a long time user of the book and the lsr package in my teaching and actually thanked Danielle on a few occasions.

I'm not sure exactly how often or how much I can be of assistance but if this is indeed one of those cases where many hands make light work versus just add more complexity I'd love to try.

I'm much more a practitioner than a theorist but I do believe I can help even in small ways if you're resolved to move towards the tidyverse. Simple example is the etaSquared issue which was written when there were a lot fewer options out there. There's a nice clean solution in the sjstats package that is tidy and is easily dropped into the book to reinforce the teaching points around types of sums of squares and effect sizes...

Don't want to butt in since you all seem well-organized and moving forward but thought I would ask.

Chuck

xmtcars <- mtcars
xmtcars$cyl <- as.factor(xmtcars$cyl)
xmtcars$am <- as.factor(xmtcars$am)
sjstats::anova_stats(aov(mpg~am*cyl, xmtcars)) #type 1 order matters
#>        term df   sumsq  meansq statistic p.value etasq partial.etasq
#> 1        am  1 405.151 405.151    44.064   0.000 0.360         0.629
#> 2       cyl  2 456.401 228.200    24.819   0.000 0.405         0.656
#> 3    am:cyl  2  25.437  12.718     1.383   0.269 0.023         0.096
#> 4 Residuals 26 239.059   9.195        NA      NA    NA            NA
#>   omegasq partial.omegasq cohens.f power
#> 1   0.349           0.574    1.302 1.000
#> 2   0.386           0.598    1.382 1.000
#> 3   0.006           0.023    0.326 0.298
#> 4      NA              NA       NA    NA
sjstats::anova_stats(aov(mpg~cyl*am, xmtcars)) #type 1 order matters
#>        term df   sumsq  meansq statistic p.value etasq partial.etasq
#> 1       cyl  2 824.785 412.392    44.852   0.000 0.732         0.775
#> 2        am  1  36.767  36.767     3.999   0.056 0.033         0.133
#> 3    cyl:am  2  25.437  12.718     1.383   0.269 0.023         0.096
#> 4 Residuals 26 239.059   9.195        NA      NA    NA            NA
#>   omegasq partial.omegasq cohens.f power
#> 1   0.710           0.733    1.857 1.000
#> 2   0.024           0.086    0.392 0.515
#> 3   0.006           0.023    0.326 0.298
#> 4      NA              NA       NA    NA
sjstats::anova_stats(car::Anova(aov(mpg~am*cyl, xmtcars), type =2))
#>        term   sumsq  meansq df statistic p.value etasq partial.etasq
#> 1        am  36.767  36.767  1     3.999   0.056 0.049         0.133
#> 2       cyl 456.401 228.200  2    24.819   0.000 0.602         0.656
#> 3    am:cyl  25.437  12.718  2     1.383   0.269 0.034         0.096
#> 4 Residuals 239.059   9.195 26        NA      NA    NA            NA
#>   omegasq partial.omegasq cohens.f power
#> 1   0.036           0.086    0.392 0.515
#> 2   0.571           0.598    1.382 1.000
#> 3   0.009           0.023    0.326 0.298
#> 4      NA              NA       NA    NA
sjstats::anova_stats(car::Anova(aov(mpg~am*cyl, xmtcars), type =3))
#>          term    sumsq   meansq df statistic p.value etasq partial.etasq
#> 1 (Intercept) 1573.230 1573.230  1   171.104   0.000 0.762         0.868
#> 2          am   58.430   58.430  1     6.355   0.018 0.028         0.196
#> 3         cyl  167.710   83.855  2     9.120   0.001 0.081         0.412
#> 4      am:cyl   25.437   12.718  2     1.383   0.269 0.012         0.096
#> 5   Residuals  239.059    9.195 26        NA      NA    NA            NA
#>   omegasq partial.omegasq cohens.f power
#> 1   0.754           0.838    2.565 1.000
#> 2   0.024           0.140    0.494 0.712
#> 3   0.072           0.330    0.838 0.975
#> 4   0.003           0.023    0.326 0.298
#> 5      NA              NA       NA    NA

^{Created on 2019-01-11 by the reprex package (v0.2.1)}

Formatting of 3.2.1

I'm not sure the code after "So for example if you copied the two lines of code from the book you’d get this" makes sense with the bookdown formatting ("## [1] 30" is repeated twice, once from the comment and once as output from the r chunk).

typo

from the email

I believe there is a typo on 11.2. On the table H0 true or false, there are two “retains H0”

Minor issue with Figure 13.9 (section 13.4, p. 399 of version 0.6)

Figure 13.9 suggests that under the null, both samples have the same mean and variance, but under the alternative, they have different means and variances. I thought that the null hypothesis also allows for the two samples to have different variances. It is very minor, and I'm not even sure if I'm right, but I thought I'd mention it.
Thank you for the great book!

Kurtosis interpretation is dubious

Suggested I look at “Kurtosis as Peakedness, 1905–2014. R.I.P.” https://www.tandfonline.com/doi/abs/10.1080/00031305.2014.917055, referenced from Wikipedia: https://en.wikipedia.org/wiki/Kurtosis. Framing around “pointiness” or “peakedness” is the wrong way to describe it.

(courtesy: Błażej Kochański)

In text aspirations: "Future versions of this book will..."

Section 4: "Finally, there are a number of packages that provide more advanced tools that I hope to talk about in future versions of the book, such as sem, ez, nlme and lme4. In any case, whenever I’m using a function that isn’t in the core packages, I’ll make sure to note this in the text."

Section 6: "In a future version of this book, I intend to finish this chapter off by talking about what makes a good or a bad graph, but I haven’t yet had the time to write that section."

Section 7.7: " In future versions of the book I plan to expand this discussion to include some of the more powerful tools that are available in R, but I haven’t had the time to do so yet." (on reshaping)
"In a future version of this book I intend to discuss melt() and cast() in a fair amount of detail."

Footnote of Section 10: "I am planning to add a bit more functionality to ciMean()"

Section 11.8: "...probably a future version of this book would include a more detailed discussion of power analysis, but for now this is about as much as I’m comfortable saying about the topic."

Section 15.9.4: "Variable transformation is another topic that deserves a fairly detailed treatment, but (again) due to deadline constraints, it will have to wait until a future version of this book."

section on quantiles needs some nuance

(from my correspondence with Sabine Schulte im Walde)

There are multiple ways of defining sample quantiles, and SAS, SPSS and R all have different defaults. The quantile() function has a type argument that lets you choose among 9 different definitions. Perhaps unfortunately, the afl.margins data is one for which the 0.25 quantile gives different answers in all three platforms:

> quantile(afl.margins, type = 1)
  0%  25%  50%  75% 100%
   0   12   30   50  116

> quantile(afl.margins, type = 2) # SAS default
   0%   25%   50%   75%  100%
  0.0  12.5  30.5  51.0 116.0

> quantile(afl.margins, type = 3)
  0%  25%  50%  75% 100%
   0   12   30   50  116

> quantile(afl.margins, type = 4)
  0%  25%  50%  75% 100%
   0   12   30   50  116

> quantile(afl.margins, type = 5)
   0%   25%   50%   75%  100%
  0.0  12.5  30.5  51.0 116.0

> quantile(afl.margins, type = 6) # SPSS default
    0%    25%    50%    75%   100%
  0.00  12.25  30.50  51.50 116.00

> quantile(afl.margins, type = 7) # R default
    0%    25%    50%    75%   100%
  0.00  12.75  30.50  50.50 116.00

> quantile(afl.margins, type = 8) 
       0%       25%       50%       75%      100%
  0.00000  12.41667  30.50000  51.16667 116.00000

> quantile(afl.margins, type = 9) 
      0%      25%      50%      75%     100%
  0.0000  12.4375  30.5000  51.1250 116.0000

relevant paper:

Hyndman, R. J. and Fan, Y. (1996) Sample quantiles in statistical packages, American Statistician 50, 361–365. doi: 10.2307/2684934.

Bibtex citation

Your book has been (and still is) very useful to me in my studies and consequently, I want to cite it in my thesis and seminar papers, for which I use LaTeX and Bitex. Author, year and title fields are no problem.
However, I am unsure what type this work is (online? misc? book?), in what field to put the version, and what the publisher is? Would this be lulu.com? Or just you as the author?

Epub format?

Thanks for writing and sharing this book, @djnavarro

Would it be possible to generate a copy in epub format for reading on an e-book? AFAIK, bookdown allows it (https://bookdown.org/yihui/bookdown/e-books.html), but I have not managed to generate a file (never worked with it, though). Also, converting PDF to epub using calibre does lose some formatting and makes it extremely difficult to read and pandoc does not convert from PDF.