avehtari / ros-examples Goto Github PK

View Code? Open in Web Editor NEW

314.0 314.0 250.0 218.14 MB

Regression and other stories R examples

Home Page: https://avehtari.github.io/ROS-Examples/

R 0.41% PostScript 0.04% Stan 0.01% HTML 99.53% CSS 0.01% JavaScript 0.03%

ros-examples's People

Contributors

Stargazers

Watchers

Forkers

jk4088 umblayla garzab15 nap2155 duolajiang zngtian dkillian waternk mustafaascha seanreed1111 3c0n0mist qingtong08 pepsalehi petershan1119 eipi10 seekshreyas yjia757-2 srravula1 hennamah cameronraysmith gakkilovemath lnsongxf snowdj jestyr27 cbr92 jonsedar maj-biostat wibeasley pplatzman gustavonovoa queerasfoucault yfan1994x karthy257 birusod ar742 haroon123 smhbamakan fanwangm fintrek anhnguyendepocen boukos jbdatascience l-d-s canyon289 ssp3nc3r erikerhardt cmdlinetips hnryjmes pabloinsente klinares louismehr lucianopv royemanuel qsliu svats2k navaneeth20 humepac cwimpy tandimanv osmundsenm ciyer liyi3344520 amir-karami yongduek jellishero zc9714 jack557557 sherry-xie-0 maggieer01 noahlove roesta07 audreylemeur jonkman1 ivanhalen86 snag18 merlinheidemanns ghinck nickicam maskegger elliotttrio mohdazmisuliman edhearn84 gdemin jessivic ntduc11 sr1614 luciapaci asrosenberg mcewenkhundi robinfondberg mariusderenthal dryzliang ellissimani victor123586 huandeng1990 pwkraft pernillebrams mg78 gacildaanne heesoo-jang

ros-examples's Issues

[Optional Suggestion] Show df head for NES Predictions

It would be nice for readers if in the code linpred, epred and postpred showed their first 5 rows and perhaps str(df) as well for shape.

That way its very obvious what different is in each one of these methods at a glance and the reader doesn't need to run the code locally. The explanation on page 239 makes some sense as to what is happening, but the docs are a bit hard to decipher.

https://avehtari.github.io/ROS-Examples/NES/nes_logistic.html

https://www.rdocumentation.org/packages/rstanarm/versions/2.19.3/topics/posterior_predict.stanreg
https://www.rdocumentation.org/packages/rstantools/versions/2.1.1/topics/posterior_epred

PS
I will label issues that are "personal preference" as optional suggestion. Feel free to close without resolution if you disagree I wont be offended

Typo on p371

The first sentence on this page should read "The new estimated treatment effect is 1.7 with a standard error of 0.7.... in [Figure 19.4b]" (instead of Figure 19.4a).

Potential issue with "fishing" formula on p61

Hi, thanks very much for the book! I enjoyed reading through it. :)

This is the first of a number of typos/issues I spotted while going through the book. I've put one typo/issue in each issue for easier reference.

p61 near the bottom of the page: It says computing T(y;\phi(y) for j = 1, \dots, J, but j doesn't appear in the formula.

Also, I found the notation a little confusing. It appears that T represents the test statistic while \phi represents the choices in the analysis. So, when choosing the best result given the data, shouldn't it be T^{best} instead of \phi^{best}, since we are performing J tests (presumably on different statistics)?

Figure 10.9 caption misleading (minor)

Minor nitpick.

The caption for Figure 10.9 says "from 1976 through 2000". Looking at the R Code, and the figures themselves, the range is 1972 through 2000.

Thought provoking book, thank you.

uncertaintt mispelled in nes_logistic

https://github.com/avehtari/ROS-Examples/blob/master/NES/nes_logistic.R#L53

Equation 21.7

Page 435 equation 21.7 it might be beta3 for predictor math pre-test

Minor typos in P231 and P236

P231 at the bottom, the first bullet point, there are two "is".

P236 in the paragraph above section "Comparing the coefficient estimates when adding a predictor", the last sentence has two "that".

Btw, I also did not understand the statement of this sentence about "the drawback that probabilities close to zero can't improve much in absolute value even if they can improve a lot relatively". Could you please explain a bit more in detail? Does this refer to the fact that in logit function (e.g Fig13.1a) near-zero probability area is "flat" so that the absolute probability value is just varying in a very small range?

Thanks,
Zhengchen

two minor typos

-p.39, in the second sentence of the first full paragraph, "Figure 3.4 displays data on log metabolic rate vs. body mass indicating..." should perhaps be "log metabolic rate vs. log body mass" since both variables are on the log scale.
-p.54, in "Comparisons, visual and numerical" subsection, "Figure 4.2" in the first sentence should be "Figure 4.3."

Poststratification using a weighted average, Chapter 17, page 314

Following Equation (17.1) the text says (in part)

"... which was too low because the sample did not include enough Republicans."

But as I read it, the sample had just the right fraction of Republicans, namely 33%. Instead it had too many Democrats and not enough Independents.

Missing xbox survey data

On page 6 there is a margin note for the "Estimating public opinion from an opt-in internet survey" for example data "Xbox survey" that is not in the repo.

Typo on p66: First bullet point

The sentence should read "... (cycle days 26-28) to avoid potential [confounders/confounding] due to premenstrual or menstrual symptoms". Unless they made this typo in the original text...

Should n be n_0 in simplest.r?

I believe this line should be round(sd(y_0)/sqrt(n_0), 2) but am not sure

https://github.com/avehtari/ROS-Examples/blob/master/Simplest/simplest.R#L79

Broken link!

Hello!

I think that there is a broken link here!

https://avehtari.github.io/ROS-Examples/PoissonExample/PoissonExample.html

Typo on p447

At the end of the first para under "Relation to formal models of causal inference", instead of "populations homogeneous in w_i", we should have "populations homogeneous in x_i".

Missing "healthdata.txt" file

On page 26 (Section 2.3) of the printed book, an example begins with

health <- read.table("healthdata.txt", header=TRUE)

The file healthdata.txt is not present in the repo, although more complex code for reproducing the graph in Figure 2.4 is present in HealthExpenditure/healthexpenditure.R.

Cannot find some code in Chapter 13

There is some code block that is in Chapter 13 that I cannot seem to find in the associated code files. Any help would be great, hopefully I am not just blind

This the notebooks I'm looking in

Typo on page 143 'Predictive simulation for a nonlinear function of new data'

On page 143 3rd line from bottom '...Figure 10.7 give a predictive mean of 260.1...'. I think this should be 262.1. Thanks for the great book. Eamonn.

Typo in p-value calculation in R on p. 57

I'm new to R, so there may be a misunderstanding on my part, but I believe there is a typo in the p-value calculation on page 57:

2*(1 - pt(abs(theta_hat)/se_theta, n_c+n_t, 2))

should be

2*(1 - pt(abs(theta_hat)/se_theta, n_c+n_t-2))

By the way, the book has been excellent so far!

Figures are not labeled with (a), (b) and so on throughout the book plus a question

Hi there,

I was on page 208. In the middle of the page. One paragraph says "Figure12.11a shows that with the default prior on regression coefficients and sigma, the implied prior distribution of R^2 is strongly favouring larger values and thus is favouring overfitted models". However, the prior distribution looks like an exp distribution, why do you think it is strongly favouring larger values? Is this inference coming from the posterior? it makes sense to state this looking at the posterior not prior. Besides, I barely see the differences between the three prior distributions visually. The very right one should be a horseshow prior but sill looks like an exp. I might be wrong since I did not understand very well this section 12.7.

In general, there are no labels for (a), (b), (c) and so on in the figures of the book, it is not at all a problem because I could understand well with the nice captions, but not sure if this is good for formal textbook.

Thanks,
Zhengchen

Typo on p413

In the last para on the page (or 2nd para under Section 20.9), the sentence should read "Another concern [is] that the treatment or exposure in the data can differ from potential policies, interventions, or exposures for new people or cases."

Publishing date?

Not sure where to ask but would like to know when you expect this book to be available?

Thanks

Ch. 17, p. 316, poststratification data mismatch

On p. 316, the data given in the code for poststrat_data_2 does not match the data shown in the table that it is meant to correspond to (Figure 17.2).

More specifically, the values for N in poststrat_data_2 are 0.16, 0.17, 0.19, 0.17, 0.16, 0.15, but in the table in Figure 17.2 the values are 0.16, 0.17, 0.22, 0.14, 0.14, 0.17.

Typo on P43: standard deviation subscripts change between 1st and 2nd paragraphs

In the print version of the book, section "Mean and standard deviation of the sum of correlated variables" (p43) displays standard deviations as \sigma_{a} and \sigma_{b} in paragraph 1. In paragraph 2, when discussing linear combinations, this changes to \sigma_{u} and \sigma_{v}. Mean subscripts remain unchanged throughout.

Typo on page 174

In the second paragraph of the subsection titled “Summarizing prediction error using the log score and deviance” on page 174 of the print edition, the normal log score is stated as “−½ log σ − […]” while it should be “−log σ − […]”.

Figure 7.2(b) y-intercept

The fit linear model has an intercept of 46.3 but the graph shows the line intersecting the vertical axis at around 44, clearly below 45. either the line is not from this model or the axis is incorrectly labeled.

Exercise reference p. 42

At the bottom of page 42 in the 'Linear transformations' section it says 'see Exercise 3.5'. I think this should be 'Exercise 3.6'.

Loving the book so far 🎉

Cannot load RDA file in R

Hi Aki,
I am unable to load this RDA file in R and am unsure what this error message means. If possible it would be helpful if the sexdata datafile was in a language agnostic format like csv, or txt or such so it can be loaded directly using Python
https://github.com/avehtari/ROS-Examples/tree/master/SexRatio/data

Missing Chile schools data

The Chile schools data for the regression discontinuity example [pg. 433] is not in the folder ChileSchools. Is this dataset available somewhere else?

Potential typo on p404

In the first para of Step 5, the comma after "A better strategy" reads awkwardly and it is probably better to remove it.

Typo on page 159 'Forming a linear predictor from a multiple regression'

This section has a reference to Figure 11.2. I believe this should be Figure 11.3.

Exercise 3.8 seems to be missing

Both exercise 9.6 (page 128) and the description of the jamaican study refer to Exercise 3.8, which is supposed to calculate the multiplicative factor distribution. However, exercise 3.8 deals with the mean and standard deviation of correlated heights, rather than with the jamaican study.

Small typo in code example on page 179

There seems to be a small typo in the last code example on page 179.
It states

fit_2 <- update(fit_2, prior=hs())

however fit_2 wasn't defined before. I guess it should read:

fit_2 <- update(fit_1, prior=hs())

[optional]: soft coding text in figure 7.4 code

The means are currently hardcoded in the code for figure 7.4. Using bquote instead of expression could allow for whatever the simulation's means were in the plot in simplest.R lines 122 and 123:

text(.05, -1 + mean(y[x==0]), bquote(paste(bar(y)[0], " = ", .(round(mean(y[x==0]), 2)))), col="gray30", cex=.9, adj=0)

text(.95, 1 + mean(y[x==1]), bquote(paste(bar(y)[1], " = ", .(round(mean(y[x==1]), 2)))), col="gray30", cex=.9, adj=1)

LOVE the book!

A notebook/html has content for many chapters - a marker would be helpful

Hi,

First thanks for this excellent book and the teaching approach. I am a newbie with some theoretical understanding and no real-world experience in this domain. These sets of examples are going to really help me practice and self-evaluate if I understood the concepts correctly.

As I started to read the first chapter I found a disconnect between the content of generated HTML files and the chapter. For e.g. at https://avehtari.github.io/ROS-Examples/examples.html, the examples are listed by chapter (this is what I desire) however when I go to hibbs.html in chapter 1, I see that it includes illustrations/code that comes from not only chapter 1 but also many other chapters.

That made it a bit difficult to follow. At least a marker with-in each HTML page indicating the end of illustration for a given chapter would have been helpful. This way a reader will not look at the remaining portion of the HTML until he/she reaches the corresponding chapter.

Regards & thanks
Kapil

page 11. difference between treatments and controls

On page 11 [Kindle version], the summary of the treatment-control comparison says "the treated units were 4.8 points higher than the controls, \bar{y} = 31.7 under the treatment and \bar{y} = 25.5 for the controls." Isn't the difference in means 6.2?

I also looked at the code in SimpleCausal/causal.R. It seems that the random seed was not fixed and so the simulations and the analyses might not be fully reproducible.

Typo on p469

Near the top of the page: "We will not go into the details of Hamiltonian Monte Carlo... in order not [to] get tripped up..."
In the sentence after that: "For our purposes here, the two most important aspects of HMC are that [it is] iterative..."

Typo on p353

In the middle of the first para of the SUTVA section, the sentence should read "Even with our small study with two levels of the treatment, there are 2^8 = 256 different possible allocations of [the 2 treatments] to these 8 people" (extra "treatments").

Mistakenly labelled equation on page 385

In the paragraph following equation 20.2 on p. 385, we can read

Equating the coefficients of z in (20.1) and (20.2) yields β₂* = β₁ + β₂γ₁.

However, there's no β₂* in equation 20.1. I believe the second equation in the subsection (the one with starred β's) should be marked as (20.1).

Typo on p406

In the last para of "Choosing covariates", the sentence should read "The classic example of a covariate that should not just be [included] as an additional predictor is an instrumental variable..."

Minor typo in the description of the parameters of the rss function (Chapter 8, p. 105)

I have the eBook version of the book. In the second paragraph of page 105 I think there is an error. It says:

We can try it out: rss(hibbs$growth, hibbs$vote, 46.3, 3.0) evaluates the residual sum of squares at the least squares estimate, (a,b) = (46.2, 3.1) ...

I think the last part should be (a,b) = (46.3, 3.0), or the other way around since the values (46.2, 3.1) are used in the following sections of the chapter.

Range of possible fits for election economy seemingly missing in html

In the HTML for hibbs.html the range of possible fits "seems to be missing" but I can see in the code its just commented out. Was a little disorienting as I read book because in Figure 8.2 the Median fit, and the range of fits are side by side, but it took a bit of searching to find the code and figure lower down in the .rmd file

https://github.com/avehtari/ROS-Examples/blob/master/ElectionsEconomy/hibbs.R#L161
https://avehtari.github.io/ROS-Examples/ElectionsEconomy/hibbs.html

Solutions

Does anyone know if there are solutions available for select problems for faculty members?

formula for the standard error of an weighted average on page 55

Because of operation precedence in R, sqrt(sum(x)^2) is equal to sum(x).

    W <- c(0.3, 0.4, 0.3)
    se <- c(0.02, 0.03, 0.03)

    # The weighted average of the standard errors
    c(sqrt(sum(W * se)^2), sum(W * se))
    #> [1] 0.027 0.027


    # The standard error of the weighted average
    sqrt(sum((W * se)^2))
    #> [1] 0.01615549

Typo on p104: First para of "Estimation of residual standard deviation \sigma

End of first sentence should read "... and the standard deviation of the errors can be estimated [from the] data".

And is misspelled in sex ratio cell title

Error is on this line

https://github.com/avehtari/ROS-Examples/blob/master/SexRatio/sexratio.R#L75

r quantile syntax in 5.3, page 73

The uncertainty intervals should be (something like)

quantile(z, c(0.25, 0.755))
quantile(z, c(0.025, 0.975))

instead of

quantile(z, 0.25, 0.75)
quantile(z, 0.025, 0.975)

Again, really enjoying the book!

I think I noticed an errata on the book: page 217.

I think I noticed an errata on the book. In page 217, it can be read: "The logistic function, logit(x) = log(x/1-x), maps the range (0,1) to (-inf to +inf)...". I think this is a mistake. It should be: "The logit function, logit(x) = log(x/1-x), maps the range (0,1) to (-inf to +inf)...".
I think the logistic function is the INVERSE of this logit function.

I think also that in page 219, it can be read: "The inverse logistic function is curved, and so the expected difference..." I don't know waht shape has the inverse logistic function. But I think that the text is more understandable, for the standard classical statistics readers like me, if you write "The inverse logit function (the logistic function) is curved, and so the expected difference...". This is my opinion.

Thanks

Possible mispelling of reproducibility

reproducibility is misspelled reproducability I believe

https://github.com/avehtari/ROS-Examples/blob/master/Earnings/earnings_regression.R#L36

PS:
I am submitting issues rather than PRs because due to earlier issue I'm not sure if my R environment is setup correctly

errata in book p.42

p. 42, the last line is the calculation for the standard error of the sample mean.
For the standard error of the difference we calculate:

sqrt(2.9^2+2.7^2)
4 approx.

Check:

h_m <- rnorm(1e4, 69.1, 2.9)
h_w <- rnorm(1e4, 63.7, 2.7)
h_diff <- h_m-h_w
sd(h_diff)

Regards.

a minor typo on p. 65

From Philip Hanser: "Also, I think there is a minor typo on p. 65. There is a double summation in paragraph 5. The inner summation sign has the range as ! to 6 and I believe you meant 1 to 6."