avehtari / ros-examples Goto Github PK
View Code? Open in Web Editor NEWRegression and other stories R examples
Home Page: https://avehtari.github.io/ROS-Examples/
Regression and other stories R examples
Home Page: https://avehtari.github.io/ROS-Examples/
It would be nice for readers if in the code linpred
, epred
and postpred
showed their first 5 rows and perhaps str(df)
as well for shape.
That way its very obvious what different is in each one of these methods at a glance and the reader doesn't need to run the code locally. The explanation on page 239 makes some sense as to what is happening, but the docs are a bit hard to decipher.
https://avehtari.github.io/ROS-Examples/NES/nes_logistic.html
https://www.rdocumentation.org/packages/rstanarm/versions/2.19.3/topics/posterior_predict.stanreg
https://www.rdocumentation.org/packages/rstantools/versions/2.1.1/topics/posterior_epred
PS
I will label issues that are "personal preference" as optional suggestion. Feel free to close without resolution if you disagree I wont be offended
The first sentence on this page should read "The new estimated treatment effect is 1.7 with a standard error of 0.7.... in [Figure 19.4b]" (instead of Figure 19.4a).
Hi, thanks very much for the book! I enjoyed reading through it. :)
This is the first of a number of typos/issues I spotted while going through the book. I've put one typo/issue in each issue for easier reference.
p61 near the bottom of the page: It says computing T(y;\phi(y) for j = 1, \dots, J, but j doesn't appear in the formula.
Also, I found the notation a little confusing. It appears that T represents the test statistic while \phi represents the choices in the analysis. So, when choosing the best result given the data, shouldn't it be T^{best} instead of \phi^{best}, since we are performing J tests (presumably on different statistics)?
Minor nitpick.
The caption for Figure 10.9 says "from 1976 through 2000". Looking at the R Code, and the figures themselves, the range is 1972 through 2000.
Thought provoking book, thank you.
Page 435 equation 21.7 it might be beta3 for predictor math pre-test
P231 at the bottom, the first bullet point, there are two "is".
P236 in the paragraph above section "Comparing the coefficient estimates when adding a predictor", the last sentence has two "that".
Btw, I also did not understand the statement of this sentence about "the drawback that probabilities close to zero can't improve much in absolute value even if they can improve a lot relatively". Could you please explain a bit more in detail? Does this refer to the fact that in logit function (e.g Fig13.1a) near-zero probability area is "flat" so that the absolute probability value is just varying in a very small range?
Thanks,
Zhengchen
-p.39, in the second sentence of the first full paragraph, "Figure 3.4 displays data on log metabolic rate vs. body mass indicating..." should perhaps be "log metabolic rate vs. log body mass" since both variables are on the log scale.
-p.54, in "Comparisons, visual and numerical" subsection, "Figure 4.2" in the first sentence should be "Figure 4.3."
Following Equation (17.1) the text says (in part)
"... which was too low because the sample did not include enough Republicans."
But as I read it, the sample had just the right fraction of Republicans, namely 33%. Instead it had too many Democrats and not enough Independents.
On page 6 there is a margin note for the "Estimating public opinion from an opt-in internet survey" for example data "Xbox survey" that is not in the repo.
The sentence should read "... (cycle days 26-28) to avoid potential [confounders/confounding] due to premenstrual or menstrual symptoms". Unless they made this typo in the original text...
I believe this line should be round(sd(y_0)/sqrt(n_0), 2)
but am not sure
https://github.com/avehtari/ROS-Examples/blob/master/Simplest/simplest.R#L79
Hello!
I think that there is a broken link here!
https://avehtari.github.io/ROS-Examples/PoissonExample/PoissonExample.html
At the end of the first para under "Relation to formal models of causal inference", instead of "populations homogeneous in w_i", we should have "populations homogeneous in x_i".
On page 26 (Section 2.3) of the printed book, an example begins with
health <- read.table("healthdata.txt", header=TRUE)
The file healthdata.txt
is not present in the repo, although more complex code for reproducing the graph in Figure 2.4 is present in HealthExpenditure/healthexpenditure.R
.
On page 143 3rd line from bottom '...Figure 10.7 give a predictive mean of 260.1...'. I think this should be 262.1. Thanks for the great book. Eamonn.
I'm new to R, so there may be a misunderstanding on my part, but I believe there is a typo in the p-value calculation on page 57:
2*(1 - pt(abs(theta_hat)/se_theta, n_c+n_t, 2))
should be
2*(1 - pt(abs(theta_hat)/se_theta, n_c+n_t-2))
By the way, the book has been excellent so far!
Hi there,
I was on page 208. In the middle of the page. One paragraph says "Figure12.11a shows that with the default prior on regression coefficients and sigma, the implied prior distribution of R^2 is strongly favouring larger values and thus is favouring overfitted models". However, the prior distribution looks like an exp distribution, why do you think it is strongly favouring larger values? Is this inference coming from the posterior? it makes sense to state this looking at the posterior not prior. Besides, I barely see the differences between the three prior distributions visually. The very right one should be a horseshow prior but sill looks like an exp. I might be wrong since I did not understand very well this section 12.7.
In general, there are no labels for (a), (b), (c) and so on in the figures of the book, it is not at all a problem because I could understand well with the nice captions, but not sure if this is good for formal textbook.
Thanks,
Zhengchen
In the last para on the page (or 2nd para under Section 20.9), the sentence should read "Another concern [is] that the treatment or exposure in the data can differ from potential policies, interventions, or exposures for new people or cases."
Not sure where to ask but would like to know when you expect this book to be available?
Thanks
On p. 316, the data given in the code for poststrat_data_2
does not match the data shown in the table that it is meant to correspond to (Figure 17.2).
More specifically, the values for N
in poststrat_data_2
are 0.16, 0.17, 0.19, 0.17, 0.16, 0.15, but in the table in Figure 17.2 the values are 0.16, 0.17, 0.22, 0.14, 0.14, 0.17.
In the print version of the book, section "Mean and standard deviation of the sum of correlated variables" (p43) displays standard deviations as \sigma_{a} and \sigma_{b} in paragraph 1. In paragraph 2, when discussing linear combinations, this changes to \sigma_{u} and \sigma_{v}. Mean subscripts remain unchanged throughout.
In the second paragraph of the subsection titled “Summarizing prediction error using the log score and deviance” on page 174 of the print edition, the normal log score is stated as “−½ log σ − […]” while it should be “−log σ − […]”.
The fit linear model has an intercept of 46.3 but the graph shows the line intersecting the vertical axis at around 44, clearly below 45. either the line is not from this model or the axis is incorrectly labeled.
At the bottom of page 42 in the 'Linear transformations' section it says 'see Exercise 3.5'. I think this should be 'Exercise 3.6'.
Loving the book so far 🎉
Hi Aki,
I am unable to load this RDA file in R and am unsure what this error message means. If possible it would be helpful if the sexdata datafile was in a language agnostic format like csv, or txt or such so it can be loaded directly using Python
https://github.com/avehtari/ROS-Examples/tree/master/SexRatio/data
The Chile schools data for the regression discontinuity example [pg. 433] is not in the folder ChileSchools. Is this dataset available somewhere else?
In the first para of Step 5, the comma after "A better strategy" reads awkwardly and it is probably better to remove it.
This section has a reference to Figure 11.2. I believe this should be Figure 11.3.
Both exercise 9.6 (page 128) and the description of the jamaican study refer to Exercise 3.8, which is supposed to calculate the multiplicative factor distribution. However, exercise 3.8 deals with the mean and standard deviation of correlated heights, rather than with the jamaican study.
There seems to be a small typo in the last code example on page 179.
It states
fit_2 <- update(fit_2, prior=hs())
however fit_2
wasn't defined before. I guess it should read:
fit_2 <- update(fit_1, prior=hs())
The means are currently hardcoded in the code for figure 7.4. Using bquote
instead of expression
could allow for whatever the simulation's means were in the plot in simplest.R
lines 122 and 123:
text(.05, -1 + mean(y[x==0]), bquote(paste(bar(y)[0], " = ", .(round(mean(y[x==0]), 2)))), col="gray30", cex=.9, adj=0)
text(.95, 1 + mean(y[x==1]), bquote(paste(bar(y)[1], " = ", .(round(mean(y[x==1]), 2)))), col="gray30", cex=.9, adj=1)
LOVE the book!
Hi,
First thanks for this excellent book and the teaching approach. I am a newbie with some theoretical understanding and no real-world experience in this domain. These sets of examples are going to really help me practice and self-evaluate if I understood the concepts correctly.
As I started to read the first chapter I found a disconnect between the content of generated HTML files and the chapter. For e.g. at https://avehtari.github.io/ROS-Examples/examples.html, the examples are listed by chapter (this is what I desire) however when I go to hibbs.html in chapter 1, I see that it includes illustrations/code that comes from not only chapter 1 but also many other chapters.
That made it a bit difficult to follow. At least a marker with-in each HTML page indicating the end of illustration for a given chapter would have been helpful. This way a reader will not look at the remaining portion of the HTML until he/she reaches the corresponding chapter.
Regards & thanks
Kapil
On page 11 [Kindle version], the summary of the treatment-control comparison says "the treated units were 4.8 points higher than the controls, \bar{y} = 31.7 under the treatment and \bar{y} = 25.5 for the controls." Isn't the difference in means 6.2?
I also looked at the code in SimpleCausal/causal.R. It seems that the random seed was not fixed and so the simulations and the analyses might not be fully reproducible.
Near the top of the page: "We will not go into the details of Hamiltonian Monte Carlo... in order not [to] get tripped up..."
In the sentence after that: "For our purposes here, the two most important aspects of HMC are that [it is] iterative..."
In the middle of the first para of the SUTVA section, the sentence should read "Even with our small study with two levels of the treatment, there are 2^8 = 256 different possible allocations of [the 2 treatments] to these 8 people" (extra "treatments").
In the paragraph following equation 20.2 on p. 385, we can read
Equating the coefficients of z in (20.1) and (20.2) yields β₂* = β₁ + β₂γ₁.
However, there's no β₂* in equation 20.1. I believe the second equation in the subsection (the one with starred β's) should be marked as (20.1).
In the last para of "Choosing covariates", the sentence should read "The classic example of a covariate that should not just be [included] as an additional predictor is an instrumental variable..."
I have the eBook version of the book. In the second paragraph of page 105 I think there is an error. It says:
We can try it out: rss(hibbs$growth, hibbs$vote, 46.3, 3.0)
evaluates the residual sum of squares at the least squares estimate, (a,b) = (46.2, 3.1)
...
I think the last part should be (a,b) = (46.3, 3.0)
, or the other way around since the values (46.2, 3.1) are used in the following sections of the chapter.
In the HTML for hibbs.html the range of possible fits "seems to be missing" but I can see in the code its just commented out. Was a little disorienting as I read book because in Figure 8.2 the Median fit, and the range of fits are side by side, but it took a bit of searching to find the code and figure lower down in the .rmd file
https://github.com/avehtari/ROS-Examples/blob/master/ElectionsEconomy/hibbs.R#L161
https://avehtari.github.io/ROS-Examples/ElectionsEconomy/hibbs.html
Does anyone know if there are solutions available for select problems for faculty members?
Because of operation precedence in R, sqrt(sum(x)^2)
is equal to sum(x)
.
W <- c(0.3, 0.4, 0.3)
se <- c(0.02, 0.03, 0.03)
# The weighted average of the standard errors
c(sqrt(sum(W * se)^2), sum(W * se))
#> [1] 0.027 0.027
# The standard error of the weighted average
sqrt(sum((W * se)^2))
#> [1] 0.01615549
End of first sentence should read "... and the standard deviation of the errors can be estimated [from the] data".
Error is on this line
https://github.com/avehtari/ROS-Examples/blob/master/SexRatio/sexratio.R#L75
The uncertainty intervals should be (something like)
quantile(z, c(0.25, 0.755))
quantile(z, c(0.025, 0.975))
instead of
quantile(z, 0.25, 0.75)
quantile(z, 0.025, 0.975)
Again, really enjoying the book!
I think I noticed an errata on the book. In page 217, it can be read: "The logistic function, logit(x) = log(x/1-x), maps the range (0,1) to (-inf to +inf)...". I think this is a mistake. It should be: "The logit function, logit(x) = log(x/1-x), maps the range (0,1) to (-inf to +inf)...".
I think the logistic function is the INVERSE of this logit function.
I think also that in page 219, it can be read: "The inverse logistic function is curved, and so the expected difference..." I don't know waht shape has the inverse logistic function. But I think that the text is more understandable, for the standard classical statistics readers like me, if you write "The inverse logit function (the logistic function) is curved, and so the expected difference...". This is my opinion.
Thanks
reproducibility is misspelled reproducability I believe
https://github.com/avehtari/ROS-Examples/blob/master/Earnings/earnings_regression.R#L36
PS:
I am submitting issues rather than PRs because due to earlier issue I'm not sure if my R environment is setup correctly
p. 42, the last line is the calculation for the standard error of the sample mean.
For the standard error of the difference we calculate:
sqrt(2.9^2+2.7^2)
4 approx.
h_m <- rnorm(1e4, 69.1, 2.9)
h_w <- rnorm(1e4, 63.7, 2.7)
h_diff <- h_m-h_w
sd(h_diff)
Regards.
From Philip Hanser: "Also, I think there is a minor typo on p. 65. There is a double summation in paragraph 5. The inner summation sign has the range as ! to 6 and I believe you meant 1 to 6."
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.