Giter VIP home page Giter VIP logo

Comments (19)

flothesof avatar flothesof commented on June 15, 2024

Hi, it's me again, still in chapter 4. Sorry to write this here again, but it's unrelated to my previous issue.

The code below works well but I'm a bit surprised by the naming choice for the variable t. Wouldn't it have been better to call it sample? It was a little bit confusing to me when I first saw it. Also, you use the word "sample" later on to describe the list of values you're using. This might be worth considering (if you have the time).

def EvalCdf(t, x):
   count = 0.0
   for value in t:
      if value <= x:
          count += 1
   prob = count / len(t)
   return prob

Thanks!

from thinkstats2.

flothesof avatar flothesof commented on June 15, 2024

Another comment:

Figure 5.8 shows normal probability plots for adult weights, w, and for their
logarithms, log10 w. Now it is apparent that the data deviate substantially
from the normal model. The lognormal model is a good match for the data
within a few standard deviations of the mean, but it deviates in the tails. I
conclude that the lognormal distribution is a good model for this data.

Isn't this supposed to be

The normal model is a good match for the data within a few standard deviations of the mean, but it deviates in the tails.
?

from thinkstats2.

AllenDowney avatar AllenDowney commented on June 15, 2024

Thank you for all of these. It will take me a while to process them, but I
will get to it soon!

Allen

On Thu, Feb 26, 2015 at 5:21 AM, flothesof [email protected] wrote:

Another comment:

Figure 5.8 shows normal probability plots for adult weights, w, and for their
logarithms, log10 w. Now it is apparent that the data deviate substantially
from the normal model. The lognormal model is a good match for the data
within a few standard deviations of the mean, but it deviates in the tails. I
conclude that the lognormal distribution is a good model for this data.

Isn't this supposed to be

The normal model is a good match for the data within a few standard
deviations of the mean, but it deviates in the tails.
?


Reply to this email directly or view it on GitHub
#16 (comment)
.

from thinkstats2.

flothesof avatar flothesof commented on June 15, 2024

Another small comment about the normal / lognormal modes. Figure 5.7 has the following caption:

CDF of adult weights on a linear scale (left) and log scale (right).

One thing that doesn't appear clearly in this caption is the fact that on the left the model is a normal one, while on the right it's a lognormal one. So I would suggest modifying both the labels within the figure ("normal model" and "lognormal model") and change the caption to:

CDF of adult weights on a linear scale, fitted using a normal model (left) and log scale, fitted using a lognormal model (right).

Thanks!

from thinkstats2.

flothesof avatar flothesof commented on June 15, 2024

Another one: (I'm using your PDF version 2.0.23) in Chapter 6 it reads

>>> sample = [random.gauss(mean, std) for i in range(500)]
>>> sample_pdf = thinkstats2.EstimatedPdf(sample)
>>> thinkplot.Pdf(pdf, label='sample KDE')

I believe this should be

>>> thinkplot.Pdf(sample_pdf, label='sample KDE')

from thinkstats2.

flothesof avatar flothesof commented on June 15, 2024

Small typo here:

If you are not familiar with moment of inertia, see
\url{http://en.wikipedia.org/wiki/Moment_of_inertia}.  \index{moment
  of inertia}.

There's a dot that shouldn't be there after the \index (this dot shows up in the PDF document).

from thinkstats2.

flothesof avatar flothesof commented on June 15, 2024

Also I was surprised by this:

def Median(xs):
   cdf = thinkstats2.MakeCdfFromList(xs)
   return cdf.Value(0.5)

Why don't we use just thinkstats2.Cdf(xs) instead? This is the way we were "taught" to create CDFs so far in the book, so why use this other, unintroduced function there?

from thinkstats2.

flothesof avatar flothesof commented on June 15, 2024

In the solutions to the exercice of chapter 6:

With a higher upper bound, the moment-based skewness increases, as
expected.  Surprisingly, the Person skewness goes down!  The reason
seems to be that increasing the upper bound has a modest effect on the
mean, and a stronger effect on standard deviation.  Since std is in
the denominator with exponent 3, it has a stronger effect on the
result.

The comment about std being in the denominator with exponent 3 is incorrect, isn't it? It's exponent 1!

from thinkstats2.

AllenDowney avatar AllenDowney commented on June 15, 2024

I've processed these and made corrections and changes. I'd like to add you to the contributor list. Should I use your github login, or do you want to email me your IRL name?

About skewness, the std does appear in the sample skewness with exponent 3. See http://en.wikipedia.org/wiki/Skewness#Sample_skewness

from thinkstats2.

flothesof avatar flothesof commented on June 15, 2024

Addition to my previous comment: the reason I was saying the exponent is 1 is that the sentence you wrote in the solution file is about Pearson's measure of skewness, not the sample skewness (if you'd been talking about the sample skewness, the comment would have been correct, obviously). Therefore I'd suggest the following rephrase: (also, there was a typo on "Pearson")

With a higher upper bound, the moment-based skewness increases, as
expected.  

Surprisingly, the Pearson skewness goes down!  The reason
seems to be that increasing the upper bound has a modest effect on the
mean, and a stronger effect on standard deviation, which is in
the denominator, and thus has a stronger effect on the
result.

from thinkstats2.

flothesof avatar flothesof commented on June 15, 2024

Chapter 7, scatter plots: your default code for scatter plots includes the following options

options = _Underride(options, color='blue', alpha=0.2, 
                        s=30, edgecolors='none')

Therefore, the code which you say yields Figure 7.1, thinkplot.Scatter(heights, weights), does not permit to obtain that figure, due to transparency, which is a little misleading.

However, it's nice to have transparency by default, so I guess it would be more helpful to say that the code is thinkplot.Scatter(heights, weights, alpha=1)? But then you need to explain what alpha does...

from thinkstats2.

flothesof avatar flothesof commented on June 15, 2024

Docstring for HexBin: shouldn't that be "makes a hexbin plot"?

def HexBin(xs, ys, **options):
    """Makes a scatter plot.
...

from thinkstats2.

flothesof avatar flothesof commented on June 15, 2024

Hi Allen,

small typo: you have an unnecessary parenthesis at the end of the following line of code found in section 7.7 (Spearman's correlation)

thinkstats2.Corr(df.htm3, np.log(df.wtkg2)))

It should be:

thinkstats2.Corr(df.htm3, np.log(df.wtkg2))

from thinkstats2.

AllenDowney avatar AllenDowney commented on June 15, 2024

Thanks again. I will get to all of these soon!

On Tue, Mar 10, 2015 at 8:26 AM, flothesof [email protected] wrote:

Hi Allen,

small typo: you have an unnecessary parenthesis at the end of the
following line of code found in section 7.7 (Spearman's correlation)

thinkstats2.Corr(df.htm3, np.log(df.wtkg2)))

It should be:

thinkstats2.Corr(df.htm3, np.log(df.wtkg2))


Reply to this email directly or view it on GitHub
#16 (comment)
.

from thinkstats2.

flothesof avatar flothesof commented on June 15, 2024

Hi Allen,

I just finished the exercises for chapter 8 and have a couple of remarks regarding Exercise 8.3 (hocker / soccer games).

The problem statement is:

Is this way of making an estimate biased?  Plot the sampling
distribution of the estimates and the 90\% confidence interval.  What
is the standard error?  What happens to sampling error for increasing
values of {\tt lam}?

Your solution does not address the confidence interval. This is actually a good point: when I computed the confidence interval, I realized it is quite meaningless in this context. In one of my tests, I set lambda=0.3 and got a confidence interval of [0; 1]. Which is to say that we always expect either 0 or 1 goals per match. As you're asking for those in the problem statement, maybe you could just point to the fact that in this context, the confidence interval is not useful (you probably have a better way of expressing this...)?

My second point pertains to the second question. Did you really mean to ask what happens when lam increases? As far as I could tell, nothing. Judging from your solution, you probably meant the variable m (the number of games).

As always, thanks for writing this book! :)

from thinkstats2.

flothesof avatar flothesof commented on June 15, 2024

Hi Allen,

small typo (line 6949 of the TeX source):

statistically significant. But considering the two tests togther, I

from thinkstats2.

flothesof avatar flothesof commented on June 15, 2024

Hi Allen,

I've just gone through the exercices of chapter 9 and I have a couple of thoughts:

  • exercice 9.1 is harder than exercice 9.2 so I'd just swap them
  • exercice 9.1 seemed a little bit ill-defined for me when I started working on them. For instance, it didn't occur to me rightaway how to reduce the sample size of the data and to redo the tests. In fact, I first tried to rerun the first test you present in the chapter, namely the dice test with 140 heads and 110 tails with more samples and just multiplied the numbers by (280, 220). I realized this doesn't make any sense, but only later. So maybe introducing a sort of "in-between" difficulty exercice would ease the learning.

Other than that, great chapter. Thanks!

from thinkstats2.

AllenDowney avatar AllenDowney commented on June 15, 2024

Changes in chapter 4 as of 3b598ed

from thinkstats2.

AllenDowney avatar AllenDowney commented on June 15, 2024

I think I have finally processed all of these. Thank you!

from thinkstats2.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.