Giter VIP home page Giter VIP logo

Comments (3)

lebebr01 avatar lebebr01 commented on July 18, 2024 1

Thanks! Looks great. The added piece to being explicit about the method's assumptions is helpful, I misread/misunderstood that.

from performance.

rempsyc avatar rempsyc commented on July 18, 2024

Thanks for the review @lebebr01!

pg 2, lines 43 - 46: There is a discussion of the mean and standard deviation not being robust, which is great. I was surprised to see the second part regarding these statistics assuming a Normal distribution. I agree and understand your point, but this is overstated for those learning statistics.

To be clear, in the sentence, "they assume normally distributed data", the "they" refers to the methods based on the means and SD, not the means and SD themselves. What if we rephrase this phrase to respecify that we refer to the methods, would you be OK with that?

pg 7, lines 244 - 247: I liked this example and appreciate the idea of thinking about research context for extreme values/outliers. Would it be worth adding/framing this idea into statistical terms and being very explicit about what you mean by context?

For context, the example is:

For example, if we are studying the effects of X on Y among teenagers and we have one observation from a 20-year-old, this observation might not be a statistical outlier, but it is an outlier in the context of our research, and should be discarded to allow for valid inferences.

Here, I think the deal with this example is that this is an undetected error outlier, in the sense that it is perhaps not detected by the statistical outlier detection methods, but it still does not belong to the theoretical or empirical distribution of interest (i.e., teenagers). So the take-away from this paragraph is that we should not blindly rely on statistical outlier detection methods and we should do our due diligence to investigate error outliers that are missed by the statistical methods. I will try to clarify this paragraph, but I am not sure I can reframe this in statistical terms since we are zooming out of the stats perspective here in a way, except I can mention the distribution of interest bit.

from performance.

rempsyc avatar rempsyc commented on July 18, 2024

Here is the revised paragraph for point 2 (updated on the JOSE branch):

 We should also keep in mind that there might be error outliers that are not detected by statistical tools, but should nonetheless be found and removed. For example, if we are studying the effects of X on Y among teenagers and we have one observation from a 20-year-old, this observation might not be a statistical outlier, but it is an outlier in the context of our research, and should be discarded. We could call these observations undetected error outliers, in the sense that although they do not statistically stand out, they do not belong to the theoretical or empirical distribution of interest (e.g., teenagers). In this way, we should not blindly rely on statistical outlier detection methods; doing our due diligence to investigate undetected error outliers relative to our specific research question is also essential for valid inferences.

from performance.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.