Comments (3)
Thanks! Looks great. The added piece to being explicit about the method's assumptions is helpful, I misread/misunderstood that.
from performance.
Thanks for the review @lebebr01!
pg 2, lines 43 - 46: There is a discussion of the mean and standard deviation not being robust, which is great. I was surprised to see the second part regarding these statistics assuming a Normal distribution. I agree and understand your point, but this is overstated for those learning statistics.
To be clear, in the sentence, "they assume normally distributed data", the "they" refers to the methods based on the means and SD, not the means and SD themselves. What if we rephrase this phrase to respecify that we refer to the methods, would you be OK with that?
pg 7, lines 244 - 247: I liked this example and appreciate the idea of thinking about research context for extreme values/outliers. Would it be worth adding/framing this idea into statistical terms and being very explicit about what you mean by context?
For context, the example is:
For example, if we are studying the effects of X on Y among teenagers and we have one observation from a 20-year-old, this observation might not be a statistical outlier, but it is an outlier in the context of our research, and should be discarded to allow for valid inferences.
Here, I think the deal with this example is that this is an undetected error outlier, in the sense that it is perhaps not detected by the statistical outlier detection methods, but it still does not belong to the theoretical or empirical distribution of interest (i.e., teenagers). So the take-away from this paragraph is that we should not blindly rely on statistical outlier detection methods and we should do our due diligence to investigate error outliers that are missed by the statistical methods. I will try to clarify this paragraph, but I am not sure I can reframe this in statistical terms since we are zooming out of the stats perspective here in a way, except I can mention the distribution of interest bit.
from performance.
Here is the revised paragraph for point 2 (updated on the JOSE branch):
We should also keep in mind that there might be error outliers that are not detected by statistical tools, but should nonetheless be found and removed. For example, if we are studying the effects of X on Y among teenagers and we have one observation from a 20-year-old, this observation might not be a statistical outlier, but it is an outlier in the context of our research, and should be discarded. We could call these observations undetected error outliers, in the sense that although they do not statistically stand out, they do not belong to the theoretical or empirical distribution of interest (e.g., teenagers). In this way, we should not blindly rely on statistical outlier detection methods; doing our due diligence to investigate undetected error outliers relative to our specific research question is also essential for valid inferences.
from performance.
Related Issues (20)
- difficult-to-diagnose errors using "difftime" response in a linear model HOT 9
- `check_singularity` doesn't work for `glmmTMB` HOT 9
- `icc` doesn't work for `glmmTMB` HOT 4
- R-squared for Dirichlet regression (`r2`)
- QQ plot blank in check model for glmmTMB with tweedie distribution HOT 4
- Error checking normality for t.test HOT 1
- spurious(?) viewport-too-small error with new ggplot2 version 3.5.0 HOT 11
- incorrect warning with old `ggplot2`/failure to load `see` HOT 2
- check_model "Error in match.arg" HOT 5
- Error in performance::check_distribution(): in call bw.SJ() HOT 2
- Revising `check_model()` HOT 1
- check_model failing on logistic regression HOT 2
- Check_model in version 0.11.0 no longer produces qq plot residuals HOT 19
- r2_nakagawa and glmmTMB with beta_family HOT 4
- Outlier detection in Linear mixed models failed? HOT 5
- cannot apply check_model title with patchwork::plot_annotation HOT 4
- check_model error suggestions are not complete HOT 5
- Error and Incomplete Output Using performance::check_collinearity with Cox Models HOT 1
- Normality of Residuals of check_model is abnormal. HOT 2
- Revise compare_models() for Bayesian models HOT 5
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from performance.