Giter VIP home page Giter VIP logo

normtest's Introduction

noRmtest

Contributors: Constantin Shuster, Sylvia Lee, Richie Zitomer

This is an R package that tests your data for normality!

Overview

A common and important assumption that is made by many (and commonly used) parametric statistical methods (t-tests, ANOVA and linear regression) is that the dependent variable (response variable) is normally distributed across all categories of the independent variables (predictors). Thus testing for normality in the data is an important step before applying parametric statistical methods.

Graphical and statistical methods can be used to test whether sample data was drawn from a normal population. In normality testing it is important to remember that our null hypothesis is that the sample data is NOT different than a normal population with the same mean and variance. If we fail to reject this null hypothesis - meaning resultant p-value is > 0.05 - then we would be able to apply the appropriate parametric statistical methods to our data. Normality testing can also be used to check whether any sample data approximates a normally distributed population. More on this topic can be found here and here.

This package will test your data for normality using a graphical and a statistical method. As a graphical method, quantile-quantile plots (Q-Q plot) will be constructed in order for you to visualize whether the data closely approximates a straight line - thereby indicating it is normally distributed. As a statistical method, the Shapiro-Wilk test score will be calculated along with the corresponding p-value. The Shapiro-Wilk test provides better power than most other statistical normality tests, as long as most of the values are unique 1. This package will also derive the parameters that would fit your data to a normal distribution using maximum likelihood estimation.

Package functions:

  1. make_qqplot()
    • description: this function will read in data and will create a QQ-plot for each continuous variable in the data. It will output a dictionary of plot objects and print them to screen as default (the user will have the option of not printing plots).
    • input: dataframe, list, array, or matrix
    • output: list of plots
  2. shapiro_wilk()
    • description: this function will read in data and will output the shapiro-wilks test for normality for each continuous variable in the data. The output will be tuple of lists where the first list contains the test statistics in the order of the variables in the input dataframe and the second list contains the p-values in the same respective ordering.
    • input: dataframe, list, array, or matrix
    • output: tuple of lists
      • first list: test statistic
      • second list: p-value
  3. params_mle()
    • description: this function will read in data and will output the mean and variance for the empirical distribution of the data given that the data is normal for each continuous variable in the data. The output will be a dataframe with one row for the mean and one row for the variance with the columns presenting the original variables in the data.
    • input: dataframe, list, array, or matrix
    • output: dataframe
      • columns: variables
      • rows: mean, variance

Where does this fit into the R ecosystem

In the stats package in R there is a Shaprio-Wilk test function named shapiro.test(). The input is a numeric vector and the output is list containing the statistic value and the p-value. There is a package called ggpubr which has a ggqqplot() where the input is a dataframe and the output is a ggplot2 object. Interestingly, the car package also has a qqPlot() function which can be used to make a Q-Q plot of any data.

normtest's People

Contributors

rzitomer avatar leeyinying avatar shusterc avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.