Giter VIP home page Giter VIP logo

dsc-2-23-09-resampling-methods-lab-online-ds-ft-100118's Introduction

Resampling Methods - Lab

Introduction

Now that you have some preliminary background on bootstrapping, jacknife and permutation tests, its time to practice those skills by coding them into functions. You'll then apply these tests to a hypothesis test and compare the results to a parametric t-test.

Objectives

You will be able to:

  • Understand permutation testing
  • Understand what jacknife is
  • Understand what bootstrapping is

Bootstrapping

Write a function that takes a sample and generates n additional samples of the same size using bootstrapping. (Recall that bootstrapping creates additional sets by sampling with replacement.)

def bootstrap():
    #Your code here

Jacknife

Write a function that creates additional samples by removing one element at a time. The function should do this for each of the n items in the original sample, returning n samples, each with n-1 members.

def jack1():
    """This function should take in a list of n observations and return n lists
    each with one member (presumably the nth) removed."""
    # Your code here

Permutation Testing

Define a function that generate all possible, equally sized, two set splits of two sets A and B. Sets A and B need not be the same size, but all of the generate two set splits should be of equal size. For example, if we had a set with 5 members and a set with 7 members, the function would return all possible 5-7 splits of the 12 items.

Here's a more in depth example:

B = [1,3]
permT(A, B) = [
                ([1,2,2], [1,3]),
                ([1,2,3], [1,2]),
                ([1,2,1], [2,3])
                ([1,1,3], [2,2]),
                ([2,2,3], [1,1])
              ]```  
These are all the possible 3-2 member splits of the 5 elements : 1,1,2,2,3.


```python
def permT(a,b):
    # Your code here

Permutation Testing in Practice

Let's further investigate the scenario proposed in the previous lesson. Below are two samples A and B. The samples are mock data for the blood pressure of sample patients. The research study is looking to validate whether there is a statistical difference in the blood pressure of these two groups using a 5% signifincance level. First, calculate the mean blood pressure of each of the two samples. Then, calculate the difference of these means. From there, use your permT() function, defined above, to generate all the possible combinations of the entrie sample data into A-B splits of equivalent sizes as the original sets. For each of these combinations, calculate the mean blood pressure of the two groups and record the difference between these sample means. The full collection of the difference in means between these generated samples will serve as the denominator to calculate the p-value associated with the difference between the original sample means.

For example, in our small handwritten example above:

$\mu_a = \frac{1+2+2}{3} = \frac{5}{3}$
and
$\mu_b = \frac{1+3}{2} = \frac{4}{2} = 2$

Giving us

$\mu_a - \mu_b = \frac{5}{3} - 2 = \frac{1}{2}$

In comparison, for our various combinations we have:

([1,2,2], [1,3]): $\mu_a - \mu_b = \frac{5}{3} - 2 = \frac{1}{2}$
([1,2,3], [1,2]): $\mu_a - \mu_b = 2 - \frac{3}{2} = \frac{1}{2}$
([1,2,1], [2,3]): $\mu_a - \mu_b = \frac{4}{3} - \frac{5}{3} = -\frac{1}{2}$
([1,1,3], [2,2]): $\mu_a - \mu_b = \frac{5}{3} - 2 = \frac{1}{2}$
([2,2,3], [1,1]): $\mu_a - \mu_b = \frac{7}{3} - 1 = \frac{4}{3}$

A standard hypothesis test for this scenario might be:

$h_0: \mu_a = \mu_b$
$h_1: \mu_a < \mu_b$

Thus comparing our sample difference to the differences of our possible combinations, we look at the number of experiments from our permutation space that were the same or greater then our sample statistic, divided by the total number of permutations. In this case, 4 out of 5 of the permutation cases produced the same or greater differences in the two sample means. This value .8 is a strong indication that we cannot refute the null hypothesis for this instance.

a = []
b = []
# Your code here

T-test Revisited

The parameteric statistical test equivalent to our permutation test above would be a t-test of the two groups. Perform a t-test on the same data above in order to calculate the p-value. How does this compare to the above results?

# Your code here

Bootstrap Applied

Use your code above to apply the bootstrap technique to this hypothesis testing scenario. In other words, similar to the permutation testing you performed above, compute additional samples (arbitrarily let's say 1000) of the same size as the original sample, with replacement. For each of these additional samples, compute whether the difference in sample means is the same or greater then that of the original samples. Use this to calculate an overall p-value for the null hypothesis.

# Your code here

Summary

Well done! In this lab you practice coding modern statistical resampling techniques of the 20th century! You also started to compare these non-parametric methods to other parametric methods such as the t-test that we previously discussed.

dsc-2-23-09-resampling-methods-lab-online-ds-ft-100118's People

Contributors

loredirick avatar mathymitchell avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.