Giter VIP home page Giter VIP logo

monte-carlo-simulator's Introduction

Monte Carlo Simulator Build Status

A command line binary written in C for performing fast Monte Carlo Permutation Tests for validating A/B test results.

Uses Double precision SIMD-oriented Fast Mersenne Twister (dSFMT) for wicked fast pseudo-randomization.

Why?

Our clients at Monetate run thousands of A/B tests each year. We've recently updated our statistical models to calculate significance of these campaigns' goals (Conversion Rate, Revenue Per Visit, etc) and wanted a way to:

  • Validate the accuracy of these statistical models
  • Create a feedback loop to further improve the models

We chose Monte Carlo to accomplish this, but quickly found that running simulations in our language of choice, Python, on A/B Tests with several million visitors wasn't going to cut it.

Monte Carlo Testing Intro

Jump right to Installation, CLI Usage or straight to the code if you're already familiar with Monte Carlo Testing.

Say you're running an A/B Test on a site to see if the experiment variant had a significant effect on Revenue Per Visit. To simplify things a bit, let's begin by looking at just three of these visitors. Our first visitor, John, visited twice but did not buy anything. Suzy visited once and made one $9 purchase. Bob visited twice and made two purchases at $8 and $9.

In this case, a visitor may have multiple visits but the A/B Test randomizes based on visitor to give everyone a consistent experience.

User Group Visits (y0) Purchase Amount Sum (y1) Purchase Amount Sum of Squares (y2) *
john Experiment 2 0 0
suzy Control 1 9 81
bob Experiment 2 17 145

* We include y2 here for calculating variance in our models

From storing the info this way, we can compute our observed difference with statistical significance in revenue per visit between the two groups.

To verify the computed significance, we can also send this data through a Monte Carlo Simulator to determine how likely the difference was due to randomness or not.

The simulator performs multiple permutations. On each interation the simulator will randomly assign visitors to a theoretical Experiment or Control group and sum up the y0, y1, y2 for all visitors in the group.

We can see that in the table below, two simulations were performed. The first simulation put all three visitors in the Experiment group and none in the Control group. In the second simulation, it put John and Suzy in the Expermiment group and Bob in the Control Group.

Simulation Group Visits (y0) Purchase Amount Sum (y1) Purchase Amount Sum of Squares (y2)
0 Experiment 5 26 226
0 Control 0 0 0
1 Experiment 3 9 81
1 Control 2 17 145

Now What?

Let's now assume we had 2 million visitors split evenly into Experiment and Control groups. We observed a difference in revenue per visit of $1.50 with a p-value of 11% in our two-tailed t-test.

Now we run the Monte Carlo simulator with 10,000 iterations. We can then calculate the difference between the two groups for each of the ten thousand simulations. Most of these differences will be near zero because we randomly distributed the visitors between the two groups, but some may lay outside of our $1.50 observed difference.

Simulation Group Visits (y0) Purchase Amount Sum (y1) Purchase Amount Sum of Squares (y2)
0 Experiment 1000129 124124 9193930
0 Control 999871 111123 10003234
1 Experiment 999976 154320 8100857
1 Control 1000024 82394 7231043
... ... ... ... ...
9999 Experiment 993429 100001 9534543
9999 Control 1006571 129993 8738439

If we see that 1000 of the 10,000 random iterations had a difference of more than $1.50, we can say that there is a 10% chance that our $1.50 observed difference was due to randomness.

Although technically not a direct comparison, we can compare our computed p-value of 11% to our simulated 10% result to determine whether or not the model is accurate enough.

Installation

Downloading a Release

You can grab a pre-compiled binary for your OS and architecture from a Github Release:

wget https://github.com/monetate/monte-carlo-simulator/releases/download/v0.1.0/monte-carlo-simulator-v0.1.0-Linux-i386.tar.gz \
  -O monte-carlo-simulator-v0.1.0-Linux-i386.tar.gz
tar -zxvf monte-carlo-simulator-v0.1.0-Linux-i386.tar.gz
cd monte-carlo-simulator-v0.1.0-Linux-i386

Building From Source

Currently, the simulator will only build on a machine with a CPU with Intel's SSE2 instructions and a C compiler which supports these features.

It's known to work on:

  • Amazon's EC2 instances with gcc 4.1.2
  • Travis CI Bluebox workers with gcc 4.6 and clang 3.3

The default make target will build a binary named simulate in the project's root directory.

CC=gcc make

CLI Usage

The simulate binary takes a csv on stdin and outputs a resulting csv on stdout.

cat /path/to/samples.csv | (./simulate 10000 0.5 0.5) > /path/to/results.csv

Arguments

It accepts the number of simulations to run as the first positional argument. The following arguments describe the weighting for each of your groups.

You can pass weights as percentages or whole numbers. The following three variants are all equivalent:

./simulate 10000 2 3 5
./simulate 10000 0.2 0.3 0.5
./simulate 10000 400 600 1000

Input CSV

The input csv is assumed to have exactly four columns with no header row.

  • id: Unique id
  • y0: The number of samples
  • y1: The sum of the samples
  • y2: The sum of squares of the samples
john,2,0.0,0.0
suzy,1,9.0,81.0
bob,2,17,145.0

Output

The result csv will contain 5 columns with no header row.

  • simulation: Simulation index
  • group_id: Group id
  • y0: The number of samples in the group
  • y1: The sum of the samples in the group
  • y2: The sum of squares in the group
0,0,5.0,26.0,226.0
0,1,0.0,0.0,0.0
1,0,3.0,9.0,81.0
1,1,2.0,17.0,145.0
...

Contributors

  • Jeffrey Persch
  • Chris Conley
  • Gil Raphaelli
  • Austin Rochford

Running Tests

CC=gcc make test

License

This project is released under the MIT License.

Bitdeli Badge

monte-carlo-simulator's People

Contributors

chrisconley avatar jjpersch avatar graphaelli avatar kway avatar

Stargazers

 avatar Kostya avatar  avatar  avatar  avatar wood avatar Wouter Daan - Product Whisperer avatar  avatar Geoffrey Martin avatar  avatar Anuradha Uduwage avatar Xiyang Chen avatar gsg sgwtyw223 avatar AJ avatar Xmacs avatar Baptiste Fontaine avatar Bulat Bochkariov avatar John Hurliman avatar Cameron McAvoy avatar Mark Hamilton avatar Kristian Freeman avatar Justin Murphy avatar Mark avatar asw456 avatar Vishal Belsare avatar Putra Manggala avatar  avatar Gregory Ostermayr avatar Márk Bartos avatar Richard Walsh avatar Alex Hofsteede avatar hamlet avatar Jason Forbes avatar Krister Kari avatar Scryptonite avatar  avatar Stephen Wray avatar kevin cawley avatar martin williams avatar Filipe Oliveira avatar Omeed avatar Joseph Misiti avatar Ryan Coyner avatar Conail Stewart avatar Wayne Krug avatar Stanislas Marion avatar Ed Costello avatar  avatar Robert Vesco avatar Cristian A Monterroza avatar Kyle Gorman avatar Peter C avatar Charles Strahan avatar Almog Melamed avatar Julian Goldsmith avatar  avatar Ashley Connor avatar Omar Miranda avatar Ivan Smirnov avatar BAIP avatar David Fisher avatar Michael Zaccari avatar

Watchers

 avatar Rob McGinley avatar Brian K. Jones avatar Andrew Gormley avatar Éric St-Jean avatar Karl Shouler avatar P. Taylor Goetz avatar Jason Forbes avatar  avatar Mike Brew avatar James Cloos avatar Jeremy Clewell avatar Brian O'Neill avatar Zach Coyle avatar Shaun Gallagher avatar Kelly Anne Pipe avatar Krister Kari avatar Jeff Palladino avatar Jeff Patti avatar Matthew Plourde avatar David Looby avatar Luke Walker avatar Michael Hand avatar  avatar Isaac Rieksts avatar Mike Strause avatar Darren McCleary avatar Laura Stokar avatar Derek Spicer avatar Jeremy Stanton avatar Manuel Hakimian avatar Yaffa Landis avatar  avatar Ashley Sheppard avatar  avatar John Peeler avatar Dave Berton avatar Peter Caisse avatar  avatar Austin Rochford avatar A. Maiale avatar James Minshall avatar  avatar Theresa Monaco avatar  avatar Maggie Kovalski avatar Paul Bily avatar Steve Szyszkiewicz avatar Brett Statman avatar Ted avatar  avatar Xu avatar  avatar Kevin Minkus avatar Dana G avatar  avatar Elizabeth Miller avatar Jack avatar  avatar Kelley Loder avatar  avatar Cait Wallace avatar  avatar  avatar  avatar  avatar  avatar Molly Yochum avatar  avatar  avatar Naomi Prescod-Green avatar Tim Cheeseman avatar Tomaz Bester avatar Jay Tian avatar  avatar Asha John avatar  avatar Rohan Joshi avatar  avatar  avatar Michael avatar Adam Gonen avatar  avatar Mike Harris avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.