This is a 2015 Data Science for Social Good project focused on using data science methods to help several partner public school districts improve their respective high school graduation rates and outcomes.
/experiments
: Experimental results generated via our data science pipeline./hspipeline
: Data science pipeline that generates predictive model output from raw data input./presentations
: Presentation material for deep dives, meetups, partner updates, etc./resources
: Literature and material related to partner districts and project topic.
- Kerstin Frailey
- Robin Gong
- Siobhan Greatorex-Voith
- Reid Johnson
Starting from a standard AWS install (Ubuntu):
- Install the Anaconda Scientific Python Distribution. We did most of our analyses using Python using Anaconda, a free enterprise-ready Python distribution for large-scale data processing, predictive analytics, and scientific computing.
- Clone this repository using Git. Git is used version control system we used to orginize our code. We hosted the code on Github. You can download Git here. Information on getting started with Git is here. Additionally, you will need to create a Github account. Once you have installed Git, you will need to navigate in command line to the folder in which you want to download the code. Then you will need to clone the respository.
- Navigate in the command line to the base directory of the respository and run
python setup.py install
to install the modeling pipeline. - Create a file
config.yaml
in the/experiments
directory that conforms to the providedconfig.yaml.example
file. - Follow the
Example.ipynb
IPython Notebook in the/experiments
directory.
Code is copyright 2015 Data Science for Social Good Summer Fellowship and released under the MIT license.