Giter VIP home page Giter VIP logo

babble's Introduction

A Python implementation of Babble Labble, a framework for creating training data via natural language explanations.
Presented at NIPS 2017 (demo) and ACL 2018 (paper).

Getting Started

About Babble Labble

The main idea behind Babble Labble is that when annotators label training sets, there are reasons behind each label. With Babble Labble, we collect those reasons as natural language explanations, which are then converted via semantic parser into labeling functions, executable functions which can be used to automatically label additional data. When many such labeling functions are combined, training sets of sufficient size and quality can be generated to train classifiers with reasonable performance, despite utilizing only a small number of user inputs (e.g., tens of explanations instead of thousands of individual labels).

In the larger picture, we envision systems like Babble Labble serving as higher-level "supervision compilers" for the Software 2.0 systems of the future. Babble Labble is just one of many projects exploring how weak supervision sources can be used to train machine learning systems. Related works include:

  • Snorkel: The flagship system for data programming with user-provided labeling functions
  • Snorkel MeTaL: Extends Snorkel to multi-task learning settings and includes a data programming formulation with better scaling properties
  • Reef: Automatically generates labeling functions from a small labeled dataset
  • Coral: Improves the label aggregation process by inferring generative model structure via static analysis of labeling functions

You can find links to papers, repositories, and blog posts on the Snorkel landing page.

Disclaimer

The code in this repository is very much research code, a proof of concept. There are many ways it could be improved, optimized, made more user-friendly, etc. Unfortunately, we do not have the manpower to provide ongoing support and have no plans to publish further updates. However, the individual components of the framework are readily available in other applications with better ongoing support:

  • semantic parser: The SEMPRE toolkit makes it easy to build semantic parsers for new tasks in flexible ways, and SippyCup (which the Babble Labble parser was built on) has some nice tutorials. If you want to use a trained neural semantic parser, many open source variants exist.
  • filter bank: The simple filters described in the paper can each be expressed with just a few lines of code, and are by no means comprehensive. Refer to the paper for details.
  • label aggregator: The LabelModel class in Snorkel-MeTaL provides the latest implementation of a data programming engine for aggregating noisy weak supervision sources.

There's nothing special about our particular implementation of this pipeline; the power is in the combination of a tools that allows high-level inputs to be converted into weak supervision resources, and a way to use those resources to ultimately train a model. Since the interfaces between the components are all simply labels---a label matrix between the semantic parser/filter bank and label aggregator, and a set of training labels from the label aggregator to the discriminative model---the framework is fairly modular.

References

@article{hancock2018babble,
  title={Training Classifiers with Natural Language Explanations},
  author={Hancock, Braden and Varma, Paroma and Wang, Stephanie and Bringmann, Martin and Liang, Percy and R{\'e}, Christopher},
  booktitle = {Association for Computational Linguistics (ACL)},
  year={2018},
}

Hancock, B., Varma, P., Wang, S., Bringmann, M., Liang, P. and Ré, C. Training Classifiers with Natural Language Explanations. ACL 2018.

Setup

There are two ways to set up Babble Labble:

  • Option A: Docker
  • Option B: Local

The first step for both options is the same:
[0] Read the Disclaimer

Steps 4 & 5 are identical as well.

Option A: Docker

[1] Install Docker (instructions)

[2] Pull docker image:

docker pull bhancock8/babble

[3] Run docker container

docker run --rm -i -p 8080:8080 -t bhancock8/babble /bin/bash

Skip to Step 4.

Option B: Local

[1] Install Anaconda 3.6 (instructions)

[2] Clone the repository:

git clone https://github.com/HazyResearch/babble.git
cd babble

[3] Set up environment:

conda env create -f environment.yml
source activate babble
source add_to_path.sh

Continue to Step 4.

Options A & B

[4] Run unit tests:

nosetests

If the tests run successfully, you will see an "OK" printed at the end.
If you chose Option B, the first time you run this may take extra time to install a language model for spaCy.

[5] Run the tutorial:

If you'd like to try out the tutorials, continue on to the Tutorial README.

babble's People

Contributors

bhancock8 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.