This is the public release of the Bingo interactive alarm prioritization system described in the PLDI 2018 paper User-Guided Program Reasoning Using Bayesian Inference. This document describes the system workflow, its constituent scripts, and instructions to use them.
NOTE 1: The Bingo system is agnostic of the underlying static analysis. The PLDI 2018 paper nominally assumes that the analysis is expressed in Datalog. However, this code distribution can work more generally with any analysis that can be conceptualized as consisting of derivation rules which are repeatedly instantiated until fixpoint. In particular, we require that the analysis produces a set of output conclusions (a subset of which are reported to the user as warnings) and a derivation graph which connects them together. In the terminology that follows, the alarms are reported in a file named base_queries.txt
, the analysis rules are listed in a file called rule_dict.txt
, and the derivation graph is contained in a file called named_cons_all.txt
.
NOTE 2: The PLDI 2018 paper describes the operation of Bingo with two instance static analyses: a datarace analysis for Java programs and a taint analysis for Android apps. Running these specific analyses is a somewhat involved process. Furthermore, this intial analysis run is unimportant to anyone wishing to port Bingo to a new analysis of their choice. In this code distribution, we therefore only include the portion of the workflow after the analysis run has been completed, the alarms have been reported, and the derivation graph has been extracted.
Bingo is mostly written in Python. A few small performance-critical pieces of code have been written in C++. The core of Bingo crucially depends on the LibDAI inference library which the build script will clone itself. Ensure that you have the Boost C++ libraries and the gmpxx wrapper to the GMP library installed. On Ubuntu, the appropriate dependencies can be installed by running:
sudo apt install libboost-dev libboost-program-options-dev libboost-test-dev libgmp-dev
Now, build Bingo by running:
./scripts/build.sh
from the main Bingo directory.
Bingo is architected as a sequence of small scripts intended to be run one after the other. We assume that all data
related to the program being analyzed are stored in a directory named PROBLEM_DIR
. These scripts comprising Bingo
communicate by creating several files in PROBLEM_DIR
whose purpose we will now describe. We have provided an example
PROBLEM_DIR
directory for the lusearch
benchmark, available by navigating to the path bingo/examples/lusearch/
:
these are the warnings generated by a static datarace detector when applied to a program from the Dacapo benchmark
suite.
-
Analysis Run: After the analysis run is complete, we assume that the
PROBLEM_DIR
directory contains the following files:-
base_queries.txt
. This file contains the list of alarms reported by the analysis. Note that each alarm is listed in the form of a tuple, rather than as the warning message string presented to the user. Each tuple is of the formRelName(v1,v2,v3,...,vk)
, for some relation nameRelName
, and wherev1
,v2
, ...,vk
are the fields of the tuple. However, note that this format is merely conventional: the only concrete requirements that Bingo makes are that each tuple is represented by a globally unique string, and that there are no spaces in this string.See the
base_queries.txt
file included with thelusearch
example. This file contains a list of tuples of the formracePairs_cs(x,y)
. Each of these tuples indicates a possible datarace between source locationx
and source locationy
. For example, the alarm tupleracePairs_cs(15562,15645)
may be decoded into the following human-readable warning message: "There may be a datarace between the field accesses at line 40 of the fileorg/apache/lucene/analysis/LowerCaseFilter.java
and at line 685 of the fileorg/apache/lucene/analysis/Token.java
." The fileracePairs_cs.txt
contains the corresponding human-readable warning messages. In general, how the alarm tuples map to the actual warning message is an analysis-specific design decision, and irrelevant to Bingo's ranking algorithm.Of the 237 alarms in the
lusearch
example, only the four alarms listed inoracle_queries.txt
are real bugs. Of course, when analyzing a new program, the user is unaware of which alarms are real bugs and which are false positives: we can only provideoracle_queries.txt
for the example problem because we have laboriously triaged each of the alarms. -
rule_dict.txt
. Each derivation rule of the analysis is assigned a name, conventionally of the formRn
, for some numbern
. Therule_dict.txt
file contains a mapping between the rule name and the underlying rule. Each line of this file contains an element of this mapping in the formRn: rule description
. This file is for human consumption only, and is not strictly required, and the format is not strictly regulated. -
named_cons_all.txt
. This file contains the derivation graph. Each line of the formRn: NOT h1, NOT h2, ..., NOT hk, t
. HereRn
is the name of the rule, which can be instantiated to produce the concrete constraint (noth1
or noth2
or ... or nothk
ort
). This is equivalent to writing (h1
andh2
and ... andhk
) =>t
. In other words, the ruleRn
can be instantiated to produce the output conclusiont
from the input hypothesesh1
,h2
, ...,hk
. We assume that tuples which are not the conclusion of any rule instantiation are facts about the program which have been supplied as input to the analysis (also called the EDB). -
rule-prob.txt
. This file contains the mapping from rule names to their probability values. Each rule is given a a line of the formRn: pn
, whereRn
is the name of the rule andpn
is its probability value. The nextbuild-bnet.sh
step will, by default, assign the probability value 0.999 to rules which do not appear inrule-prob.txt
. As a result, an emptyrule-prob.txt
is completely acceptable:cat /dev/null > $PROBLEM_DIR/rule-prob.txt
. In fact, our experiments in the PLDI 2018 paper were conducted with an emptyrule-prob.txt
file.
-
-
Building the Bayesian Network (
build-bnet.sh
): Thebuild-bnet.sh
script converts the constraints innamed_cons_all.txt
into a Bayesian network. We provide two versions of this script. Either run:./scripts/bnet/compressed/build-bnet.sh $PROBLEM_DIR noaugment_base $PROBLEM_DIR/rule-prob.txt
or run
./scripts/bnet/build-bnet.sh $PROBLEM_DIR noaugment_base $PROBLEM_DIR/rule-prob.txt
The first script applies the chain compression procedure described in Algorithm 3 of our PLDI 2018 paper, but the two two commands are otherwise interchangeable. We recommend the use of the compressed version unless it is specifically inapplicable, for example when learning rule probabilities with the Expectation Maximization algorithm.
The meaning of the arguments are as follows:
PROBLEM_DIR
. The location of the problem-specific files.noaugment_base
. Augmentation was one of the options we experimented with while developing Bingo. Our hope was that it would significantly improve accuracy at a small cost in inference time. In practice, we found no perceptible increase in accuracy but dramatically higher costs for constraint pruning and for inference. We therefore recommend that this option be turned off.$PROBLEM_DIR/rule-prob.txt
. Location of the file containing the rule probabilities.
The script will place the Bayesian network in the directory
$PROBLEM_DIR/bnet/noaugment_base/
. The most important files in this directory are:named-cons-all.txt.pruned
. This contains the output of the cycle elimination procedure described in Algorithm 2 of the paper.named-bnet.out
. Contains the constraint graph as a Bayesian network with deterministic or and probabilistic and nodes. The essential difference between this file andnamed-cons-all.txt.pruned
is that the alternate derivations of tuples are made explicit in the form of disjunctions.bnet-dict.out
. The nodes of the Bayesian network innamed-bnet.out
are given numerical names to simplify processing. Thebnet-dict.out
file contains a mapping between these numbers and the tuples ofnamed-cons-all.txt
.factor-graph.fg
. The underlying marginal inference library, LibDAI, works with factor graphs rather than Bayesian networks. This file contains the same Bayesian network asnamed-bnet.out
in the form of a factor graph.
In addition, the version of the script with compression produces the intermediate files
named-cons-all.txt.ep
,named-cons-all.txt.cep
andnew-rule-prob.txt
. These files contain versions ofnamed-cons-all.txt.pruned
with various semantics preserving optimizations applied. -
Running the Bingo Interaction Loop (
driver.py
): We are now ready to run the main Bingo interaction loop. Run the command:./scripts/bnet/driver.py $PROBLEM_DIR/bnet/noaugment_base/bnet-dict.out \ $PROBLEM_DIR/bnet/noaugment_base/factor-graph.fg \ $PROBLEM_DIR/base_queries.txt \ $PROBLEM_DIR/oracle_queries.txt
The arguments are as follows:
$PROBLEM_DIR/bnet/noaugment_base/bnet-dict.out
and$PROBLEM_DIR/bnet/noaugment_base/factor-graph.fg
. This is the Bayesian network over which marginal inference is to be performed.$PROBLEM_DIR/base_queries.txt
. This is the list of alarms which have been ultimately produced by the analysis tool. This is therefore the set of alarms which have to be ranked by marginal probabilities, rather than all tuples present in the Bayesian network.$PROBLEM_DIR/oracle_queries.txt
. This contains the subset of alarms which are actually true bugs. Note that this argument was solely used for our experiments while developing the system: in practical deployments, the user is unaware of which alarms are real bugs until after triaging them. In these cases, it is completely acceptable to use an empty file,cat /dev/null > $PROBLEM_DIR/oracle_queries.txt
, and it will leave the interaction unchanged.
This script is an interactive driver that provides an interface between the user and the the LibDAI inference library. It repeatedly reads user commands from stdin and prints output to stdout and stderr. The most user-friendly command is called
MAC
(short for manual alarm carousel):MAC 1e-6 500 1000 100 stats.txt combined out
The first four arguments set various properties of the belief propagation loop, respectively the tolerance at which to declare convergence, the minimum and maximum number of iterations of belief propagation, and the width of the low-pass filter in case of non-convergence. The arguments we have specified in the example invocation worked well for the benchmarks in the paper. The next argument,
stats.txt
, is the name of the file in which to print aggregate statistics from the interaction loop. The last two arguments,combined
andout
, indicate the format of the filenames in which to print the ranked list of alarms after each iteration.
As specified in the previous section, the easiest way to operate driver.py
is to invoke the MAC
command:
MAC tolerance minIters maxIters histLength stats.txt combined out
with tolerance
= 1e-6, minIters
= 500, maxIters
= 1000 and histLength
= 100. Users desiring greater flexibility
in the interaction loop may find the following commands useful:
BP tolerance minIters maxIters histLength
. Run the loopy belief propagation algorithm given the evidence provided so far.HA
. Print the top ranked alarm and its confidence value.Q t
. Given a tuplet
(for exampleracePairs_cs(15974,16487)
), print its marginal probability.O t ground
. Specify the ground truth (ground
=true
orground
=false
) of the tuplet
.P filename
. Print the ranked list of alarms to a file. If the providedoracle_queries.txt
is empty, then it automatically labels each alarm asFalseGround
in the output file, but this is merely labelling and does not affect anything the actual ranking.AC tolerance minIters maxIters histLength stats.txt combined out
. Run the interaction loop, but assuming that the providedoracle_queries.txt
file is the ground truth. This was the command we used to test the system while running the experiments in the paper.
Note that driver.py
is very sensitive to the order in which these commands are issued. Please make sure that the
sequence of commands is from the following regular language: MAC | (BP (HA | Q | P)* O)* | AC
. All commands except
MAC
and AC
are idempotent, and HA
, Q
and P
do not change the state of the system.