This project implements a distributed dataflow analysis program that can be run on multiple machines and processes. This program allows us to analyze large programs that may not fit on a single machine by distributing the work across multiple machines that communicate with eachother and a primary machine.
Our implementation is written in Scala and is designed to analyze JVM bytecode.
- MacOS
- Scala3
- To install/upgrade using brew
brew install coursier/formulas/coursier && cs setup
Simply navigate to the root folder and run
sbt "run path/to/java/dir"
We have included 3 test java programs in the src/test/testFiles
directory. An example of running the program on a small example is
sbt "run src/test/testFiles/espTest"
To run this analysis on your own Java files, make sure to compile your Java code and just input the path to your directory.
We have a comprehensive suite of unit tests for individual parts of the codebase. These can be found in src/test/scala
and can be run with
sbt test
├── analysis
├── project
├── src
│ ├── main
│ │ ├── protobuf
│ │ ├── scala
│ │ │ ├── analyses
│ │ │ ├── cfg
│ │ │ ├── cli
│ │ │ ├── dataflow
│ │ │ ├── lattice
│ ├── test
│ │ ├── scala
│ │ ├── testFiles
├── target
└── build.sbt
- The
analysis
folder contains the outputs oftcpdump
which keeps track of all communication between machines in our program. These outputs are in binary form in the.pcap
files and converted to ASCII in the.txt
files. Python analysis of these files for message size can be found in this folder. - Both the
project
andtarget
folders are generated bysbt
which is the interactive build tool for Scala and Java projects. These should not be manually modified. - The directory
src
contains the main source code.main/protobuf
contains all the proto definitions of the servers and messages that we use.main/scala
contains the source scala code.analyses
contains the code for performing constant propogation analysis.cfg
contains all code related to the constructing the control flow graph from java bytecode and performing operations on the graph.cli
contains code for handling the CLI arguments.dataflow
contains the code specifying the worker machine implementations, and handles how machines communicate computation units to eachother.lattice
contains helper class definitions and utils.
test/scala
contains unit tests.test/testFiles
contains 3 example Java programs that don't do anything meaningful but demonstrate how our program works across programs of different sizes and complexities.
build.sbt
is the config for our project and specifies the dependencies.
Our design notebook can be found here.