Giter VIP home page Giter VIP logo

equi-vocal's Introduction

EQUI-VOCAL

A prototype implementation of EQUI-VOCAL, which is a system to automatically synthesize compositional queries over videos from limited user interactions. See the technical report for more details.

Setup Instructions

The project uses conda to manage dependencies. To install conda, follow the instructions here.

# Clone the repository
git clone https://github.com/uwdb/EQUI-VOCAL.git
cd EQUI-VOCAL

# Create a conda environment (called equi-vocal) and install dependencies
conda env create -f environment.yml --name equi-vocal
conda env export –no-builds > environment.yml
conda activate equi-vocal
python -m pip install -e .

The project uses Git Large File Storage to track large files.

# Pull large files
git lfs install
git lfs pull

in postgres/create_udf.sql, change all file paths to correct path to EQUI-VOCAL/postgres/functors

in src/methods/vocal_postgres.py :

Line 87: change user value to user name

Line 128: Replace file path with correct file path to functors folder

Example Usage

Set up your PostgreSQL server

Run the following commands to create a PostgreSQL server instance and then load data into the database. This will create a databse cluster mylocal_db and the database data will be stored in <project_root_dir>/mylocal_db.

cd <project_root_dir>
# Create a PostgreSQL server instance
initdb -D mylocal_db --no-locale --encoding=UTF8
# Start the server
pg_ctl -D mylocal_db start
# Create a database
createdb --owner=<user_name> myinner_db
# Configure
psql -f postgres/alter_config-cpu_1-mem_100.sql  myinner_db
# Restart the server
pg_ctl -D mylocal_db restart
# Create relations
psql -f postgres/create_table.sql myinner_db
# Load data
psql -f postgres/load_data.sql myinner_db
# Load user-defined functions
psql -f postgres/create_udf.sql myinner_db
# Recompile and link C functions:
cc -I /usr/local/Cellar/postgresql@14/14.7/include/postgresql@14/server -c functors.c
cc -bundle -flat_namespace -undefined suppress -o functors.so functors.o

Set up PL/Python in Postgres: conda install doesn't work for me, so I had to apt-get install it and then copy the files over to the correct directory

cp /usr/share/postgresql/12/extension/plpython3u.control /home/enhao/miniconda3/envs/equi-vocal/share/extension/
cp /usr/share/postgresql/12/extension/plpython3u--1.0.sql /home/enhao/miniconda3/envs/equi-vocal/share/extension/
cp /usr/share/postgresql/12/extension/plpgsql--unpackaged--1.0.sql /home/enhao/miniconda3/envs/equi-vocal/share/extension/
cp /usr/lib/postgresql/12/lib/plpython3.so /home/enhao/miniconda3/envs/equi-vocal/lib/

Run query synthesis

To synthesis query, run this command under the <project_root_dir>/src directory:

python synthesize.py [-h] [--method {vocal_postgres,vocal_postgres_no_active_learning,quivr_original,quivr_original_no_kleene}]
                     [--n_init_pos N_INIT_POS] [--n_init_neg N_INIT_NEG]
                     [--dataset_name {synthetic_scene_graph_easy,synthetic_scene_graph_medium,synthetic_scene_graph_hard,without_duration-sampling_rate_4,trajectories_duration,trajectories_handwritten,without_duration-sampling_rate_4-fn_error_rate_0.1-fp_error_rate_0.01,without_duration-sampling_rate_4-fn_error_rate_0.3-fp_error_rate_0.03}]
                     [--npred NPRED] [--n_nontrivial N_NONTRIVIAL] [--n_trivial N_TRIVIAL] [--depth DEPTH]
                     [--max_duration MAX_DURATION] [--beam_width BEAM_WIDTH] [--pool_size POOL_SIZE] [--k K] [--budget BUDGET]
                     [--multithread MULTITHREAD] [--strategy STRATEGY] [--max_vars MAX_VARS] [--query_str QUERY_STR]
                     [--run_id RUN_ID] [--output_to_file] [--port PORT] [--lru_capacity LRU_CAPACITY] [--reg_lambda REG_LAMBDA]
                     [--input_dir INPUT_DIR] [--output_dir OUTPUT_DIR]

options:
  -h, --help            show this help message and exit
  --method {vocal_postgres,vocal_postgres_no_active_learning,quivr_original,quivr_original_no_kleene}
                        Query synthesis method.
  --n_init_pos N_INIT_POS
                        Number of initial positive examples provided by the user.
  --n_init_neg N_INIT_NEG
                        Number of initial negative examples provided by the user.
  --dataset_name {synthetic_scene_graph_easy,synthetic_scene_graph_medium,synthetic_scene_graph_hard,without_duration-sampling_rate_4,trajectories_duration,trajectories_handwritten,without_duration-sampling_rate_4-fn_error_rate_0.1-fp_error_rate_0.01,without_duration-sampling_rate_4-fn_error_rate_0.3-fp_error_rate_0.03}
                        Name of the dataset.
  --npred NPRED         Maximum number of predicates that the synthesized queries can have.
  --n_nontrivial N_NONTRIVIAL
                        Maximum number of non-trivial predicates that the synthesized queries can have. Used by Quivr.
  --n_trivial N_TRIVIAL
                        Maximum number of trivial predicates (i.e., <True>* predicate) that the synthesized queries can have.
                        Used by Quivr.
  --depth DEPTH         For EQUI-VOCAL: Maximum number of region graphs that the synthesized queries can have. For Quivr:
                        Maximum depth of the nested constructs that the synthesized queries can have.
  --max_duration MAX_DURATION
                        Maximum number of the duration constraint.
  --beam_width BEAM_WIDTH
                        Beam width.
  --pool_size POOL_SIZE
                        Number of queries sampled during example selection.
  --k K                 Number of queries in the final answer.
  --budget BUDGET       Labeling budget.
  --multithread MULTITHREAD
                        Number of CPUs to use.
  --strategy STRATEGY   Strategy for query sampling.
  --max_vars MAX_VARS   Maximum number of variables that the synthesized queries can have.
  --query_str QUERY_STR
                        Target query written in the compact notation.
  --run_id RUN_ID       Run ID. This sets the random seed.
  --output_to_file      Whether write the output to file or print the output on the terminal console.
  --port PORT           Port on which Postgres is to listen.
  --lru_capacity LRU_CAPACITY
                        LRU cache capacity. Only used for Quivr due to its large memory footprint.
  --reg_lambda REG_LAMBDA
                        Regularization parameter.
  --input_dir INPUT_DIR
                        Input directory.
  --output_dir OUTPUT_DIR
                        Output directory.

The following scripts provide example configurations for the trajectories dataset and the scene graphs dataset used in the paper:

cd scripts
# Trajectories dataset
./run_vocal_trajectory.sh
# Scene graphs dataset
./run_vocal_scene_graph.sh

Evaluate query performance

To evaluate the performance of synthesized queries, run this command under the <project_root_dir>/experiments/analysis directory:

python evaluate_vocal.py [-h]
                         [--dataset_name {synthetic_scene_graph_easy,synthetic_scene_graph_medium,synthetic_scene_graph_hard,without_duration-sampling_rate_4,trajectories_duration,trajectories_handwritten}]
                         [--query_str QUERY_STR] [--method {vocal_postgres_no_active_learning-topk,vocal_postgres-topk}]
                         [--port PORT] [--multithread MULTITHREAD] [--budget BUDGET]
                         [--task_name {trajectory,budget,bw,k,num_init,cpu,reg_lambda}] [--value VALUE] [--run_id RUN_ID]
                         [--input_dir INPUT_DIR] [--output_dir OUTPUT_DIR]

options:
  -h, --help            show this help message and exit
  --dataset_name {synthetic_scene_graph_easy,synthetic_scene_graph_medium,synthetic_scene_graph_hard,without_duration-sampling_rate_4,trajectories_duration,trajectories_handwritten}
                        Dataset to evaluate.
  --query_str QUERY_STR
                        Target query to evalaute, written in the compact notation.
  --method {vocal_postgres_no_active_learning-topk,vocal_postgres-topk}
                        Query synthesis method.
  --port PORT           Port on which Postgres is to listen.
  --multithread MULTITHREAD
                        Number of CPUs to use.
  --budget BUDGET       Labeling budget.
  --task_name {trajectory,budget,bw,k,num_init,cpu,reg_lambda}
                        Task name, e.g., the name of the tested hyperparameter.
  --value VALUE         Value of the tested hyperparameter. If specified, evaluate on the single value; otherwise, evaluate on
                        all values tested in our experiment.
  --run_id RUN_ID       Run ID.
  --input_dir INPUT_DIR
                        Input directory.
  --output_dir OUTPUT_DIR
                        Output directory.

The following script provides an example configuration used in the paper:

cd <project_root_dir>/scripts
./eval_vocal.sh

equi-vocal's People

Contributors

zhang-eh avatar manasi-ganti avatar

Stargazers

 avatar  avatar Jeff Carpenter avatar Dong He avatar

Watchers

 avatar  avatar  avatar

Forkers

dongheuw

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.