This is the repo for paper submission.
The technical report can be found in samcomb_TR.pdf.
Tpch-skew: the tpch-skew dataset can be generated using the tool in TPCH-H-Skew.zip. The official download link is https://www.microsoft.com/en-us/download/details.aspx?id=52430.
Loan: the loan dataset can be downloaded in https://www.kaggle.com/skihikingkevin/online-p2p-lending.
The generated query can be found in tpch_query.csv and loan_query.csv.
The code is in samcomb folder. The entry point is com.samcomb.Main class. Other main components includes:
-
com.samcomb.config: the place to set experiment settings.
-
com.samcomb.experiment: experiment implementation.
-
com.samcomb.sampler: sampler implementation, where SamCombSampler class is the proposed SamComb approach.
To run the code, first load the dataset into a DBMS:
-
For each dataset, create a schema. For example, we create schema skew_s1_z2 and loan for dataset tpch-skew and load, respectively.
-
Load each dataset into a table named orgtable under its corresponding schema.
Next, set up or change the settings in com.samcomb.config(e.g., jdbc connection string). After that, execute com.samcomb.Main. It will print a message and ask for choosing actions. To choose an action, you only needs to input its corresponding id.