Giter VIP home page Giter VIP logo

awesome-olap-paper's Introduction

Awesome-OLAP-Paper 666

A curated paper list of awesome Online Analytical Processing databases, frameworks, ressources, tools and other awesomeness, for data engineers.

Welcome new PR, please conform to the committed rules: paperName(with link) [MeetingName Year]

If the paper has the open-source code, please supply its github links in Meeting.

Query-Aware Database Generation

  1. QAGen: Generating Query-Aware Test Databases [SIGMOD 07]
  2. Generating Targeted Queries for Database Testing [SIGMOD 08]
  3. Generating Databases for Query Workloads [VLDB 10]
  4. Data Generation using Declarative Constraints [SIGMOD 11]
  5. MyBenchmark: generating databases for query workloads [VLDB 14]
  6. Scalable and Dynamic Regeneration of Big Data Volumes [EDBT 18]
  7. Touchstone: Generating Enormous Query-Aware Test Databases [OSDI 18]
  8. Synthesizing Linked Data Under Cardinality and Integrity Constraints [SIGMOD 21]
  9. Projection-Compliant Database Generation [VLDB 22]
  10. SAM: Database Generation from Query Workloads with Supervised Autoregressive Models [SIGMOD 22]
  11. PrivLava: Synthesizing Relational Data with Foreign Keys under Differential Privacy [SIGMOD 23]
  12. Mirage: Generating Enormous Databases for Complex Workloads [ICDE 24]

Survey

  1. Synthetic Data Generation for Enterprise DBMS [ICDE 23]

Query Schedule

  1. Memory Efficient Scheduling of Query Pipeline Execution [CIDR 22]

Query Optimization

  1. Sampling-Based Query Re-Optimization [SIGMOD 16]
  2. Kepler: Robust Learning for Parametric Query Optimization [SIGMOD 23]
  3. Rethink Query Optimization in HTAP Databases [SIGMOD 24]
  4. Optimizing Nested Recursive Queries [SIGMOD 24]

Query Rewrite

  1. QueryBooster: Improving SQL Performance Using Middleware Services for Human-Centered Query Rewriting [VLDB 23]
  2. SlabCity: Whole-Query Optimization using Program Synthesis [VLDB 23]
  3. GEqO: ML-Accelerated Semantic Equivalence Detection [SIGMOD 24]
  4. Proving Query Equivalence Using Linear Integer Arithmetic [SIGMOD 24]

Cardinality Estimation

Histogram

  1. Equi-Depth Histograms For Estimating Selectivity Factors For Multi-Dimensional Queries [None 87]
  2. Optimal Histograms for Limiting Worst-Case Error Propagation in the Size of Join Results [ACM Transactions on Database Systems 93]
  3. Independence is good: Dependency-based histogram synopses for high-dimensional data [SIGMOD 01]
  4. STHoles: a multidimensional workload-aware histogram [SIGMOD 01]
  5. A multi-dimensional histogram for selectivity estimation and fast approximate query answering [CASCON 03]
  6. The history of histograms (abridged) [VLDB 03]
  7. ISOMER: Consistent histogram construction using query feedback [ICDE 06]
  8. Join Over Histograms [Alberto Dell'Era 07]
  9. Improving accuracy and robustness of self-tuning histograms by subspace clustering [ICDE 16]
  10. LHist: Towards Learning Multidimensional Histogram for Massive Spatial Data [ICDE 21]

Sampling

  1. Two-Level Sampling for Join Size Estimation [SIGMOD 17]
  2. Combining Aggregation and Sampling (Nearly) Optimally for Approximate Query Processing [SIGMOD 21]

Others

  1. Access path selection in a relational database management system [SIGMOD 79]
  2. Approximating multi-dimensional aggregate range queries over real attributes [SIGMOD 00]
  3. Selectivity estimators for multidimensional range queries over real attributes [VLDB 05]
  4. Plan Bouquets: Query Processing without Selectivity Estimation [SIGMOD 14]
  5. Exact Cardinality Query Optimization with Bounded Execution Cost [SIGMOD 19]
  6. JoinSketch: A Sketch Algorithm for Accurate and Unbiased Inner-Product Estimation [SIGMOD 23]
  7. Efficient and Effective Cardinality Estimation for Skyline Family [SIGMOD 23]

Survey

  1. Preventing bad plans by bounding the impact of cardinality estimation errors [VLDB 09]
  2. Analyzing the Impact of Cardinality Estimation on Execution Plans in Microsof SQL Server [VLDB 23]
  3. Sub-optimal Join Order Identification with L1-error [SIGMOD 24]

Join Order

  1. Join Order Selection with Deep Reinforcement Learning: Fundamentals, Techniques, and Challenges [VLDB 23]
  2. Efficiently Computing Join Orders with Heuristic Search [SIGMOD 23]
  3. Ready to Leap (by Co-Design)? Join Order Optimisation on Quantum Hardware [SIGMOD 23]
  4. Quantum-Inspired Digital Annealing for Join Ordering [VLDB 24]
  5. POLAR: Adaptive and Non-invasive Join Order Selection via Plans of Least Resistance [VLDB 24]

Join Algorithms

  1. Massively Parallel Sort-Merge Joins in Main Memory Multi-Core Database Systems [VLDB 12]
  2. Leapfrog Triejoin: a worst-case optimal join algorithm [International Conference on Database Theory 12]
  3. An Experimental Comparison of Thirteen Relational Equi-Joins in Main Memory [SIGMOD 16]
  4. Worst-Case Optimal Join Algorithms: Techniques, Results, and Open Problems [SIGMOD 18]
  5. Adopting Worst-Case Optimal Joins in Relational Database Systems [VLDB 20]
  6. Free Join: Unifying Worst-Cast Optimal and Traditional Joins [arXiv 23]
  7. Reservoir Sampling over Joins [SIGMOD 24]

Cost Model

  1. LEO – DB2’s LEarning Optimizer [VLDB 11]
  2. Predicting query execution time: are optimizer cost models really unusable? [ICDE 13]
  3. Towards Predicting Query Execution Time for Concurrent and Dynamic Database Workloads [VLDB 13]
  4. Forecasting the cost of processing multi-join queries via hashing for main-memory databases [SoCC 15]
  5. Query Performance Prediction for Concurrent Queries using Graph Embedding [VLDB 20]
  6. Efficient Deep Learning Pipelines for Accurate Cost Estimations Over Large Scale Query Workload [arXiv 21]
  7. Rethinking Learned Cost Models: Why Start from Scratch? [SIGMOD 24]
  8. Cackle: Analytical Workload Cost and Performance Stability With Elastic Pools [SIGMOD 24]

View

  1. Foreign Keys Open the Door for Faster Incremental View Maintenance [SIGMOD 23]

Survey

  1. How Good Are Query Optimizers, Really? [VLDB 15]
  2. Cardinality Estimation: An Experimental Survey [VLDB 17]
  3. A Survey on Advancing the DBMS Query Optimizer: Cardinality Estimation, Cost Model, and Plan Enumeration [VLDB 21]
  4. Have query optimizers hit the wall? [VLDB Journal 22]
  5. Cardinality Estimation in DBMS: A Comprehensive Benchmark Evaluation [VLDB 22]
  6. Data dependencies for query optimization: a survey [VLDB Journal 22]
  7. Simple Adaptive Query Processing vs. Learned Query Optimizers: Observations and Analysis [VLDB 23]

Index

  1. SQL Server Column Store Indexes [SIGMOD 11]
  2. Column Sketches: A Scan Accelerator for Rapid and Robust Predicate Evaluation [SIGMOD 18]

Query Execution

  1. MonetDB/X100: Hyper-Pipelining Query Execution [CIDR 05]
  2. Materialization Strategies in the Vertica Analytic Database: Lessons Learned [ICDE 13]
  3. Rethinking SIMD Vectorization for In-Memory Databases [SIGMOD 15]
  4. Access Path Selection in Main-Memory Optimized Data Systems: Should I Scan or Should I Probe? [SIGMOD 17]
  5. Building Advanced SQL Analytics From Low-Level Plan Operators [SIGMOD 21]
  6. ChainedFilter: Combining Membership Filters by Chain Rule [SIGMOD 24]

Data Dependency Search

  1. Discovering Functional Dependencies through Hitting Set Enumeration [SIGMOD 24]

Query Compilation

  1. How to Architect a Query Compiler [SIGMOD 16]
  2. Adaptive Execution of Compiled Queries [ICDE 18]

Logic Bugs Detection

  1. Detecting Logic Bugs of Join Optimizations in DBMS [SIGMOD 23 Best Paper]

Storage

  1. What Modern NVMe Storage Can Do, And How To Exploit It: High-Performance I/O for High-Performance Storage Engines [VLDB 23]
  2. An Empirical Evaluation of Columnar Storage Formats [VLDB 24]

LSM-Tree

  1. Dissecting, Designing, and Optimizing LSM-based Data Stores [SIGMOD 22 Tutorial]

Proxy

  1. Tigger: A Database Proxy That Bounces With User-Bypass [VLDB 23]

Data Loading

  1. ConnectorX: Accelerating Data Loading From Databases to Dataframes [VLDB 22]

Database Kernel

  1. Lakehouse: A New Generation of Open Platforms that Unify Data Warehousing and Advanced Analytics [CIDR 21]
  2. Disaggregated Database Systems [VLDB 23 Tutorial]
  3. GPU Database Systems Characterization and Optimization [VLDB 24]
  4. The Art of Latency Hiding in Modern Database Engines [VLDB 24]
  5. DoppelGanger++: Towards Fast Dependency Graph Generation for Database Replay [SIGMOD 24]

Others

MVCC

  1. Scalable Garbage Collection for In-Memory MVCC Systems [VLDB 13]
  2. Rethinking serializable multiversion concurrency control [VLDB 15]
  3. An Empirical Evaluation of In-Memory Multi-Version Concurrency Control [VLDB 17]
  4. Accelerating Analytical Processing in MVCC using Fine-Granular High-Frequency Virtual Snapshotting [SIGMOD 18]
  5. Long-lived Transactions Made Less Harmful [SIGMOD 20]
  6. Rethink the Scan in MVCC Databases [SIGMOD 21]
  7. Diva: Making MVCC Systems HTAP-Friendly [SIGMOD 22]
  8. Memory-Optimized Multi-Version Concurrency Control for Disk-Based Database Systems [VLDB 22]
  9. Scalable and Robust Snapshot Isolation for High-Performance Storage Engines [VLDB 23]
  10. One-shot Garbage Collection for In-memory OLTP through Temporality-aware Version Storage [SIGMOD 23]

HTAP

System Architecture

Linear Consistency
  1. HyPer: A Hybrid OLTP&OLAP Main Memory Database System Based on Virtual Memory Snapshots [ICDE 12]
  2. TiDB: A raft-based htap database [VLDB 20]
  3. OceanBase Paetica: A Hybrid Shared-Nothing/Shared-Everything Database for Supporting Single Machine and Distributed Cluster [VLDB 23]
Sequential Consistency
  1. BatchDB: Efficient Isolated Execution of Hybrid OLTP+OLAP Workloads for Interactive Applications [SIGMOD 17]
  2. F1 Lightning: HTAP as a Service [VLDB 20]
  3. Retrofitting High Availability Mechanism to Tame Hybrid Transaction/Analytical Processing [ATC 21]
  4. ByteHTAP: ByteDance’s HTAP System with High Data Freshness and Strong Data Consistency [VLDB 22]
Session Consistency
  1. PolarFS: An Ultra-low Latency and Failure Resilient Distributed File System for Shared Storage Cloud Database [VLDB 18]
  2. PolarDB-IMCI: A Cloud-Native HTAP Database System at Alibaba [SIGMOD 23]
Survey
  1. HTAP Databases: What is New and What is Next [SIGMOD 22]
  2. Data Sharing Model and Optimization Strategies in HTAP Database Systems [Journal of Software 23]
  3. HTAP Databases: A Survey [TKDE 24]

Kernel Optimization

  1. Log Replaying for Real-Time HTAP: An Adaptive Epoch-based Two-Stage Framework [ICDE 24]

Result Replay

  1. DoppelGanger++: Towards Fast Dependency Graph Generation for Database Replay [SIGMOD 24]

Benchmark

  1. How Good is My HTAP System? [SIGMOD 22]
  2. OLxPBench: Real-time, Semantically Consistent, and Domain-specific are Essential in Benchmarking, Designing, and Implementing HTAP Systems [ICDE 22]
  3. Dike: A Benchmark Suite for Distributed Transactional Databases [SIGMOD 23]
  4. M2Bench: A Database Benchmark for Multi-Model Analytic Workloads [VLDB 23]
  5. Cloud Analytics Benchmark [VLDB 23]
  6. Pollock: A Data Loading Benchmark [VLDB 23]
  7. VeriBench: Analyzing the Performance of Database Systems with Verifiability [VLDB 23]
  8. TSM-Bench: Benchmarking Time Series Database Systems for Monitoring Applications [VLDB 23]
  9. CDSBen: Benchmarking the Performance of Storage Services in Cloud-native Database System at ByteDance [VLDB 23]
  10. FEBench: A Benchmark for Real-Time Relational Data Feature Extraction [VLDB 23]
  11. TPCx-AI - An Industry Standard Benchmark for Artificial Intelligence and Machine Learning Systems [VLDB 23]
  12. ScienceBenchmark: A Complex Real-World Benchmark for Evaluating Natural Language to SQL Systems [VLDB 23]
  13. DBPA: A Benchmark for Transactional Database Performance Anomalies [SIGMOD 23]
  14. HyBench: A New Benchmark for HTAP Databases [VLDB 24]

Time Series

  1. An Experimental Evaluation of Anomaly Detection in Time Series [VLDB 24]

Vector Data

  1. Are There Fundamental Limitations in Supporting Vector Data Management in Relational Databases? A Case Study of PostgreSQL [ICDE 24]

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.