Comments (1)
The UPDATE data contains some unnecessary data, we should drop them. See details in Specification(v3.2.0) pdf at Page 7 of 141
remove tables that are no longer part of the data maintenance refresh in TPC-DS v2.0 A-1
(s_zip_to_cmt) A-3 (s_customer) A-7 (s_item) A-10 (s_store) A-11 (s_call_center) A-12
(s_web_site) A-13 (s_warehouse) A-14 (s_web_page) A-15 (s_promotion) A-20 (s_catalog_page)
(FogBugz 2178)
The Specification doc doesn't explicitly provide the DELETE query strings, I come up with the following SQLs.
See Section 5.3.11
-- DF_CS
-- for catalog_sales
delete from catalog_sales where cs_sold_date_sk in (select d_date_sk from date_dim where d_date between DATE1 and DATE2);
-- for catalog_returns
delete from catalog_returns where cr_order_number in (
select cs_order_number from catalog_sales where cs_sold_date_sk in (select d_date_sk from date_dim where d_date between DATE1 and DATE2)
);
-- DF_SS
-- for store_sales
delete from store_sales
where ss_sold_date_sk in
(select d_date_sk from date_dim where d_date between DATE1 and DATE2);
-- for store_sales
delete from store_returns where sr_ticket_number in (
select ss_ticket_number from store_sales where ss_sold_date_sk in (select d_date_sk from date_dim where d_date between DATE1 and DATE2)
);
-- DF_WS
-- for web_sales
delete from web_sales where ws_sold_date_sk in (select d_date_sk from date_dim where d_date between DATE1 and DATE2);
-- for web_returns
delete from web_returns where wr_order_number in (
select ws_order_number from web_sales where ws_sold_date_sk in (select d_date_sk from date_dim where d_date between DATE1 and DATE2)
);
-- DF_I
-- for inventory
delete from inventory where inv_date_sk in (select d_date_sk from date_dim where d_date between DATE1 and DATE2);
from spark-rapids-benchmarks.
Related Issues (20)
- Transcode from other formats besides CSV
- Query filtering for nds_power HOT 2
- [Bug] Failed to build twice in nds/tpcds-gen HOT 2
- [FEA] Support Iceberg and DeltaLake as input data format for data conversion
- [FEA] Allow property files for nds_transcode and other nds_* scripts HOT 1
- [BUG] Got error "cannot resolve 'd_date' given input columns" when run nds_maintenance.py HOT 1
- [BUG] Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:3.11.0:compile (default-compile) on project tpcds-gen
- Update README to latest version of spark-rapids HOT 1
- [FEA] nds_validate.py can compare some specified queries
- [FEA]Define a pre-commit hook to update copyright year automatically
- [BUG] nds_transcode.py is not handling international characters correctly HOT 1
- [BUG] validation failure of power run results on gpu with nvcomp and power run results on cpu HOT 7
- [BUG] No need for patches/code.patch HOT 1
- [BUG] Throughput run should use a new template file instead of using the Power test one
- [BUG] Table meta information are missing when running Data Maintenance over Delta
- Implement a pre-commit / premerge check for licence headers HOT 1
- [FEA] Create NDS-H benchmark for performance analysis HOT 2
- [QST] Cannot run on GPU because GpuCSVScan only supports UTF8 encoded data HOT 14
- [BUG] Iceberg related jobs failed due to Spark version incompatibility
- [BUG] Delta related jobs failed due to Spark version incompatibility
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from spark-rapids-benchmarks.