This process is for categorising data (brands) based on an existing list of brands and their search strings.
query_strings.py For each delta file in an S3 bucket, the python script will download the file and call the Sql function to import the data and start the branding process.
wloaddelta.sql Creates the table, imports and creates necessary indexes. Calls the branding process - then exports the results.
brand_match_exper.sql This branding process takes a large list of brands and searches the delta table for these brands. It searches in multiple columns, can return cases of multiple matches - and can accept additional sql input.
Input brands
brand | search string | additional sql |
---|---|---|
TESCO | \mTESCO\M% | |
RAPHAELS BANK | RAPHAELS BANK | AND SIC8_DESCRIPTION LIKE '%BANK%' AND NOT SIC8_DESCRIPTION LIKE '%ATM%' |
EE | T-MOBILE | |
ABBEYFIELD | ABBEYFIELD % | |
BARCLAYS BANK | BARCLAYS PLC% | AND SIC8_DESCRIPTION LIKE '%BANK%' AND NOT SIC8_DESCRIPTION LIKE '%ATM%' |
WHSMITH | W_?H_?SMITH% |
Results example
name | trade_name | franchise_name | sic8_description | brandname | new_brand | multiple_brands | matches |
---|---|---|---|---|---|---|---|
EXXONMOBIL POWER AND GASE SERVICES INC. | GAS AND OTHER SERVICES COMBINED | EXXON MOBIL | 1 | ;EXXON MOBIL | 1 | ||
GODDARD DAIRY QUEEN | ICE CREAM STANDS OR DAIRY BARS | DAIRY QUEEN | 1 | ;DAIRY QUEEN | 1 | ||
DAIRY QUEEN | DAIRY QUEEN | ICE CREAM STANDS OR DAIRY BARS | DAIRY QUEEN | 1 | ;DAIRY QUEEN | 2 | |
COURTYARD BY MARRIOTT CARROLLTON | COURTYARD CARROLLTON | COURTYARD BY MARRIOTT | HOTELS | COURTYARD BY MARRIOTT | 1 | ;COURTYARD BY MARRIOTT;MARRIOTT | 2 |
COURTYARD BY MARRIOTT DELRAY BEACH | COURTYARD BY MARRIOTT | HOTELS AND MOTELS | COURTYARD BY MARRIOTT | 1 | ;COURTYARD BY MARRIOTT;MARRIOTT | 2 |