Giter VIP home page Giter VIP logo

feature_eng_for_lm's Introduction

Preprocessing

Scripts for tables transformation

cv_model_selection.py

Test different features combinations by fitting rotating features to linear regression from the pool and cross-validate models on N folds, thus total quantity is: combinations * N

args:

  • file_name - {str} name of the file with transformed features
  • date_time_col - {str} [date_time] feature column name. filled with N values (1, 2, 3, ... n_observations)
  • max_date_num_col - {int} last number [date_time] value / n_observation
  • const_features - {list} of constant features that don't have to be rotated !!!Y_val HAVE TO BE FIRST IN LIST!!!
  • to_rotate - {list} list of regular expressions for each media feature to rotate ['SUB_OLV_Imp_d0.?[2-8]', 'SUB_OOH_d0.?[2-8]']
  • regex_decode - {list} list of media to rotate in normal form ['OLV', 'OOH']
  • nfolds - {int} number of CV splits
  • save - {str} name of file to save with

return:

  • res - {pd.DataFrame} pivoted look like feature and stats table
from cv_model_selection import model_selection

model_selection(
                file_name = 'bike_sharing_demand', 
                date_time_col = False, 
                max_date_num_col = False, 
                const_features = ['season', 'holiday', 'weather', 'usd'], 
                to_rotate = ['TV_d?[0-9]', 'OOH_d?[0-9]', 'OLV_d?[0-9]', 'Comp_d?[0-9]'], 
                regex_decode = ['TV', 'OOH', 'OLV', 'Comp'],
                nfolds = 5,
                save = 'filename'
                )

distance_calculation.py

Calculate distance beween each competitior/OOH object and client store

  • file_name - name of file with client and competitor/OOH objects, latitude, longitude and open-close dates
  • client - column of client's objects names
  • competitor - column of competitor's objects names
  • competitor_latitude - column of competitor's latitude
  • client_latitude - column of client's latitude
  • competitor_longitude - column of competitor's longitude
  • client_longitude - column of client's longitude
  • threshold_tuple - interval (min_dist, max_dist) between objects you need to calculate feature within in metres
  • open_date - column with competitor open dates
  • close_date - column with competitor close dates
  • model_start_date - first model's monday
  • model_end_date - the last monday of model

Input and output dataframes:

    e.g.:
        comp    comp_latitude    comp_longitude    client    client_latitude    client_longitude    comp_open_date    comp_close_date
        1       30.3746659	     60.0453568        store_1   59.9268	        30.3161	             2016-04-04       2019-11-08
        2       30.3144672	     59.9571587        store_2   59.9258	        30.3144              2016-04-04                
        n       30.338264	     59.862598         store_n   60.0264	        30.2229              2016-04-11       2019-06-04
        
                                                                        to
        date          client_store      value   
        2016-03-28    store_1           0
        2016-04-04    store_1           1
        2016-04-11    store_1           2
        2016-04-18    store_1           2

Main func of the script

from distance_calculation import dist_feature

dist_feature(file_name = 'dist_filename', 
             client = 'client_store_col_name', 
             competitor = 'competitor_store_col_name', 
             competitor_latitude = 'competitor_latitude_col_name', 
             client_latitude = 'client_latitude_col_name', 
             competitor_longitude = 'competitor_longitude_col_name', 
             client_longitude = 'client_latitude_col_name', 
             threshold_tuple = (0, 200), 
             open_date = 'competitor_store_open_date_col_name', 
             close_date = 'competitor_store_close_date_col_name', 
             model_start_date = '2017-01-02', 
             model_end_date = '2020-02-17'
             )

feature_eng_for_lm's People

Contributors

lumosoptimus avatar

Watchers

James Cloos avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.