Giter VIP home page Giter VIP logo

china_cup's Introduction

China_Cup

For China Cup
The missing value is following:

Train

  • overdue:
    original shape: (55596, 2)
    transform shape: (55596, 2)
    unique user number: 55596
    0.0 % users are missing

  • user_info:
    original shape: (55596, 6)
    transform shape: (55596, 6)
    unique user number: 55596
    0.0 % users are missing

  • loan_time:
    original shape: (55596, 2)
    transform shape: (55596, 2)
    unique user number: 55596
    0.0 % users are missing

  • browse_history:
    original shape: (22919547, 4)
    transform shape: (22919547, 4)
    unique user number: 47330
    14.87 % users are missing

  • bill_detail:
    original shape: (2338118, 15)
    transform shape: (2338118, 15)
    unique user number: 53174
    4.36 % users are missing

  • bank_detail:
    original shape: (6070197, 5)
    transform shape: (6070197, 5)
    unique user number: 9294
    83.28 % users are missing

Test

  • usersID:
    original shape: (13899, 1)
    transform shape: (13899, 1)
    unique user number: 13899
    0.0 % users are missing

  • user_info:
    original shape: (13899, 6)
    transform shape: (13899, 6)
    unique user number: 13899
    0.0 % users are missing

  • loan_time:
    original shape: (13899, 2)
    transform shape: (13899, 2)
    unique user number: 13899
    0.0 % users are missing

  • browse_history:
    original shape: (5476055, 4)
    transform shape: (5476055, 4)
    unique user number: 11997
    13.68 % users are missing

  • bill_detail:
    original shape: (414895, 15)
    transform shape: (414895, 15)
    unique user number: 13643
    1.84 % users are missing

  • bank_detail:
    original shape: (376409, 5)
    transform shape: (376409, 5)
    unique user number: 709
    94.9 % users are missing

Features of the data

  • sample imbalance
    can adjust the sample rate for different type, try to get close to the online sample balance
  • browse_history
    user can do multi activities at a time and the file record users' activity at different time. Many activity can come out from this file, like how many times the user operate, how many times the user take this activity or activity label.
  • bill info
    a costumer can have credit cards from different bank, totally 29 banks.
  • the number of the activities has no pysical meaning, assumption base on the total number an activity taken.
  • the number of the activities' label has no pysical meaning, assumption base on the total number an activity label taken.

Extract Features (5 + 1 + 1050 + 2551 = 3607)

Basic information (5):

  • gender, occupation, education, marriage, residence

Loan time (1):

  • loan time

Browse activity (1050):

  • the total times the costumer browse (1)
  • the time difference between the first and last browses of the costumer (1)
  • the frequency of the costumer browse (1)
  • total_times/frequency/average/min/max of a costumer take for a single activity label (11 different types * 5 = 55)
  • total_times/frequency of a costumer take for a single activity (216 different types * 2 = 432)
  • the average/min/max/count of activities number does the costumer take in one browse (4)
  • the average/min/max/count of activities number label does the costumer take in one browse (4)
  • total_times/frequency of a costumer take for a single activity+label (276 different types * 2 = 552)

Bill information (2551)

  • the total bills the costumer has (1)
  • the average/min/max/sum/cov/std of each bill's details for costumers (14 * 6 = 84)
  • the number of credit cards costumer has (1)
  • the total bills the costumer has for different banks (29)
  • the average/min/max/sum/cov/std of each bill's details for costumers (14 * 6 * 29 = 2436)

Bank detail

  • loss too much information, consider later

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.