Light

copyrosicky / kagglemlb Goto Github PK

View Code? Open in Web Editor NEW

1.0 1.0 0.0 295 KB

77 / 852 复盘和代码改进

License: MIT License

Python 39.16% Jupyter Notebook 60.84%

kagglemlb's Introduction

MLB Player Digital Engagement Forecasting 复盘和总结

复盘代码参考了3rd的代码开源并进行了改进

比赛任务为表格数据的回归问题。特点为train data较大，达到了GB级别。这就导致了在特征工程阶段使用pandas处理速度不够。同时特征维度较高，可进行的特征构造较多。

针对数据量较大的问题

采取了分批次处理的方法，通常全量数据需要分成20-25个epoch才能完全处理。在进行代码复盘时，参考了高分的参考方案，即针对不同的数据，通过不同的Class来定义数据结构，脱离了pandas的框架进行特征处理，构造的特征以字典的形式返回，加快了特征的处理速度。

特征维度较高的问题

进行了多次的特征构造，同时结合了lag特征，滑窗特征和针对不同player和team的统计特征对数据的信息进行抽取。

思考和改进

是否可以结合MLB的专业知识，针对不同的player和team，基于他们的统计特征，先行构造embedding特征，抽取出player和team对应的embedding向量？是否可以结合spark框架，改进出pyspark版本的code，缩短代码运行的时间？

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.