Light

ju-ki / my_pipeline Goto Github PK

View Code? Open in Web Editor NEW

0.0 0.0 0.0 1.12 MB

Python 100.00%

my_pipeline's Introduction

Hi 👋, I'm Jukiya

I'm a student majoring in psychology. I'm interested in Web development and Game development.

🌱 I’m currently learning Game development and Web development
📫 How to reach me [email protected]

Connect with me:

Languages and Tools:

my_pipeline's People

Contributors

Watchers

my_pipeline's Issues

gb_modelのパラメータについて

できるだけもう少し詳細に書く(cat, xgbあたりが少なすぎる)
公式ドキュメントなどをよく読む
できれば論文などもみてどんな効果があるのかまで把握できるとベスト

XGBboostについて

XGBだけ特徴量がないのは違和感あるので追加しろ
parameterの情報をもっと詳しく書いた方がいいかも(xgboostに限らず)

特徴量の保存方法について

現状ブロック名とカラム名で保存しているがn_componentなどの値を変えたいときはドライブにいってわざわざ削除しないといけなくなる
良い対処方法が思いつかない場合は、tuboさんが使っていたparam管理も検討してみる

Kaggle上で謎のエラー

pip installができない

アンサンブル系のライブラリ案

いちいちパスを書くのはめんどくさい

イメージ(一番無難なアイデア)

def ensemble(exp_name:Union[List[str], str], model_name=[str], path=None,  target=None, metric=None, is_oof=False, is_logarithm=False):
    out_df = pd.DataFrame()
    if type(exp_name) == str:
        exp_name = [exp_name]

    if is_oof:
        for exp in exp_name:
            for model in model_name:
                oof_path = f"{path}/{exp}/{exp}_{model}_oof.csv"
                _df = pd.read_csv(oof_path).rename(columns={"oof": f"{exp}_{model}"})
                if target is not None:
                    print(f"{exp}_{model}")
                    print(metric(target.values, _df[f"{exp}_{model}"].values))
                out_df = pd.concat([out_df, _df], axis=1)
        return out_df
    else:
        for exp in exp_name:
            for model in model_name:
                sub_path = f"{path}/{exp}/{exp}_{model}_sub.csv"
                _df = pd.read_csv(sub_path).rename(columns={"target": exp})
                out_df = pd.concat([out_df, _df], axis=1)
        return out_df

ただ使いたくない実験のモデル（lgb, xgbは大丈夫だがcatboostがリークを起こしたなど）も一気に取ってくる必要があるので、後々削除しなきゃいけないのが少し面倒
modelはgb_modelに少しあるので大丈夫そう

kaggle_apiの関数が動かない

Errorがでてスキップされてしまう

改善すべき点

lag feature
Tabnetの追加
feature_block(trainとtestの両方にやる用の関数)
できればpytorch, tensorflowの追加

plot_values系

まとめた方がいい。分岐が多すぎて大変
nlp系の可視化はNaNデータに対応できていない(fillna("NaN")などで対処)

LightGBMのlogger出力

https://amalog.hateblo.jp/entry/lightgbm-logging-callback
verbose=Falseでもできるか知りたい

テーブルデータ系のテキスト特徴量について

次元削減を名前で管理するようにする(NFM, TruncatedSVD)
confirm_explained_ratioが動かないところもある

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.