Giter VIP home page Giter VIP logo

my_pipeline's Introduction

Hi 👋, I'm Jukiya

I'm a student majoring in psychology. I'm interested in Web development and Game development.

ju-ki

ju-ki

juki_dsandgm

  • 🌱 I’m currently learning Game development and Web development

  • 📫 How to reach me [email protected]

Connect with me:

juki_dsandgm jukijuki jukiya

Languages and Tools:

csharp css3 django docker express git heroku html5 javascript laravel linux mysql nodejs pandas php python pytorch react unity

ju-ki

 ju-ki

ju-ki

my_pipeline's People

Contributors

ju-ki avatar

Watchers

 avatar

my_pipeline's Issues

gb_modelのパラメータについて

  • できるだけもう少し詳細に書く(cat, xgbあたりが少なすぎる)
  • 公式ドキュメントなどをよく読む
  • できれば論文などもみてどんな効果があるのかまで把握できるとベスト

XGBboostについて

  • XGBだけ特徴量がないのは違和感あるので追加しろ
  • parameterの情報をもっと詳しく書いた方がいいかも(xgboostに限らず)

特徴量の保存方法について

  • 現状ブロック名とカラム名で保存しているがn_componentなどの値を変えたいときはドライブにいってわざわざ削除しないといけなくなる
  • 良い対処方法が思いつかない場合は、tuboさんが使っていたparam管理も検討してみる

アンサンブル系のライブラリ案

  • いちいちパスを書くのはめんどくさい

イメージ(一番無難なアイデア)

def ensemble(exp_name:Union[List[str], str], model_name=[str], path=None,  target=None, metric=None, is_oof=False, is_logarithm=False):
    out_df = pd.DataFrame()
    if type(exp_name) == str:
        exp_name = [exp_name]

    if is_oof:
        for exp in exp_name:
            for model in model_name:
                oof_path = f"{path}/{exp}/{exp}_{model}_oof.csv"
                _df = pd.read_csv(oof_path).rename(columns={"oof": f"{exp}_{model}"})
                if target is not None:
                    print(f"{exp}_{model}")
                    print(metric(target.values, _df[f"{exp}_{model}"].values))
                out_df = pd.concat([out_df, _df], axis=1)
        return out_df
    else:
        for exp in exp_name:
            for model in model_name:
                sub_path = f"{path}/{exp}/{exp}_{model}_sub.csv"
                _df = pd.read_csv(sub_path).rename(columns={"target": exp})
                out_df = pd.concat([out_df, _df], axis=1)
        return out_df
  • ただ使いたくない実験のモデル(lgb, xgbは大丈夫だがcatboostがリークを起こしたなど)も一気に取ってくる必要があるので、後々削除しなきゃいけないのが少し面倒
  • modelはgb_modelに少しあるので大丈夫そう

改善すべき点

  • lag feature
  • Tabnetの追加
  • feature_block(trainとtestの両方にやる用の関数)
  • できればpytorch, tensorflowの追加

plot_values系

  • まとめた方がいい。分岐が多すぎて大変
  • nlp系の可視化はNaNデータに対応できていない(fillna("NaN")などで対処)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.