Giter VIP home page Giter VIP logo

open-solution-cdiscount-starter's Introduction

What is Cdiscount starter?

This is ready to use, end-to-end sample solution for the currently running Kaggle Cdiscount challenge.

It involves data loading and augmentation, model training (many different architectures), ensembling and submit generator.

More competitions 🎇

Check collection of public projects 🎁, where you can find multiple Kaggle competitions with code, experiments and outputs.

Disclaimer

In this open source solution you will find references to the neptune.ml. It is free platform for community Users, which we use daily to keep track of our experiments. Please note that using neptune.ml is not necessary to proceed with this solution. You may run it as plain Python script 😉.

How to run Cdiscount starter?

Installation

  1. Install the requirements

    pip install -r requirements.txt
  2. Install neptune by simply

    pip install neptune-cli
  3. Finish neptune installation by running

    neptune login
  4. Finally, open neptune and create project cdiscount. Check the project key because you will use it later (most likely it is: CDIS).

Now, you are ready to run the code and train some models...

Run code

remark about the competition data: We have uploaded the data to the neptune platform. It is available in the /public/cdiscount directory. Moreover, we created the meta_data file for large .bson files in the /public/Cdiscount/meta directory. It makes the process way faster.

You can run this end-to-end solution in two ways:

  • If you wish to work on your own machine you can run
    neptune run run_manager.py -- run_pipeline
  • Deploying on cloud via neptune is super easy
    • just go

      source run_neptune_command.sh
    • more advanced option is to run

      neptune send run_manager.py \
      --config experiment_config.yaml \
      --pip-requirements-file requirements.txt \
      --project-key CDIS \
      --environment keras-2.0-gpu-py3 \
      --worker gcp-gpu-medium \
      -- run_pipeline

Collect results and upload to Kaggle

Navigate to /output/project_data/submissions, get your submission file, upload it to Kaggle and check your rank in the competition!

Advanced options

custom data directories

If you do not wish to use default data directories, you can specify custom paths in the data_config.yaml

raw_data_dir: /public/Cdiscount
meta_data_dir: /public/Cdiscount/meta
meta_data_processed_dir: /output/project_data/meta_processed
models_dir: /output/project_data/models
predictions_dir: /output/project_data/predictions
submissions_dir: /output/project_data/submissions

meta data creation

If you want to create meta data locally you should run

python run_manager create_metadata

and your metadata will be stored in the meta_data_dir

data sampling

Since the dataset is very large we suggest that you sample training dataset to a manageable size. Something like 1000 most common categories and 1000 images per category seems reasonable to start with. Nevertheless, You can tweak it however you want in the experiment_config.yaml file

properties:
  - key: top_categories
    value: 100
  - key: images_per_category
    value: 100
  - key: epochs
    value: 10
  - key: pipeline_name
    value: InceptionPipeline

hyperparameter space search

If you like to search the hyperparameter space, neptune can do this for you. Check out hyperparameter optimization.

training without neptune

We give you an option to run this code without neptune. The transition is seamless, just follow these steps:

  1. Download the competition data to some folder your_raw_data_dir

  2. specify data directories in the data_config.yaml

  3. run python code

      python run_manager.py run_pipeline

Final remarks

Please feel free to modify this code in order to improve your score. Add new models, pre- and post-processing routines or ensembling methods.

Have fun competing on this Kaggle challenge!

open-solution-cdiscount-starter's People

Contributors

anitaka-codilime avatar cubestone avatar dependabot[bot] avatar i008 avatar jakubczakon avatar kamil-kaczmarek avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

open-solution-cdiscount-starter's Issues

Get stuck after 10 epochs

According to the readme, run the command
source run_neptune_command.sh
after 10 epochs, about 1.5 hours, it gets stuck.
And stdout stops at

1629.251008,"
 1/79 [..............................] - ETA: 259s
 2/79 [..............................] - ETA: 152s
 3/79 
...
75/79 [===========================>..] - ETA: 2s
76/79 [===========================>..] - ETA: 2s
77/79 [============================>.] - ETA: 1s
78/79 [============================>.] - ETA: 0s
79/79 [==============================] - 52s
"

Is it at a predicting process?
If it is predicting, how long will it take?
At kaggle others also found this problem:
https://www.kaggle.com/c/cdiscount-image-classification-challenge/discussion/41478
Seems like someone getting stuck for at least for nearly 4 hours.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.