Giter VIP home page Giter VIP logo

criteo's Introduction

Distributed training for Criteo dataset on GCP using BigQuery reader

Overview

Sample Structure

  • trainer directory: containing the training package to be submitted to AI Platform
    • __init__py which is an empty file. It is needed to make this directory a Python package.
    • task.py contains the training code. It create a simple dummy linear dataset and trains a linear regression model with scikit-learn and saves the trained model object in a directory (local or GCS) given by the user.
  • scripts directory: command-line scripts to train the model locally or on AI Platform. We recommend to run the scripts in this directory in the following order, and use the source command to run them, in order to export the environment variables at each step:
    • train-local.sh trains the model locally using gcloud. It is always a good idea to try and train the model locally for debugging, before submitting it to AI Platform.
    • train-cloud.sh submits a training job to AI Platform.
    • deploy.sh creates a model resource, and a model version for the newly trained model.
    • cleanup.sh deletes all the resources created in this tutorial.
  • prediction containing a Python sample code to invoke the model for prediction.
    • predict.py invokes the model for some predictions.
  • setup.py: containing all the required Python packages for this tutorial.

Running the Sample

TODO: update

Explaining Key Elements

In this section, we'll highlight the main elements of this sample.

In this sample we are not passing the input dataset as a parameter. However, we need to save the trained model. To keep things simple, the code expects one argument to be passed to the code: the path to to store the model in. In other examples, we will be using argparse to process the input arguments. However, in this sample, we simply read the input argument from sys.argv[1].

Also note that we save the model as model.joblib which is the name that AI Platform expects for models saved with joblib to have.

Finally, we are using tf.gfile from TensorFlow to upload the model to GCS. It does not mean we are actually using TensorFlow in this sample to train a model. You may also use google.cloud.storage library for uploading and downloading to/from GCS. The advantage of using tf.gfile is that it works seamlessly whether the file path is local or a GCS bucket.

TODO: update The command to run the training job locally is this:

gcloud ai-platform local train \
        --module-name=trainer.task \
        --package-path=${PACKAGE_PATH} \
        -- \
        ${MODEL_DIR}
  • module-name is the name of the Python file inside the package which runs the training job
  • package-path determines where the training Python package is.
  • -- this is just a separator. Anyhing after this will be passed to the training job as input argument.
  • ${MODEL_DIR} will be passed to task.py as sys.argv[1]

TODO: update To submit a training job to AI Platform, the main command is:

gcloud ai-platform jobs submit training ${JOB_NAME} \
        --job-dir=${MODEL_DIR} \
        --runtime-version=${RUNTIME_VERSION} \
        --region=${REGION} \
        --scale-tier=${TIER} \
        --module-name=trainer.task \
        --package-path=${PACKAGE_PATH}  \
        --python-version=${PYTHON_VERSION} \
        -- \
        ${MODEL_DIR}
  • ${JOB_NAME} is a unique name for each job. We create one with a timestamp to make it unique each time.
  • scale-tier is to choose the tier. For this sample, we use BASIC. However, if you need to use accelerators for instance, or do a distributed training, you will need a different tier.

criteo's People

Contributors

vlasenkoalexey avatar

Watchers

James Cloos avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.