Giter VIP home page Giter VIP logo

googlecloudplatform / tensorflow-recommendation-wals Goto Github PK

View Code? Open in Web Editor NEW
159.0 15.0 104.0 4.44 MB

An end-to-end solution for website article recommendations based on Google Analytics data. Uses WALS matrix-factorization in TensorFlow, trained on Cloud ML Engine. Recommendations served with App Engine Flex and Cloud Endpoints. Orchestration is performed using Airflow on Cloud Composer. See the solution tutorials at:

Home Page: https://cloud.google.com/solutions/machine-learning/recommendation-system-tensorflow-overview

License: Apache License 2.0

Python 73.39% Shell 10.63% Jupyter Notebook 15.98%

tensorflow-recommendation-wals's Introduction

Recommendations on GCP with TensorFlow 1.x and WALS

This project deploys a solution for a recommendation service on GCP, using the WALS algorithm in the contrib.factorization module of TensorFlow 1.x. Components include:

  • Recommendation model code, and scripts to train and tune the model on ML Engine
  • A REST endpoint using Google Cloud Endpoints for serving recommendations
  • An Airflow server managed by Cloud Composer (or alternatively, running on GKE) for running scheduled model training

Before you begin

  1. Create a new Cloud Platform project.

  2. Enable billing for your project.

  3. Enable APIs for

  • BigQuery API
  • Cloud Resource Manager
  • AI Platform Training & Prediction API
  • App Engine Admin
  • Container Engine (if using Airflow on GKE)
  • Cloud SQL API (if using Airflow on GKE)
  • Cloud Composer API (if using Cloud Composer for Airflow)

Installation

Option 1: Use Google Cloud Shell

  1. Open the Google Cloud Platform Console.

  2. Click the Cloud Shell icon at the top of the screen. Cloud Shell

Option 2: Run Locally in Linux or Mac OS X

These scripts will not work in Windows. If you have a Windows machine, we recommend you use Google Cloud Shell.

  1. Download and install the Google Cloud SDK, which includes the gcloud command-line tool.

  2. Initialize the Cloud SDK.

    gcloud init
    
  3. Set your default project (replace YOUR-PROJECT-ID with the name of your project).

    gcloud config set project YOUR-PROJECT-ID
    

Install Miniconda 2

This project assumes Python 2.

  • Install miniconda 2:

https://docs.conda.io/en/latest/miniconda.html#installing

  • Create environment and install packages:

Install packages in conda.txt:

cd tensorflow-recommendation-wals
conda create -y -n recserve
source activate recserve
conda install -y -n recserve --file conda.txt
  • Install TensorFlow version 1.x. This code should work with any version of 1.x. We are using the latest as of June 2020.

CPU:

pip install tensorflow==1.15

Or GPU, if one is available in your environment:

pip install tensorflow-gpu==1.15

Install other requirements not available from conda:

pip install -r requirements.txt

Upload sample data to BigQuery

This tutorial comes with a sample Google Analytics data set, containing page tracking events from the Austrian news site Kurier.at. The schema file '''ga_sessions_sample_schema.json''' is located in the folder data in the tutorial code, and the data file '''ga_sessions_sample.json.gz''' is located in a public Cloud Storage bucket associated with this tutorial. To upload this data set to BigQuery:

  1. Make a GCS bucket with the name recserve_[YOUR-PROJECT-ID]:

    export BUCKET=gs://recserve_$(gcloud config get-value project 2> /dev/null)
    gsutil mb ${BUCKET}
    
  2. Copy the data file ga_sessions_sample.json.gz to the bucket:

    gsutil cp gs://solutions-public-assets/recommendation-tensorflow/data/ga_sessions_sample.json.gz ${BUCKET}/data/ga_sessions_sample.json.gz
    
  3. (Option 1) Go to the BigQuery web UI. Create a new dataset named "GA360_test". In the navigation panel, hover on the dataset, click the down arrow icon, and click Create new table. On the Create Table page, in the Source Data section:

    • For Location, select Google Cloud Storage, and enter the file path [your_bucket]/data/ga_sessions_sample.json.gz (without the gs:// prefix).
    • For File format, select JSON.
    • On the Create Table page, in the Destination Table section, for Table name, choose the dataset, and in the table name field, enter the name of the table as 'ga_sessions_sample'.
    • Verify that Table type is set to Native table.
    • In the Schema section, enter the schema definition.
    • Open the file data/ga_sessions_sample_schema.json in a text editor, select all, and copy the complete text of the file to the clipboard. Click Edit as text and paste the table schema into the text field in the web UI.
    • Click Create Table.
  4. (Option 2) Using the command line:

    export PROJECT=$(gcloud config get-value project 2> /dev/null)
    
    bq --project_id=${PROJECT} mk GA360_test
    
    bq load --source_format=NEWLINE_DELIMITED_JSON \
     GA360_test.ga_sessions_sample \
     ${BUCKET}/data/ga_sessions_sample.json.gz \
     data/ga_sessions_sample_schema.json
    

Install WALS model training package and model data

  1. Create a distributable package. Copy the package up to the code folder in the bucket you created previously.

    pushd wals_ml_engine
    python setup.py sdist
    gsutil cp dist/wals_ml_engine-0.1.tar.gz ${BUCKET}/code/
    
  2. Run the wals model on the sample data set:

    ./mltrain.sh local ../data recommendation_events.csv --data-type web_views --use-optimized
    

This will take a couple minutes, and create a job directory under wals_ml_engine/jobs like "wals_ml_local_20180102_012345/model", containing the model files saved as numpy arrays.

  1. Copy the model files from this directory to the model folder in the project bucket:

    export JOB_MODEL=$(find jobs -name "model" | tail -1)
    gsutil cp ${JOB_MODEL}/* ${BUCKET}/model/
    
  2. Copy the sample data file up to the project bucket:

    gsutil cp ../data/recommendation_events.csv ${BUCKET}/data/
    popd
    

Install the recserve endpoint

This step can take several minutes to complete. You can do this in a separate shell. That way you can deploy the Airflow service in parallel. Remember to 'source activate recserve' in any new shell that you open, to activate the recserve envionment.

source activate recserve
  1. Create the App Engine app in your project:

    gcloud app create --region=us-east1
    
  2. Prepare the deploy template for the Cloud Endpoint API:

    cd scripts
    ./prepare_deploy_api.sh                         # Prepare config file for the API.
    

This will output somthing like:

...
To deploy:  gcloud endpoints services deploy /var/folders/1m/r3slmhp92074pzdhhfjvnw0m00dhhl/T/tmp.n6QVl5hO.yaml
  1. Run the endpoints deploy command output above:

    gcloud endpoints services deploy [TEMP_FILE]
    
  2. Prepare the deploy template for the App Engine App:

    ./prepare_deploy_app.sh
    

You can ignore the script output "ERROR: (gcloud.app.create) The project [...] already contains an App Engine application. You can deploy your application using gcloud app deploy." This is expected.

The script will output something like:

   ...
   To deploy:  gcloud -q app deploy ../app/app_template.yaml_deploy.yaml
  1. Run the command above:

    gcloud -q app deploy ../app/app_template.yaml_deploy.yaml
    

This will take several minutes.

   cd ..

Deploy the Airflow service

Option 1 (recommended): Use Cloud Composer

Cloud Composer is the GCP managed service for Airflow. It is in beta at the time this code is published.

  1. Create a new Cloud Composer environment in your project:

    CC_ENV=composer-recserve

    gcloud composer environments create $CC_ENV --location us-central1

This process takes a few minutes to complete.

  1. Get the name of the Cloud Storage bucket created for you by Cloud Composer:

    gcloud composer environments describe $CC_ENV
    --location us-central1 --format="csvno-heading" | sed 's/.{5}$//'

In the output, you see the location of the Cloud Storage bucket, like this:

gs://[region-environment_name-random_id-bucket]

This bucket contains subfolders for DAGs and plugins.

  1. Set a shell variable that contains the path to that output:

    export AIRFLOW_BUCKET="gs://[region-environment_name-random_id-bucket]"

  2. Copy the DAG training.py file to the dags folder in your Cloud Composer bucket:

    gsutil cp airflow/dags/training.py ${AIRFLOW_BUCKET}/dags

  3. Import the solution plugins to your composer environment:

    gcloud composer environments storage plugins import
    --location us-central1 --environment ${CC_ENV} --source airflow/plugins/

Option 2: Create an Airflow cluster running on GKE

This can be done in parallel with the app deploy step in a different shell.

  1. Deploy the Airflow service using the script in airflow/deploy:

    source activate recserve
    cd airflow/deploy
    ./deploy_airflow.sh
    

This will take a few minutes to complete.

  1. Create "dags," "logs" and "plugins" folders in the GCS bucket created by the deploy script named (managed-airflow-{random hex value}), e.g. gs://managed-airflow-e0c99374808c4d4e8002e481. See https://storage.googleapis.com/solutions-public-assets/recommendation-tensorflow/images/airflow_buckets.png. The name of the bucket is available in the ID field of the airflow/deploy/deployment-settings.yaml file created by the deploy script. You can create the folders in the cloud console, or use the following script:

    cd ../..
    python airflow/deploy/create_buckets.py
    
  2. Copy training.py to the dags folder in your airflow bucket:

    export AIRFLOW_BUCKET=`python -c "\
    import yaml;\
    f = open('airflow/deploy/deployment-settings.yaml');\
    settings=yaml.load(f);\
    f.close();\
    print settings['id']"`
    
    gsutil cp airflow/dags/training.py gs://${AIRFLOW_BUCKET}/dags
    
  3. Copy plugins to the plugins folder of your airflow bucket:

    gsutil cp -r airflow/plugins gs://${AIRFLOW_BUCKET}
    
  4. Restart the airflow webserver pod

    WS_POD=`kubectl get pod | grep -o airflow-webserver-[0-9a-z]*-[0-9a-z]*`
    kubectl get pod ${WS_POD} -o yaml | kubectl replace --force -f -
    

Usage

rec_serve endpoint service

cd scripts
./query_api.sh          # Query the API.
./generate_traffic.sh   # Send traffic to the API.

Airflow

The Airflow web console can be used to update the schedule for the DAG, inspect logs, manually execute tasks, etc.

Option 1 (Cloud Composer)

Note that after creating the Cloud Composer environment, it takes approximately 25 minutes for the web interface to finish hosting and become accessible.

Type this command to print the URL for the Cloud Composer web console:

gcloud composer environments describe $CC_ENV --location us-central1 \
    --format="csv[no-heading](config.airflow_uri)"

You see output that looks like the following:

https://x6c9aa336e72ad0dd-tp.appspot.com

To access the Airflow console for your Cloud Composer instance, go to the URL displayed in the output.

Option 2 (GKE)

You can find the URL and login credentials for the airflow admin interface in the file airflow/deploy/deployment-settings.yaml.

e.g.

...
web_ui_password: IiDYrpwJcT...
web_ui_url: http://35.226.101.220:8080
web_ui_username: airflow

The Airflow service can also be accessed from the airflow-webserver pod in the GKE cluster. Open your project console, navigate to the "Discovery and load balancing" page in GKE, and click on the endpoint link for the airflow-webserver to access the Airflow admin app.

tensorflow-recommendation-wals's People

Contributors

lukmanr avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

tensorflow-recommendation-wals's Issues

WARNING: The `gcloud ml-engine` commands have been renamed and will soon be removed. Please use `gcloud ai-platform` instead. Traceback (most recent call last): File "/home/contacto/miniconda2/envs/tfrec/lib/python2.7/runpy.py", line 174, in _run_module_as_main "__main__", fname, loader, pkg_name) File "/home/contacto/miniconda2/envs/tfrec/lib/python2.7/runpy.py", line 72, in _run_code exec code in run_globals File "/home/contacto/tensorflow-recommendation-wals/wals_ml_engine/trainer/task.py", line 22, in <module> import model File "trainer/model.py", line 25, in <module> import wals File "trainer/wals.py", line 21, in <module> from tensorflow.contrib.factorization.python.ops import factorization_ops ImportError: No module named contrib.factorization.python.ops

ModuleNotFoundError: No module named 'setuptools'

While trying to run the command
./mltrain.sh train ${BUCKET} data/ratings.dat --delimiter ::
The following error appears.
and I tried deactivating conda and installing setuptools. but it still shows the same error:

Fri Feb 5 10:11:32 UTC 2021
ERROR: (gcloud.ai-platform.jobs.submit.training) Packaging of user Python code failed with message:
Traceback (most recent call last):
File "/home/ramiz_f/tensorflow-recommendation-wals/wals_ml_engine/setup.py", line 15, in
from setuptools import find_packages
ModuleNotFoundError: No module named 'setuptools'
Try manually building your Python code by running:
$ python setup.py sdist
and providing the output via the --packages flag (for example, --packages dist/package.tar.gz,dist/package2.whl)
Fri Feb 5 10:11:32 UTC 2021

Can't run mltrain script locally due to <3 arguments

There are only two parameteres for the mltrain script if you want to run it locally: 'local' and the path to the data. However, in your mltrain script you don't allow for less than three arguments passed:
if [[ $# < 3 ]]; then usage exit 1 fi
I have to add a bogus argument to get this to work.
Ex: mltrain.sh local data.csv does not work
Ex: mltrain.sh local garbage data.csv does work

google tutorial: model tuning error

Hi,

I went through the tutorial by Google on recommendation systems with tensorflow given in the link below:
https://cloud.google.com/solutions/machine-learning/recommendation-system-tensorflow-apply-to-analytics-data

I went through this tutorial until I reached the part in the link above: Apply to Data from Google Analytics (Part 3)
The training part was successful. However, the tuning part was not. I used the command as it was on the tutorial to run the tuning, which is

./mltrain.sh tune gs://your_bucket data/ga_pageviews.csv --data-type web_views

obviously replacing your_bucket data with the name of my bucket, which is rec_sys_bucket_part2.

I get a few errors in the following order in the log:

2019-07-23T19:50:33.494398117Z master-replica-0 Trial Id : 1 usage: task.py [-h] --train-file TRAIN_FILE --job-dir JOB_DIR

2019-07-23T19:50:33.494867086Z master-replica-0 Trial Id : 1 [--latent_factors LATENT_FACTORS] [--num_iters NUM_ITERS]

2019-07-23T19:50:33.495102882Z master-replica-0 Trial Id : 1 [--regularization REGULARIZATION] [--unobs_weight UNOBS_WEIGHT]

2019-07-23T19:50:33.495305061Z master-replica-0 Trial Id : 1 [--wt_type WT_TYPE] [--feature_wt_factor FEATURE_WT_FACTOR]

2019-07-23T19:50:33.495655059Z master-replica-0 Trial Id : 1 [--feature_wt_exp FEATURE_WT_EXP] [--gcs-bucket GCS_BUCKET]

2019-07-23T19:50:33.495904922Z master-replica-0 Trial Id : 1 [--output-dir OUTPUT_DIR] [--verbose-logging] [--hypertune]

2019-07-23T19:50:33.496123075Z master-replica-0 Trial Id : 1 [--data-type DATA_TYPE] [--delimiter DELIMITER] [--headers]

2019-07-23T19:50:33.496273040Z master-replica-0 Trial Id : 1 [--use-optimized]

2019-07-23T19:50:33.496448040Z master-replica-0 Trial Id : 1 task.py: error: argument --train-file: expected one argument

2019-07-23T19:50:34.097594976Z master-replica-0 Trial Id : 1 Command '['python', '-m', u'trainer.task', u'--hypertune', u'--gcs-bucket', u'gs://rec_sys_bucket_part2/ga_pageviews.csv', u'--train-file', u'--data-type', u'--verbose-logging', u'web_views', u'--unobs_weight', u'0.10133471358860466', u'--latent_factors', u'43', u'--regularization', u'4.9950598320961008', u'--feature_wt_exp', u'6.3398703380823136', '--job-dir', 'gs://rec_sys_bucket_part2/ga_pageviews.csv/jobs/wals_ml_tune_20190723_194628/1']' returned non-zero exit status 2

I'm not sure which part of task.py is responsible for this. I added some print lines for check, but i don't see the print results when I run the job.

Any clues on how I can debug this?
Thanks,
A

Why python 2?

Hi I am just curious why python 2 version is required to run this project?

Training in gcp failed with error

File "/root/.local/lib/python2.7/site-packages/trainer/model.py", line 203, in _page_views_train_and_test
ix = pds_items.searchsorted(item)[0]
IndexError: invalid index to scalar variable.

Problem with /usr/libpython 2.7/runpy.py

WARNING: The gcloud ml-engine commands have been renamed and will soon be removed. Please use gcloud ai-platform instead.
Traceback (most recent call last):
File "/usr/lib/python2.7/runpy.py", line 174, in _run_module_as_main
"main", fname, loader, pkg_name)
File "/usr/lib/python2.7/runpy.py", line 72, in _run_code
exec code in run_globals
File "/home/contacto/tensorflow-recommendation-wals/wals_ml_engine/trainer/task.py", line 22, in
import model
File "trainer/model.py", line 20, in
import pandas as pd
ImportError: No module named pandas

Wrong computation for RMSE ?

for i in xrange(actual.data.shape[0]):
row_pred = output_row[actual.row[i]]
col_pred = output_col[actual.col[i]]
err = actual.data[i] - np.dot(row_pred, col_pred)
mse += err * err

Are you sure about your calculation of the RMSE ?
I may be wrong but you may have forgotten in your calculation all the null values in the 'actual' variable(a scipy coo_matrix) because scipy coo_matrix is a sparse matrix (that is to say zero values should not be in the matrix).

So we may add the prediction value to the error calculation because in that case, a prediction is made but the real value is 0.

Am I clear ? Did I miss something ? Am I wrong ?

problem with Recommendations on GCP with TensorFlow and WALS

WARNING: The gcloud ml-engine commands have been renamed and will soon be removed. Please use gcloud ai-platform instead.
Traceback (most recent call last):
File "/home/contacto/miniconda2/envs/tfrec/lib/python2.7/runpy.py", line 174, in _run_module_as_main
"main", fname, loader, pkg_name)
File "/home/contacto/miniconda2/envs/tfrec/lib/python2.7/runpy.py", line 72, in _run_code
exec code in run_globals
File "/home/contacto/tensorflow-recommendation-wals/wals_ml_engine/trainer/task.py", line 22, in
import model
File "trainer/model.py", line 25, in
import wals
File "trainer/wals.py", line 21, in
from tensorflow.contrib.factorization.python.ops import factorization_ops
ImportError: No module named contrib.factorization.python.ops
Tue Nov 26 01:21:40 UTC 2019

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.