Giter VIP home page Giter VIP logo

oss-pcf-gcp-retail-demo's Introduction

PCF + GCP Retail Demo

Diagram showing major components and how data flows Caveat: this version has no Persistence Service

A demo for retailers to see how PCF and GCP turn streams of data into action.

Prerequisites

  • A Pivotal Cloud Foundry (PCF) installation
  • Install the GCP Service Broker, available on Pivotal Network (source is GitHub)
  • Java 8 JDK installed
  • CF Command Line Interface (CLI):
    1. Navigate to the releases page on GitHub
    2. Download and install the appropriate one for your platform
  • Git client installed
  • Using the Git client, clone this repo
  • Change into this newly created directory: cd ./oss-pcf-gcp-retail-demo

Deploy some Python apps

There are a few Python applications used here, and the easiest way to get this all deployed is to get a few of them running first.

  • Image Resizing Service (used by ds_app_09):

    1. git clone https://github.com/cf-platform-eng/image-resizing-service.git
    2. cd ./image-resizing-service/
    3. cf push
    4. cd -
  • The product recommendation is partially based on an image feature based match. The following describes the requirements:

    1. A Collection of images of inventory items, in JPEG format
    2. The file names of these images encode the SKU, price, and description: SKU-PRICE-THE_FULL_DESCRIPTION.jpg
    3. Example image file name: L57817-42-Polka_Dot_Wrap_Midi_Dress.jpg
    4. The entire image collection is housed within a Google Cloud Storage bucket (can be a different one from the bucket created in the next section)
    5. Within that same bucket, there needs to be a "table of contents" (TOC) file, containing the image names, one per line (with no path)
  • Inventory Matcher (used by ds_app_15):

    1. cd ./inventory_match/
    2. Edit ./manifest.yml, setting the appropriate value for IMAGE_TOC_URL (see above)
    3. Create an instance of google-storage, specifying your BUCKET_NAME: cf create-service google-storage standard storage -c '{ "name": "BUCKET-NAME" }
    4. Push the app without starting it: cf push --no-start
    5. Bind to the storage instance you created: cf bs image-match storage -c '{"role": "editor"}'
    6. Set the health check to be processed based during the indexing phase, since the app will not be listening on its port during all that time (could be an hour): cf set-health-check image-match process
    7. Start the app up: cf start image-match
    8. Tail the logs as it indexes images: cf logs image-match
    9. Once that completes, it should print a message, saying it's listening on its port. Once this happens, run the next step.
    10. Just run a standard cf push
    11. It stores the model as a Zip file in your storage bucket, so future restarts will be quicker.
    12. cd -
  • Data Science Interrogator App:

    1. cd ./ds_app_09/
    2. Create an instance of GCP ML: cf cs google-ml-apis default gcp-ml -c '{ "name": "gcp-ml" }'
    3. Push the app without starting it: cf push --no-start
    4. Bind the app to the GCP ML service instance: cf bs ds_app_09 gcp-ml -c '{ "role": "viewer" }'
    5. Create an instance of the Redis service: cf cs p-redis shared-vm redis (NOTE: the code will look for p-redis)
    6. Bind this Redis instance to the app: cf bs ds_app_09 redis
    7. Start the app: cf start ds_app_09. Once it starts, the urls: field of the output will contain the value you need in the next step
    8. Create a service based on this app: cf cups ds_app_09-service -p '{ "uri": "http://ds_app_09.YOUR_PCF_INSTALL.DOMAIN" }'
    9. Use the app to bootstrap your termSet (see the code): time curl http://ds_app_09.YOUR_PCF_INSTALL.DOMAIN/genTermSet
    10. Similarly, for labelSet: time curl http://ds_app_09.YOUR_PCF_INSTALL.DOMAIN/genLabelSet/400 (The 400 here is just a significant subset of your inventory set; I've been going with a value which is 10% of the total.) This step takes some time as it hits the GCP Vision API for each image.
    11. cd -
  • Data Science Evaluator App:

    1. cd ./ds_app_15/
    2. cf push
    3. Create a service based on this app: cf cups ds_app_15-service -p '{ "uri": "http://ds_app_15.YOUR_PCF_INSTALL.DOMAIN" }'
    4. NOTE: This app exposes a REST endpoint at /lastMessage, returning the most recent JSON data in the system
    5. cd -

Install Spring Cloud Dataflow (SCDF) server

SCDF is the foundation of the data flow through this system. The server orchestrates the data streams, which are composed of modular building blocks. These can be Source, Processor, or Sink. There is a large set of out of the box components available and, since they are Spring Boot apps, it is easy to build a customized module.

  1. Download the SCDF server and client JAR files, as documented here
  2. Configure the manifest
  3. Ensure RabbitMQ, MySQL, and Redis tiles are installed (using Ops Manager, in PCF)
  4. Create service instances of each of these, using cf cs ...
  5. Push this app to PCF on GCP
  6. Ensure it is running
  7. Access the SCDF Dashboard (at https://dataflow-server.YOUR_PCF_INSTALL.DOMAIN/dashboard/)

Set up the SCDF stream "data backbone" consisting of the following components

  • An HTTP Source (item 6 in the diagram) which accepts incoming data from any of the social media or other adapters shown as items 1 through 6 in the diagram. We will use the out of the box HTTP source.
  • A custom SCDF Processor (item 8 in diagram), which hands off the data stream to the data science app shown as number 9 in the diagram
  • A second instance of the same SCDF Processor, which will take the enriched data stream and hand it off to the second data science app, item 15 in the diagram
  • This stream terminates at the SCDF Sink component (item 19 in diagram), which will pass the offer notification to the delivery agent.

Build and upload your SCDF modules

  1. Build the Processor project: ( cd ./transform-proc/ && ./build.sh )
  2. Upload the resulting JAR, ./transform-proc/target/transform-proc-0.0.1-SNAPSHOT.jar, into a Cloud Storage bucket, so SCDF is able to acces it.
  3. Build the Sink project: ( cd ./offer-sink/ && ./build.sh )
  4. And upload its JAR, ./offer-sink/target/offer-sink-0.0.1-SNAPSHOT.jar, to Cloud Storage.
  5. In your Google Cloud Platform web console, within the Storage view, make each of these JAR files public by clicking the Share publicly box

Set up the SCDF stream

Here, you will register your modules in SCDF, define and then deploy a stream

  1. Register the default list of modules with the UI
    • Use your browser to go to the bulk import UI
    • Click the Action box associated with the Maven based Stream Applications with RabbitMQ Binder line
    • Verify you have a set of modules by clicking the APPS link in the dashboard UI. They should be listed there.
  2. Edit the script, ./scdf/scdf_create_stream.sh, substituting the appropriate values under # EDIT THESE VALUES
  3. Run the script: ./scdf/scdf_create_stream.sh NOTE: review this code snippet to see if our binding approach can be simplified (24 April 2017)

Create your Twitter API credentials

  1. Go to http://apps.twitter.com and click the Create New App button.
  2. Follow the steps on that page (requires you have a Twitter account), and the consumer key and secret will be generated.
  3. After the step above, you will be redirected to your app's page.
  4. There, create an access token under the Your access token section.
  5. Make a note of these for the next step, below.

Set up the Twitter app

  1. cd ./twitter/
  2. Edit ./twitter.py, setting the value for OUR_SCREEN_NAME to the one for your Twitter account.
  3. Copy the ./manifest_template.yml to ./manifest.yml
  4. Edit ./manifest.yml, replacing each instance of YOUR_KEY_HERE with the appropriate value from your Twitter API credential set
  5. Push the app without starting it: cf push --no-start
  6. Using cf apps, note the value in the "urls" column for the app whose name ends in "-http"
  7. Now, create a service named "http-hub" using a URI based on that value: cf cups http-hub -p '{"uri": "http://dataflow-server-hf30QYI-socialmedia-http.YOUR_PCF_INSTALL.DOMAIN"}'
  8. Bind the Twitter app to this service instance: cf bs twitter http-hub
  9. Start the Twitter app: cf start twitter
  10. Create a service based on the Twitter app: cf cups twitter-service -p '{ "uri": "http://twitter.YOUR_PCF_INSTALL.DOMAIN" }'
  11. Next, bind this service to the offer-sink app: cf bs dataflow-server-SOME_ID_STRING-socialmedia-offer-sink twitter-service
  12. Finally, restage the offer-sink app to get the binding to take effect: cf restage dataflow-server-SOME_ID_STRING-socialmedia-offer-sink
  13. cd -

Set up the Web UI

This is a little single-page UI to show the progression from incoming data to outbound offer

  • cd web-ui
  • Edit the file ./src/main/webapp/scripts/controllers/main.js, changing the line under the comment // EDIT THIS TO CORRESPOND TO THE URL OF YOUR "ds_app_15" as indicated. This is the REST endpoint providing data to this UI.
  • Edit ./src/main/webapp/views/dashboard/home.html, replacing @bohochicsf with your account's Twitter handle.
  • Build: ./mvnw clean package
  • Push: cf push
  • Access the UI in your web browser

Review of the deployed system

SCDF Dashboard Streams View

  • Navigating to the STREAMS link on the SCDF Dashboard app, then clicking the arrow icon pointing to the socialmedia app yields the view shown above.
  • What we have now should provide the following:
    • If we run the Twitter app on our phone and follow bohochicsf
    • and we tweet something with a positive sentiment which also contains an image of clothing
    • we should receive tweet with an offer containing images of items similar to the one we tweeted, as in the image below

Twitter App Showing Offer

oss-pcf-gcp-retail-demo's People

Contributors

mgoddard-pivotal avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.