Copyright 2019 Google LLC. This solution, including any related sample code or data, is made available on an “as is,” “as available,” and “with all faults” basis, solely for illustrative purposes, and without warranty or representation of any kind. This solution is experimental, unsupported and provided solely for your convenience. Your use of it is subject to your agreements with Google, as applicable, and may constitute a beta feature as defined under those agreements. To the extent that you make any data available to Google in connection with your use of the solution, you represent and warrant that you have all necessary and appropriate rights, consents and permissions to permit Google to use and process that data. By using any portion of this solution, you acknowledge, assume and accept all risks, known and unknown, associated with its usage, including with respect to your deployment of any portion of this solution in your systems, or usage in connection with your business, if at all.
Feedloader is a packaged solution for Google Cloud that automates ingestion of Google Merchant Center ("GMC") Shopping feeds, and sends the data to GMC via a Content API interface at minimal cost and maximal performance.
Feedloader also features automatic prevention of product expiration in GMC, along with logging, alerts, and potential for user-contributed optimization expansions.
Users of GMC that have a Multi-Client Account ("MCA") or sub-accounts with Merchant Center IDs and feed files (CSV/TSV format) of their products can leverage Feedloader to automate Shopping data uploads at scale.
The only costs required to operate Feedloader are those to run the GCP environment's components, which are listed below. Cost to run Feedloader will vary depending on usage frequency and volume of data processed.
Individual product pricing can be found on their respective pricing pages, but we have taken precautions and design considerations to reduce GCP usage cost as much as possible, as many of the services fall under the free-tier usage limit for most use cases of Feedloader.
- App Engine (https://cloud.google.com/appengine/pricing)
- BigQuery (https://cloud.google.com/bigquery/pricing)
- Cloud Build (https://cloud.google.com/cloud-build/pricing)
- Cloud Composer (https://cloud.google.com/composer/pricing)
- Cloud Functions (https://cloud.google.com/functions/pricing)
- Cloud Run (https://cloud.google.com/run/pricing)
- Cloud Source Repositories (https://cloud.google.com/source-repositories/pricing)
- Cloud Storage (https://cloud.google.com/storage/pricing)
- Cloud Tasks (https://cloud.google.com/tasks/docs/pricing)
- Key Management Service (https://cloud.google.com/kms/pricing)
- PubSub (https://cloud.google.com/pubsub/pricing)
- Cloud Logging (https://cloud.google.com/stackdriver/pricing)
This section will explain how to setup Feedloader for use in a GCP environment.
DISCLAIMER: This guide assumes that your GCP environment is at default settings and is not customized in any way. Any pre-existing GCP settings modifications, configurations, or customizations may affect the installation and operation of this solution. For the best experience, use a new/unused GCP project when proceeding with the below guide.
-
Install the Google Cloud SDK to be able to run CLI commands. Ensure both alpha and beta are installed, and update components to the latest versions:
-
gcloud components install alpha
-
gcloud components install beta
-
gcloud components update
-
-
Install Python version 3 or greater.
-
Then install the following packages:
pip3 install google-auth requests
-
-
Have a Google Cloud Project available, or create a new one.
- If a billing account has not been set on the project, set one by following these instructions.
-
Clone the code from this repository.
-
Move to the root directory of the project and open the file
feed_schema_config.json
. Customize the mapping by naming thecsvHeader
fields to match your feed file’s schema/headers.-
The fields for 'bqColumn' should not be changed, but if your feed has Content API fields that are not in the default json schema, additional fields can be added. Refer to the full Content API product spec here.
-
Ensure that the naming and order matches your feed file’s headers exactly, or the resulting data in GMC will be corrupted.
-
If your feeds specify different GMC IDs per-item, append the following column entry for the GMC ID, and ensure your feed contains the header
google_merchant_id
for these IDs:{"csvHeader": "google_merchant_id", "bqColumn": "google_merchant_id", "columnType": "STRING"}
See the section MCID-Per-Item for details.
-
-
Edit the env.sh file in the repository root directory and supply values for all of the variables.
- For the variables ending in "BUCKET", follow the naming conventions for Google Cloud Storage buckets or they will not be created properly.
-
Run the initialization script in the root directory:
bash install_to_gcp.sh
-
Due to dependencies on Cloud Composer environment initialization, the installation could take a long time (~45 minutes).
-
If the scripts hangs (no log output for > 30 minutes), try using Ctrl-C to kill it. It might unblock it (if something was hanging) or kill it. If it gets killed, you can re-run it safely as the script is written to be re-runnable without harm.
-
Check the logs of the script for any errors.
-
-
Add the service account created in the script to GMC by following these steps:
-
Navigate to your GMC UI.
-
Go to User access for GMC (section: “Invite a new user”).
-
Add the Service account’s email address (replace [PROJECT_ID] with your GCP project ID):
merchant-center@[PROJECT_ID].iam.gserviceaccount.com
-
-
In order to deploy the Feedloader solution to your GCP project, it is required to perform a git push to your GCP project's Cloud Source Repository (reference: Cloud Source Repositories). Cloud Build will auto-trigger on the git push and deploy the code to GCP.
-
Generate a new identifier for Cloud Source Repositories by running the command shown on https://source.developers.google.com/new-password
-
Run the following git commands in the same local repository you cloned and ran the initialization script in:
git remote add google ssh://[EMAIL]@source.developers.google.com:2022/p/[PROJECT_ID]/r/[REPO_NAME]
(WherePROJECT_ID
is your GCP project ID andREPO_NAME
is the repository name you set forSOURCE_REPO
in env.sh)git add -A
git status
- Ensure no service_account .json files are being added, just the .enc files and a variables.json file.
git commit -m "[Your commit message]"
git push --all google
(If you see a prompt for choosing a configuration, choose option 1)
-
This will trigger the build scripts installed in your GCP project to deploy all the code and necessary resources. It should take around 2 minutes. You can check the status in your GCP Console's Cloud Build dashboard or history tab. Ensure that the logs show no errors.
-
-
Feedloader should now be ready to use.
-
Set the following environment variables in your shell:
-
GCP_PROJECT
: The ID of your GCP project. -
FEED_PATH
: Set this to the directory where your feed data files reside. They must be in CSV or TSV format. Itcan be a relative path, absolute path, or ~ path, but do not use double quotes if you use a ~ in the path.
-
FEED_BUCKET
: The GCP Storage Bucket to upload feed files to, e.g. “gs://shopping-feed-bucket”. -
UPDATE_BUCKET
: The GCP Storage Bucket to upload the EOF file to, e.g. “gs://update-bucket”.
-
-
Login to GCP with gcloud and set your current project:
gcloud auth login
gcloud config set project $GCP_PROJECT
-
Upload your feed files to GCP Cloud Storage using this command:
gsutil cp -j csv $FEED_PATH/* $FEED_BUCKET
-
Wait until all feed files are uploaded. Depending on the number and size of the files, this may take several minutes to fully load into BigQuery. We recommend to wait approximately 5 minutes for a batch of 100 files at ~50,000 products per file.
-
Prepare an empty zero-byte file named “EOF” (do not put it in the same folder as the feed files) and upload it to the UPDATE_BUCKET GCS bucket:
touch ~/[PATH_TO_EOF]/EOF
gsutil cp [PATH_TO_EOF]/EOF $UPDATE_BUCKET
-
Feedloader will process items that require changes in Content API (new items, updated items, deleted items, and items about to expire), and send them to GMC. You can view the results in your GMC UI.
Important: This feature is only supported when all MCIDs fall under a single MCA
Some users of Feedloader will want to send different items within a single feed file to more than one Merchant Center ID ("MCID").
For these cases, Feedloader supports the addition of an optional “google_merchant_id” column in the feed files: a column with the header "google_merchant_id" may be added to your feeds, which specifies the destination Merchant Center to route the Content API call for that item row.
Feedloader will automatically handle routing of this item to the MC ID via Content API (if an MCID was specified for that item row).
This value will be used as the account id specified in calls to Content API and will determine the destination account that items are sent to.
The “google_merchant_id” column is loaded into BigQuery in the same way as other columns.
Feedloader's behavior when this column is included in the feed file differs depending on whether the environment variable "IS_MCA" is set to True or False. This behavior is defined in the below table:
IS_MCA = True | IS_MCA = False | |
---|---|---|
"google_merchant_id" column is missing or value is equivalent to boolean False (empty, 0, etc) | Item is "skipped" in processing and not added to the batch + a warning is logged | The env var "DEFAULT_MERCHANT_ID" is used as the destination merchant_id in API calls |
"google_merchant_id" exists and is set to a value not equivalent to boolean False | The "google_merchant_id" value in each row of the feed file is used as the destination merchant_id in API calls | The env var "DEFAULT_MERCHANT_ID" is used as the destination merchant_id in API calls |
Feedloader has the option to enable sending product data to Content API as LIA
(Local Inventory Ads) in addition to online ads. If your inventory is also local
and you would like to take advantage of LIA, set the ``inenv.sh
to "True" and
install Feedloader to enable this.
By default, this setting will automatically be off, meaning products are sent
to only the online destination when USE_LOCAL_INVENTORY_ADS
was not set, or
was set to "False".
IMPORTANT NOTE: Currently Feedloader does not support "partial-LIA" (i.e. by enabling this setting, all products will be sent as both online and LIA ads).
Before Feedloader sends product data to Content API, it has the option to interface with a "Shoptimizer" REST API, a standalone solution that can be hosted in a Docker container which includes various optimization logic that attempts to automatically "fix" the product data being uploaded (based on best practices) before going into GMC.
This is controlled by setting the environment variables "SHOPTIMIZER_API_INTEGRATION_ON" and "SHOPTIMIZER_URL" upon installation of Feedloader. See the env.sh file for explanations of these variables.
The Shoptimizer API can also be used standalone, for users that already have a Content API interface.
See the Shoptimizer repository here.
Feedloader comes packaged with suites of unit tests, API integration tests, and end-to-end tests.
Some unit tests are currently run as part of the build process. i.e., the code will not deploy unless the tests all pass in Cloud Build. These include the Shoptimizer API unit tests.
Other tests will need to be run manually via Ava (for Cloud Functions) and the packaged test runners for Python App Engine code.
Feedloader comes packaged with several bash test scripts that kick off actual data uploads and EOF triggers to run the solution automatically for various use cases, and each time the scripts are run, GCP resources are force-cleaned so that they can be run again and again.
This repository may be updated with new releases. To get the newly released code deployed to your GCP environment, either perform a git pull from this repository, or overwrite your local repository with one of the tagged releases. Then perform a git add, commit, and push into your Cloud Source Repository. Refer to the diagram below for an overview.
If infrastructure or environment changes are required (i.e. not solely code-based changes), then documentation will be provided on how to update for that release.
For Developers/Maintainers of a Feedloader deployment, please see this guide for common troubleshooting procedures and frequently asked questions.