Giter VIP home page Giter VIP logo

kasna-cloud / dataflow-fsi-example Goto Github PK

View Code? Open in Web Editor NEW
38.0 15.0 15.0 11.77 MB

Using Google Cloud, this project is an example of how to detect anomalies in financial, technical indicators by modeling their expected distribution and thus inform when the Relative Strength Indicator (RSI) is unreliable.

License: MIT License

Dockerfile 0.03% Java 12.25% Python 1.31% Shell 0.13% HCL 0.20% Jupyter Notebook 86.08%
gcp rsi dataflow fsi tensorflow

dataflow-fsi-example's Introduction

Dataflow Financial Services Time-Series Example

This project is an example of how to detect anomalies in financial, technical indicators by modeling their expected distribution and thus inform when the Relative Strength Indicator (RSI) is unreliable. RSI is a popular indicator for traders of financial assets, and it can be helpful to understand when it is reliable or not. This example will show how to implement a RSI model using realistic foreign exchange market data, Google Cloud Platform and the Dataflow time-series sample library.

Dashboards

The Dataflow samples library is a fast, flexible library for processing time-series data -- particularly for financial market data due to its large volume. Its ability to generate useful metrics in real-time significantly reduces the time and effort to build machine learning models and solve problems in the finance domain. This library is used in the metrics generator component of this example and detailed information on it's usage can be found in docs.

The GCP infrastructure used in this example includes Dataflow, Pub/Sub, BigQuery, Kubernetes Engine, and AI Platform. Further information on components, flows and diagrams can be found in the docs directory.

A great place to start is to run this example in GCP and view the excellent blog for a detailed walk-through of the solution.

Quickstart

Run from laptop

To install:

  1. Create a new project in GCP
  2. Install gcloud and set PROJECT_ID
  3. Execute this script to create base infrastructure. This will take about 5-10mins
    ./deploy-infra.sh
  4. After this has completed, deploy the pipelines and model by executing the run-app script. This will take about 5mins
    ./run-app.sh
  5. View the grafana dashboard. The username and password is your PROJECT_ID and the location is found in the Cloud Console and output in the build log.

Run on Cloud Shell

You can also run this example using Cloud Shell. To begin, login to the GCP console and select the “Activate Cloud Shell” icon in the top right of your project dashboard. Then run the following:

  1. Clone the repo:
    git clone https://github.com/kasna-cloud/dataflow-fsi-example.git && cd dataflow-fsi-example
  2. Execute this script to create base infrastructure. This will take about 5-10mins
    ./deploy-infra.sh
  3. After this has completed, deploy the pipelines and model by executing the run-app script. This will take about 5mins
    ./run-app.sh
  4. View the grafana dashboard. The username and password is your PROJECT_ID and the location is found in the Cloud Console and output in the build log.

Problem Domain

The Relative Strength Index, or RSI, is a popular financial technical indicator that measures the magnitude of recent price changes to evaluate whether an asset is currently overbought or oversold.

To detect when RSI is reliable or not for a given asset, the modelling approach is as follows. We train an anomaly detection model to learn the expected behaviour of metrics describing the asset when RSI is greater than 70 or RSI is less than 30. When an anomaly is detected, the model is informing that these input metrics are behaving differently to how they usually behave when RSI is greater than 70 or RSI is less than 30. And so in these instances, RSI is not reliable and a trade is not advised. If no anomaly is detected, then the metrics are behaving as expected, so you can trust RSI and make a trade. NOTE:

This blog contains general advice only. It was prepared without taking into account your objectives, financial situation, or needs. You should speak to a financial planner before making a financial decision, and you should speak to a licensed ML practitioner before making an ML decision.

A deep-dive on the problem domain, data science and model creation are in Jupyter notebooks which you can run yourself, or view right here on github:

Be sure to view the blog for a detailed walk-through of the solution.

Repo Layout

This repo is organised into folders containing logical functions of the example. A brief description of these are below:

  • app
    • app/bootstrap_models This is the LSTM TFX model pre-populated with the RSI example so that dashboards can immediately render RSI values. During the run-app.sh deployment of components, this model will be uploaded into GCS and a new Cloud Machine Learning model version will be created for the inference pipeline to use. This model is then updated by the re-training data pipeline.
    • app/grafana Contains visualization configuration used in the grafana dashboards.
    • app/java This directory holds the Dataflow pipeline code using the Dataflow samples library. The pipeline creates metrics from the prices stream.
    • app/kubernetes Directory of deployment manifests for starting the Dataflow pipelines, prices generator and retraining job.
    • app/python This directory contains a containerized python program for:
      • inference and retraining pipelines
      • pubsub to bigquery pipeline
      • forex generator to create realistic prices
  • docs This folder contains further example information and diagrams
  • infra Contains the cloudbuild and terraform code to deploy this example GCP infrastructure.
  • notebooks This folder has detailed AI Notebooks which step through the RSI use case from a Data Science perspective.

Further information is available in the directory READMEs and the docs directory.

Components

This example can be thought of in two distinct, logical functions. One for real-time ingestion of prices and determination of RSI presence, and another for the re-training of the model to improve prediction.

The logical diagram for the real-time and training in GCP components is here:

Logical diagram

A detailed list of the components and data flows can be found in the FLOWS doc.

Storage Components

  • Three PubSub Topics:
    • prices
    • metrics
    • reconerr
  • One BigQuery Dataset with 3 Tables, schema defined in table_schemas:
    • prices
    • metrics
    • reconerr
  • One AI Platform Model
  • One Cloud SQL Database for ML Metadata

Compute Components

Deployment

This repo uses java, python, cloudbuild, terraform and other technologies which require configuration. For this example we have chosen to store all configuration values in the config.sh file. You can change any values in this file to modfiy the behaviour or deployment of the example.

This example is designed to be run in a fresh GCP project and requires at least Owner privileges to the project. All further IAM permissions are set by Cloud Build or Terraform.

Deployment of this example is done in two steps:

  1. infrastructure into GCP by CloudBuild and terraform
  2. application and pipeline deployment using CloudBuild

Both of these CloudBuild steps can be triggered using the deploy-infra.sh and run-app.sh scripts and require only a gcloud Google Cloud SDK to be installed locally.

To install this example repo into your Google Cloud project, follow the instructions in the Quickstart section. If needed, this example can be run using GCP Cloud Shell.

Further information is available in the app and infra directories.

License

This code is licensed under the terms of the MIT license and is available for free.

Links

This repo has been built with the support of Google, Kasna and Eliiza. Links the relevant doco, libraries and resources are below:

Contributing

The excellent contributors to this repo are listed in the AUTHORS file and in the git history. If you would like to contribute please see the CODE-OF-CONDUCT and CONTRIBUTING info.

dataflow-fsi-example's People

Contributors

jj11teen avatar patrickflavel avatar troybebee avatar viohman avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

dataflow-fsi-example's Issues

Limit maxNumWorkers in Dataflow Streaming Engine

Currently there is no limit on number of workers, which means they can scale up to 100. As a good practice, it would be good to set up a reasonable limit to prevent cost explosion in case of any errors.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.