Giter VIP home page Giter VIP logo

data-pipelines-infrastructure's Introduction

Data Pipelines - Infrastructure

Infrastructure component of a Data Workflow Management system using these components:

This repository manages the customised Docker image build of Airflow. The new Docker image is based on Docker Hub apache/airflow. Customised Airflow DAGs and Plugins are built into the image and must be installable as a Python pip package. This provides an immutable deploy bound within the Docker container during run time. The dependent DAG pip install defaults to a simple Airflow DAG bookend example.

Note

The dependent DAG pip install can be overriden to suit your project's requirements as detailed under Image Build.

Prerequisties

Getting Started

Get the code and change into the top level git project directory:

$ git clone https://github.com/loum/data-pipelines-infrastructure.git && cd data-pipelines-infrastructure

Note

Run all commands from the top-level directory of the git repository.

For first-time setup, get the Makester project:

$ git submodule update --init

Keep Makester project up-to-date with:

$ make submodule-update

Setup the environment:

$ make init

Getting Help

There should be a make target to be able to get most things done. Check the help for more information:

$ make help

Image Build

The image build process takes the Docker Hub apache/airflow image and installs a Python package of custom Airflow DAGs and Plugin definitions via the normal Python package management process using pip. The upstream dependency for the Airflow image build is defined by the DATA_PIPELINES_DAG_REPO variable in the Makefile. This defaults to git+https://github.com/loum/[email protected] which is a simple DAG for the purposes of satisfying the requirement of our new Docker image build. To start the default data-pipelines-infrastructure Docker image build:

$ make build-image

To list the available data-pipelines-infrastructure Docker images:

$ make search-image

On the successful build of the Docker image, the typical output would be:

REPOSITORY                      TAG                 IMAGE ID            CREATED             SIZE
data-pipelines-infrastructure   eda36a5             0671ab41ad37        19 hours ago        766MB

Here, the TAG is important as it identifies the local data-pipelines-infrastructure build and is used directly in the Infrastructure Build and Setup.

Infrastructure Build and Setup

Start Infrastructure Components

Note

Triggering the local-build-up target forces a make build-image to ensure a data-pipelines-infrastructure Docker image exists locally.

To build a Dockerised Airflow platform running under Celery Executor mode:

$ make local-build-up

Navigate to the Airflow console http://localhost:<AIRFLOW__WEBSERVER__WEB_SERVER_PORT>

Note

The AIRFLOW__WEBSERVER__WEB_SERVER_PORT value can be identified with:

make print-AIRFLOW__WEBSERVER__WEB_SERVER_PORT

Destroy Infrastructure Components

To release all Docker resources:

$ make local-build-down

Image Tag

To tag the image as latest:

$ make tag

Or to align with tagging policy <airflow-version>-<data-pipeline-dags-tag>-<image-release-number>:

$ make tag-version

Note

Control version values by setting MAKESTER__VERSION and MAKESTER__RELEASE_NUMBER in the project Makefile.

Kubernetes Integration

Kubernetes shakeout and troubleshooting.

Prerequisites

(Optional) Convert existing docker-compose.yml to Kubernetes Manifests

Kubernetes provides the kompose conversion tool that can help you migrate to Kubernetes from docker-compose. Ensure that your docker-compose.yml file exists in the top-level directory of your project repository.

To create your Kubernetes manifests:

$ make k8s-manifests

This will deposit the generated Kubernetes manifests under the ./k8s directory.

Create A Local Kubernetes Cluster (Minikube) and Create Resources

Create a Pod and requires Services taken from manifests under ./k8s directory:

$ make kube-apply

Interact with Kubernetes Resources

View the Pods and Services:

$ make kube-get

Delete the Pods and Services:

$ make kube-del

Bring up the Airflow Webserver UI

The Kubernetes deployment will expose the Airflow Webserver UI that can be browsed to. The URL can be obtained with:

$ minikube service webserver --url

Cleanup Kubernetes

$ make mk-del

data-pipelines-infrastructure's People

Contributors

loum avatar

Watchers

James Cloos avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.