Giter VIP home page Giter VIP logo

airflow-chart-webinar's Introduction

Getting starrted with the official Apache Airflow Helm chart

Introducion

Let's follow instructions from the webinar Getting Started With the Official Airflow Helm Chart: https://www.youtube.com/watch?v=39k2Sz9jZ2c

Then, we can follow the official documentation of the Apache Airflow Helm Chart:

Currently there are at least 3 helm charts:

Very complete, with plenty of versions, all the executors, it allows you to load DAG files from an existing config map or from a git repository.

I've found no easy way to add your own python modules or extra dependencies.

Again, several versions, but based on Helm 2, it also allows you to add DAGs with a git-sync sidecar, in a Kubernetes PersistentVolume, or by providing your custom image.

Not yet tested.

Of course, the one to use.

It comes with a great documentation.

Configure cluster

Prerequisites: docker, docker-compose, kubectl, helm... and kind.

Note: useful link at https://kubernetes.io/docs/reference/kubectl/cheatsheet/

Let's follow these steps to install Kind: (Note: at the end I found useful this link: https://iximiuz.com/en/posts/kubernetes-kind-load-docker-image/ )

$ go get sigs.k8s.io/[email protected]

$ export PATH=$PATH:$(go env GOPATH)/bin

$ kind version
kind v0.11.0 go1.16.4 linux/amd64

Now let's create the cluster:

$ kind create cluster --name airflow-cluster --config kind-cluster.yaml
Creating cluster "airflow-cluster" ...
 โœ“ Ensuring node image (kindest/node:v1.21.1) ๐Ÿ–ผ 
 โœ“ Preparing nodes ๐Ÿ“ฆ ๐Ÿ“ฆ ๐Ÿ“ฆ ๐Ÿ“ฆ  
 โœ“ Writing configuration ๐Ÿ“œ 
 โœ“ Starting control-plane ๐Ÿ•น๏ธ 
 โœ“ Installing CNI ๐Ÿ”Œ 
 โœ“ Installing StorageClass ๐Ÿ’พ 
 โœ“ Joining worker nodes ๐Ÿšœ 
Set kubectl context to "kind-airflow-cluster"
You can now use your cluster with:

kubectl cluster-info --context kind-airflow-cluster

Have a nice day! ๐Ÿ‘‹

Finally, let's verify cluster:

$ kubectl cluster-info
Kubernetes control plane is running at https://127.0.0.1:35839
CoreDNS is running at https://127.0.0.1:35839/api/v1/namespaces/kube-system/services/kube-dns:dns/proxy

To further debug and diagnose cluster problems, use 'kubectl cluster-info dump'.

$ kubectl get nodes -o wide
NAME                            STATUS   ROLES                  AGE     VERSION   INTERNAL-IP   EXTERNAL-IP   OS-IMAGE       KERNEL-VERSION     CONTAINER-RUNTIME
airflow-cluster-control-plane   Ready    control-plane,master   3m41s   v1.21.1   172.18.0.3    <none>        Ubuntu 20.10   5.8.0-59-generic   containerd://1.5.1
airflow-cluster-worker          Ready    <none>                 3m8s    v1.21.1   172.18.0.2    <none>        Ubuntu 20.10   5.8.0-59-generic   containerd://1.5.1
airflow-cluster-worker2         Ready    <none>                 3m8s    v1.21.1   172.18.0.5    <none>        Ubuntu 20.10   5.8.0-59-generic   containerd://1.5.1
airflow-cluster-worker3         Ready    <none>                 3m8s    v1.21.1   172.18.0.4    <none>        Ubuntu 20.10   5.8.0-59-generic   containerd://1.5.1

Deploy Apache airflow

Create namespace:

$ kubectl create namespace airflow
namespace/airflow created

$ kubectl get ns
NAME                 STATUS   AGE
airflow              Active   31s
default              Active   9m7s
kube-node-lease      Active   9m9s
kube-public          Active   9m9s
kube-system          Active   9m9s
local-path-storage   Active   9m2s

Add official repository to helm.

If you don't have the repo yet

$ helm repo add apache-airflow https://airflow.apache.org

Otherwise:

$ helm repo add apache-airflow https://airflow.apache.org
"apache-airflow" already exists with the same configuration, skipping

$ helm repo update
Hang tight while we grab the latest from your chart repositories...
...
...Successfully got an update from the "apache-airflow" chart repository
...
Update Complete. โŽˆHappy Helming!โŽˆ

At the end you'll have:

...
$ helm repo list
NAME                    URL                                                  
apache-airflow          https://airflow.apache.org                           
...

Now you can look for helm charts in the official Apache Airflow repo:

...
$ helm search repo airflow
NAME                    CHART VERSION   APP VERSION     DESCRIPTION                                       
apache-airflow/airflow  1.0.0           2.0.2           Helm chart to deploy Apache Airflow, a platform...
...

Note: this command will find all charts with name airflow in all the repos.

Once we have the chart we can deploy it:

$ helm install airflow apache-airflow/airflow --namespace airflow --debug --timeout 30m0s
NOTES:
Thank you for installing Apache Airflow 2.0.2!

Your release is named airflow.
You can now access your dashboard(s) by executing the following command(s) and visiting the corresponding port at localhost in your browser:

Airflow Webserver:     kubectl port-forward svc/airflow-webserver 8080:8080 --namespace airflow
Flower dashboard:      kubectl port-forward svc/airflow-flower 5555:5555 --namespace airflow
Default Webserver (Airflow UI) Login credentials:
    username: admin
    password: admin
Default Postgres connection credentials:
    username: postgres
    password: postgres
    port: 5432

You can get Fernet Key value by running the following:

    echo Fernet Key: $(kubectl get secret --namespace airflow airflow-fernet-key -o jsonpath="{.data.fernet-key}" | base64 --decode)

$ helm ls -n airflow
NAME    NAMESPACE       REVISION        UPDATED                                         STATUS          CHART          APP VERSION
airflow airflow         1               2021-07-02 13:44:55.554129864 +0200 CEST        deployed        airflow-1.0.0  2.0.2      
javier@javier-Inspiron-13-5378:~/Projects/github.com/malsolo/airflow-chart-webinar$

Now you can see the deployed pods:

$ kubectl get pods -n airflow
NAME                                 READY   STATUS    RESTARTS   AGE
airflow-flower-54ff94c998-s8kct      1/1     Running   0          1m
airflow-postgresql-0                 1/1     Running   0          1m
airflow-redis-0                      1/1     Running   0          1m
airflow-scheduler-67c4958b86-lz885   2/2     Running   0          1m
airflow-statsd-7586f9998-kbp8m       1/1     Running   0          1m
airflow-webserver-6b447d5776-vfrzw   1/1     Running   0          1m
airflow-worker-0                     2/2     Running   0          1m

Since they are regular pods, you can do things such as see the logs and so on:

$ kubectl logs airflow-webserver-6b447d5776-vfrzw -n airflow
10.244.3.1 - - [02/Jul/2021:12:06:49 +0000] "GET /health HTTP/1.1" 200 187 "-" "kube-probe/1.21"
[2021-07-02 12:06:50 +0000] [24] [INFO] Handling signal: ttin
[2021-07-02 12:06:50 +0000] [223] [INFO] Booting worker with pid: 223

$ kubectl logs airflow-worker-0 -n airflow
error: a container name must be specified for pod airflow-worker-0, choose one of: [worker worker-gc] or one of the init containers: [wait-for-airflow-migrations]

$ kubectl get po -o jsonpath={.items[*].spec.containers[*].name} -n airflow
flower airflow-postgresql redis scheduler scheduler-gc statsd webserver worker worker-gc

$ kubectl get po airflow-worker-0 -o jsonpath='{.spec.containers[*].name}' -n airflow
worker worker-gc

$ kubectl get po airflow-scheduler-67c4958b86-lz885 -o jsonpath='{.spec.containers[*].name}' -n airflow
scheduler scheduler-gc

$ kubectl logs airflow-worker-0 -n airflow -c worker
[2021-07-02 11:48:15,664: INFO/MainProcess] Connected to redis://:**@airflow-redis:6379/0
[2021-07-02 11:48:15,701: INFO/MainProcess] mingle: searching for neighbors
[2021-07-02 11:48:16,754: INFO/MainProcess] mingle: all alone
[2021-07-02 11:48:16,776: INFO/MainProcess] celery@airflow-worker-0 ready.
[2021-07-02 11:48:19,305: INFO/MainProcess] Events of group {task} enabled by remote.

Now, let's make a port forward to access the airflow UI:

$ kubectl get svc -n airflow
NAME                          TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)             AGE
airflow-flower                ClusterIP   10.96.140.231   <none>        5555/TCP            79m
airflow-postgresql            ClusterIP   10.96.104.236   <none>        5432/TCP            79m
airflow-postgresql-headless   ClusterIP   None            <none>        5432/TCP            79m
airflow-redis                 ClusterIP   10.96.145.252   <none>        6379/TCP            79m
airflow-statsd                ClusterIP   10.96.10.167    <none>        9125/UDP,9102/TCP   79m
airflow-webserver             ClusterIP   10.96.139.226   <none>        8080/TCP            79m
airflow-worker                ClusterIP   None            <none>        8793/TCP            79m

$ kubectl port-forward svc/airflow-webserver 8080:8080 -n airflow
Forwarding from 127.0.0.1:8080 -> 8080
Forwarding from [::1]:8080 -> 8080

Now go to http://localhost:8080 with user/password admin/admin

Configure airflow

Let's update airflow to version 2.1.0

$ helm show values apache-airflow/airflow > values.yaml

Now edit: ยท defaultAirflowTag to 2.1.0 ยท airflowVersion to 2.1.0 ยท executor to KubernetesExecutor

We'll also add a config map: ยท extraEnvFrom <- see new values

We need to create this config map in K8s, we'll use variables.yaml for it.

$ kubectl apply -f variables.yaml 
configmap/airflow-variables created

$ kubectl get configmap -n airflow
NAME                     DATA   AGE
airflow-airflow-config   1      3h26m
airflow-variables        1      36s
kube-root-ca.crt         1      4h31m

Now, let's upgrade airflow

$ helm ls -n airflow
NAME    NAMESPACE       REVISION        UPDATED                                         STATUS          CHART             APP VERSION
airflow airflow         1               2021-07-02 13:44:55.554129864 +0200 CEST        deployed        airflow-1.0.0     2.0.2      

$ helm upgrade --install airflow apache-airflow/airflow --namespace airflow -f values.yaml --debug --timeout 30m0s
...
NOTES:
Thank you for installing Apache Airflow 2.1.0!

Your release is named airflow.
You can now access your dashboard(s) by executing the following command(s) and visiting the corresponding port at localhost in your browser:

Airflow Webserver:     kubectl port-forward svc/airflow-webserver 8080:8080 --namespace airflow
Default Webserver (Airflow UI) Login credentials:
    username: admin
    password: admin
Default Postgres connection credentials:
    username: postgres
    password: postgres
    port: 5432

You can get Fernet Key value by running the following:

    echo Fernet Key: $(kubectl get secret --namespace airflow airflow-fernet-key -o jsonpath="{.data.fernet-key}" | base64 --decode)

$ helm ls -n airflow
NAME    NAMESPACE       REVISION        UPDATED                                         STATUS          CHART             APP VERSION
airflow airflow         2               2021-07-02 17:20:23.126990423 +0200 CEST        deployed        airflow-1.0.0     2.0.2      

$ kubectl get po -n airflow
NAME                                 READY   STATUS    RESTARTS   AGE
airflow-postgresql-0                 1/1     Running   0          3h54m
airflow-scheduler-5797df5d6f-mt9rn   2/2     Running   0          1m
airflow-statsd-7586f9998-kbp8m       1/1     Running   0          3h54m
airflow-webserver-7d49bc9c68-cwhx4   1/1     Running   0          1m

Let's access the UI. In another terminal:

$ kubectl port-forward svc/airflow-webserver 8080:8080 -n airflow
Forwarding from 127.0.0.1:8080 -> 8080
Forwarding from [::1]:8080 -> 8080

Go to http://localhost:8080 with admin/admin

Let's also verify the config map variable (actually, an airflow variable)

$ kubectl get po -n airflow
NAME                                 READY   STATUS    RESTARTS   AGE
airflow-postgresql-0                 1/1     Running   0          3h54m
airflow-scheduler-5797df5d6f-mt9rn   2/2     Running   0          1m
airflow-statsd-7586f9998-kbp8m       1/1     Running   0          3h54m
airflow-webserver-7d49bc9c68-cwhx4   1/1     Running   0          1m

$ kubectl exec --stdin --tty airflow-webserver-7d49bc9c68-cwhx4 -n airflow -- /bin/bash
Defaulted container "webserver" out of: webserver, wait-for-airflow-migrations (init)
airflow@airflow-webserver-7d49bc9c68-cwhx4:/opt/airflow$ python
Python 3.6.13 (default, May 12 2021, 16:48:24) 
[GCC 8.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from airflow.models import Variable
>>> Variable.get("my_s3_bucket")
'my_s3_name'
>>> exit()
airflow@airflow-webserver-7d49bc9c68-cwhx4:/opt/airflow$ exit
exit
$

Next steps

Custom image with your dependencies and DAGs.

Load the mage in Kind: https://kind.sigs.k8s.io/docs/user/quick-start/#loading-an-image-into-your-cluster.

Follow https://airflow.apache.org/docs/helm-chart/stable/quick-start.html

For the time being, let's finish it

$ kind get clusters
airflow-cluster

$ kind delete cluster --name airflow-cluster

$ kind get clusters
No kind clusters found.

airflow-chart-webinar's People

Contributors

malsolo avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.