Giter VIP home page Giter VIP logo

airflow-local-playground's Introduction

Airflow Examples

apache/airflow version: 2.1.3

other 2.X versions might also work, just hasn't tested

Offical Doc: Run Airflow in Docker

There are 3 kinds of executors provided for local testing. Aims to quickly test DAGs for the parsing error.

  • SequentialExecutor: scheduler can only run tasks one by one, since sqlite can not accept multiple connections. All the airflow components are in one container.
    • data in sqlite is not preserved, shutdown airflow cleans all the data.
      • create empty ./airflow.db for the first time and use volume to preserve data if needed.
  • LocalExecutor: workers executes tasks concurrently. Scheduler and workers are in same container.
    • data in posgresql is preserved with volume
  • CeleryExecutor: scheduler sends task to redis queue, celery workers pull tasks from queue and execute, which is same as offcial docker-compose file
    • data in posgresql is preserved with volume

Start Airflow

[All] Set UID and GID to .env

On Linux, the mounted volumes in container use the native Linux filesystem user/group permissions, so make sure the container and host computer have matching file permissions.

echo -e "AIRFLOW_UID=$(id -u)\nAIRFLOW_GID=0" > .env

[LocalExecutor/CeleryExecutor] Init Airflow

Run for the first time, which helps to

  • Check resources and print airflow version
  • Create logs and plugins folders
  • Run database migrations and create the first user account
    • Since table schema has been evolved, scheduler can not run successfully without migrations first

End log message as below

airflow-init_1       | Upgrades done
airflow-init_1       | [2021-09-06 02:35:47,963] {manager.py:788} WARNING - No user yet created, use flask fab command to do it.
airflow-init_1       | Admin user airflow created
airflow-init_1       | 2.1.3
airflow-examples_airflow-init_1 exited with code 0

SequentialExecutor does not need this step, since the commands (without checking resources and airflow version) in airflow-init is merged to one container which runs when up.

  • LocalExecutor
docker-compose -f docker-compose-local.yaml up airflow-init
  • CeleryExecutor
docker-compose -f docker-compose-celery.yaml up airflow-init

[All] Up and Down

It takes some time for airflow to start, wait patiently for UI in localhost:8080

  • SequentialExecutor
docker-compose -f docker-compose-sequentail.yaml up -d
docker-compose -f docker-compose-sequentail.yaml down
  • LocalExecutor
docker-compose -f docker-compose-local.yaml up -d
docker-compose -f docker-compose-local.yaml down
  • CeleryExecutor
docker-compose -f docker-compose-celery.yaml up -d
docker-compose -f docker-compose-celery.yaml down

Login

default user and pwd are both airflow, which can be set by _AIRFLOW_WWW_USER_USERNAME and _AIRFLOW_WWW_USER_PASSWORD

Check DAG examples

All the dags in dags folder contains sh label which can be used to filter DAGs with UI. Check README.md or markdown in DAG for how to trigger and observe the result.

Examples given by airflow

Since AIRFLOW__CORE__LOAD_EXAMPLES is true in docker-compose-*.yaml, airflow example dags will also show in UI which is useful for learning how to contruct DAGs.

Debug DAGs on the fly

dag folder path in container is /opt/airflow/dags, use docker cp command to copy dag files in local to the path in scheduler container after docker-compose up. Click refresh DAG button from UI and it is expected to change DAG without restarting all docker containers

Where is scheduler?

  • docker-compose-sequential.yaml: airflow-service
    • container name: airflow-examples_airflow-service_1
  • docker-compose-local.yaml: airflow-scheduler
    • container name: airflow-examples_airflow-scheduler_1
  • docker-compose-celery.yaml: airflow-scheduler
    • container name: airflow-examples_airflow-scheduler_1

Note: If using airflow < 1.10.7 without dag serialization, dag files should be copied to both webserver and scheduler. Check dag-serialization for more details.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.