Giter VIP home page Giter VIP logo

airflow_in_docker_compose's Introduction

Maybe you will be also interested

  • airflow-helper - Pretty Fresh command line tool to set up Apache Airflow connections, variables & pools from yaml config. Support config inheritance & feature to get settings from existed server.

Official Docker-Compose

Pay attention that in current time already exists official Docker-Compose.yml https://airflow.apache.org/docs/apache-airflow/stable/start/docker.html (maybe better to use it)

Apache Airflow version 2.0.0

(2.0 not 100% bacward compatible to 1.10+ this is because I move it to separate compose file):

By default now RBAC is turn on and this mean, that to use Airflow UI you need create user first, for this in db_init service was added also command to create default user:

airflow users create --firstname admin --lastname admin --email admin --password admin --username admin --role Admin

Change your user password and login as you want. By default it is login: admin, password: admin.

New Apache Airflow 2.0

Note: If you will run docker-compose for 2nd and more times in init_db you will see log:

    initdb_1     | admin already exist in the db
    airflow_in_docker_compose_initdb_1 exited with code 0

docker-compose-with-celery-executor.yml

NOTE: if you previous run Airflow 1.10 - remove your DB volume files before run 2.0 or change db init command to db upgrade.

    git clone https://github.com/xnuinside/airflow_in_docker_compose
    cd airflow_in_docker_compose
    
    docker-compose -f docker-compose-2.0-with-celery-executor.yml up --build

Apache Airflow 2.* with 2 Celery Workers (or more)

Because was issue about run Apache 2.0 with 2 Celery workers I think will be not bad to have docker-compose with such set up.

I added it as separate compose file:

docker-compose-2.0-with-celery-executor-2-workers.yml

To check that your workers up&run well - use flower UI (it exists in docker-compose setup): Flower UI with 2 workers

Apache Airflow version 1.10.14:

    git clone https://github.com/xnuinside/airflow_in_docker_compose
    cd airflow_in_docker_compose

    # to run airflow with 1 Celery worker
    docker-compose up --build

Wait until all services will succesfull up & open http://localhost:8080/admin.

FAQ & Help

Exists different behaviour of Docker Compose on different OS relative to file system specifications, work with access rights & etc. This docker-compose file tested by me in MacOS mostly, some time I can up & run it on wsl (but not each update).

In issues you can find some cases when something goes wrong and maybe it will help you to solve own issue.

Ubuntu Issues:

  1. Permission denied error

WSL Issues:

  1. No DAGs in UI in Airflow 2.0 & failed airflow init on second runs - Not resolved yet

Also at the end of this README.md file exists section https://github.com/xnuinside/airflow_in_docker_compose#for-windows-10-users with some information wor WSL users. Maybe it also can help.

Problem with connection to PostgreSQL (at first time run):

If you share low count of resources for Docker or you have a machine with low perfomance, up&run PostgreSQL for the first time can take a significant time. And you can see the errors like this:

Is the server running on host "postgres" (172.25.0.3) and accepting
initdb_1     |  TCP/IP connections on port 5432?

In normal behaviour - in docker-compose I added autorestarts so after 10-15 secs all servers will be up&run, but sometimes 3 retries can be not enough.

I can recommend in this case at first time run postgres service separate until you will see information that Postgres is up & ready to accept connections.:

    docker-compose -f docker-compose-2.0-with-celery-executor-2-workers.yml up --build postgres

If you had any troubles & you successfully solve it - please open an issue with solution, I will add it to this readme.md file. Thank you!

Apache Airflow with Docker Compose examples

UPD from July 2020: Those articles was created before release of official Apache Airflow Docker image and they use puckel/docker-airflow. Now, already exist official image apache/airflow. So this docker-compose files became 'legacy' and all sources moved to 'docker_with_puckel_image'. Main Docker Compose Cluster based on apache/airflow Image

Docker-compose config based on official image (required docker-compose version 3.7 and higher):

docker-compose-with-celery-executor.yml

And env file with config setting for Airflow (used in docker-compose-with-celery-executor.yml): .env

Source files for article with description on Medium.

Apache Airflow with LocalExecutor: https://medium.com/@xnuinside/quick-guide-how-to-run-apache-airflow-cluster-in-docker-compose-615eb8abd67a

Apache Airflow with CeleryExecutor: https://medium.com/@xnuinside/quick-tutorial-apache-airflow-with-3-celery-workers-in-docker-composer-9f2f3b445e4

Install Python dependencies to docker-compose cluster without re-build images https://medium.com/@xnuinside/install-python-dependencies-to-docker-compose-cluster-without-re-build-images-8c63a431e11c

Main Apache Airflow UI Version

10.12.2022:

  1. Updated version to 2.5.0

20.09.2022:

  1. Updated version to 2.4.0
  2. all files with version 1.* & puckel images moved to "archive" folder
  3. 2* became default version
  4. Updated docker-compose version

20.09.2022:

  1. Updated version to 2.4.0
  2. all files with version 1.* & puckel images moved to "archive" folder
  3. 2* became default version
  4. Updated docker-compose version

03.02.2021:

  1. In docker-compose files for Airflow 2.0 scheduler service restart police changed to 'any' because for some reason it exist with 0 if error in DB and init is not finished yet, so restart policy 'on-failure' does not works.
  2. Added example for Apached Airflow 2.0 with 2 workers.

02.02.2021:

  1. Added FAQ section with issues that might help
  2. Updated fernet key in .env

18.12.2020:

  1. Added separate docker-compose file for Apache Airflow 2.0 version

16.12.2020:

  1. Update Apache Airflow version to 1.10.14
  2. Change init db command to "airflow db init"

29.11.2020:

  1. Update Apache Airflow version to 1.10.12
  2. Update PostgreSQL DB to 13.1
  3. Added restart_policy to services in docker-compose

07.2020:

  1. All compose files with puckel_image moved to docker_with_puckel_image
  2. Creted docker-compose config based on official image (required docker-compose version 3.7 and higher): docker-compose-with-celery-executor.yml And env file with config setting for Airflow (used in docker-compose-with-celery-executor.yml): .env
  3. At the bottom of readme added note for Windows 10 users

21.07.2020:

  1. Docker Compose files with puckel images moved to docker_with_puckel_image
  2. Added docker-compose-with-celery.yml based on official image.

18.12.19 changes:

  1. added samples for article https://medium.com/@xnuinside/install-python-dependencies-to-docker-compose-cluster-without-re-build-images-8c63a431e11c (docker-compose-volume-packages.yml, packages.pth, added commented lines to Dockerfile)
  2. added .dockerignore

29.11.19 changes:

  1. Apache Airflow Image was updated to version 1.10.6
  2. Added test_dag into airflow_files

For Windows 10 Users

If you try to work on Windows 10 & run docker-compose on it you will got an issue for postgres service:

FATAL: data directory "/var/lib/postgresql/data/pgdata" has wrong ownership

To solve this issue you must do additional steps (unfortanutely there is no more quick workaround, check: https://forums.docker.com/t/data-directory-var-lib-postgresql-data-pgdata-has-wrong-ownership/17963/23 and https://forums.docker.com/t/trying-to-get-postgres-to-work-on-persistent-windows-mount-two-issues/12456/5?u=friism):

  1. Create docker volume:

    docker volume create --name volume-postgresql -d local

  2. in docker-compose.yml: 2.1 add volume at thetop of the file, under 'networks' defining like this:

    networks:
      airflow:
    
    volumes:
      volume-postgresql:
        external: true
    

    2.2 change postgres service volumes:

     was:  
    
      - ./database/data:/var/lib/postgresql/data/pgdata
      - ./database/logs:/var/lib/postgresql/data/log
    
     become:
    
      - volume-postgresql:/var/lib/postgresql/data/pgdata
      - volume-postgresql:/var/lib/postgresql/data/log
    

Or use WSL and run docker under it.

If you never use docker with mount local folders as volumes under WSL possible you need first follow up this article: https://nickjanetakis.com/blog/setting-up-docker-for-windows-and-wsl-to-work-flawlessly#ensure-volume-mounts-work because by default volumes are not mounted correct and you will not see any 'dags' in Airflow.

airflow_in_docker_compose's People

Contributors

dependabot[bot] avatar fokko avatar xnuinside avatar zevaverbach avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

airflow_in_docker_compose's Issues

Create new worker for airflow 2.0.0

When I try to create a new worker for celery executor with the following .yml. The webserver died.

services:
  worker_2:
    image: apache/airflow:2.0.0-python3.8
    env_file:
      - .env
    volumes:
      - ./airflow_files/dags:/opt/airflow/dags
      - ./logs:/opt/airflow/logs
      - ./files:/opt/airflow/files
      - /var/run/docker.sock:/var/run/docker.sock
    command: celery worker
    deploy:
      restart_policy:
        condition: on-failure
        delay: 8s
        max_attempts: 3
    networks:
      - airflow

Screenshot 2021-01-11 at 2 34 17 PM

Do you know how to scale out with the airflow 2.0.0 image?

docker image from official

Привет!
Апач выложил офф версию в dockerhub.
Вы не меняли еще compose файл?

Windows docker container

When we run docker-compose in windows env, you may encounter the following two problems:

  1. Postgres mount error because of wrong permission, and we can solve this by using volumes
  2. Postgres DNS resolve error, so assigning a fixed IP address can solve this problem

The docker-compose file and env file changed below.

  • docker-compose.yml
version: '3.2'
networks:
  airflow:
    ipam:
      config:
        - subnet: 172.32.0.0/16

volumes:
  pgdata:
    driver: local
  pglog: 
    driver: local 

services:
  postgres:
    image: postgres:13.1
    environment:
      - POSTGRES_USER=airflow
      - POSTGRES_DB=airflow
      - POSTGRES_PASSWORD=airflow
      - PGDATA=/var/lib/postgresql/data/pgdata
    ports:
      - 5432:5432
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock
      - pgdata:/var/lib/postgresql/data/pgdata
      - pglog:/var/lib/postgresql/data/log
    command: >
     postgres
       -c listen_addresses=*
       -c logging_collector=on
       -c log_destination=stderr
       -c max_connections=200
    networks:
      airflow:
         ipv4_address: 172.32.0.2
      
  redis:
    image: redis:5.0.5
    environment:
      REDIS_HOST: redis
      REDIS_PORT: 6379
    ports:
      - 6379:6379
    networks:
      airflow:
         ipv4_address: 172.32.0.3
      
  webserver:
    env_file:
      - .env
    image: apache/airflow:2.0.0-python3.8
    ports:
      - 8080:8080
    volumes:
      - E:\airflow_in_docker_compose\airflow_files\dags:/opt/airflow/dags
      - E:\airflow_in_docker_compose\logs:/opt/airflow/logs
      - E:\airflow_in_docker_compose\files:/opt/airflow/files
      - /var/run/docker.sock:/var/run/docker.sock
    deploy:
      restart_policy:
        condition: on-failure
        delay: 5s
        max_attempts: 3
        window: 120s
    depends_on:
      - postgres
      - redis
      - initdb
    command: webserver
    healthcheck:
      test: ["CMD-SHELL", "[ -f /opt/airflow/airflow-webserver.pid ]"]
      interval: 30s
      timeout: 30s
      retries: 3
    networks:
     airflow:
         ipv4_address: 172.32.0.4
      
  flower:
    image: apache/airflow:2.0.0-python3.8
    env_file:
      - .env
    ports:
      - 5555:5555
    depends_on:
      - redis
    deploy:
      restart_policy:
        condition: on-failure
        delay: 8s
        max_attempts: 3
    volumes:
      - E:\airflow_in_docker_compose\logs:/opt/airflow/logs
    command: celery flower
    networks:
      airflow:
         ipv4_address: 172.32.0.5
      
  scheduler:
    image: apache/airflow:2.0.0-python3.8
    env_file:
      - .env
    volumes:
      - E:\airflow_in_docker_compose\airflow_files\dags:/opt/airflow/dags
      - E:\airflow_in_docker_compose\logs:/opt/airflow/logs
      - E:\airflow_in_docker_compose\files:/opt/airflow/files
      - /var/run/docker.sock:/var/run/docker.sock
    command: scheduler
    depends_on:
      - initdb
    deploy:
      restart_policy:
        condition: any
        delay: 5s
        window: 120s
    networks:
      airflow:
         ipv4_address: 172.32.0.6
  
  initdb:
    image: apache/airflow:2.0.0-python3.8    
    env_file:
      - .env
    volumes:
      - E:\airflow_in_docker_compose\airflow_files\dags:/opt/airflow/dags
      - E:\airflow_in_docker_compose\logs:/opt/airflow/logs
      - E:\airflow_in_docker_compose\files:/opt/airflow/files
      - /var/run/docker.sock:/var/run/docker.sock
    entrypoint: /bin/bash
    deploy:
      restart_policy:
        condition: on-failure
        delay: 8s
        max_attempts: 5
    command: -c "airflow db init && airflow users create --firstname admin --lastname admin --email admin --password admin --username admin --role Admin"
    depends_on:
      - redis
      - postgres
    networks:
       airflow:
         ipv4_address: 172.32.0.7
  
  worker:
    image: apache/airflow:2.0.0-python3.8
    env_file:
      - .env
    volumes:
      - E:\airflow_in_docker_compose\airflow_files\dags:/opt/airflow/dags
      - E:\airflow_in_docker_compose\logs:/opt/airflow/logs
      - E:\airflow_in_docker_compose\files:/opt/airflow/files
      - /var/run/docker.sock:/var/run/docker.sock
    command: celery worker
    depends_on:
      - scheduler
    deploy:
      restart_policy:
        condition: on-failure
        delay: 8s
        max_attempts: 3
    networks:
      - airflow
  • env files
AIRFLOW__CORE__EXECUTOR=CeleryExecutor
AIRFLOW__WEBSERVER__RBAC=False
AIRFLOW__CORE__CHECK_SLAS=False
AIRFLOW__CORE__STORE_SERIALIZED_DAGS=False
AIRFLOW__CORE__PARALLELISM=50
AIRFLOW__CORE__LOAD_EXAMPLES=False
AIRFLOW__CORE__LOAD_DEFAULT_CONNECTIONS=False
AIRFLOW__SCHEDULER__SCHEDULER_HEARTBEAT_SEC=10
AIRFLOW__CELERY__BROKER_URL=redis://:@172.32.0.3:6379/0
AIRFLOW__CELERY__RESULT_BACKEND=db+postgresql://airflow:[email protected]:5432/airflow
AIRFLOW__CORE__SQL_ALCHEMY_CONN=postgresql+psycopg2://airflow:[email protected]:5432/airflow
AIRFLOW__CORE__FERNET_KEY=P_gYHVxUHul5GNhev_Pde-Kr8qvCeurfSCF9OT7cJQM=

Permission Denied Error

When I run docker-compose up --build, I get
PermissionError: [Errno 13] Permission denied: '/opt/airflow/logs/scheduler'

Full log below -

flower_1     | Unable to load the config, contains a configuration error.
flower_1     | Traceback (most recent call last):
flower_1     |   File "/usr/local/lib/python3.6/logging/config.py", line 565, in configure
flower_1     |     handler = self.configure_handler(handlers[name])
flower_1     |   File "/usr/local/lib/python3.6/logging/config.py", line 738, in configure_handler
flower_1     |     result = factory(**kwargs)
flower_1     |   File "/home/airflow/.local/lib/python3.6/site-packages/airflow/utils/log/file_processor_handler.py", line 50, in __init__
flower_1     |     os.makedirs(self._get_log_directory())
flower_1     |   File "/usr/local/lib/python3.6/os.py", line 210, in makedirs
flower_1     |     makedirs(head, mode, exist_ok)
flower_1     |   File "/usr/local/lib/python3.6/os.py", line 220, in makedirs
flower_1     |     mkdir(name, mode)
flower_1     | PermissionError: [Errno 13] Permission denied: '/opt/airflow/logs/scheduler'

InitDB fails: Unable to load the config, contains a configuration error.

I am getting the following error not 100% how to resolve. Tried both on a WSL and ubuntu system now

Unable to load the config, contains a configuration error.
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/pathlib.py", line 1288, in mkdir
    self._accessor.mkdir(self, mode)
FileNotFoundError: [Errno 2] No such file or directory: '/opt/airflow/logs/scheduler/2023-07-25'

Latest docker compose not working at the startp up

Hi,

I just cloned your repo I did: docker-compose -f docker-compose-with-celery-executor.yml up
But I get these messages:

worker_1 | DB_BACKEND=postgresql+psycopg2
worker_1 | DB_HOST=postgres
worker_1 | DB_PORT=5432
redis_1 | 1:M 13 Aug 2020 14:08:08.174 * DB loaded from disk: 0.000 seconds
redis_1 | 1:M 13 Aug 2020 14:08:08.174 * Ready to accept connections
initdb_1 | DB: postgresql+psycopg2://airflow:***@postgres:5432/airflow
initdb_1 | [2020-08-13 14:08:10,833] {db.py:378} INFO - Creating tables
initdb_1 | Traceback (most recent call last):
initdb_1 | File "/home/airflow/.local/lib/python3.6/site-packages/sqlalchemy/engine/base.py", line 2339, in _wrap_pool_connect
initdb_1 | return fn()
initdb_1 | File "/home/airflow/.local/lib/python3.6/site-packages/sqlalchemy/pool/base.py", line 304, in unique_connection
initdb_1 | return _ConnectionFairy._checkout(self)
initdb_1 | File "/home/airflow/.local/lib/python3.6/site-packages/sqlalchemy/pool/base.py", line 778, in _checkout
initdb_1 | fairy = _ConnectionRecord.checkout(pool)
initdb_1 | File "/home/airflow/.local/lib/python3.6/site-packages/sqlalchemy/pool/base.py", line 495, in checkout
initdb_1 | rec = pool._do_get()
initdb_1 | File "/home/airflow/.local/lib/python3.6/site-packages/sqlalchemy/pool/impl.py", line 140, in _do_get
initdb_1 | self.dec_overflow()
initdb_1 | File "/home/airflow/.local/lib/python3.6/site-packages/sqlalchemy/util/langhelpers.py", line 69, in exit
initdb_1 | exc_value, with_traceback=exc_tb,
initdb_1 | File "/home/airflow/.local/lib/python3.6/site-packages/sqlalchemy/util/compat.py", line 178, in raise

initdb_1 | raise exception
initdb_1 | File "/home/airflow/.local/lib/python3.6/site-packages/sqlalchemy/pool/impl.py", line 137, in _do_get
initdb_1 | return self._create_connection()
initdb_1 | File "/home/airflow/.local/lib/python3.6/site-packages/sqlalchemy/pool/base.py", line 309, in _create_connection
initdb_1 | return _ConnectionRecord(self)
initdb_1 | File "/home/airflow/.local/lib/python3.6/site-packages/sqlalchemy/pool/base.py", line 440, in init
initdb_1 | self.__connect(first_connect_check=True)
initdb_1 | File "/home/airflow/.local/lib/python3.6/site-packages/sqlalchemy/pool/base.py", line 661, in _connect
initdb_1 | pool.logger.debug("Error on connect(): %s", e)
initdb_1 | File "/home/airflow/.local/lib/python3.6/site-packages/sqlalchemy/util/langhelpers.py", line 69, in exit
initdb_1 | exc_value, with_traceback=exc_tb,
initdb_1 | File "/home/airflow/.local/lib/python3.6/site-packages/sqlalchemy/util/compat.py", line 178, in raise

initdb_1 | raise exception
initdb_1 | File "/home/airflow/.local/lib/python3.6/site-packages/sqlalchemy/pool/base.py", line 656, in __connect
initdb_1 | connection = pool._invoke_creator(self)
initdb_1 | File "/home/airflow/.local/lib/python3.6/site-packages/sqlalchemy/engine/strategies.py", line 114, in connect
initdb_1 | return dialect.connect(*cargs, **cparams)
initdb_1 | File "/home/airflow/.local/lib/python3.6/site-packages/sqlalchemy/engine/default.py", line 493, in connect
initdb_1 | return self.dbapi.connect(*cargs, **cparams)
initdb_1 | File "/home/airflow/.local/lib/python3.6/site-packages/psycopg2/init.py", line 127, in connect
initdb_1 | conn = _connect(dsn, connection_factory=connection_factory, **kwasync)
initdb_1 | psycopg2.OperationalError: could not connect to server: Connection refused
initdb_1 | Is the server running on host "postgres" (172.25.0.3) and accepting
initdb_1 | TCP/IP connections on port 5432?
initdb_1 |
initdb_1 |

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.