Giter VIP home page Giter VIP logo

awesome-apache-airflow's Introduction

Awesome Apache Airflow

contrib badge GitHub commit activity

This is a curated list of resources about Apache Airflow. Please feel free to contribute any items that should be included. Items are generally added at the top of each section so that more fresh items are featured more prominently.

Contents

Vital links

Airflow deployment solutions

Introductions and tutorials

Airflow Summit 2020 videos

The first Airflow Summit 2020 was held in July 2020. It was a truly global, fully online event that was co-hosted by 9 Airflow Meetups from all over the world (Melbourne, Tokyo, Bangalore, Warsaw, Amsterdam, London, NYC, BayArea).

It featured 40+ talks and three workshops. You can check out the talk recordings as a YouTube Airflow Summit 2020 Playlist or see the individual talks here:

Best practices, lessons learned and cool use cases

Books, blogs, podcasts, and such

Slide deck presentations and online videos

Libraries, Hooks, Utilities

  • Domino - Domino is an open source Graphical User Interface platform for creating data and Machine Learning workflows (DAGs) with no-code, visually intuitive drag-and-drop actions. It is also a standard for publishing and sharing your Python code so it can be automatically used by anyone, directly in the GUI.
  • Airflow-Helper - setting up Airflow Variables, Connections, and Pools from a YAML configuration file.
  • AirFly - Auto generate Airflow's dag.py on the fly.
  • DEAfrica Airflow - Airflow libraries used by Digital Earth Africa, an humanitarian effort to utilize satellite imagery of Africa.
  • Airflow plugins - Central collection of repositories of various plugins for Airflow, including mailchimp, trello, sftp, GitHub, etc.
  • fileflow - Collection of modules to support large data transfers between Airflow operators through either local file system or S3. This addresses a gap where data is too large for XCOMs but too small or inconvenient for loading directly in the operator. Built by Industry Dive.
  • fairflow - Library to abstract away Airflow's Operators with functional pieces that transform the data from one operator to another.
  • airflow-maintenance-dags - Clairvoyant has a repo of Airflow DAGs that operator on Airflow itself, clearing out various bits of the backing metadata store.
  • test_dags - a more complete solution for DAG integrity tests (first Circle of Data’s Inferno are the first.
  • dag-factory - A library for dynamically generating Apache Airflow DAGs from YAML configuration files.
  • whirl - Fast iterative local development and testing of Apache Airflow workflows.
  • airflow-code-editor - A plugin for Apache Airflow that allows you to edit DAGs in browser.
  • Pylint-Airflow - A Pylint plugin for static code analysis on Airflow code.
  • afctl - A CLI tool that includes everything required to create, manage and deploy airflow projects faster and smoother.
  • Dag Dependencies viewer - A plugin which creates a view to visualize dependencies between the Airflow DAGs
  • Airflow ECR Plugin - Plugin to refresh AWS ECR login token at regular intervals. This is helpful where DockerOperator needs to pull images hosted on ECR.
  • AirflowK8sDebugger - A library for generate k8s pod yaml templates from an Airflow dag using the KubernetesPodOperator.
  • Oozie to Airflow - A tool to easily convert between Apache Oozie workflows and Apache Airflow workflows.
  • Airflow Ditto - An extensible framework to do transformations to an Airflow DAG and convert it into another DAG which is flow-isomorphic with the original DAG, to be able to run it on different environments (e.g. on different clouds, or even different container frameworks - Apache Spark on YARN vs Kubernetes). Comes with out-of-the-box support for EMR-to-HDInsight-DAG transforms.
  • gusty - Create a DAG using any number of YAML, Python, Jupyter Notebook, or R Markdown files that represent individual tasks in the DAG. gusty also configures dependencies, DAGs, and TaskGroups, features support for your local operators, and more. A fully containerized demo is available here.
  • Meltano - Open source, self-hosted, CLI-first, debuggable, and extensible ELT tool that embraces Singer for extraction and loading, leverages dbt for transformation, and integrates with Airflow for orchestration.
  • DAG checks - The dag-checks consist of checks that can help you in maintaining your Apache Airflow instance.
  • Airflow DVC plugin - Plugin for open-source version-control system for data science and Machine Learning pipelines - DVC.
  • Airflow Vars - A CLI for variables management, created for CD-Pipelines in order to allow robust and safe variables management.

Meetups

Commercial Airflow-as-a-service providers

  • Google Cloud Composer - Google Cloud Composer is a managed service built atop Google Cloud and Airflow.
  • Qubole - Qubole is mainly known as a service-and-support company for Apache Hive, but also provides Airflow as a component of its platform.
  • Astronomer.io - Astronomer provides complete ETL lifecycle solutions and appears to be entirely focused on providing Airflow-based products.
  • AWS MWAA - Amazon Managed Workflows for Apache Airflow (MWAA) is a managed orchestration service for Apache Airflow that makes it easier to set up and operate end-to-end data pipelines in the cloud at scale.

Cloud Composer resources

This section contains articles that apply to Cloud Composer — a service built by Google Cloud based on Apache Airflow. Tricks and solutions are described here that are intended for Cloud Composer, but may be applicable to vanilla Airflow.

Non-English resources

Sample projects

License

CC0

To the extent possible under law, Jakob Homan has waived all copyright and related or neighboring rights to this work.

awesome-apache-airflow's People

Contributors

asafmaor avatar asandeep avatar barney-s avatar basph avatar chriscardillo avatar daniel-cortez-stevenson avatar duyet avatar eyaltrabelsi avatar feng-tao avatar germaintanguy avatar hankehly avatar jghoman avatar kaxil avatar marcosmarxm avatar marijaselakovic avatar mik-laj avatar msumit avatar potiuk avatar prabeesh avatar ryanchao2012 avatar sann3 avatar siddhantttt avatar sprohaska avatar tedmiston avatar tekn0ir avatar tfayyaz avatar turbaszek avatar villasv avatar vinayak-mehta avatar viniciusdsmello avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

awesome-apache-airflow's Issues

Add ECS fargate deployment solution

I created an example project demonstrating deployment to Elastic Container Service (ECS). I would like to include a link to it in this repository if you find it meets the criteria. It covers various use cases:

  • autoscale workers to zero
  • route logs to CloudWatch or Kinesis Firehose using fluentbit
  • use remote_logging to put/get task logs to S3
  • use the AWS provider SecretsManagerBackend to store/consume sensitive configuration options in SecretsManager
  • run a single command as standalone ECS task (eg. airflow db init)
  • get a shell into a running container via ECS exec
  • send Airflow statsd metrics to CloudWatch

https://github.com/hankehly/deploy-airflow-on-ecs-fargate

Add a link to the ariflow-dvc project

Airflow-dvc provides a plugin to integrate DVC tool into Airflow systems. DVC is a version-control system that allows versioning of ML artifacts, data-science intermediate datasets and other large-volume files. It's the alternative for git lfs that has better support for pipelines and results reproduction.

Link to the PR

Ok to add company providing Airflow support and dev services?

Hi all,
was wondering whether it would be considered appropriate to add a company that provides operations, support, and development services for (mostly on-premise) Airflow systems in the "Commercial Airflow-as-a-service providers" category? Looking at the project ecosystem page and this awesome list, there seems to be no other category that fits this kind of service. Yet, I think this is information that people are looking for (as I certainly have been in the past for Airflow as well as various other open source projects).

Any thoughts on this? Did I maybe miss a place where this kind of information would fit better? Or just create a PR and see what happens in review?

Best,
Markus

Docker based airflow container for standalone or cluster deployment.

Very recently docker based airflow container code has been published by me on GitHub repo. Docker airflow container has many features, some of them are listed below -

  • Support 1.10.0 and 1.9.0 version of airflow
  • Standalone or cluster mode deployment of the container.
  • Standalone mode is for exploration and learning purpose based on Sequential scheduler & Sqllite database.
    *Cluster mode is for production & long run use case and includes two modes of deployment 1) As a server & 2) As a worker
  • Supports log writes on AWS S3.
  • Container also consists support for GCP.
    There are many other silent features. Please include in your curated list.

Link of the Github Repo - https://github.com/abhioncbr/docker-airflow

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.