Giter VIP home page Giter VIP logo

emr-eks-airflow2-plugin's Introduction

EMR on EKS Airflow v2 Plugin

๐Ÿ’ This plugin is now officially incoporated into official Airflow Amazon Provider as of August 2021.

You can find instructions on using it on the Amazon EMR on EKS Operators documentation page.

!! Please note that Airflow 2.1.x or greater is required. If you want to use EMR on EKS on Amazon MWAA, you can follow the instructions below.

It's been tested with self-managed Airflow 2.0 (see Run EMR on EKS jobs on Apache Airflow) and Airflow 2.0 on MWAA.

Requirements

  • Python >= 3.8

Installing on MWAA

If you're running Airflow 2.0 on Amazon WMAA, create a zip file with the plugin and an entrypoint. Then follow the instructions.

    zip -j plugins.zip mwaa/mwaa_plugin.py
    cd emr_containers
    zip -r ../plugins.zip .

Alternatively, you can create a requirements.txt file that points to this repository and contains the following line:

emr-containers @ https://github.com/dacort/emr-eks-airflow2-plugin/archive/main.zip
apache-airflow[amazon]==2.0.2

Usage

See airflow2_emr_eks.py for an example Airflow 2.0 DAG.

That example requires a new Connection in Airflow with the connection ID of emr_eks and the following "extra" config:

{
    "virtual_cluster_id": "abcdefghijklmno0123456789",
    "job_role_arn": "arn:aws:iam::123456789012:role/emr_eks_default_role"
}

(Deprecated) Installing

Airflow 2.0 no longer supports importing plugins via airflow.{operators,sensors,hooks}.<plugin_name, so extensions need to be imported as regular Python modules.

As I'm intending to get this merged into the AWS providers package in Airflow and not a pip-installable package, there are a couple ways of doing this:

  1. Run python setup.py and pip install the resulting wheel file on your Airflow installation

     python setup.py bdist_wheel
    
  2. If you're running a containerized version of Airflow, the Dockerfile builds a custom container based off Airflow 2.1.0

     docker build -t airflow2-emr-eks .
     docker tag airflow2-emr-eks ghcr.io/OWNER/airflow-emr-eks:2.1.0
     docker push ghcr.io/OWNER/airflow-emr-eks:2.1.0
    

emr-eks-airflow2-plugin's People

Contributors

dacort avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar

Forkers

pserrano

emr-eks-airflow2-plugin's Issues

The plugin doesn't run on Airflow 2.1.0

Environment

  • Airflow 2.1.0
  • Kubernetes Executor

Issue

When I trigger the DAG, the worker pod runs and completes immediately with below logs. EMR job doesn't create inside EMR console.

/home/airflow/.local/lib/python3.6/site-packages/airflow/configuration.py:346 DeprecationWarning: The default_queue option in [celery] has been moved to the default_queue option in [operators] - the old setting has been used, but please update your config.
/home/airflow/.local/lib/python3.6/site-packages/airflow/configuration.py:346 DeprecationWarning: The default_queue option in [celery] has been moved to the default_queue option in [operators] - the old setting has been used, but please update your config.
[2021-06-17 10:09:43,692] {dagbag.py:487} INFO - Filling up the DagBag from /opt/airflow/dags/airflow2_emr_on_k8s.py
Running <TaskInstance: airflow2_emr_on_k8s.start_job 2021-06-16T00:00:00+00:00 [queued]> on host airflow2emronk8sstartjob.ba123f091ceb4988b564036610e0e90d

Other options

I built the airflow image based on your Dockerfile. However, there is a broken issue with Airflow 2.0.1 that we couldn't deploy it. Logs below

/home/airflow/.local/lib/python3.8/site-packages/azure/cosmos/session.py:186 SyntaxWarning: "is not" with a literal. Did you mean "!="?
WARNI [airflow.providers_manager] Exception when importing 'airflow.providers.microsoft.azure.hooks.wasb.WasbHook' from 'apache-airflow-providers-microsoft-azure' package: No module named 'azure.storage.blob'
WARNI [airflow.providers_manager] Exception when importing 'airflow.providers.microsoft.azure.hooks.wasb.WasbHook' from 'apache-airflow-providers-microsoft-azure' package: No module named 'azure.storage.blob'
Traceback (most recent call last):
  File "/home/airflow/.local/lib/python3.8/site-packages/alembic/script/base.py", line 162, in _catch_revision_errors
    yield
  File "/home/airflow/.local/lib/python3.8/site-packages/alembic/script/base.py", line 356, in _upgrade_revs
    revs = list(revs)
  File "/home/airflow/.local/lib/python3.8/site-packages/alembic/script/revision.py", line 904, in _iterate_revisions
    requested_lowers = self.get_revisions(lower)
  File "/home/airflow/.local/lib/python3.8/site-packages/alembic/script/revision.py", line 455, in get_revisions
    return sum([self.get_revisions(id_elem) for id_elem in id_], ())
  File "/home/airflow/.local/lib/python3.8/site-packages/alembic/script/revision.py", line 455, in <listcomp>
    return sum([self.get_revisions(id_elem) for id_elem in id_], ())
  File "/home/airflow/.local/lib/python3.8/site-packages/alembic/script/revision.py", line 458, in get_revisions
    return tuple(
  File "/home/airflow/.local/lib/python3.8/site-packages/alembic/script/revision.py", line 459, in <genexpr>
    self._revision_for_ident(rev_id, branch_label)
  File "/home/airflow/.local/lib/python3.8/site-packages/alembic/script/revision.py", line 525, in _revision_for_ident
    raise ResolutionError(
alembic.script.revision.ResolutionError: No such revision or branch 'a13f7613ad25'
airflow command error: argument GROUP_OR_COMMAND: `airflow upgradedb` command, has been removed, please use `airflow db upgrade`, see help above.
usage: airflow [-h] GROUP_OR_COMMAND ...

positional arguments:
  GROUP_OR_COMMAND

    Groups:
      celery         Celery components
      config         View configuration
      connections    Manage connections
      dags           Manage DAGs
      db             Database operations
      kubernetes     Tools to help run the KubernetesExecutor
      pools          Manage pools
      providers      Display providers
      roles          Manage roles
      tasks          Manage tasks
      users          Manage users
      variables      Manage variables

    Commands:
      cheat-sheet    Display cheat sheet
      info           Show information about current Airflow and environment
      kerberos       Start a kerberos ticket renewer
      plugins        Dump information about loaded plugins
      rotate-fernet-key
                     Rotate encrypted connection credentials and variables
      scheduler      Start a scheduler instance
      sync-perm      Update permissions for existing roles and DAGs
      version        Show the version
      webserver      Start a Airflow webserver instance
      ```

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.