๐ This plugin is now officially incoporated into official Airflow Amazon Provider as of August 2021.
You can find instructions on using it on the Amazon EMR on EKS Operators documentation page.
!! Please note that Airflow 2.1.x or greater is required. If you want to use EMR on EKS on Amazon MWAA, you can follow the instructions below.
It's been tested with self-managed Airflow 2.0 (see Run EMR on EKS jobs on Apache Airflow) and Airflow 2.0 on MWAA.
- Python >= 3.8
If you're running Airflow 2.0 on Amazon WMAA, create a zip file with the plugin and an entrypoint. Then follow the instructions.
zip -j plugins.zip mwaa/mwaa_plugin.py
cd emr_containers
zip -r ../plugins.zip .
Alternatively, you can create a requirements.txt
file that points to this repository and contains the following line:
emr-containers @ https://github.com/dacort/emr-eks-airflow2-plugin/archive/main.zip
apache-airflow[amazon]==2.0.2
See airflow2_emr_eks.py
for an example Airflow 2.0 DAG.
That example requires a new Connection in Airflow with the connection ID of emr_eks
and the following "extra" config:
{
"virtual_cluster_id": "abcdefghijklmno0123456789",
"job_role_arn": "arn:aws:iam::123456789012:role/emr_eks_default_role"
}
Airflow 2.0 no longer supports importing plugins via airflow.{operators,sensors,hooks}.<plugin_name
, so extensions need to be imported as regular Python modules.
As I'm intending to get this merged into the AWS providers package in Airflow and not a pip-installable package, there are a couple ways of doing this:
-
Run
python setup.py
andpip install
the resulting wheel file on your Airflow installationpython setup.py bdist_wheel
-
If you're running a containerized version of Airflow, the
Dockerfile
builds a custom container based off Airflow 2.1.0docker build -t airflow2-emr-eks . docker tag airflow2-emr-eks ghcr.io/OWNER/airflow-emr-eks:2.1.0 docker push ghcr.io/OWNER/airflow-emr-eks:2.1.0