Giter VIP home page Giter VIP logo

bashsz / data-on-eks Goto Github PK

View Code? Open in Web Editor NEW

This project forked from awslabs/data-on-eks

0.0 0.0 0.0 84.5 MB

DoEKS is a tool to build, deploy and scale Data Platforms on Amazon EKS

Home Page: https://awslabs.github.io/data-on-eks/

License: Apache License 2.0

Shell 15.49% JavaScript 1.88% Python 13.63% TypeScript 0.77% CSS 0.34% PLpgSQL 20.77% HCL 46.00% Jupyter Notebook 1.01% Dockerfile 0.11%

data-on-eks's Introduction

plan-examples

Data on Amazon EKS (DoEKS)

πŸ’₯ Welcome to Data on Amazon EKS (DoEKS) πŸ’₯

Data on Amazon EKS (DoEKS) is a comprehensive tool that allows you to build scalable data platforms on Amazon EKS, whether you choose an AWS-managed or self-managed approach. This repository provides you with a wealth of resources, including Infrastructure as Code templates (like Terraform, AWS CDK), sample Apache Spark/ML jobs, references to AWS Data blogs, performance benchmark reports, and guidance on best practices for deploying data solutions on Amazon EKS.

Note: DoEKS is actively being developed for various patterns. To see what features are in progress, please check out the issues section of our repository.

πŸ—οΈ Architecture

The following diagram illustrates the open source data tools, Kubernetes operators, and frameworks covered by DoEKS, as well as the integration of AWS Data Analytics managed services with DoEKS open source tools.

image

🌟 Features

Data on EKS(DoEKS) solution is categorized into the following areas.

🎯 Data Analytics on EKS

🎯 AI/ML on EKS

🎯 Distributed Databases & Query Engine on EKS

🎯 Streaming Platforms on EKS

🎯 Scheduler Workflow Platforms on EKS

πŸƒβ€β™€οΈGetting Started

In this repository, you'll find a variety of deployment examples for creating data platforms with Amazon EKS clusters and Kubernetes add-ons. These examples are just a small selection of the available blueprints - visit the DoEKS website for the complete list of options.

πŸš€ EMR on EKS with Karpenter - πŸ‘ˆ Start here if you are new to EMR on EKS. This template deploys EMR on EKS cluster and uses Karpenter to scale Spark jobs.

πŸš€ Spark Operator on EKS - This template deploys EKS cluster and uses Spark Operator and Apache YuniKorn for running self-managed Spark jobs

πŸš€ Ray on EKS - This template deploys Ray Operator on EKS with sample scripts.

πŸš€ Amazon Manged Workflows for Apache Airflow (MWAA) - This template deploys EMR on EKS cluster and uses Amazon Managed Workflows for Apache Airflow (MWAA) to run Spark jobs.

πŸš€ Self-managed Airflow on EKS - This template sets up a self-managed Apache Airflow on an Amazon EKS cluster, following best practices.

πŸš€ Argo Workflows on EKS - This template sets up a self-managed Argo Workflow on an Amazon EKS cluster, following best practices.

πŸš€ Kafka on EKS - This template deploys a self-managed Kafka on EKS using the popular Strimzi Kafka operator.

πŸ—‚οΈ Documentation

For instructions on how to deploy Data on EKS patterns and run sample tests, visit the DoEKS website.

πŸ† Motivation

Kubernetes is a widely-used system for the large-scale orchestration of containerized software. It has become more suited for running stateful workloads with the introduction of several storage options in version 1.19. The availability of Spark on Kubernetes and the versatility of Kubernetes has encouraged many users to migrate their existing Hadoop-based clusters to Kubernetes.

However, deploying and managing Kubernetes clusters and scaling data workloads can still be challenging for many users as they are required to be proficient in both Kubernetes and data workloads. To address this, we developed Data on EKS (DoEKS) to help users easily run Spark on EKS, Kubeflow, MLFlow, Airflow, Presto, Kafka, Cassandra, and other data workloads.

🀝 Support & Feedback

DoEKS is maintained by AWS Solution Architects and is not an AWS service. Support is provided on a best effort basis by the Data on EKS Blueprints community. If you have feedback, feature ideas, or wish to report bugs, please use the Issues section of this GitHub.

πŸ” Security

See CONTRIBUTING for more information.

πŸ’Ό License

This library is licensed under the Apache 2.0 License.

πŸ™Œ Community

We welcome all individuals who are enthusiastic about data on Kubernetes to become a part of this open source community. Your contributions and participation are invaluable to the success of this project.

Built with ❀️ at AWS.

data-on-eks's People

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.