Giter VIP home page Giter VIP logo

cde_tour_ace_hol's Introduction

CDE Tour ACE Workshop HOL

โš  Warning
This Guide was created for CDE versions up until 1.18. CDE 1.19 was released in May 2023 and includes some important updates. If you have a CDE 1.19 Virtual Cluster at your disposal we recommend using the updated version of the HOL available at this GitHub repository.

Objective

CDE is the Cloudera Data Engineering Service, a containerized managed service for Cloudera Data Platform designed for Large Scale Batch Pipelines with Spark, Airflow and Iceberg. It allows you to submit batch jobs to auto-scaling virtual clusters. As a Cloud-Native service, CDE enables you to spend more time on your applications, and less time on infrastructure.

CDE allows you to create, manage, and schedule Apache Spark jobs without the overhead of creating and maintaining Spark clusters. With CDE, you define virtual clusters with a range of CPU and memory resources, and the cluster scales up and down as needed to run your Spark workloads, helping to control your cloud costs.

This Hands On Lab is designed to walk you through the Services's main capabilities. Throughout the exercises you will:

  1. Deploy an Ingestion, Transformation and Reporting pipeline with Spark 3.2.
  2. Learn about Iceberg's most popular features.
  3. Orchestrate pipelines with Airflow.
  4. Use the CDE CLI to execute Spark Submits and more from your local machine.

Step by Step Instructions

Detailed instructions in English are provided in the step_by_step_guides folder.

Next Steps

CDE is the Cloudera Data Engineering Service, a containerized managed service for Spark and Airflow.

If you are exploring CDE you may find the following tutorials relevant:

  • Spark 3 & Iceberg: A quick intro of Time Travel Capabilities with Spark 3.

  • Simple Intro to the CDE CLI: An introduction to the CDE CLI for the CDE beginner.

  • CDE CLI Demo: A more advanced CDE CLI reference with additional details for the CDE user who wants to move beyond the basics.

  • CDE Resource 2 ADLS: An example integration between ADLS and CDE Resource. This pattern is applicable to AWS S3 as well and can be used to pass execution scripts, dependencies, and virtually any file from CDE to 3rd party systems and viceversa.

  • Using CDE Airflow: A guide to Airflow in CDE including examples to integrate with 3rd party systems via Airflow Operators such as BashOperator, HttpOperator, PythonOperator, and more.

  • GitLab2CDE: a CI/CD pipeline to orchestrate Cross-Cluster Workflows for Hybrid/Multicloud Data Engineering.

  • CML2CDE: an API to create and orchestrate CDE Jobs from any Python based environment including CML. Relevant for ML Ops or any Python Users who want to leverage the power of Spark in CDE via Python requests.

  • Postman2CDE: An example of the Postman API to bootstrap CDE Services with the CDE API.

  • Oozie2CDEAirflow API: An API to programmatically convert Oozie workflows and dependencies into CDE Airflow and CDE Jobs. This API is designed to easily migrate from Oozie to CDE Airflow and not just Open Source Airflow.

For more information on the Cloudera Data Platform and its form factors please visit this site.

For more information on migrating Spark jobs to CDE, please reference this guide.

If you have any questions about CML or would like to see a demo, please reach out to your Cloudera Account Team or send a message through this portal and we will be in contact with you soon.

alt text

cde_tour_ace_hol's People

Contributors

maxhardt avatar pdefusco avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.