Giter VIP home page Giter VIP logo

stacks-azure-data's Introduction

Stacks Azure Data Platform

Link to the official documentation: Stacks Azure Data Platform.

Overview

The Ensono Stacks Azure Data Platform solution provides a framework for accelerating the deployment of a production-ready modern data platform in Azure.

Ensono Stacks Data Overview

  1. Use the Ensono Stacks CLI to generate a new data platform project.
  2. Build and deploy the data platform infrastructure into your Azure environment.
  3. Accelerate development of data workloads and ELT pipelines with the Datastacks CLI.

The Ensono Stacks Data Platform delivers a modern Lakehouse solution, based upon the medallion architecture, with Bronze, Silver and Gold layers for various stages of data preparation. The platform utilises tools including Azure Data Factory for data ingestion and orchestration, Databricks for data processing and Azure Data Lake Storage Gen2 for data lake storage. It provides a foundation for data analytics and reporting through Microsoft Fabric and Power BI.

Key elements of the solution include:

  • Infrastructure as code (IaC) for all infrastructure components (Terraform).
  • Deployment pipelines to enable CI/CD and DataOps for the platform and all data workloads.
  • Sample data ingest pipelines that transfer data from a source into the landing (Bronze) data lake zone.
  • Sample data processing pipelines performing data transformations from Bronze to Silver and Silver to Gold layers.

The solution utilises the Stacks Data Python library, which offers a suite of utilities to support:

High-level architecture

High-level architecture

Repository structure

stacks-azure-data
├── build           # Deployment pipeline configuration for building and deploying the core infrastructure
├── de_build        # Deployment pipeline configuration for building and deploying data engineering resources
├── de_workloads    # Data engineering workload resources, including data pipelines, tests and deployment configuration
│   ├── generate_examples    # Example config files for generating data engineering workloads using Datastacks
│   ├── ingest               # Data ingestion workloads
│   ├── processing           # Data processing and transformation workloads
│   ├── shared_resources     # Shared resources used across data engineering workloads
├── deploy          # TF modules to deploy core Azure resources (used by `build` directory)
├── docs            # Documentation
├── stacks-cli      # Example config to use when scaffolding a project using stacks-cli
├── utils           # Python utilities package used across solution for local testing
├── .pre-commit-config.yaml         # Configuration for pre-commit hooks
├── Makefile        # Includes commands for environment setup
├── pyproject.toml  # Project dependencies
├── README.md       # This file
├── stackscli.yml   # Tells the Stacks CLI what operations to perform when the project is scaffolded
├── taskctl.yaml    # Controls the independent runner
└── yamllint.conf   # Linter configuration for YAML files used by the independent runner

Terraform Locals

In the directories, deploy/azure/infra and deploy/azure/networking there is a locals.tf file. This file is used to calculate values based on given variable values and / or whether to get the values from a data block.

Additionally it is used to store complicated object variables that have been traditionally stored in the vars.tf file. For example the network_details in the deploy/azure/networking/locals.tf describes all of the networking information required to establish the hub and spoke network.

As the details are stored in the locals file they cannot be overridden using variables, so this will need to be updated with respect to specific network requirements, if the defaults do not work.

Developing the solution

Please refer to the documentation for getting started with developing Stacks: Local Development Quickstart.

stacks-azure-data's People

Contributors

satenderrathee avatar balpurewal avatar lorraine-houston-ensono avatar russellseymour avatar motylp avatar adurkan-amido avatar james-horrocks avatar trishisingh avatar rhysbushnell avatar lhsnaddon avatar sam-deeble avatar danphillipz avatar mehdi-kimakhe-amido avatar elvenspellmaker avatar samgreig avatar george-calvert-ensono avatar deimantehennelly avatar elephantei avatar

Stargazers

 avatar Jimmy Briggs avatar

Watchers

Simon Evans avatar Radoslaw Orlowski avatar Andy Hale avatar Rob Selway avatar Christopher Butler avatar  avatar Sharon Russell avatar Andrew Waite avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.