Topic: dataengineering Goto Github
Some thing interesting about dataengineering
Some thing interesting about dataengineering
dataengineering,The developer framework for your data & analytics stack
Organization: 514-labs
Home Page: https://www.moosejs.com
dataengineering,Tutorial on how to setup Trino and Apache Ranger using docker
User: aakashnand
dataengineering,Build & Learn Data Engineering,Machine Learning over Kubernetes. No Shortcut approach.
User: abhishek-ch
dataengineering,This project serves as a comprehensive guide to building an end-to-end data engineering pipeline using TCP/IP Socket, Apache Spark, OpenAI LLM, Kafka and Elasticsearch. It covers each stage from data acquisition, processing, sentiment analysis with ChatGPT, production to kafka topic and connection to elasticsearch.
User: airscholar
Home Page: https://www.youtube.com/watch?v=ETdyFfYZaqU
dataengineering,A distributed, low-code, end-to-end data collection and analysis tool for data folks. Take the pain out of data collection from your pipeline!
User: anuran-roy
Home Page: https://anuran-roy.github.io/serpytor
dataengineering,An open source development framework to help you build data workflows and modern data architecture on AWS.
Organization: awslabs
Home Page: https://awslabs.github.io/aws-ddk/
dataengineering,A Data Platform built for AWS, powered by Kubernetes.
Organization: awslabs
Home Page: https://awslabs.github.io/aws-orbit-workbench/
dataengineering,Code/Notes for the Data Engineering Zoomcamp by DataTalksClub
User: balajirvp
dataengineering,Bridge Four is a simple, functional, effectful, single-leader, multi worker, distributed compute system optimized for embarrassingly parallel workloads.
User: chollinger93
dataengineering,Forecasting Solar Power: Analysis of using a LSTM Neural Network
User: cynthiakoopman
dataengineering,A simple Data Engineering solution for testing or education purposes. You only need to know SQL and Python to understand this project. Dagster is the data orchestration, while DBT is for data transformation. The BI tool is Metabase.
User: daihuynh
dataengineering,Data Engineering/Scraping Project. Creating a detailed Sports Relational Database for the Top European Soccer Leagues.
User: danielsaban
dataengineering,
Organization: data-burst
Home Page: https://databurst.tech
dataengineering,This is a repo with links to everything you'd ever want to learn about data engineering
Organization: dataexpert-io
dataengineering,Compare tables within or across databases
Organization: datafold
Home Page: https://docs.datafold.com
dataengineering,A free to use dbt package for creating and loading Data Vault 2.0 compliant Data Warehouses (powered by dbt, an open source data engineering tool, registered trademark of dbt Labs)
Organization: datavault-uk
Home Page: https://www.automate-dv.com
dataengineering,Index for online reading materials in order to learn Python and backend development/engineering concepts from scratch and develop a mastery sufficient for Senior/Principal Backend Engineers and Data Engineers
User: eldar1205
dataengineering,Roadmap for Data Engineering
User: erdemozgen
dataengineering,Predict stock price based on financial news feeds
Organization: finance-and-ml
dataengineering,Sample project that use Dagster, dbt, DuckDB and Dash to visualize car and motorcycle Spanish market
User: franloza
dataengineering,
Organization: grai-io
Home Page: https://www.grai.io
dataengineering,Simple stream processing pipeline
User: josephmachado
Home Page: https://www.startdataengineering.com/post/data-engineering-project-for-beginners-stream-edition/
dataengineering,Example repo to create end to end tests for data pipeline.
User: josephmachado
Home Page: https://www.startdataengineering.com/post/setting-up-e2e-tests/
dataengineering,Project for "Data pipeline design patterns" blog.
User: josephmachado
Home Page: https://www.startdataengineering.com/post/code-patterns/
dataengineering,An End-to-End ETL data pipeline that leverages pyspark parallel processing to process about 25 million rows of data coming from a SaaS application using Apache Airflow as an orchestration tool and various data warehouse technologies and finally using Apache Superset to connect to DWH for generating BI dashboards for weekly reports
User: judeleonard
Home Page: https://judeleonard.github.io/Prescriber-ETL-data-pipeline/
dataengineering,Code and data for the Modern Polars book
User: kevinheavey
Home Page: https://kevinheavey.github.io/modern-polars/
dataengineering,Все, о чем меня когда-либо спрашивали на собеседованиях, и другие полезные знания в кратком формате
User: kirilldikalin
Home Page: https://kirilldikalin.github.io/kirilldikalin.io/knowlege_base/iKnowledge_base.html
dataengineering,Data engineering interviews Q&A for data community by data community
User: kislerdm
Home Page: https://data-engineering-interviews.org
dataengineering,Data Pipeline from the Global Historical Climatology Network DataSet
User: marcosmjd
dataengineering,end-to-end data engineering project to get insights from PyPi using python, duckdb, MotherDuck & Evidence
User: mehd-io
Home Page: https://duckdbstats.com/
dataengineering,An open-source project dedicated to constructing robust data pipelines and scalable software infrastructure. We leverage industry-standard tools favored by developers to enhance efficiency and reliability. Uniquely, these pipelines are field-tested on farms across Sumatra, Indonesia, ensuring real-world applicability and resilience.
User: mikestack15
Home Page: https://orangutan-stem.com
dataengineering,Found a data engineering challenge or participated in a selection process ? Share with us!
User: minhadona
dataengineering,Duke MIDS: Data Engineering and DataOps Course
User: noahgift
Home Page: https://noahgift.github.io/data-engineering-and-dataops/
dataengineering,Data Engineering Pilipinas is a community for data engineers, data analysts, data scientists, developers, AI / ML engineers, and users of closed and open source data tools and methods / techniques in the Philippines. Data Engineering Pilipinas is a PyData group.
User: ogbinar
Home Page: https://dataengineering.ph
dataengineering,Apply for a job at Olist's Data Team: https://olist.gupy.io/
Organization: olist
Home Page: https://olist.gupy.io/
dataengineering,OpenMetadata is a unified metadata platform for data discovery, data observability, and data governance powered by a central metadata repository, in-depth column level lineage, and seamless team collaboration.
Organization: open-metadata
Home Page: https://open-metadata.org
dataengineering,Build, test, deploy, iterate - Dev and prod tool for data science pipelines
Organization: prodmodel
dataengineering,A guide for leading a data (engineering) team
User: sbalnojan
Home Page: https://www.thdpth.com/
dataengineering,This repository provides various demos/examples of using Snowpark for Python.
Organization: snowflake-labs
dataengineering,Recohut - Learn data engineering, data science
User: sparsh-ai
Home Page: https://www.recohut.in/docs/introduction
dataengineering,Instant search for and access to many datasets in Pyspark.
User: spratiher9
Home Page: https://pypi.org/project/sparkdataset/
dataengineering,Resources about data science, machine learning, deep learning, data engineering, and SQL.
User: tirendazacademy
dataengineering,Efficient data transformation and modeling framework that is backwards compatible with dbt.
Organization: tobikodata
Home Page: https://sqlmesh.com
dataengineering,kedro cli plugin for generating a static kedro viz site (html, css, js) that can be deployed on many serverless tools.
User: waylonwalker
Home Page: https://static-viz.kedro.dev
dataengineering,Dockerizing an Apache Spark Standalone Cluster
User: wittline
Home Page: https://wittline.github.io/apache-spark-docker/
dataengineering,Challenge Data Engineer
User: wittline
Home Page: https://wittline.github.io/data-engineer-challenge/
dataengineering,Scheduling Big Data Workloads and Data Pipelines in the Cloud with pyDag
User: wittline
Home Page: https://wittline.github.io/pyDag/
dataengineering,The goal of this project is to offer an AWS EMR template using Spot Fleet and On-Demand Instances that you can use quickly. Just focus on writing pyspark code.
User: wittline
Home Page: https://wittline.github.io/pyspark-on-aws-emr/
dataengineering,Scalable identity resolution, entity resolution, data mastering and deduplication using ML
Organization: zinggai
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.