Hello! We have created this list to help you to get started with Data Engineering. The list below contains a collection of links that have helped our Data Engineers out along the way (and can hopefully help you).
This roadmap also contains a Web Framework and Template Engine that doesn’t fall under Data Engineering but can be used and helpful to build autonomous/dynamic data pipelines.
-
What is Big Data? [en.wikipedia.org/wiki/Big_data]
-
Data Engineer vs Data Analyst vs Data Scientist
-
Data Engineer
-
Data Mining for getting insights from data
-
Conversion of erroneous data into a useable form for data analysis
-
Writing queries on data
-
Maintenance of the data design and architecture
-
Develop large data warehouses with the help of extra transform load (ETL)
-
-
Data Analyst
-
Collecting information from a database with the help of queries
-
Enable data processing and summarize results
-
Use basic algorithms in their work like logistic regression, linear regression and so on
-
Possess and display deep expertise in data munging, data visualization, exploratory data analysis and statistics
-
-
Data Scientist
-
Manage, mine, and clean unstructured data to prepare it for practical use
-
Develop models that can operate on Big Data
-
Understand and interpret Big Data analysis
-
Take charge of the data team and help them towards their respective goals
-
Deliver results that have an impact on business outcomes
-
-
-
[Sandbox/OS] Hortonworks Data Platform [cloudera.com/downloads/hortonworks-sandbox/hdp.html]
-
[Cloud] Itversity labs for Hadoop, Spark, and Kafka [labs.itversity.com]
-
[Cloud] Databricks Community Cloud for Apache Spark [community.cloud.databricks.com]
-
Download Python [python.org/downloads]
-
Python IDE - Community Edition [jetbrains.com/pycharm/download]
-
Learn Python [learnpython.org]
-
Python Data Structures [edureka.co/blog/data-structures-in-python]
-
Python Coding Standard [python.org/dev/peps/pep-0008]
-
Learn Pandas [pandas.pydata.org/pandas-docs/stable/getting_started/tutorials.html]
-
Short hands-on challenge for Pandas [kaggle.com/learn/pandas]
-
Unit Testing for Data Science in Python [towardsdatascience.com/unit-testing-for-data-scientists]
-
Download Kafka [kafka.apache.org/downloads]
-
Get Started with Kafka [kafka.apache.org/documentation]
-
Kafka in Python
- HDFS Architecture Overview [hadoop.apache.org/docs/r1.2.1/hdfs_design.html]
- HIVE Overview & Documentation [hive.apache.org]
- HBASE Overview & Documentation [hbase.apache.org]
-
Airflow Overview [airflow.apache.org]
-
Airflow Installation [airflow.apache.org/docs/apache-airflow/2.0.1/installation.html]
-
Airflow Tutorial [airflow.apache.org/docs/apache-airflow/2.0.1/tutorial.html]
-
Spark Overview [spark.apache.org/docs/latest/]
-
Download Spark [spark.apache.org/downloads.html]
-
Spark The Definitive Guide [github.com/databricks/Spark-The-Definitive-Guide]
-
DBeaver. All in One Client GUI for SQL Databases [dbeaver.io/download]
-
PostgreSQL
-
Download PostgreSQL Server [postgresql.org/download]
-
Learn PostgreSQL [postgresqltutorial.com]
-
-
MongoDB
-
Download MongoDB Server [mongodb.com/try/download/community]
-
Compass MongoDB GUI [mongodb.com/try/download/compass]
-
Learn MongoDB [mongodb.com/what-is-mongodb]
-
-
Learn Django [djangoproject.com/start]
-
Build Portfolio Project for Practice [realpython.com/get-started-with-django-1]
-
Blog Website with Django Admin Panel [github.com/shubham-thakare/tech-blog]
-
Django REST Framework [django-rest-framework.org/community/tutorials-and-resources]
-
Django REST Framework Boilerplate [github.com/pyset/django-rest-framework-boilerplate]
-
Python Package [pypi.org/project/Jinja2]
-
Learn Jinja Templating [jinja.palletsprojects.com/en/3.0.x]
-
Download Atlas [atlas.apache.org/#/Downloads]
-
Get Started with Atlas [atlas.apache.org]
-
Download Ranger [ranger.apache.org/download.html]
-
Get Started with Ranger [ranger.apache.org]
-
Apache Drill [drill.apache.org]
-
Apache Phoenix [phoenix.apache.org]
-
Presto [prestodb.io]
- Get Started with Trifacta [community.trifacta.com/s/academywelcome]
-
Databricks - Certified Associate Developer for Apache Spark 3.0 [View]
-
Airflow - Astronomer Certification: Apache Airflow Fundamentals [View]
-
Confluent - Certified Developer for Apache Kafka [View]
-
Microsoft Azure - Exam DP-203: Data Engineering on Microsoft Azure (Azure Data Engineer Associate) [View]
-
Google - Professional Data Engineer [View]
-
AWS - Certified Big Data - Specialty [View]
-
Google [google.com]
-
YouTube [youtube.com]
-
Datacamp [datacamp.com/career-tracks/data-engineer-with-python]
-
Kaggle [kaggle.com]
-
Wikipedia [en.wikipedia.org/wiki/Big_data]
-
The Data Engineering Cookbook [github.com/andkret/Cookbook]
-
Google Docs to MD File Converter [github.com/mangini/gdocs2md]
-
Spark - The Definitive Guide [https://books.google.co.in/books]