Giter VIP home page Giter VIP logo

mbd-bidw's Introduction

Business Intelligence and Data Warehouse Course

This repository contains all necessary inputs to run the course hands-on labs.

Repository contents (by session)

  • Additional articles and documents
  • MySQL Workbench Schemas
  • ETL processes
  • Datasets
  • Tableau files
  • Videos

Software Installation

  • Data Warehouse: MySQL (database) and MySQL Workbench (database modeling and SQL development)
  • ETL: Pentaho Data Integration (PDI)
  • Business Intelligence/Data Visualization: Tableau Desktop
  • Self-Service of Data Lakes: Dremio

Steps

Install Java

  • Check if you have previous Java installed in your system If have more than one, uninstall all of them and follow the steps. If you already have Java JDK v8, it is not required to follow the steps.
  • Download Java JDK v8 from: http://www.oracle.com/technetwork/java/javase/downloads/index-jsp-138363.html (in our case: Java SE 8u231). It may be possible you must create a new Oracle Account to download the JDK.
  • Install and follow the instructions
  • [Optional] Instead of using Oracle Java JDK, you can use
    • Amazon Correto. In particular version 8. Consider the right installer for your OS. This is a long-term support production-ready distribution of the Open Java Development Kit (OpenJDK) supported by Amazon.
    • OpenJDK. Only use one JDK version.

If you have problems with Oracle Java, uninstall and switch to Amazon Correto.

Install MySQL

  • Download the right version of MySQL and MySQL Workbench for your OS (in our case: MySQL Community Server 8.0.18 and MySQL Workbench 8.0.18). Check in advance if your system is supported: MySQL and MySQL Workbench.
  • Download the program(s) for your specific OS:
  • Install all the programs and follow the instructions:
    • [Windows] Consider a custom installation and choose just the MySQL Server and MySQL Workbench as components to be installed. During the installation process you will configure the password for root user (choose iembd2019). Choose legacy password encryption. If you forget the password you will be able to change it using the workbench.
    • [Mac] During the installation process you will configure the password for root user (choose iembd2019). Choose legacy password encryption. If you forget the password you will be able to change it from system preferences.
    • PDI and Tableau only support legacy password encryption, not the new strong encryption available in MySQL 8. Select this option until the strong encryption is supported.

Note: for Microsoft Windows it is just one installer for MAC, two files.

Remember to start the server to be able to use the database. Open MySQL Workbench and create a new connection using the right user and password and the standard parameters for configuration.

Install PDI

We will use the community version of Pentaho Data Integration (a.k.a PDI). It can be downloaded from this link (in our case: pdi-ce-8.3.0.0-371.zip).

  • Download the file and unzip.
    • [Mac] Move the data-integration folder into Applications folder
    • [Windows] Move the data-integration folder into C:/ folder
  • Open PDI
    • [Windows] Double-click spoon.bat inside data-integration folder
    • [Mac] Open the terminal and execute:
cd /Applications/data-integration/
./spoon.sh
  • [Optional, Recommended, Mac] Activate data-integration.app as a double-click app using the terminal:
sudo xattr -dr com.apple.quarantine /Applications/data-integration/Data\ Integration.app
  • Configuring a JDBC Connection to MySQL 8.x Using PDI:
    • Download the MySQL 8.x JDBC driver (select platform independent, zip) to the computer running Pentaho from: https://dev.mysql.com/downloads/connector/j/
    • Unzip the file mysql-connector-java-8.0.18.zip
    • Copy mysql-connector-java-8.0.18.jar to the Pentaho lib folder. [Windows]: C:\data-integration\lib. [Mac OS]: …/Applications/data-integration/lib
    • Configure a Generic Database connection in Pentaho: (1) Connection URL: jdbc:mysql://localhost:3306/<database_name> (at the beginning the only database is sys, subtitute <database_name> by sys) (2) Driver Class Name: com.mysql.cj.jdbc.Driver (3) use the previous user and password
    • In case the server time zone value 'AEST' (or other) is unrecognized or represents more than one time zone, then consider: jdbc:mysql://localhost:3306/<database_name>?useLegacyDatetimeCode=false&serverTimezone=UTC
  • [Not required, only if you use MySQL 5.x] Install MySQL 5.x plugin for PDI:
    • Open PDI
    • Go the tools menu > Marketplace > MySQL Plugin and install
    • Restart PDI

Install Tableau Desktop

We can access student licenses due to the Academic Partnership. Tableau has versions for Mac and Windows. Follow these instructions:

  • Download the latest version of Tableau Desktop here.
  • Copy Tableau Desktop License from campus.
  • Install the software following the instructions in the screen.
  • Update your license in the application: Help menu -> Manage Product Keys
  • Download the driver for MySQL from here

(Optional) Dremio

We will illustrate the connectivity against a pool of resources. It can be downloaded from this link. One of the options is using docker. Therefore it requires to install first docker and the download the docker file.

(Optional) Atom

In case you need a multipurpose text editor, I recommend Atom.

FAQ

Is there a Pentaho Release Product Version Matrix?

Yes! You can find it here.

Any recommendation for MySQL SQL syntax?

Yes, check MySQL™ Notes for Professionals book and MySQL Documentation.

Any tutorials for MySQL Workbench?

Yes, check MySQL Workbench Manual.

Any book for SQL?

How can I have this repository?

Fork it using github and github desktop. Are you interested in how Github works? Start here. Steps:

mbd-bidw's People

Contributors

josepcurto avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.