To be able to transfer data from a one format to another. Examples for the formats include
- CSV
- PostgresQL
- Excel
- MySQL
- MongoDB
- Big Data (Hadoop, Cassandra) etc
What is ETL?
We will be focussing on all these today. We will try to
- Extract data out of CSV and load it PSQL
- Extract data out of PSQL, transform and load it to Excel
- Get a bunch of files (CSVs) and load them to PSQL.
Then to automate it so that it runs on it's own wihtout human intervention. [Will be done in the next session]
A tool that helps us do all of the above.
New Concept
- Components and Connectors: All the operations in Talend are performed by connectors and components. Input output connected through connectors.
{
"postgres_host": "64.225.85.167",
"postgres_port": 5432,
"postgres_database": "postgres",
"postgres_username": "postgres",
"postgres_password": ""
}
- Creating a folder and saving both files into the system.
Steps
- Add the delimited file
- Connect it to the database
New Concepts
- Schema: Is how the data is defined in a Database. It could be file as well. Datatypes: String, Integer, Decimal, JSON, etc. It is a generic definition of data and tt helps translate data from one format to another.
New Concepts
- Map: How one fields translates to another in a different format.
Steps
- Adding events-2 to the table
- Setting up map
- Adding excel output component
New Concepts
- FileIterator