Create an Amazon S3 bucket and copy the sample_dataset data-engg-demo
Copy the data set to the S3 bucket
BUCKET_NAME='s3://data-engg-demo/dataset/covid-data/input/'
aws s3 cp --recursive sample_dataset/ $BUCKET_NAME
Open the console and click on Data Catalog
-> Crawler
Click on Add a data source
and add the S3 location of the dataset (JSON)
Add the bucket location s3://data-engg-demo/dataset/covid-data/input/json
Provide the IAM role, so that Glue can read the data from Amazon S3
Mention the Database you want to select and add a prefix
Once this is done just review all the parameters, click on Create
and Run
the Crawler.
We can now create an intractive Glue Studio Notebook. Go to the Glue console and go to Data Integration and ETL
-> AWS Glue Studio
-> Job
and then click on Visual with a blank canvas
and click on Create
And now you can follow this workshop