Learn to mount jupyter notebook in docker and analyze data using sparksql on it
- Using the configuration from other repo before (Project4_hadoop-mapper)
- Create
docker-compose.yml
to mount jupyter notebook inhttp://localhost:8888/
- Run
spark/sparksql.ipynb
to analyzeinput/yellow_tripdata_2021-01.csv.gz
- Save the output into
output/spark_write_parquet.parquet