Hadoop beginner exercise - using the Streaming API to determine the word count of select Wikipedia articles.
The solution is available as a Jupyter notebook.
Open up the main.ipynb notebook file to view the solution, along with the pre-computed results.
Or, you may re-run the solution on your device this way:
- Run
make
to build the Makefile with Docker - Go to
localhost:8888
- Open
main.ipynb
in the Jupyter notebook - Run all the cells