GitHub Repository: https://github.com/s566319/web-scrapping
Complete the tasks in the Python Notebook in this repository. Make sure to add and push the pkl or text file of your scraped html (this is specified in the notebook)
- (Question 1) Article html stored in separate file that is committed and pushed: 1 pt
- (Question 2) Article text is correct: 1 pt
- (Question 3) Correct (or equivalent in the case of multiple tokens with same frequency) tokens printed: 1 pt
- (Question 4) Correct (or equivalent in the case of multiple lemmas with same frequency) lemmas printed: 1 pt
- (Question 5) Correct scores for first sentence printed: 2 pts (1 / function)
- (Question 6) Histogram shown with appropriate labelling: 1 pt
- (Question 7) Histogram shown with appropriate labelling: 1 pt
- (Question 8) Thoughtful answer provided: 1 pt
Requirements
- Markdown introduction with name and clickable link is required.
- Markdown Section Headings for each Question are required.
- Execute your code before exporting HTML and pushing notebooks. (See FAQ for help.)
- Unexecuted code is not eligible for credit.