The script wiki_extractor.py
performs a Wikipedia search using the provided keyword, and returns urls of “n” related Wikipedia pages.
Working :
- Related keywords are fetched from the wikipedia search page using BeautifulSoup.
- For each keyword returned from the previous step, visit each page and fetch only the first paragraph (again, using BeautifulSoup).
- Store the url and paragraph in a dict ->
{'url':fetched_url, 'paragraph':'fetched_para'}
- Append the dict in a list and finally convert to JSON which will be saved separately with the user-defined file name.
To run the program:
python wiki_extractor.py --keyword=”Indian Historical Events” --num_urls=100 --output=”out.json”
- Enter the keyword to be searched
- Mention the total number of urls requried
- Mention the desired name for resultant JSON file.