uses tweepy
to interact with Twitter's API. Set size of chunk, chunk size
, and number of chunk needed, chunk count
. Save tweets into each chunk untill it's full, and save the chunk into a file, tweets<filecount>.txt
. Python has the module json
to convert json file to Python dictionary directly. Keep saveing file until it reaches the chunk count
.
tweets will be missed during the reinitialization of the stream if twitter connection is interrupted.
./twitterStream.sh<chunk size><chunk count>
- Scans files for urls. If url found, retreives the title of the linked page. Parses through files tweets0.txt - tweets.txt
twitterURLFinder.py <fileCount>
Implement indexing and searching function with Whoosh library in Python.
Build web-based interface with Flask library in python. Implement map function with flask-googlemaps library.
Implement Okapi BM25 scoring function with Whoosh.
Add time factor: timeFactor = 1/(currentTime - tweetTime)
.
Most recent posts have largest value. It's then wegighted to blance its relevance to the BM25 score.
Overall Score = Score(BM25) + Score(time)
- Take in duplicates and retweets as normal.
- It's not a dynamic interface.
- Most tweets don't have geo information.
- pip install
flask
,flask-googlemaps
,geocoder
,whoosh
. - run indexSearch.py:
python indexSearch.py
- run app.py:
python app.py
- Input search query into the search box