Final Project for DS5999 by Jingnan Yang
The F4 version of the data and the topic modeling database are too huge to upload here.
They can be accessed at this link to the public UVA Box.
https://virginia.box.com/s/akdfxa5a4n0bkz7mfvuygalzijtcgd6r
Please import the two csv files with the following codes,
if you wish to view them in a Jupyter Notebook:
tokens = pd.read_csv("tokens_f4.csv", index_col=[0,1,2])
vocab = pd.read_csv("vocab_f4.csv", index_col=0)