Raw data is daily oil price from 2017.10.24 to 2022.10.24 with some loss in some days. Here are steps to process the data:
Interpolate data using scipy.interpolate package with cubic. See code here. See output file here.
Get HTML contents from OilPrice by requests package and parse HTML by BeautifulSoup 4. Use pandas to export .xls files. See code here. See output file here.
Get all the news contens and save each as a single file in /News, which named [index].txt. See code here. See output files here.
Clean the text. Replace all the characters that are not lowercase letters with space. Split texts with space and trim space of each word. See code here.
Calculate each word's frequecy of each day. Export to .csv file. See code here. See output file here.
Sort columns by words. See code here. See output file here.