MLDS @ ICL Unstructured Data course final project
Matthew Sit, CID: 02273408
Fall 2023
I used Google Colab to run the code.
- Download the contents of this repository and upload the notebook to your Google Drive account. Open the notebook using Google Colab (free for all users).
- Connect to the default runtime and run all cells, paying careful attention to the top few cells which are the pre-requisite setup cells.
- In the second cell, a "Choose Files" button will appear and execution will wait until you click the button and upload the necessary data files, which are all the files contained in the two data directories in this repository:
- the Meta earnings call raw transcript files (15 files) (copy+pasted from investors relations site from pdf to txt)
- the Microsoft earnings call raw transcript files (15 files) (downloaded from the investors relations site and re-saved as txt)
- Total: 30 files to be uploaded
- In the third cell, some installations are required which are not available in the default runtime. These should be completed automatically without issue.
- The rest of the notebook should now run.
8 minutes
No additional hardware or software is needed besides the default Google Colab runtime. No special clusters, parallel jobs, SLURM, OpenPBS, nodes, cores, CPU/GPU, or memory per CPU requirements.
The Google Colab default runtime is called the "Python 3 Google Compute Engine backend" and has 12.7 GB system RAM and 107.7 GB disk available for free.
- numpy: 1.23.5
- matplotlib: 3.7.1
- google.colab: 0.0.1a2
- textdescriptives: 2.7.1
- spacytextblob: 4.0.0
- pandas: 1.5.3
- requests: 2.31.0
- spacy: 3.6.1
- wordcloud: 1.9.3
- sklearn: 1.2.2
- gensim: 4.3.2