Giter VIP home page Giter VIP logo

iclunstructureddata's Introduction

Topic modeling on earnings calls transcripts from big tech 2020-2023

MLDS @ ICL Unstructured Data course final project

Matthew Sit, CID: 02273408

Fall 2023

How to run the code

I used Google Colab to run the code.

  1. Download the contents of this repository and upload the notebook to your Google Drive account. Open the notebook using Google Colab (free for all users).
  2. Connect to the default runtime and run all cells, paying careful attention to the top few cells which are the pre-requisite setup cells.
  3. In the second cell, a "Choose Files" button will appear and execution will wait until you click the button and upload the necessary data files, which are all the files contained in the two data directories in this repository:
    • the Meta earnings call raw transcript files (15 files) (copy+pasted from investors relations site from pdf to txt)
    • the Microsoft earnings call raw transcript files (15 files) (downloaded from the investors relations site and re-saved as txt)
    • Total: 30 files to be uploaded
  4. In the third cell, some installations are required which are not available in the default runtime. These should be completed automatically without issue.
  5. The rest of the notebook should now run.

Approximate runtime for notebook on Google Colab

8 minutes

Hardware/software requirements

No additional hardware or software is needed besides the default Google Colab runtime. No special clusters, parallel jobs, SLURM, OpenPBS, nodes, cores, CPU/GPU, or memory per CPU requirements.

The Google Colab default runtime is called the "Python 3 Google Compute Engine backend" and has 12.7 GB system RAM and 107.7 GB disk available for free.

Python module dependency versions

  • numpy: 1.23.5
  • matplotlib: 3.7.1
  • google.colab: 0.0.1a2
  • textdescriptives: 2.7.1
  • spacytextblob: 4.0.0
  • pandas: 1.5.3
  • requests: 2.31.0
  • spacy: 3.6.1
  • wordcloud: 1.9.3
  • sklearn: 1.2.2
  • gensim: 4.3.2

iclunstructureddata's People

Contributors

mattsit avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.