Giter VIP home page Giter VIP logo

risk_conversation's Introduction

Risk Prediction with Conversation Graph

This repo provides a reference implementation of Risk-ConversationGraph as described in the paper Reading Between the Lines: A Graph-based Financial Risk Prediction Approach:

Conversation Graph

Submitted to Information Systems Research for review

While the earnings transcript dataset cannot be publicly released due to the data provider's policy, we make our code implementation publicly available.

We hope that our design can benefit researchers and practitioners and shed light on other financial prediction tasks

Note: Researchers can easily obtain the earnings conference call data from Seekingalpha or databases such as Thomson Reuters StreetEvents.

How to run the code

Dependencies

Run the following to install a subset of necessary python packages for our code

pip install -r requirements.txt

Usage

Train and evaluate our models through main.py. Here are some vital options:

main.py:
  --config: Training configuration. Readers can change it to adapt to their specific tasks.
    (default: 'conversation_graph.yml')
  --trainer: Init and run model. In our design, we bound trainer with their specific model.
    (default: 'ConversationGraph')
  --test: Whether to evaluate the model with the best checkpoint.

Configurations for evaluation

After training, turn on --test to test model with best checkpoint. We select the best checkpoint according to the MSE results on the validation dataset.

Folder structure:

  • assets: contains best_checkpoint.pth and lda_topic_rep.pkl. Necessary for computing MSE, MAE, Spearsman's rho and Kendal's tau.
  • data: contains data_2015.pkl, data_2016.pkl, data_2017.pkl, data_2018.pkl that represent data of different years. The detailed content of each .pkl can be found in trainer/conversation_graph_trainer.py -- loading_train_dataset function.
  • model: contains conversation_graph.py. Our model's file.
  • utils: We implement attention and other functions here.

Data structure

We use pre-trained Bert-based-un-cased to generate sentence embeddings. Note that other pretrained language models such as RoBERTa, SentenceBert Sentence Bert, and FinBert can also be used as the text encoder, but we find the results are similar.

Train LDA topic model

We select the topic amounts of the LDA model by coherence score:

import os
import gensim
from gensim.models import CoherenceModel
def compute_coherence_values(dictionary, corpus, texts, limit, start=2, step=3):
    """
    Compute c_v coherence for various number of topics
    Parameters:
    ----------
    dictionary : Gensim dictionary
    corpus : Gensim corpus
    texts : List of input texts
    limit : Max num of topics
    Returns:
    -------
    model_list : List of LDA topic models
    coherence_values : Coherence values corresponding to the LDA model with respective number of topics
    """
    coherence_values = []
    model_list = []
    mallet_path = 'packages/mallet-2.0.8/bin/mallet'
    for num_topics in range(start, limit, step):
        model = gensim.models.wrappers.LdaMallet(mallet_path,
                                                 corpus=corpus,
                                                 num_topics=num_topics,
                                                 id2word=dictionary,
                                                 random_seed=1234,
                                                 workers=os.cpu_count())
        # model_list.append(model)
        coherencemodel = CoherenceModel(model=model, texts=texts, dictionary=dictionary, coherence='c_v')
        coherence_values.append(coherencemodel.get_coherence())

    return model_list, coherence_values

Some NLP preprocessing techniques are needed, including converting text to lowercase, removing emojis, expanding contractions, removing punctuation, removing numbers, removing stopwords, lemmatization, etc.

risk_conversation's People

Contributors

judiebig avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.