Giter VIP home page Giter VIP logo

sentivent-economic-event-detection's Introduction

Economic event detection in company-specific news text

Economic event type detection on the SentiFM dataset using biLSTM and SVM for the paper: Gilles Jacobs, Els Lefever, and Véronique Hoste. 2018. Economic event detection in company-specific news text. In Proceedings of the 1st Workshop on Economics and NLP (ECONLP). ACL 2018, Melbourne, AUS, 1-10.

This repo includes data and code for company-specific sentence level event type classification for the English SentiFM dataset.

Please cite the original paper when using the dataset.

This code can completely replicate the experiments described in the paper with pre-processing, word-vector creation & evaluation, hyperparameter optimization in crossvalidation and holdout-prediction.

Set-up:

  1. Install non-python dependencies:

    • Install CUDA if not installed (it is already on phil)
    • sudo apt-get install libopenblas-base libopenblas-base python-dev
    • download and unpack latest Stanford CoreNLP: cd ~/software; wget http://nlp.stanford.edu/software/stanford-corenlp-full-2018-02-27.zip ; unzip stanford-corenlp-full-2018-02-27.zip; rm stanford-corenlp-full-2018-02-27.zip And set the envvar for the python Corenlp package to use CORENLP_HOME=~/software/stanford-corenlp-full-2018-02-27
  2. Configure Keras to use TensorFlow:

    • set $HOME/.keras/keras.json to:
    {
        "image_data_format": "channels_last",
        "epsilon": 1e-07,
        "floatx": "float32",
        "backend": "tensorflow"
    }

Contents and usage

Set your experiment storage/output paths and experimental settings in settings.py

  • settings.py for defining the experimental constants for crossvalidation optim. & testing, & wordvector training.
  • featurize.py feature engineering: tokenisation, indexing, sequencing & making the embedding matrix.
  • crossvalidate.py run validation test & multi-label crossvalidation experiment using featurized data.
  • crossvalidate.py run validation test & one-vs-rest crossvalidation experiment using featurized data.
  • datahandler.py loading, parsing, writing, making splits and general handling of dataset.
  • classifier.py custom sklearn-compatible classifiers and classifier handling.
  • scorer.py custom classifier scoring for logging multiple scores in crossvalidation.
  • wordvectors_train.py script for training glove word vectors.
  • wordvectors_eval.py script for evaluating trained glove vectors with the google analogy suite.
  • util.py commonly used, general pythonic utility functions.
  • clean_output_dir.py: removes empty dirs made as output when testing.

Results

best score: 0.75F1.

sentivent-economic-event-detection's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

sentivent-economic-event-detection's Issues

Some problems

Hello! After reading your code, I still don't know how you do data pre-processing. In line 102 of datahandler.py, is the .txt file in f"{dirp}/*.txt" corresponds to .ann file in the sentivent data? If this is the case, in the unify_dataset function, I didn't find any examples with a label of 1. Maybe my understanding has a problem. Could you tell me your specific data pre-processing process and the correspondence between related data and programs? Thank you!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.