Giter VIP home page Giter VIP logo

topic-coherence-sensitivity's Introduction

This repository contains code and dataset described in the publication "The Sensitivity of Topic Coherence Evaluation to Topic Cardinality"

Running the System

  • The code depends on jhlau/topic_interpretability, so check out the repository: https://github.com/jhlau/topic_interpretability
  • Use run-wordcount.sh to collect word co-occurrence statistics between topic words
  • If doing word intrusion, use run-wi.sh; the script will:
  • generate SVM features based on word count features
  • train an SVM rank model to predict intruder words
  • If doing NPMI, use run-npmi.sh; the script will:
  • compute topic coherence using word count features
  • Both scripts will aggregate coherence scores over different cardinalities and print them at the end
  • Note: an example toy dataset is given in example_data. To test, execute run-wordcount.sh followed by run-[npmi/wi].sh

Scripts

  • run_wordcount.sh: runs topic_interpretability/ComputeWordCount.py to collect word statistics
  • run_wi.sh: computes topic coherence using word intrusion
  • run_npmi.sh: computes topic coherence using NPMI

Mechanical Turk Annotations

The coherence ratings of topics collected via mturk are in mturk_annotation/annotations.csv (tab-delimited).

Description of columns:

  • domain: domain of topic (wiki or news)
  • topic: top-20 words of the topic
  • top-N: top-N average rating (e.g. top-5 means only the top 5 of the 20 words are presented when collecting the rating)

Processed Corpus (News and Wiki)

Publication

  • Jey Han Lau and Timothy Baldwin. The Sensitivity of Topic Coherence Evaluation to Topic Cardinality. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL HLT 2016), San Diego, California, to appear.

topic-coherence-sensitivity's People

Contributors

jhlau avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.