Giter VIP home page Giter VIP logo

addmoreclusters's Introduction

AddMoreClusters

Shared repository with code from paper "Capturing Evolution in Word Usage: Just Add More Clusters?" presented at WWW2020 Temporal Web workshop.

Installation, documentation

Published results were produced in Python 3 programming environment. Instructions for installation assume the usage of PyPI package manager.
To get the source code, clone the project from the repository with 'git clone https://github.com/smontariol/AddMoreClusters/'
Install dependencies if needed: pip install -r requirements.txt

To reproduce the results on the COHA corpus published in the paper run the code in the command line using following commands:

To fine-tune custom BERT model on the COHA corpus:

python3 BERT_finetuning.py --train_data_file PATH_TO_COHA_CORPUS --output_dir PATH_TO_SAVED_MODEL --model_type bert --mlm --do_train --num_train_epochs 5 --per_gpu_train_batch_size 8 --model_name_or_path bert-base-uncased

Script src/main.py generates BERT contextual embeddings, clusters them and detect semantic drift for each target word given the following conditions:

  • COHA corpus is in src/corpora/COHA/text
  • Gulordava and Baroni dataset is in src/Gulordava_word_meaning_change_evaluation_dataset.csv
  • The path to BERT model is pretrained_models/bert-base-uncased.tar.gz

Note!

The results reported in the paper for the Affinity Propagation are reproducible with Scikit version 0.21. The Scikit-learn implementation of Affinity Propagation behaves different from the version we used (ver 0.21) to subsequent versions (ver 0.22 and higher): some bigger datasets cannot be clustered. To replicate the Affinity Propagation results, we recommend downgrading to Scikit to 0.21, as it is specified in requirements.txt

Acknowledgements

This work has been partly supported by the European Union’s Horizon 2020 research and innovation programme under grant 770299 (NewsEye) and 825153 (EMBEDDIA).

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.