Giter VIP home page Giter VIP logo

nlp4if-2021's Introduction

Cross-lingual misinformation detection

This repo contains the code for cross-lingual misinformation detection. See paper ๐Ÿ“” here.

Quick start

Install PyTorch 1.1.0 from the official website. Install other dependencies in requirements.txt.

Prepare data

For details of the data, see

cd src
python prepare_data.py  # prepare data without using additional data
python prepare_data_additional.py  # prepare data with using additional data

Analysis of the data is available in notebooks/analyze_data.ipynb and notebooks/analyze_data_additional.ipynb.

Training

Choose the appropriate file in the bash folder to train without using additional data or the folder bash_additional to use additional data for training. For example, if you want to fine-tune multilingual BERT with source language English while using the additional data, run the following command lines.

cd bash_additional
chmod +x train_multilingual_bert_src_en.sh
./train_multilingual_bert_src_en.sh

The training logs are saved in the specified file, the argument for which is --log_file_path. The log file also stores the evaluation results after training completes.

Note: To tabulate the results from the log files and pick the best hyperparameters across multiple runs, see notebooks/tabulate_results_v{1,2,3}.ipynb.

Predict labels for the test set

cd bash_predict
chmod +x predict_best_sys.sh
./predict_best_sys.sh

Training logs

  • logs_v1 contains the training logs while using own train-dev splits for en and ar and provided train and dev data for bg.
  • logs_v2 contains the training logs while using the provided train and dev data for all languages.
  • logs contains the training logs while using the provided additional train and dev data for all languages.

Citation

@inproceedings{detecting-multilingual-misinformation,
    title = "Detecting Multilingual {COVID}-19 Misinformation on Social Media via Contextualized Embeddings",
    author = "Panda, Subhadarshi and Levitan, Sarah Ita",
    booktitle = "Proceedings of the Fourth Workshop on Natural Language Processing for Internet Freedom: Censorship, Disinformation, and Propaganda",
    series = {NLP4IF@NAACL'~21},
    month = {June},
    year = "2021",
    address = "Online",
    publisher = "Association for Computational Linguistics",
}

nlp4if-2021's People

Contributors

subhadarship avatar

Stargazers

 avatar

Watchers

 avatar  avatar  avatar

Forkers

paper-nlp

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.