Giter VIP home page Giter VIP logo

claimdetective's Introduction

ClaimDetective

ClaimDetective is a python class that allows the user to rank a list of sentences (i.e. potential claims) in order of most check-worthy to least check-worthy, i.e., the priority with which they should be fact-checked.

ClaimDetective was built with a deep-learning model that fine-tunes RoBERTa under-the-hood to identify and rank claims that are worth fact-checking. To see the code used to train the ClaimDetective models, click here.

For ClaimDetective documentation, click here.

Overview

  1. claim_detective.py contains all the necessary source code to use the check-worthiness detection models located in the models directory.

  2. models is a directory containing the latest trained models. See below for details.

  3. requirements.txt contains the packages and the versions used to write claim_detective.py

  4. example_small.py contains a very brief example of loading and using one of the models. Read this file before using! Essentially provides all the documentation needed. The output from this file can be found in the example_outputs directory, here: small_output.csv.

  5. example_big.py is another example of how to load and use a model in a more realistic setting. Note: to run this you will need more packages than those listed in requirements.txt (e.g. nltk and BeautifulSoup). The output from this file can be found in the example_outputs directory, in the files called big_output_[model].csv where [model] = the model used to generate the file.

  6. example_outputs contains the output .csv files from the two example.py files.

  7. misclassified.py is another example of how to load and use the model. The output of this file can be seen in the incorrect_preds directory.

Models

Each model is located in its own subdirectory. Each model subdirectory contains two files:

  1. logfile.txt which contains a log of all the training and testing that model has been through, as well as the architecture of the model.
  2. model.pth which is a pyTorch checkpoint file containing the model weights in the form of a state_dict object.

Because the models are so large, you must download their respective .zip files from Google Drive, then unzip each model inside the models directory.

At the time of writing, I have made the following models are available on Google Drive:

  • claimbuster was trained on the ClaimBuster dataset described in Arslan et. al. Briefly, the ClaimBuster dataset consists of 23,533 statements extracted from all U.S. general election presidential debates (1960-2016) which were then annotated by human coders.

  • clef19 was trained first on the ClaimBuster dataset described above, and then on the CLEF-2019 CheckThat! dataset (CT19-T1 corpus) described in Atanasova et. al. Briefly, the CT19-T1 corpus contains 23,500 human-annotated sentences from political speeches and debates during the 2016 U.S. presidential election.

  • clef20 was trained solely trained on the CLEF-2020 CheckThat! dataset (CT20-T1(en) corpus) described in Barron-Cedeno et. al. Briefly, the CT20-T1(en) corpus contains 962 human-annotated tweets about the novel coronavirus caused by SARS-CoV-2.

Note that the very first time running a model will take a few minutes to load and run everything properly. After that first go, using the model to identify claims is very fast.

claimdetective's People

Contributors

lawrence-chillrud avatar

Watchers

 avatar

Forkers

posuer

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.