Giter VIP home page Giter VIP logo

acl-18's Introduction

A Stylometric Inquiry into Hyperpartisan And Fake News

This repository contains the code for reproducing results of the paper:

Martin Potthast, Johannes Kiesel, Kevin Reinartz, Janek Bevendorff, and Benno Stein. A Stylometric Inquiry into Hyperpartisan and Fake News. In Proceedings of 56th Annual Meeting of the Association for Computational Linguistics (ACL 18), July 2018

Resources

  • Download the dataset, place it under data, and extract it there.
  • Get the required libraries, aitools4-ie-uima.jar and jsoup-1.6.1.jar, from the resources page and place them under lib.
  • Download the Tree Tagger binaries that match your operating system and add it to the directory structure as detailed below (naming must be exact). Please visit the TreeTagger homepage beforehand to view the license terms (and instructions for the Windows installation).
    • Linux to lib/thirdparty-tt4j-1.1.0/tree-tagger-Linux-3.2
    • Windows to lib/thirdparty-tt4j-1.1.0/tree-tagger-Win-3.2
    • MacOSX to lib/thirdparty-tt4j-1.1.0/tree-tagger-MacOSX-3.2-intel
  • In all cases, there should be a bin directory directly within the operating-system-specific directory. Then add a lib directory next to this bin directory and add the parameters file you extract from this archive as english.par into this lib directory.
  • Get the TeX hyphenation patterns ZIP, place it next to the ACL-18 directory, and extract it there. This should create a directory called thirdparty next to the ACL-18 directory of this project.

Building

Just use ant in this directory. This will create a single acl18-bundle.jar JAR file that contains everything you need.

Classification experiments

Split the data into three folds (by portal/publisher) and convert to UIMA XMI.

java -cp acl18-bundle.jar de.aitools.ie.articles.DataPreprocessor data/articles data/xmi

Then extract the features using UIMA and generate WEKA ARFF files for each task. Note that this extracts all features. The actually used feature set is specified in the next step.

java -cp acl18-bundle.jar de.aitools.ie.articles.FeatureExtractor VERACITY data/xmi data/veracity
java -cp acl18-bundle.jar de.aitools.ie.articles.FeatureExtractor ORIENTATION data/xmi data/orientation
java -cp acl18-bundle.jar de.aitools.ie.articles.FeatureExtractor HYPERPARTISANSHIP data/xmi data/hyperpartisanship

You can then train and test the classifier. Available feature sets are: TOPIC, TEXT_STYLE, HYPERTEXT_STYLE, STYLE (= TEXT_STYLE + HYPERTEXT_STYLE), ALL (= TOPIC + STYLE). The following command will build the TOPIC classifier for VERACITY on the first fold training set and evaluate it on the first fold test set.

java -cp acl18-bundle.jar de.aitools.ie.articles.RandomForestClassifier TOPIC data/veracity/*-fold1-training.arff data/veracity/*-fold1-test.arff 

acl-18's People

Contributors

johanneskiesel avatar potthast avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

acl-18's Issues

Results vary a bit by operating system

We found that Java reads the input files in a different order for different operating systems. The results thus vary a bit between operating systems. Our results were reported using Linux.

Partial source code is not available?

Hi johanneskiesel,
I just read the code and can't find the code under the "de.aitools.ie.uima" dir. But .class file is found in the aitools4-ie-uima.jar. Is these source code publicly available?

Can't initializeAnalysisComponent

Thank you very much for you code sharing. Your idea is so great that I would like to try testing. But when I ran this command 'java -cp acl18-bundle.jar de.aitools.ie.articles.FeatureExtractor VERACITY data/xmi data/veracity' , it raised a bug like this.

Caused by: org.apache.uima.resource.ResourceInitializationException: Initialization of annotator class "de.aitools.ie.uima.analysis.segmentation.TexHyphJSyllableDetector" failed. (Descriptor: jar:file:/D:/Projects/BuzzFeed/master/acl18-bundle.jar!/uima-descriptors/primitive-AEs/segmentation/TextHyphJSyllableDetector.xml)
at org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.initializeAnalysisComponent(PrimitiveAnalysisEngine_impl.java:264)
at org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.initialize(PrimitiveAnalysisEngine_impl.java:169)
at org.apache.uima.impl.AnalysisEngineFactory_impl.produceResource(AnalysisEngineFactory_impl.java:94)
at org.apache.uima.impl.CompositeResourceFactory_impl.produceResource(CompositeResourceFactory_impl.java:62)
at org.apache.uima.UIMAFramework.produceResource(UIMAFramework.java:279)
at org.apache.uima.UIMAFramework.produceAnalysisEngine(UIMAFramework.java:407)
at org.apache.uima.analysis_engine.asb.impl.ASB_impl.setup(ASB_impl.java:256)
at org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl.initASB(AggregateAnalysisEngine_impl.java:429)
at org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl.initializeAggregateAnalysisEngine(AggregateAnalysisEngine_impl.java:373)
at org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl.initialize(AggregateAnalysisEngine_impl.java:186)
at org.apache.uima.impl.AnalysisEngineFactory_impl.produceResource(AnalysisEngineFactory_impl.java:94)
at org.apache.uima.impl.CompositeResourceFactory_impl.produceResource(CompositeResourceFactory_impl.java:62)
at org.apache.uima.UIMAFramework.produceResource(UIMAFramework.java:279)
at org.apache.uima.UIMAFramework.produceAnalysisEngine(UIMAFramework.java:371)
... 5 more
Caused by: java.lang.NullPointerException
at java.io.FileInputStream.(FileInputStream.java:130)
at java.io.FileInputStream.(FileInputStream.java:93)
at de.aitools.ie.uima.analysis.segmentation.TexHyphJSyllableDetector.initialize(TexHyphJSyllableDetector.java:87)
at org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.initializeAnalysisComponent(PrimitiveAnalysisEngine_impl.java:262)

It is great if you can check/fix?

Some problem in running build.xml

Excuse me, could you solve my problem?
When I run build.xml by ant, it shows that "package javax.xml.bind is not visible". What is wrong with this ? 
Thank you very much!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.