Giter VIP home page Giter VIP logo

spacy_arguing_lexicon's Introduction

SpaCy Arguing Lexicon

A spaCy extension that wraps around the arguing lexicon by MPQA. It allows easy programmatic access to labeled sentences containing arguing lexicon. Using spaCy you can then apply the latest machine learning technologies with little effort.

Use the arguing lexicon extension for instance for deep argument mining. It is available in English and Dutch.

Getting started

You can install the spaCy extension through pip. It requires spaCy 2.

pip install spacy_arguing_lexicon
python -m spacy download en  # optional, downloads a spaCy language model if you haven't downloaded one already

Then enable the extension by adding the arguing lexicon parser to the spaCy pipeline.

import spacy
from spacy_arguing_lexicon import ArguingLexiconParser

nlp = spacy.load("en")
nlp.add_pipe(ArguingLexiconParser(lang=nlp.lang))

Now you can load any document and access the parts of that document which contain arguments. You access the arguments through the doc._.arguments attribute, which gets added by this extension.

doc = nlp("""
    A changing society should not cling to traditional family models. 
    Society is changing, and the traditional idea of the nuclear family 
    with married mother and father 
    is no longer the only acceptable alternative.
""")

argument_span = next(doc._.arguments.get_argument_spans())
print("Argument lexicon:", argument_span.text)
print("Label of lexicon:", argument_span.label_)
print("Sentence where lexicon occurs:", argument_span.sent.text.strip())

The above will output

Argument lexicon: should
Label of lexicon: necessity
Sentence where lexicon occurs: A changing society should not cling to traditional family models.

As get_argument_spans yields spaCy Spans it is trivial to retrieve things like average word embeddings for sentences that contain arguing lexicon. These average embeddings can serve as input for your deep learning models.

You can for example access the built-in spaCy vectors for a sentence containing argument lexicon with

print("Vector type:", type(argument_span.sent.vector))
print("Vector shape:", argument_span.sent.vector.shape)

Which will output

Vector type: <class 'numpy.ndarray'>
Vector shape: (384,)

How it works

The MPQA arguing lexicon is made available under the GNU General Public License. It is a set of about 200 regular expressions with macros divided into 17 categories. For more information about how the lexicon was created we refer to the arguing lexicon homepage.

The Dutch arguing lexicon is a translation of the English lexicon and is available only through this extension.

Under the hood this extension parses the regular expressions and unpacks any macros inside of them. The doc._.arguments.get_argument_spans method tries to match any lexicon regular expression against the text of the input spaCy Doc. When a match is found the match gets transformed into a spaCy Span before it gets yielded.

doc._.arguments.get_argument_spans is the only recommended way of using this extension at the moment.

As the MPQA arguing lexicon is made available as a list of regular expressions we side stepped the spaCy Matcher, but we think that loading the lexicon as a set of matchers might improve the performance.

Citation

Please cite the following when using this software:

Swapna Somasundaran, Josef Ruppenhofer and Janyce Wiebe (2007) Detecting Arguing and Sentiment in Meetings, 
SIGdial Workshop on Discourse and Dialogue, Antwerp, Belgium, September 2007 (SIGdial Workshop 2007)

spacy_arguing_lexicon's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

spacy_arguing_lexicon's Issues

Make changes for spaCy universe approval

There are a few things pointed out in spaCy documentation that would reject this package in their universe.

  • remove "spacy" from the module name attribute
  • except an "attrs" property on init to allow override of the "arguments" attribute name

More details are at this link: https://spacy.io/usage/processing-pipelines#extensions

When this is done also add an entry to the spaCy universe repo to get included: https://github.com/explosion/spacy/blob/master/website/universe/README.md

Unable to add_pipe

Hi creator,

I have been unable to run the following command: nlp.add_pipe(ArguingLexiconParser(lang=nlp.lang))

As recommended in the error, I tried converting the add_pipe's argument to string: nlp.add_pipe('ArguingLexiconParser(lang=nlp.lang)') .

It still did not work and now there's a new error saying :
"can't find factory for ArguingLexiconParser(lang=nlp.lang) for Language English(en)."

Please can you help resolve this.

Thanks

StopIteration error on documents

I keep getting this error on strings that are not the one used in your example ("A changing society should not cling to..." )

StopIteration Traceback (most recent call last)
in ()
11 # """)
12
---> 13 argument_span = next(doc..arguments.get_argument_spans())
14 print("Argument lexicon:", argument_span.text)
15 print("Label of lexicon:", argument_span.label
)

StopIteration:

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.