spacy_ke's Introduction

spacy_ke: Keyword Extraction with spaCy.

⏳ Installation

pip install spacy_ke

🚀 Quickstart

Usage as a spaCy pipeline component (spaCy v2.x.x)

import spacy
import spacy_ke

# load spacy model
nlp = spacy.load("en_core_web_sm")

# spacy v3.0.x factory.
# if you're using spacy v2.x.x swich to `nlp.add_pipe(spacy_ke.Yake(nlp))`
nlp.add_pipe("yake")

doc = nlp(
    "Natural language processing (NLP) is a subfield of linguistics, computer science, and artificial intelligence "
    "concerned with the interactions between computers and human language, in particular how to program computers "
    "to process and analyze large amounts of natural language data. "
)

for keyword, score in doc._.extract_keywords(n=3):
    print(keyword, "-", score)

Configure the pipeline component

Normally you'd want to configure the keyword extraction pipeline according to its implementation.

window: int = 2  # default
lemmatize: bool = False  # default
candidate_selection: str = "ngram"  # default, use "chunk" for noun phrase selection.

nlp.add_pipe(
    Yake(
        nlp,
        window=window,  # default
        lemmatize=lemmatize,  # default
        candidate_selection="ngram"  # default, use "chunk" for noun phrase selection
    )
)

And if you want to define a custom candidate selection use the example below.

from typing import Iterable
from spacy.tokens import Doc
from spacy_ke.util import registry, Candidate


@registry.candidate_selection.register("custom")
def custom_selection(doc: Doc, n=3) -> Iterable[Candidate]:
    ...


nlp.add_pipe(
    Yake(
        nlp,
        candidate_selection="custom"
    )
)

Development

Set up virtualenv

$ python -m venv .venv
$ source .venv/bin/activate

Install dependencies

$ pip install -U pip
$ pip install -r requirements-dev.txt

Run unit test

$ pytest

Run black (code formatter)

$ black spacy_ke/ --config=pyproject.toml

Release package (via twine)

$ python setup.py upload

References

[1] A Review of Keyphrase Extraction

@article{DBLP:journals/corr/abs-1905-05044,
  author    = {Eirini Papagiannopoulou and
               Grigorios Tsoumakas},
  title     = {A Review of Keyphrase Extraction},
  journal   = {CoRR},
  volume    = {abs/1905.05044},
  year      = {2019},
  url       = {http://arxiv.org/abs/1905.05044},
  archivePrefix = {arXiv},
  eprint    = {1905.05044},
  timestamp = {Tue, 28 May 2019 12:48:08 +0200},
  biburl    = {https://dblp.org/rec/journals/corr/abs-1905-05044.bib},
  bibsource = {dblp computer science bibliography, https://dblp.org}
}

[2] pke: an open source python-based keyphrase extraction toolkit.

@InProceedings{boudin:2016:COLINGDEMO,
  author    = {Boudin, Florian},
  title     = {pke: an open source python-based keyphrase extraction toolkit},
  booktitle = {Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: System Demonstrations},
  month     = {December},
  year      = {2016},
  address   = {Osaka, Japan},
  pages     = {69--73},
  url       = {http://aclweb.org/anthology/C16-2015}
}

spacy_ke's People

Contributors

Stargazers

Watchers

spacy_ke's Issues

Support spaCy v3.0

Originally posted by @kavorite in #1 (comment)

Config issue

In example:
you cannot run this:
yake.cfg["candidate_selection"] = {"ngram": 3}

because TypeError: unhashable type: 'dict'

cant use with zh model

im use spacy and spacy_ke with zh_core_web_md to extract keyword but get some errors hop can tell me how to fix

'''
ValueError: [E002] Can't find factory for 'PositionRank' for language Chinese (zh). This usually happens when spaCy calls nlp.create_pipe with a custom component name that's not registered on the current language class. If you're using a Transformer, make sure to install 'spacy-transformers'. If you're using a custom component, make sure you've added the decorator @Language.component (for function components) or @Language.factory (for class components).

Available factories: attribute_ruler, tok2vec, merge_noun_chunks, merge_entities, merge_subtokens, token_splitter, parser, beam_parser, entity_linker, ner, beam_ner, entity_ruler, lemmatizer, tagger, morphologizer, senter, sentencizer, textcat, spancat, textcat_multilabel, yake, textrank, positionrank, topicrank, transformer

'''

Recommend Projects

talmago / spacy_ke Goto Github PK

spacy_ke's Introduction

spacy_ke: Keyword Extraction with spaCy.

⏳ Installation

🚀 Quickstart

Usage as a spaCy pipeline component (spaCy v2.x.x)

Configure the pipeline component

Development

References

spacy_ke's People

Contributors

Stargazers

Watchers

Forkers

spacy_ke's Issues

Support spaCy v3.0

Config issue

cant use with zh model

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent