Giter VIP home page Giter VIP logo

abbreviation-extraction's Introduction

Extraction of abbreviation-definition pairs

Build Status

Version: 0.1.5

This is a Python3 implementation of the Schwartz-Hearst algorithm for identifying abbreviations and their corresponding definitions in free text[1].

The original implementation is in Java, and Vincent Van Asch created a Python2 implementation at

http://www.cnts.ua.ac.be/~vincent/scripts/abbreviations.py

I have taken the liberty of taking Vincent's code, simplifying it a little, refactoring it for Python 3, and adding some tests.

This version outputs a Python dictionary of abbreviation:definition pairs.

As per Vincent's code, this version is licensed under GPLv3. See LICENSE.txt

Installation for command-line use

pip install -r requirements.txt

Usage

From the command line

python abbreviations/schwartz_hearst.py <input file>

Installation as a module

python3 setup.py install

or

pip install abbreviations

Usage

from abbreviations import schwartz_hearst

pairs = schwartz_hearst.extract_abbreviation_definition_pairs(doc_text='The emergency room (ER) was busy')
pairs = schwartz_hearst.extract_abbreviation_definition_pairs(file_path='<path_to_file>')

[1] A. Schwartz and M. Hearst (2003) A Simple Algorithm for Identifying Abbreviations Definitions in Biomedical Text. Biocomputing, 451-462.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.