Giter VIP home page Giter VIP logo

hearst's Introduction

hearst

Uses python Hearst pattern code from https://github.com/mmichelsonIF/hearst_patterns_python

USAGE: python3 hearst.py [] [-o ] [-s]

If no input file is given, it will process text read from stdin. If no output file is given, it will write to stdout. The -s flag limits the parrersn to the oritinal (simple) ones from thew 1992 paper.

Output has tab-separated lines like 'specific\tgeneral\tsource_file', e.g., 'apple\tfruit\trecipies.txt'

From python, call hearst.hearstify(, []) to get an iterable of pairs

Works in both Python3 and 2.X

background

See: Marti A. Hearst. 1992. Automatic acquisition of hyponyms from large text corpora. In Proceedings of the 14th conference on Computational linguistics - Volume 2 (COLING '92), Vol. 2. Association for Computational Linguistics, Stroudsburg, PA, USA, 539-545. DOI: https://doi.org/10.3115/992133.992154 http://people.ischool.berkeley.edu/~hearst/papers/coling92.pdf

some sample output

apt_2017 is a directory with some sample text files that you can use for testing or experimentation

apt_sg.txt is the output produced by hearst.py run on more than 400 APT documents where the 1st column is the specific term and the second is the generic one.

apt_sg_uniq.txt are those results after processing with sort and uniq

apt_gs.txt is the output produced by hearst.py run on more than 400 APT documents where the 1st column is the generic term and the second is the specific one.

apt_gs_uniq.txt are those results after processing with sort and uniq

hearst's People

Contributors

finin avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.