Giter VIP home page Giter VIP logo

distanttermextractor's Introduction

DistantTermExtractor

Distant Supervision による用語抽出を行います.

使い方

python scripts/main.py -h

Usage:
    test (-c <root_cat> | --category <root_cat>) [-d <depth> | --depth <depth>] [-o <output_dir> | --output <output_dir>] [-l <log_file> | --log <log_file>]
    test -h | --help
    test -v | --version

Option:
     -h, --help
        Show this screen.
     -v, --version
        Show version.
     -c <root_cat>, --category <root_cat>
        ルートカテゴリ名
     -d <depth>, --depth <depth>
        カテゴリの深さ [default: 1]
     -o <output_dir>, --output <output_dir>
        取得したシードや記事本文,抽出した用語を出力するディレクトリ [default: root/data]
     -l <log_file>, --log <log_file> [dafault:]
        ログ出力先ファイル


python scripts/main.py -c 自動車工学 -l log.txt

-oオプションで指定したディレクトリに様々なファイルが出力されます.
(指定しない場合は./dataディレクトリが作成されます.)
最終的に獲得した単語は./data/output/fp_words.txtに出力されます.

必要なツール

  • docopt
  • CRF++
  • mecab
  • unidic-mecab

docoptはpipで,CRF++はサイトから,unidic-mecabもサイトから
unidic-mecabはbinバージョンをダウンロードしてください.
そして,展開した中身からdicrc以外をリポジトリのmy_unidicにコピーしてください.

Licence

MIT

distanttermextractor's People

Contributors

ryosukee avatar

Watchers

James Cloos avatar f avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.