Giter VIP home page Giter VIP logo

keyphrase-extraction-for-research-articles's Introduction

Keyphrase-extraction-for-research-articles

Model for extracting and ranking keyphrases from research articles using an unsupervised theme and position biased-PageRank graph. Part of Deep learning for NLP course in Fall 19 semester at the University of Illinois at Chicago.

Requirements:

  • Numpy
  • Scipy
  • stemming
  • StanfordCoreNLP
  • fastText embeddings

    Embeddings used: wiki-news-300d-1M-subword.vec.zip. 1 million word vectors trained with subword infomation on Wikipedia 2017, UMBC webbase corpus and statmt.org news dataset (16B tokens). T. Mikolov, E. Grave, P. Bojanowski, C. Puhrsch, A. Joulin. Advances in Pre-Training Distributed Word Representations, LREC 2018

Dataset used

  • Data from KDD and WWW conferences

    Sujatha Das Gollapalli and Cornelia Caragea. "Extracting Keyphrases from Research Papers using Citation Networks." In: Proceedings of the 28th American Association for Artificial Intelligence (AAAI 2014)

Model

The goal of this project is to build a keyphrase extraction model that uses candidate keyphrases extracted from scholarly articles and rank them using a modified novel PageRank algorithm in an unsupervised graph model.

Keyphrase extraction enables faster processing by mapping multiword phrases to a document, that describe it the best. The task is important for building automated systems that are able to provide high level contextual and descriptive information about research articles which may be used for recommending articles to readers, identifying potential reviewers, highlighting research trends and mapping citations to articles. This project aims to generate candidate keyphrases from an embedding model and rank them using a modified PageRank algorithm while capturing information that would accurately represent or describe the paper. Some previous graph based models such as Key2vec and PositionRank, amongst other supervised and unsupervised keyphrase extraction models, have been used for background and ideation of the project.

Credits

  • PositionRank: Corina Florescu and Cornelia Caragea. "PositionRank: An Unsupervised Approach to Keyphrase Extraction from Scholarly Documents." In: Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL 2017)
  • Key2vec: Debanjan Mahata, John Kuriakose, Rajiv Ratn Shah, Roger Zimmermann. "Key2Vec: Automatic Ranked Keyphrase Extraction from Scientific Articles using Phrase Embeddings." In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers)
  • Ideas for PositionRank implementation: https://github.com/ymym3412/position-rank

keyphrase-extraction-for-research-articles's People

Contributors

tuhinkundu avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.