Giter VIP home page Giter VIP logo

text-summarization-on-webpages's Introduction

Text-Summarization-on-webpages

In this Notebook, we have tried to implement Text Summarisation using Natural Language Processing(NLP). This can be used to summarise various articles, reviews, news reports available on different webpages. The text to be summarised is taken from various webpages by using Web Scrapping.

There are two types of Text Summarisation techniques:

  1. Extractive Text Summarisation : It is a method to find the most informative sentences within a large body of text which are used to form a summary.
  2. Abstractive Text Summarisaton : It is a method to generate concise phrases that are semantically consistent with the large body of text.

We have implemented Extractive Text Summarisation in this Notebook.

This notebook contains Text Summarisation based on two Scoring methods namely:

  1. Tf-idf Scoring Method : Tf-idf stands for "Term Frequency - Inverse Document Frequency". Tf-idf weight is a statistical measure used to evaluate how important a word is to a document in a collection or corpus. We have used this weight to get the importance of each sentence in our Text material.
  2. Word Embeddings and PageRank Scoring Method : Word Embeddings are a method to convert language words into a vector form to apply various mathematical functions on them to extract various features of words. Using Word Embeddings, we will evaluate Cosine Similarity of Sentences and pass it on the PageRank Algorithm. Finally, we will implement TextRank Algorithm.

The second method is implemented using Google's Word2Vec and Stanford's Glove word embeddings.

Finally, an All in One Model is implemented by combining the two methods above mentioned.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.