sentence_similarity's Introduction

Measuring Pairwise Sentence Similarities

Motivation

Recent applications of deep learning methods to language modelling tasks have spawned a variety of context-free and contextual natural language representation models. Context-free embeddings map each word or phrase in the vocabulary to a single real vector in the continuous semantic space. A limitation of this approach is that each element of the vocabulary only has a single vector representation, even if it may have multiple meanings (for example: the word ‘lead’ in phrases ‘lead astray’ and ‘lead pipe’). Contextual embeddings attempt to resolve this issue by incorporating information from neighboring words into each word’s vector representation. The objective of contextual embeddings is to produce representation spaces for sequences of words (like sentences and paragraphs) that generalize well across NLP tasks.

The aim of this project is to investigate the performance of context-free and contextual embeddings in measuring the semantic similarity of sentence pairs. There are four questions we explore:

How well can context-free and context-aware embeddings identify relatedness in sentence pairs? (section 4 of write-up)
How do NLP pre-processing techniques (such as removing stopwords, lemmatization, stemming) effect the performance of context-free and context-aware embeddings? (section 5 of write-up)
Can the performance of context-free models be improved by incorporating information about part-of-speech? (section 6 of write-up)
Do dimension reduction techniques (like PCA and TSNE) qualitatively capture the similarities between sentences? Do unsupervised clustering methods reveal meaning groups of sentences (section 7 of write-up)?

Results and Discussion

Results and interpretations can be found in the write-up.. The sources for the pre-trained embeddings used in this analysis are also included in this pdf.

Code

Jupyter notebook with code and comments

Recommend Projects

ijazulhaq1 / sentence_similarity Goto Github PK

sentence_similarity's Introduction

Measuring Pairwise Sentence Similarities

Motivation

Results and Discussion

Code

sentence_similarity's People

Contributors

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent