Giter VIP home page Giter VIP logo

xswem's Introduction

XSWEM

Build Status Build Status

A simple and explainable deep learning model for NLP implemented in TensorFlow.

Based on SWEM-max as proposed by Shen et al. in Baseline Needs More Love: On Simple Word-Embedding-Based Models and Associated Pooling Mechanisms, 2018.

This package is currently in development. The purpose of this package is to make it easy to train and explain SWEM-max.

You can find demos of the functionality we have implemented in the notebooks directory of the package. Each notebook has a badge that allows you to run it yourself in Google Colab. We will add more notebooks as new functionality is added.

For a demo of how to train a basic SWEM-max model see train_xswem.ipynb.

Local Explanations

We are currently implementing some methods we have developed for local explanations.

local_explain_most_salient_words

So far we have only implemented the local_explain_most_salient_words method. This method extracts the words the model has learnt as most salient from a given input sentence. Below we show an example of this method using a sample from the ag_news dataset. This method is explained in more detail in the local_explain_most_salient_words.ipynb notebook.

local_explain_most_salient_words.png

Global Explanations

We have implemented the global explainability method proposed in section 4.1.1 of the original paper. You can see a demo of this method in the notebook global_explain_embedding_components.ipynb.

How to install

This package is hosted on PyPI and can be installed using pip.

pip install xswem

xswem's People

Contributors

kieranlitschel avatar

Stargazers

 avatar

Watchers

 avatar  avatar

xswem's Issues

Add option to initialize model with pre-trained GloVe word embeddings

We have found that we are able to achieve similar performance by initializing word embeddings randomly. But in the original paper, the author initialized them with pre-trained GloVe word embeddings. We should enable this functionality by implementing the following:

  • Allow users to initialize the embedding layer with their own pre-trained weights. We should recommend them to use the pre-trained GloVe weights. Words that the user does not have pre-trained weights for should be initialized using a random uniform distribution with values in the range -0.01 to 0.01.

The authors also sometimes added a Dense layer between embedding and pooling layers to allow the model to adapt the embeddings to the task of interest. We should allow users to do this by implementing the following:

  • Allow users to optionally specify a Dense layer should be included between the embedding and pooling layer. It should have the same number of units as the embedding layer and use a ReLU activation function.

Typically we would freeze the embedding layer when using pre-trained weights. But the author does not mention this explicitly in their paper, nor freeze the weights in their original source code. So in our implementation embedding weights are trainable.

Dropout before max pooling killing embedding components during training

When a unit is dropped out its value is set to 0. As we are applying dropout directly to the word embeddings, for long input sequences, it becomes increasingly likely that at least one component in each dimension will be set to zero. This means that negative components can often die, as they get stuck with negative values due to the zeros being introduced by dropout being taken as the maximum.

This is particularly problematic as our distribution for initializing embeddings is centred at zero, meaning around half of the components are initialized as values less than zero. The histogram below exemplifies this issue.

image

One possible solution is to initialize all embedding weights with values greater than zero. This should significantly reduce the number of dying units, but units will still die if they are updated with a value less than zero.

A better solution would be to make it so that zero is ignored during the max-pooling operation. But this may slow down training significantly, which would make the first solution more preferable.

Implement global explainability for word embedding components

In section 4.1.1 of the original paper, the authors proposed a method for interpreting the components of the embeddings learned by SWEM-max. We should implement this method in XSWEM.

To do this we need to first implement a function that allows users to generate a histogram from their word embeddings so that they can confirm whether the embeddings learned for their model are also sparse. Second, we need to implement a function that returns the n words with the largest values for each component (n=5 should be the default).

Implement method to determine most salient words

At maximum, d words (where d is size of the embedding size) from the input sentence contribute to the output of the network. This is because of the max-pooling layer, with it only keeping the maximum value of each dimension across the embeddings of the input sentence. This means at maximum d words contribute to the output of the max-pooling layer.

Thus where d is smaller than the unique words in the input sentence, the max-pooling layer has the effect of shortlisting the d most important words needed to make a prediction. If d is larger than the number of the unique words in the input sentence, it still can have the effect of shortlisting words because some words may have the maximum value for multiple dimensions, but shortlisting is not guaranteed.

We can find the shortlisted words by taking an argmax for each dimension across the embeddings of the input sentence. We should add a function to XSWEM to do this. This can be used as a method for local explainability.

Allow users to set the parameters of any layer

We should allow users to set the parameter of any layer.

We could do this using a configuration dictionary passed to the constructor of the model. This could map layer names to the configuration dictionary for the corresponding layer. The layers configuration dictionary could then be unpacked in the layers constructor using **kwargs.

Investigate setting embedding weights not seen in training to zero to reduce saved model size

In #10 we observed that it looks like a lot of the weights in the embeddings are never seen during training so maintain their initialized values. From what we can tell this seems to be happening for most weights initialized with a negative value. If we randomly initialize our embedding layer, weights that have never been seen during training have little to contribute to prediction at test time as their value is random. We may be able to use this to make our saved models smaller.

After training, we could check which weights have values that have not changed from their initialized value and set them to zero. Then when saving the matrix of embedding weights we just need to save the non-zero values and where they occur in the matrix.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.