jessevig / bertviz Goto Github PK

BertViz: Visualize Attention in NLP Models (BERT, GPT2, BART, etc.)

Home Page: https://towardsdatascience.com/deconstructing-bert-part-2-visualizing-the-inner-workings-of-attention-60a16d86b5c1

License: Apache License 2.0

JavaScript 11.38% Python 86.19% Jupyter Notebook 2.43%

natural-language-processing machine-learning visualization neural-network pytorch nlp bert transformer gpt2 roberta

bertviz's Introduction

BertViz

Visualize Attention in NLP Models

Quick Tour • Getting Started • Colab Tutorial • Paper

BertViz is an interactive tool for visualizing attention in Transformer language models such as BERT, GPT2, or T5. It can be run inside a Jupyter or Colab notebook through a simple Python API that supports most Huggingface models. BertViz extends the Tensor2Tensor visualization tool by Llion Jones, providing multiple views that each offer a unique lens into the attention mechanism.

Get updates for this and related projects on Twitter .

🚀 Quick Tour

Head View

The head view visualizes attention for one or more attention heads in the same layer. It is based on the excellent Tensor2Tensor visualization tool by Llion Jones.

🕹 Try out the head view in the Interactive Colab Tutorial (all visualizations pre-loaded).

Model View

The model view shows a bird's-eye view of attention across all layers and heads.

🕹 Try out the model view in the Interactive Colab Tutorial (all visualizations pre-loaded).

Neuron View

The neuron view visualizes individual neurons in the query and key vectors and shows how they are used to compute attention.

🕹 Try out the neuron view in the Interactive Colab Tutorial (all visualizations pre-loaded).

⚡️ Getting Started

Running BertViz in a Jupyter Notebook

From the command line:

pip install bertviz

You must also have Jupyter Notebook and ipywidgets installed:

pip install jupyterlab
pip install ipywidgets

(If you run into any issues installing Jupyter or ipywidgets, consult the documentation here and here.)

To create a new Jupyter notebook, simply run:

jupyter notebook

Then click New and select Python 3 (ipykernel) if prompted.

Running BertViz in Colab

To run in Colab, simply add the following cell at the beginning of your Colab notebook:

!pip install bertviz

Sample code

Run the following code to load the xtremedistil-l12-h384-uncased model and display it in the model view:

from transformers import AutoTokenizer, AutoModel, utils
from bertviz import model_view
utils.logging.set_verbosity_error()  # Suppress standard warnings

model_name = "microsoft/xtremedistil-l12-h384-uncased"  # Find popular HuggingFace models here: https://huggingface.co/models
input_text = "The cat sat on the mat"  
model = AutoModel.from_pretrained(model_name, output_attentions=True)  # Configure model to return attention values
tokenizer = AutoTokenizer.from_pretrained(model_name)
inputs = tokenizer.encode(input_text, return_tensors='pt')  # Tokenize input text
outputs = model(inputs)  # Run model
attention = outputs[-1]  # Retrieve attention from model outputs
tokens = tokenizer.convert_ids_to_tokens(inputs[0])  # Convert input ids to token strings
model_view(attention, tokens)  # Display model view

The visualization may take a few seconds to load. Feel free to experiment with different input texts and models. See Documentation for additional use cases and examples, e.g., encoder-decoder models.

Running sample notebooks

You may also run any of the sample notebooks included with BertViz:

git clone --depth 1 [email protected]:jessevig/bertviz.git
cd bertviz/notebooks
jupyter notebook

🕹 Interactive Tutorial

Check out the Interactive Colab Tutorial to learn more about BertViz and try out the tool. Note: all visualizations are pre-loaded, so there is no need to execute any cells.

📖 Documentation

Self-attention models (BERT, GPT-2, etc.)
- Head and Model Views
- Neuron View
Encoder-decoder models (BART, T5, etc.)
Installing from source
Additional options
Limitations

Self-attention models (BERT, GPT-2, etc.)

Head and Model Views

First load a Huggingface model, either a pre-trained model as shown below, or your own fine-tuned model. Be sure to set output_attentions=True.

from transformers import AutoTokenizer, AutoModel, utils
utils.logging.set_verbosity_error()  # Suppress standard warnings
tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")
model = AutoModel.from_pretrained("bert-base-uncased", output_attentions=True)

Then prepare inputs and compute attention:

inputs = tokenizer.encode("The cat sat on the mat", return_tensors='pt')
outputs = model(inputs)
attention = outputs[-1]  # Output includes attention weights when output_attentions=True
tokens = tokenizer.convert_ids_to_tokens(inputs[0])

Finally, display the attention weights using the head_view or model_view functions:

from bertviz import head_view
head_view(attention, tokens)

Examples: DistilBERT (Model View Notebook, Head View Notebook)

For full API, please refer to the source code for the head view or model view.

Neuron View

The neuron view is invoked differently than the head view or model view, due to requiring access to the model's query/key vectors, which are not returned through the Huggingface API. It is currently limited to custom versions of BERT, GPT-2, and RoBERTa included with BertViz.

# Import specialized versions of models (that return query/key vectors)
from bertviz.transformers_neuron_view import BertModel, BertTokenizer
from bertviz.neuron_view import show

model_type = 'bert'
model_version = 'bert-base-uncased'
do_lower_case = True
sentence_a = "The cat sat on the mat"
sentence_b = "The cat lay on the rug"
model = BertModel.from_pretrained(model_version, output_attentions=True)
tokenizer = BertTokenizer.from_pretrained(model_version, do_lower_case=do_lower_case)
show(model, model_type, tokenizer, sentence_a, sentence_b, layer=2, head=0)

Examples: BERT (Notebook, Colab) • GPT-2 (Notebook, Colab) • RoBERTa (Notebook)

For full API, please refer to the source.

Encoder-decoder models (BART, T5, etc.)

The head view and model view both support encoder-decoder models.

First, load an encoder-decoder model:

from transformers import AutoTokenizer, AutoModel

tokenizer = AutoTokenizer.from_pretrained("Helsinki-NLP/opus-mt-en-de")
model = AutoModel.from_pretrained("Helsinki-NLP/opus-mt-en-de", output_attentions=True)

Then prepare the inputs and compute attention:

encoder_input_ids = tokenizer("She sees the small elephant.", return_tensors="pt", add_special_tokens=True).input_ids
with tokenizer.as_target_tokenizer():
    decoder_input_ids = tokenizer("Sie sieht den kleinen Elefanten.", return_tensors="pt", add_special_tokens=True).input_ids

outputs = model(input_ids=encoder_input_ids, decoder_input_ids=decoder_input_ids)

encoder_text = tokenizer.convert_ids_to_tokens(encoder_input_ids[0])
decoder_text = tokenizer.convert_ids_to_tokens(decoder_input_ids[0])

Finally, display the visualization using either head_view or model_view.

from bertviz import model_view
model_view(
    encoder_attention=outputs.encoder_attentions,
    decoder_attention=outputs.decoder_attentions,
    cross_attention=outputs.cross_attentions,
    encoder_tokens= encoder_text,
    decoder_tokens = decoder_text
)

You may select Encoder, Decoder, or Cross attention from the drop-down in the upper left corner of the visualization.

Examples: MarianMT (Notebook) • BART (Notebook)

For full API, please refer to the source code for the head view or model view.

Installing from source

git clone https://github.com/jessevig/bertviz.git
cd bertviz
python setup.py develop

Additional options

Dark / light mode

The model view and neuron view support dark (default) and light modes. You may set the mode using the display_mode parameter:

model_view(attention, tokens, display_mode="light")

Filtering layers

To improve the responsiveness of the tool when visualizing larger models or inputs, you may set the include_layers parameter to restrict the visualization to a subset of layers (zero-indexed). This option is available in the head view and model view.

Example: Render model view with only layers 5 and 6 displayed

model_view(attention, tokens, include_layers=[5, 6])

For the model view, you may also restrict the visualization to a subset of attention heads (zero-indexed) by setting the include_heads parameter.

Setting default layer/head(s)

In the head view, you may choose a specific layer and collection of heads as the default selection when the visualization first renders. Note: this is different from the include_heads/include_layers parameter (above), which removes layers and heads from the visualization completely.

Example: Render head view with layer 2 and heads 3 and 5 pre-selected

head_view(attention, tokens, layer=2, heads=[3,5])

You may also pre-select a specific layer and single head for the neuron view.

Visualizing sentence pairs

Some models, e.g. BERT, accept a pair of sentences as input. BertViz optionally supports a drop-down menu that allows user to filter attention based on which sentence the tokens are in, e.g. only show attention between tokens in first sentence and tokens in second sentence.

Head and model views

To enable this feature when invoking the head_view or model_view functions, set the sentence_b_start parameter to the start index of the second sentence. Note that the method for computing this index will depend on the model.

Example (BERT):

from bertviz import head_view
from transformers import AutoTokenizer, AutoModel, utils
utils.logging.set_verbosity_error()  # Suppress standard warnings

# NOTE: This code is model-specific
model_version = 'bert-base-uncased'
model = AutoModel.from_pretrained(model_version, output_attentions=True)
tokenizer = AutoTokenizer.from_pretrained(model_version)
sentence_a = "the rabbit quickly hopped"
sentence_b = "The turtle slowly crawled"
inputs = tokenizer.encode_plus(sentence_a, sentence_b, return_tensors='pt')
input_ids = inputs['input_ids']
token_type_ids = inputs['token_type_ids'] # token type id is 0 for Sentence A and 1 for Sentence B
attention = model(input_ids, token_type_ids=token_type_ids)[-1]
sentence_b_start = token_type_ids[0].tolist().index(1) # Sentence B starts at first index of token type id 1
token_ids = input_ids[0].tolist() # Batch index 0
tokens = tokenizer.convert_ids_to_tokens(token_ids)    
head_view(attention, tokens, sentence_b_start)

Neuron view

To enable this option in the neuron view, simply set the sentence_a and sentence_b parameters in neuron_view.show().

Obtain HTML representations

Support to retrieve the generated HTML representations has been added to head_view, model_view and neuron_view.

Setting the 'html_action' parameter to 'return' will make the function call return a single HTML Python object that can be further processed. Remember you can access the HTML source using the data attribute of a Python HTML object.

The default behavior for 'html_action' is 'view', which will display the visualization but won't return the HTML object.

This functionality is useful if you need to:

Save the representation as an independent HTML file that can be accessed via web browser
Use custom display methods as the ones needed in Databricks to visualize HTML objects

Example (head and model views):

from transformers import AutoTokenizer, AutoModel, utils
from bertviz import head_view

utils.logging.set_verbosity_error()  # Suppress standard warnings
tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")
model = AutoModel.from_pretrained("bert-base-uncased", output_attentions=True)

inputs = tokenizer.encode("The cat sat on the mat", return_tensors='pt')
outputs = model(inputs)
attention = outputs[-1]  # Output includes attention weights when output_attentions=True
tokens = tokenizer.convert_ids_to_tokens(inputs[0]) 

html_head_view = head_view(attention, tokens, html_action='return')

with open("PATH_TO_YOUR_FILE/head_view.html", 'w') as file:
    file.write(html_head_view.data)

Example (neuron view):

# Import specialized versions of models (that return query/key vectors)
from bertviz.transformers_neuron_view import BertModel, BertTokenizer
from bertviz.neuron_view import show

model_type = 'bert'
model_version = 'bert-base-uncased'
do_lower_case = True
sentence_a = "The cat sat on the mat"
sentence_b = "The cat lay on the rug"
model = BertModel.from_pretrained(model_version, output_attentions=True)
tokenizer = BertTokenizer.from_pretrained(model_version, do_lower_case=do_lower_case)
html_neuron_view = show(model, model_type, tokenizer, sentence_a, sentence_b, layer=2, head=0, html_action='return')

with open("PATH_TO_YOUR_FILE/neuron_view.html", 'w') as file:
    file.write(html_neuron_view.data)

Non-Huggingface models

The head view and model view may be used to visualize self-attention for any standard Transformer model, as long as the attention weights are available and follow the format specified in head_view and model_view (which is the format returned from Huggingface models). In some case, Tensorflow checkpoints may be loaded as Huggingface models as described in the Huggingface docs.

⚠️ Limitations

Tool

This tool is designed for shorter inputs and may run slowly if the input text is very long and/or the model is very large. To mitigate this, you may wish to filter the layers displayed by setting the include_layers parameter, as described above.
When running on Colab, some of the visualizations will fail (runtime disconnection) when the input text is long. To mitigate this, you may wish to filter the layers displayed by setting the include_layers parameter, as described above.
The neuron view only supports the custom BERT, GPT-2, and RoBERTa models included with the tool. This view needs access to the query and key vectors, which required modifying the model code (see transformers_neuron_view directory), which has only been done for these three models.

Attention as "explanation"?

Visualizing attention weights illuminates one type of architecture within the model but does not necessarily provide a direct explanation for predictions [1, 2, 3].
If you wish to understand how the input text influences output predictions more directly, consider saliency methods provided by tools such as the Language Interpretability Toolkit or Ecco.

🔬 Paper

A Multiscale Visualization of Attention in the Transformer Model (ACL System Demonstrations 2019).

Citation

@inproceedings{vig-2019-multiscale,
    title = "A Multiscale Visualization of Attention in the Transformer Model",
    author = "Vig, Jesse",
    booktitle = "Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: System Demonstrations",
    month = jul,
    year = "2019",
    address = "Florence, Italy",
    publisher = "Association for Computational Linguistics",
    url = "https://www.aclweb.org/anthology/P19-3007",
    doi = "10.18653/v1/P19-3007",
    pages = "37--42",
}

Authors

Jesse Vig

🙏 Acknowledgments

We are grateful to the authors of the following projects, which are incorporated into this repo:

License

This project is licensed under the Apache 2.0 License - see the LICENSE file for details

bertviz's People

Contributors

Stargazers

Watchers

Forkers

mugurd shankar0206 sftwrngnr jaykimbravekjh zorrock codeaudit y12uc231 little1tow yyht amimul mshtyusuf victorx98 mkim0710 guanlongtianzi pandinosaurus vikingmew ankitshah009 devlier mormukut11 nyiel hsouporto harunpehlivan adedzy bibin-sebastian cosecant-csc hhy5277 pengxy charlottesean semsevens shinichr yishuihanhan sjyttkl qianrenjian xisnzhang mntabassm peteroxic shubhampachori12110095 jsedoc berryhn rohitgarg253 whaleloops duanzhihua gdcollect ersawant dwright37 lfsblack hainan89 pglock dapeng2018 bordias namapolo rubenszimbres donaldtse118 byo-ai askspoke stanxii zhouyonglong triper1022 xiaojie2018 jbdatascience dineshkumares bin2000 nojuman churximi mejihero sunny8898 lduml indexfziq kakumarabhishek amyxie361 1230113202 leekltw madscience101 yucoian akakakakakaa dsl-light timrajan johndpope badryoubiidrissi rakeshchada tarsbase mm86443 htw5295 devross leemengtw derrywijaya tonydeep qitong peihuaining sl0v3c pikejun joheny yinjiangjin bayartsogt-ya righ120 melody1235813 hydercps nimnab matttang7 casually-pylearner

bertviz's Issues

How do I export vector graphics?

After using bertviz to visualize attention in a notebook, how to export vector pictures that meet the requirements of the paper?

Chinese BERT model can be used represented by words instead of character

I want to ask about the Chinese BERT model can be used represented by words instead of character?
Because when I do BERT visual for Chinese can only see its attention from character to character.
I want to see attention from words to words. Can I change this?

Thanks a lot for your help

Saving visualizations

Thanks for the great tool!

It would be nice to be able to save the visualizations for specific layers/heads as images. I have not been able to find a spot in the model/head/neuron_view.js file to add a saving function.

Do you maybe have a suggestion on how to save the visualizations as images?

Thanks!

BERTviz for NER

Hi,

Could you let me know how to use bertviz for NER task?. @jessevig

Classification words importance

Is there any way to use bertviz to visualise the importance of the different words respect to a given prediction of a classification task (BertClassifier)?
Similar to this: https://docs.fast.ai/text.interpret.html#interpret

Thank you

Attention weight for a sentence

Hi,
Is it possible to calculate the attention weight for a full sentence.

For example, for sentence "This is test"

the attention matrix for the first layer and the first attention head is
[
[0.4,0.1,0.2],
[0.1,0.5,0.3],
[0.7,0.3,0.2]
]

if I get the average of that matrix, does that reflect the attention weight for the full sentence or that does not make sense??

How to change the model_version to my own Fine-Tune BERT model

Hello, I want to ask about how to change the model_version to my own Fine-Tune BERT model in the head_view_bert.ipynb case
When I try to change to my model like this it always has the error

Attention visualization for RoBERTa is blank, raises KeyError using call_html()

Hi @jessevig ,

Thank you so much for making this library. It's an incredibly effective and easy way to create beautiful attention visualizations. I'm currently trying to implement for a tutorial in the DeepChem library. However, I'm running into a multitude of issues. When I use the RoBERTa notebook code, it doesn't display anything and is blank:

When I try to re-use the BERT colab code on RoBERTa, it throws the following error:

I'm able to successfully run the example BERT code, it runs successfully, however:

I'd love to get your advice on how I can fix this. I'm really interested in visualizing the attention using the library for validating the robustness of the model, and for future technical presentations and papers.

The Colab notebook can be accessed here.

Thanks!

General Question about word embedding and gradient output

Hi,
I am using BertForSequenceClassification to do binary classification task. Do you know how to get word embedding and gradient output for each word in BertForSequenceClassification https://github.com/huggingface/transformers/blob/7d7fe4997f83d6d858849a659302b9fdc32c3337/src/transformers/modeling_bert.py#L1075?

My use case: I want model to automatically tag a certain number of indicative words in test sentence to interpret why this sentence is labeled as COVID or non COVID. So my solution is to interpret this using gradient. But i haven't figured out how to output word embedding and gradient

Can I use this to do previous-word prediction?

Hi,
I'm using Roberta, GPT2, BERT and Grover for various tasks, I was hoping to see if I could use Bertviz to help me work out what the previous word would have been if it was or not present.

Ah thinking about this I could just use Roberta for that. But regardless, my question still stands.

Is it possible to use this for previous-word prediction?

Many thanks.
Vince.

visualization of only 3 layers / example model_view_xlnet.ipynb

I tried load XLNet only with three layers (it does work with full XLNet) but with three the example model_view_xlnet.ipynb does not work

config = XLNetConfig.from_pretrained('/transformers/')
config.n_layer = 3
config.num_labels = 3
model = XLNetModel.from_pretrained('/transformers/')

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-11-7c9c3356caa4> in <module>
     17 input_id_list = input_ids[0].tolist() # Batch index 0
     18 tokens = tokenizer.convert_ids_to_tokens(input_id_list)
---> 19 model_view(attention, tokens)

~/projects/bertviz/bertviz/model_view.py in model_view(attention, tokens, sentence_b_start, prettify_tokens)
     78     attn_seq_len = len(attn_data['all']['attn'][0][0])
     79     if attn_seq_len != len(tokens):
---> 80         raise ValueError(f"Attention has {attn_seq_len} positions, while number of tokens is {len(tokens)}")
     81     display(Javascript('window.params = %s' % json.dumps(params)))
     82     display(Javascript(vis_js))

ValueError: Attention has 768 positions, while number of tokens is 14

Runtime disconnected frequently

I am using bertviz with BertModel for visualisation., I'm using the neuron detail viz. Every time I pass a sentence of more than 5 words., the runtime gets disconnected. Is their someway else to still visualise them, I am trying to understand long term dependencies.

Is there a way to visualize the attention between encoder and decoder layers?

BertForSequenceClassification.from_pretrained

Hi, Thank you for this great work.
can I use this code to plot my model(I am useing BertForSequenceClassification.from_pretrained('bert-base-uncased', num_labels=2)

model_type = 'bert'
model_version = 'bert-base-uncased'
do_lower_case = True
model = model #(this my model)
#tokenizer = BertTokenizer.from_pretrained('bert-base-uncased', do_lower_case=True)
tokenizer = BertTokenizer.from_pretrained(model_version, do_lower_case=do_lower_case)
sentence_a = sentences[0]
sentence_b = sentences[1]
call_html()
show(model, model_type, tokenizer, sentence_a, sentence_b)
I changed only the model with my model, and the sentences and I got this error??!!please help or share any blog that explain how to plot my model
AttributeError: 'BertTokenizer' object has no attribute 'cls_token'

Thank you in advance

word attention weights

Hi,

Thank you for writing this tool.

I was wondering if there is any way to compute word-level attention and also point their respective positions within the context of a sentence in a given multi-sentence text.

Sticking to the same example below, how to find the BERT word attention for "cat" in two different sentences:
[the cat sat on the mat. the cat lay on the rug.]

Thanks!

encode_plus is not in GPT2 Tokenizer

It seems you removed encode_plus, what is the successor? All the notebook includes inputs = tokenizer.encode_plus(text, return_tensors='pt', add_special_tokens=True) which is wrong and raise an error.

"require"

Hey, first, thank you for both creating + maintaining this repo!

I'm trying to get the basic BERT visualization working using Chrome on my local machine. The notebook is throwing an error when I execute the JS cell.

%%javascript
require.config({
  paths: {
      d3: '//cdnjs.cloudflare.com/ajax/libs/d3/3.4.8/d3.min',
      jquery: '//ajax.googleapis.com/ajax/libs/jquery/2.0.0/jquery.min',
  }
});

returns an error: Javascript Error: require is not defined

I tried:
(1) doing a clean pip install from your requirements.txt to confirm I had everything
(2) I also found documentation from Jupyter indicating there's a specific jupyter-require package, which I installed; and also used the magic %load_ext jupyter_require; stopping / restarting / reloading the kernel, etc.

Any idea what's going on here?

update: i just tried opening in a notebook (vs the lab environment) and it worked fine! So that's a totally reasonable workaround for me. If there's an easy fix, would love to hear it, but otherwise feel free to just close this. thankya!

How to use model i've trained in this project

i've use the bert from huggingface to fine tuning the model for my task already. And i'm trying to use my model in this project so i can see my model in these 3 style provided in this project.
By the way, my task is text classification with pair of sentence.
Finally,thanks a lot to this paper and project, it's so excellent

Save attention visualizations as local html file

I'm running the attention visualizations on a server without GUI.
Is there an easy way to run, e.g., head_view_bert.py and save the interactive visualizations to a local .html file which can then be viewed on another machine?

Missing [CLS] token in XLNet

I want to know why there is not CLS token in the visualizaition of XLNet . I used your XLNet code to visualize my fine tuned XLNet but I am not getting in any [CLS] token in the input . Can you tell , how can i get that ?

Horizontal head view feature

Hi, thanks for the great visualization tool!

I'm just wondering whether we can have a feature which renders head view in horizontal direction? The reason is that it's more suitable to show the sequence of tokens in the horizontal direction for language like Chinese, Japanese or Korean.

In the above example, typical sentences in Chinese take about 6,70 characters but it already uses a lot of space showing 10 of them in the current head view.

Thanks again for the great tool!

How to use bertviz with Transformers model

Hi,
Can I use bertViz with Transformers model implemented using pytorch nn.Transform??

Thanks,
Fatma

not working on Safari?

Hey!
First of all, it was such a good work!
But some notebooks, "bertviz_detail.ipynb", are not working properly in safari.
Should it work on safari while working with jupyter?

Issues in visualizing a fine tuned model

BertModel finetuned for a sequence classification task does not give expected results on visualisation.
Ideally, the pretrained model should be loaded into BertForSequenceClassification, but that model does not return attentions scores for visualisation.
When loaded into BertModel (0 to 11 layers), I assume the 11th layer (right before classification layer in BertForSequenceClassification) is the right layer to check attention distribution.
But every word is equally attentive to every other word.
I am wondering what can be the possible reasons and how I can fix it.
Thanks.

Using the attention to summarize a document

I've been looking for a tool which can give me some type of token-based extractive summarization to solve an especially interesting problem in the Competitive Debate community. I think that this tool will help me solve it.

I've wanted to create a neural network which summarizes texts by using a "highlighter", that is, by summarizing documents out of the words used in the original document (but NOT the sentences). I cannot seem to find a neural network based method that does exactly what I'm asking.... but the attention mechanism (and it's visualizations) show highlights of a particular source document in terms of highlighting the most important parts to cause a transformation to document b. This seems to be what I want

Actually, just as I typed out the previous paragraph, I'm getting the idea to do something like this: Take a news article and an abstractivly made short "summary" of a news article, and then take the most attended to tokens in the transformation between news article and summary and use that as the summary itself. Can I use bertviz to do what I am describing via bert, and if I can't, what are my best options?

standalone web demo

Thanks for making such great visualization!

I wonder whether it is possible to render the vis for easier exploration into a normal html page rather than showing on jupyter notebook?

And suggestion or example for achieving this will be appreciated!

model_view shows black image?

Hi there,

First of all, thank you for building this package. It's exactly what I'm looking for right now! :)

I'm trying to use model_view to visualize the attention. I can use head_view and it works perfectly fine, but when I try to use model_view with everything else the same -- I just get a black rectangle. See the image below.

Any suggestions would be much appreciated.

How to use saved model for bertviz

Hi,

I am using huggingface's transformers BertForSequenceClassification to train a BERT model. Now, I want to load my saved model and use it in your head_view_bert notebook. It does not assume a local model... Can you tell me how to fix this? here's my code:

model_version = 'bert-base-cased'
do_lower_case = True
modelpath = '~/Documents/insight/projects/factCC/models/saved_models/'
model = BertForSequenceClassification.from_pretrained(modelpath, from_tf=True)

model = BertModel.from_pretrained(model_version, output_attentions=True)

tokenizer = TFBertTokenizer.from_pretrained(model_version, do_lower_case=do_lower_case)
sentence_a = "The cat sat on the mat"
sentence_b = "The cat lay on the rug"
show_head_view(model, tokenizer, sentence_a, sentence_b)

This is the was I load my model normally for test data and prediction... yet I can't load it here... here's the error I get:

OSError Traceback (most recent call last)
in
2 do_lower_case = True
3 modelpath = '~/Documents/insight/projects/factCC/models/saved_models/'
----> 4 model = BertForSequenceClassification.from_pretrained(modelpath, from_tf=True)
5
6 # model = BertModel.from_pretrained(model_version, output_attentions=True)

/usr/local/lib/python3.6/site-packages/transformers/modeling_utils.py in from_pretrained(cls, pretrained_model_name_or_path, *model_args, **kwargs)
348 resume_download=resume_download,
349 proxies=proxies,
--> 350 **kwargs
351 )
352 else:

/usr/local/lib/python3.6/site-packages/transformers/configuration_utils.py in from_pretrained(cls, pretrained_model_name_or_path, **kwargs)
171 ', '.join(cls.pretrained_config_archive_map.keys()),
172 config_file, CONFIG_NAME)
--> 173 raise EnvironmentError(msg)
174
175 except json.JSONDecodeError:

OSError: Model name '~/Documents/insight/projects/factCC/models/saved_models/' was not found in model name list (bert-base-uncased, bert-large-uncased, bert-base-cased, bert-large-cased, bert-base-multilingual-uncased, bert-base-multilingual-cased, bert-base-chinese, bert-base-german-cased, bert-large-uncased-whole-word-masking, bert-large-cased-whole-word-masking, bert-large-uncased-whole-word-masking-finetuned-squad, bert-large-cased-whole-word-masking-finetuned-squad, bert-base-cased-finetuned-mrpc, bert-base-german-dbmdz-cased, bert-base-german-dbmdz-uncased, bert-base-japanese, bert-base-japanese-whole-word-masking, bert-base-japanese-char, bert-base-japanese-char-whole-word-masking, bert-base-finnish-cased-v1, bert-base-finnish-uncased-v1). We assumed 'https://s3.amazonaws.com/models.huggingface.co/bert/~/Documents/insight/projects/factCC/models/saved_models//config.json' was a path or url to a configuration file named config.json or a directory containing such a file but couldn't find any such file at this path or url.

[MASK] token attention patterns.

Thanks for this repo and article. Also i think will be interesting visualize attention with special [MASK] token what used for BERT pre training. How it interacts with the other tokens in sequence.

Load directly huggingface/transformers models

Hi,
thank you for the nice tool!
I would like to understand why you are not loading the models directly from the transformers package.

Which part of the transformers model do you need to adapt to make it compatible with bertviz? I would like to add other huggingface models such as DistilBERT.

Did you consider a way to load the model directly from transformers?

Thank you in advance for your kind response.

How can use bertviz for Bert Questioning Answering??

Is there any way to see the attention visualization for Bert Questioning and Answering model ?? Because I couldn't see BertForQuestionAnswering class in bertviz.pytorch_transformers_attn? I have fine-tuned over a QA dataset using hugging-face transformers and wanted to see the visualization for it. Can you suggest any way of doing it ??

attention details

Hi,

I found a non-working colab notebook here while I was reading here.

I was wondering if there is any way to get the attention details including the keys and vectors.

Thanks!

I have a problem loading my own model

Hi，I have a problem loading my own model
I can successfully load my fine-tune BERT model
But it has a problem message ### TypeError: object of type 'float' has no len() when I call head_view(attention, tokens)
I don't know how to solve this bug

Thanks a lot for your help

Interpretability of a BERT model's intermediate layers

Hi,

I have a general question. If you feel like this is not relevant / don't feel like discussing this here, feel free to close this issue. Why do you associate the values of intermediate layers of the transformer with the input tokens? Is there a property of BERT/transformers that binds the representation to those specific tokens?

The way I see it, the representation of layer i + 1 is essentially a weighted sum over the input sequence. If attention chooses to give zero weight to the input token/hidden vector, the new representation will be absolutely unrelated to the initial input sequence. So what does it tell us that layer 5 is paying attention to the hidden representation of elements 4-5 in layer 4? Are they really still associated with the initial input tokens? Do you see what I'm getting at?

Does bertviz support Chinese? (bert-base-chinese)

layer and attention are empty.

I'm using colab but it doesn't work. Help.

%%javascript
require.config({
paths: {
d3: '//cdnjs.cloudflare.com/ajax/libs/d3/3.4.8/d3.min',
jquery: '//ajax.googleapis.com/ajax/libs/jquery/2.0.0/jquery.min',
}
});

def` show_head_view(model, tokenizer, sentence_a, sentence_b=None):

inputs = tokenizer.encode_plus(sentence_a, sentence_b, return_tensors='pt', 

add_special_tokens=True)

input_ids = inputs['input_ids']
if sentence_b:
    token_type_ids = inputs['token_type_ids']
    attention = model(input_ids, token_type_ids=token_type_ids)[-1]
    sentence_b_start = token_type_ids[0].tolist().index(1)
else:
    attention = model(input_ids)[-1]
    sentence_b_start = None
input_id_list = input_ids[0].tolist() # Batch index 0
tokens = tokenizer.convert_ids_to_tokens(input_id_list)    
head_view(attention, tokens, sentence_b_start)

model_version = 'bert-base-uncased'
do_lower_case = True

model = BertModel.from_pretrained(model_version, output_attentions=True)
tokenizer = BertTokenizer.from_pretrained(model_version, do_lower_case=do_lower_case)

sentence_a = "the cat sat on the mat"
sentence_b = "the cat lay on the rug"

show_head_view(model, tokenizer, sentence_a, sentence_b)

how to use BertForSequenceClassification class for tf checkpoint fine-tuned on sentence classification task(eg. GLUE task).

model_type = 'bert'
model_version_3 = './bertviz/tests/saved_model'
model_config = './bertviz/tests/saved_model/bert_config.json'
do_lower_case = False
config = BertConfig.from_json_file(model_config)
model = BertForSequenceClassification.from_pretrained(model_version_3, from_tf=True, config=config)
tokenizer = BertTokenizer.from_pretrained(model_version, do_lower_case=do_lower_case)
sentence_a = "The cat sat on the mat"
sentence_b = "The cat lay on the rug"
show(model, model_type, tokenizer, sentence_a, sentence_b)

I have tried this code, but it shows error:
AttributeError: 'BertForSequenceClassification' object has no attribute 'bias'

Also, I'm unable to pass parameter num_labels in BertForSequenceClassification.from_pretrained()
It was showing error: init() got an unexpected keyword argument 'num_label'

Help me to fix it.

Thanks.

A question for the output of the neuron_view_roberta.ipynb

Hi, I have a question for the output of the neuron_view_roberta.ipynb. I did not get the result including Q, K, and Q dot K like the example shown. Here is the result that I have got and here is the result that the example shown.
The result I got.

The result from the example shown.

Many thanks!

Unable to visualize more than once in the same notebook

Thank you for your work on this repo - the visualizations are really useful and helpful towards my research. I'm wondering if I've run into some sort of bug or limitation of the visualizations, because I'm unable to show more than one visualization in a notebook - in two different cells that is. The second cell always shows an empty output when I try to run it (See attached picture). I see the same behavior with all the views.

Update: I've also noticed that sometimes the second visualization is displayed, but then the first visualization disappears.

Multiple visualization for multiple sentences at one display

Hi,
Thanks for your great tool.
I was wondering if it is possible to display multiple visualization for multiple sentences at once in one display? If yes, I would be grateful if you guide me on how to do it.
Thank you very much in advance

How do I use this tool for my own model?

Hi, I have trained an XLM model that translates from English to Spanish. A model for this language pair is not available on huggingface's repo. Is there any way to load my saved model?

Vizualization for query-result pairs

I am very impressed from the features bertviz is offering till now. I was wondering if there is any approach for vizualizing the relation between a query and the results ranked by a BERT Model fine tuned on the passage reranking task. This fine tuned model predicts the relevance of a passage being the right "response" to a query and it would be nice to somehow vizualize how the query is connected to the passages.

Thanks in advance!

it's not compatible with latest pytorch-transformers anymore

From the official document:`

The main breaking change when migrating from pytorch-pretrained-bert to pytorch-transformers is that the models forward method always outputs a tuple with various elements depending on the model and the configuration parameters.

It seems like the structure of attentions also changed. The attention for each layer is just a tensor, not key-value pairs.

Neuron View Inconsistency

As you can see in the following example, the neuron view output is different than what you've provided in the readme of this repo. Please also check this one.

The current output:

The demanded output:

No visualization on changing sentences

If I am changing sentence then the visualization is coming. Why so?

Attention matrix is asymmetric

Hi Jesse,
I find you work very interesting, thanks a lot for putting it out there!

I was digging into the attention values being visualized in the BERT map, specifically the return value of _get_attentions(), and found that the token-to-token attention weights are not symmetrical, as I would have expected. For instance, consider:

layer, head = 11, 0
att = _get_attentions(tokens_a, tokens_b, atts)
attmx = np.array(att['a']['att'][layer][head])

Here, the matrix attmx might look like this:

array([[0.10391058, 0.09832697, 0.09166335, 0.14575878, 0.08784127],
       [0.09632228, 0.09650009, 0.09524056, 0.12355924, 0.09061429],
       [0.12465193, 0.10896012, 0.11306546, 0.11939598, 0.10786319],
       [0.09877665, 0.10982872, 0.08591022, 0.11621149, 0.1339225 ],
       [0.11143579, 0.0954979 , 0.09444219, 0.1312461 , 0.07381313]])

How should the fact that it's asymmetrical be interpreted? If we consider the [CLS] token at the output layer (layer 11 in bert-base?), would the attention it receives from the second token in the previous layer be attmx[0][1] == 0.09832697 (or attmx[1][0] == 0.09632228)? Are either of these values incidental and can be safely ignored?

Thanks in advance!
-Samuel

Some words are unable to be visualized?

It seems that some words not in the BERT vocab can be broken down into WordPiece tokens that are in the pretrained model and visualized, but others are not?

I tried this on some words from Norse mythology using the Colab notebook. "Ragnarok" can be broken down into rag ##nar ##ok but somehow "Valhalla" cannot. I tried it on sentences from tax documents and it doesn't quite work there either.

Hovering issue

Hello, thanks for the work.
Short bug report.
In safari, when you are hovering over the visualisation in google colab, the window in of the visualisation is scrolling down automatically, making it impossible to work with model_view and neuron_view. Works in chrome.

Thanks again

Visualize large BERT model

Thank you for the great work! Is it possible to make this work with the large uncased BERT model as well? Currently it loads the model but the Layer dropdown is not populated and the visualization is not shown. Simply changing the hardcoded layers from 12 to 24 in attention.js does not work.

Unable to load weights properly from tf checkpoint

The function load_tf_weights_in_bert in modeling_bert.py is buggy and throws a lot of attribute errors because of what seems as improper parsing of the variable names and the pointer pointing to the entire model.

For instance for the variable bert/encoder/layer_0/attention/output/dense/kernel it throws an attribute error along the lines of Bert model has no attribute weight because the pointer is the model bert itself whereas the pointer should be bert.encoder.layer.0.attention.output.dense.

jessevig / bertviz Goto Github PK

bertviz's Introduction

BertViz

Visualize Attention in NLP Models

Quick Tour • Getting Started • Colab Tutorial • Paper

🚀 Quick Tour

Head View

Model View

Neuron View

⚡️ Getting Started

Running BertViz in a Jupyter Notebook

Running BertViz in Colab

Sample code

Running sample notebooks

🕹 Interactive Tutorial

📖 Documentation

Table of contents

Self-attention models (BERT, GPT-2, etc.)

Head and Model Views

Neuron View

Encoder-decoder models (BART, T5, etc.)

Installing from source

Additional options

Dark / light mode

Filtering layers

Setting default layer/head(s)

Visualizing sentence pairs

Head and model views

Neuron view

Obtain HTML representations

Non-Huggingface models

⚠️ Limitations

Tool

Attention as "explanation"?

🔬 Paper

Citation

Authors

🙏 Acknowledgments

License

bertviz's People

Contributors

Stargazers

Watchers

Forkers

bertviz's Issues

model = BertModel.from_pretrained(model_version, output_attentions=True)

Recommend Projects

Recommend Topics

Recommend Org