"01_introduction.ipynb" notebook is incomplete

Information

The problem arises in chapter:

Describe the bug

The Introduction notebook is incomplete

To Reproduce

Steps to reproduce the behavior:

Open "01_introduction.ipynb" notebook
navigate to the end of the notebook
last sections are not there

Ch7: Reader evaluation on "Fine-tune on SQuAD" returns 0.0 in both EM and F1

Information

The question or comment is about chapter:

Question or comment

FARMReader was initialized and model_ckpt = "deepset/minilm-uncased-squad2"

reader = FARMReader(model_name_or_path=model_ckpt, progress_bar=False,
max_seq_len=max_seq_length, doc_stride=doc_stride, return_no_answer=True)

With evaluate_reader function defined in page 198 of the book, reader_eval["Fine-tune on SQuAD"] = evaluate_reader(reader).

The EM and F1 scores on "Fine-tune on SQuAD" that I got are both zeros. This also impacts the domain adaption results for "Fine-tune on SQuAD+SubjQA". Following the notebook till this step, and wondering what I've overlooked in evaluating the reader.

Thank you !

Chapter 2 Finetune fails on GPU

Describe the bug

Running the notebook failed with an error:

To Reproduce

Steps to reproduce the behavior:

Run the notebook on a CUDA/GPU enabled device
.train() fails with
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument index in method wrapper__index_select)

Expected behavior

training..

proposed solution (after several hours of head banging against the keyboard)

do not pass tokenizer to Trainer()
from transformers import Trainer

trainer = Trainer(model=model, args=training_args, 
                  compute_metrics=compute_metrics,
                  train_dataset=emotions_encoded["train"],
                  eval_dataset=emotions_encoded["validation"],
                  #tokenizer=tokenizer
                  )

When running the Trainer cell (DistillationTrainer), it found two devices (cuda:0 and CPU)

Information

The problem arises in chapter:

Describe the bug

When I try to execute cell 26 in 08_model-compression.ipnyb on my local machine I'm getting the following error

To Reproduce

Steps to reproduce the behavior:

RuntimeError Traceback (most recent call last)
Input In [28], in <cell line: 6>()
1 distilbert_trainer = DistillationTrainer(model_init=student_init,
2 teacher_model=teacher_model, args=student_training_args,
3 train_dataset=clinc_enc['train'], eval_dataset=clinc_enc['validation'],
4 compute_metrics=compute_metrics, tokenizer=student_tokenizer)
----> 6 distilbert_trainer.train()

File ~\anaconda3\lib\site-packages\transformers\trainer.py:1316, in Trainer.train(self, resume_from_checkpoint, trial, ignore_keys_for_eval, **kwargs)
1314 tr_loss_step = self.training_step(model, inputs)
1315 else:
-> 1316 tr_loss_step = self.training_step(model, inputs)
1318 if (
1319 args.logging_nan_inf_filter
1320 and not is_torch_tpu_available()
1321 and (torch.isnan(tr_loss_step) or torch.isinf(tr_loss_step))
1322 ):
1323 # if loss is nan or inf simply add the average of previous logged losses
1324 tr_loss += tr_loss / (1 + self.state.global_step - self._globalstep_last_logged)

File ~\anaconda3\lib\site-packages\transformers\trainer.py:1849, in Trainer.training_step(self, model, inputs)
1847 loss = self.compute_loss(model, inputs)
1848 else:
-> 1849 loss = self.compute_loss(model, inputs)
1851 if self.args.n_gpu > 1:
1852 loss = loss.mean() # mean() to average on multi-gpu parallel training

Input In [17], in DistillationTrainer.compute_loss(self, model, inputs, return_outputs)
10 def compute_loss(self, model, inputs, return_outputs=False):
---> 11 outputs_stu = model(**inputs)
12 # Extract cross-entropy loss and logits from student
13 loss_ce = outputs_stu.loss

File ~\anaconda3\lib\site-packages\torch\nn\modules\module.py:1110, in Module._call_impl(self, *input, **kwargs)
1106 # If we don't have any hooks, we want to skip the rest of the logic in
1107 # this function, and just call forward.
1108 if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
1109 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1110 return forward_call(*input, **kwargs)
1111 # Do not call functions when jit is used
1112 full_backward_hooks, non_full_backward_hooks = [], []

File ~\anaconda3\lib\site-packages\transformers\models\distilbert\modeling_distilbert.py:729, in DistilBertForSequenceClassification.forward(self, input_ids, attention_mask, head_mask, inputs_embeds, labels, output_attentions, output_hidden_states, return_dict)
721 r"""
722 labels (:obj:torch.LongTensor of shape :obj:(batch_size,), optional):
723 Labels for computing the sequence classification/regression loss. Indices should be in :obj:[0, ..., 724 config.num_labels - 1]. If :obj:config.num_labels == 1 a regression loss is computed (Mean-Square loss),
725 If :obj:config.num_labels > 1 a classification loss is computed (Cross-Entropy).
726 """
727 return_dict = return_dict if return_dict is not None else self.config.use_return_dict
--> 729 distilbert_output = self.distilbert(
730 input_ids=input_ids,
731 attention_mask=attention_mask,
732 head_mask=head_mask,
733 inputs_embeds=inputs_embeds,
734 output_attentions=output_attentions,
735 output_hidden_states=output_hidden_states,
736 return_dict=return_dict,
737 )
738 hidden_state = distilbert_output[0] # (bs, seq_len, dim)
739 pooled_output = hidden_state[:, 0] # (bs, dim)

File ~\anaconda3\lib\site-packages\torch\nn\modules\module.py:1110, in Module._call_impl(self, *input, **kwargs)
1106 # If we don't have any hooks, we want to skip the rest of the logic in
1107 # this function, and just call forward.
1108 if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
1109 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1110 return forward_call(*input, **kwargs)
1111 # Do not call functions when jit is used
1112 full_backward_hooks, non_full_backward_hooks = [], []

File ~\anaconda3\lib\site-packages\transformers\models\distilbert\modeling_distilbert.py:550, in DistilBertModel.forward(self, input_ids, attention_mask, head_mask, inputs_embeds, output_attentions, output_hidden_states, return_dict)
547 head_mask = self.get_head_mask(head_mask, self.config.num_hidden_layers)
549 if inputs_embeds is None:
--> 550 inputs_embeds = self.embeddings(input_ids) # (bs, seq_length, dim)
551 return self.transformer(
552 x=inputs_embeds,
553 attn_mask=attention_mask,
(...)
557 return_dict=return_dict,
558 )

File ~\anaconda3\lib\site-packages\torch\nn\modules\module.py:1110, in Module._call_impl(self, *input, **kwargs)
1106 # If we don't have any hooks, we want to skip the rest of the logic in
1107 # this function, and just call forward.
1108 if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
1109 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1110 return forward_call(*input, **kwargs)
1111 # Do not call functions when jit is used
1112 full_backward_hooks, non_full_backward_hooks = [], []

File ~\anaconda3\lib\site-packages\transformers\models\distilbert\modeling_distilbert.py:130, in Embeddings.forward(self, input_ids)
127 position_ids = torch.arange(seq_length, dtype=torch.long, device=input_ids.device) # (max_seq_length)
128 position_ids = position_ids.unsqueeze(0).expand_as(input_ids) # (bs, max_seq_length)
--> 130 word_embeddings = self.word_embeddings(input_ids) # (bs, max_seq_length, dim)
131 position_embeddings = self.position_embeddings(position_ids) # (bs, max_seq_length, dim)
133 embeddings = word_embeddings + position_embeddings # (bs, max_seq_length, dim)

File ~\anaconda3\lib\site-packages\torch\nn\modules\module.py:1110, in Module._call_impl(self, *input, **kwargs)
1106 # If we don't have any hooks, we want to skip the rest of the logic in
1107 # this function, and just call forward.
1108 if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
1109 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1110 return forward_call(*input, **kwargs)
1111 # Do not call functions when jit is used
1112 full_backward_hooks, non_full_backward_hooks = [], []

File ~\anaconda3\lib\site-packages\torch\nn\modules\sparse.py:158, in Embedding.forward(self, input)
157 def forward(self, input: Tensor) -> Tensor:
--> 158 return F.embedding(
159 input, self.weight, self.padding_idx, self.max_norm,
160 self.norm_type, self.scale_grad_by_freq, self.sparse)

File ~\anaconda3\lib\site-packages\torch\nn\functional.py:2183, in embedding(input, weight, padding_idx, max_norm, norm_type, scale_grad_by_freq, sparse)
2177 # Note [embedding_renorm set_grad_enabled]
2178 # XXX: equivalent to
2179 # with torch.no_grad():
2180 # torch.embedding_renorm_
2181 # remove once script supports set_grad_enabled
2182 no_grad_embedding_renorm(weight, input, max_norm, norm_type)
-> 2183 return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)

RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument index in method wrapper__index_select)

Expected behavior

When I try to set device as "CPU" at the start of notebook even then I'm getting this issue.

display_df function referenced but not defined

Describe the bug
There is a reference to a function display_df(df.T, header=None) but the dsiplay_df function is not defined in the notebook.

It needs to perform something like this (or a better-written version of this):

def display_df(df, header=True):
if header:
return df
else:
df.columns = ["" for col in df.columns]
return df

Function tag_text in NER notebook fails to run

Information

The problem arises in chapter:

Describe the bug

The function tag_text has a bug where a variable is defined as input_ids and then used as inputs

To Reproduce

Steps to reproduce the behavior:

Run the notebook through

On last cell before "Error Analysis", with code

text_de = "Jeff Dean ist ein Informatiker bei Google in Kalifornien"
tag_text(text_de, tags, trainer.model, xlmr_tokenizer)

Produces error:

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
/tmp/ipykernel_33/24045397.py in <module>
      1 # hide_output
      2 text_de = "Jeff Dean ist ein Informatiker bei Google in Kalifornien"
----> 3 tag_text(text_de, tags, trainer.model, xlmr_tokenizer)

/tmp/ipykernel_33/469722974.py in tag_text(text, tags, model, tokenizer)
      5     input_ids = xlmr_tokenizer(text, return_tensors="pt").input_ids.to(device)
      6     # Get predictions as distribution over 7 possible classes
----> 7     outputs = model(inputs)[0]
      8     # Take argmax to get most likely class per token
      9     predictions = torch.argmax(outputs, dim=2)

NameError: name 'inputs' is not defined

Expected behavior

Expect the notebook to run without error.

Proposed fix

Replace inputs by input_ids in the definition of tag text

Current

def tag_text(text, tags, model, tokenizer):
    # Get tokens with special characters
    tokens = tokenizer(text).tokens()
    # Encode the sequence into IDs
    input_ids = xlmr_tokenizer(text, return_tensors="pt").input_ids.to(device)
    # Get predictions as distribution over 7 possible classes
    outputs = model(inputs)[0]
    # Take argmax to get most likely class per token
    predictions = torch.argmax(outputs, dim=2)
    # Convert to DataFrame
    preds = [tags.names[p] for p in predictions[0].cpu().numpy()]
    return pd.DataFrame([tokens, preds], index=["Tokens", "Tags"])

Proposed

def tag_text(text, tags, model, tokenizer):
    # Get tokens with special characters
    tokens = tokenizer(text).tokens()
    # Encode the sequence into IDs
    input_ids = xlmr_tokenizer(text, return_tensors="pt").input_ids.to(device)
    # Get predictions as distribution over 7 possible classes
    outputs = model(input_ids)[0]                                              # <-- Change here
    # Take argmax to get most likely class per token
    predictions = torch.argmax(outputs, dim=2)
    # Convert to DataFrame
    preds = [tags.names[p] for p in predictions[0].cpu().numpy()]
    return pd.DataFrame([tokens, preds], index=["Tokens", "Tags"])

Self Attention

Information

The question or comment is about chapter:

Question or comment

Hi

I would like to say first thanks for writing this amazing book. And then ask a question about the attention mechanism in Transformers (referring to page 61). I am trying to compare the meaning and mechanism of what is named as Self Attention in Transformers with what I previously knew as Self attention from this paper:https://aclanthology.org/N16-1174.pdf and local and general attention from the following : https://arxiv.org/pdf/1508.04025.pdf
What it has been used in these papers was HAN model with Self,local or Global attention on top of RNN , GRU, LSTM or CNN layers. As Transformers are new architecture , I am wondering if the mathematics behind the attention is same as these 2 papers or not?

Please forgive me if the question seems very basic for you.

Regards
Shabnam

Host like fastbook

Hi, how to host the notebooks like fastbook.

Potentially unexplained bit of code in the Text Classification chapter

Information

The question or comment is about chapter:

Question or comment

In the declaration of the method extract_hidden_states in section Transformers as Feature Extractors subsection Extracting the last hidden states, the inclusion of the line if k in tokenizer.model_input_names is not explained in the text. This unexplained line is also a deviation from the previous code demonstrating the behavior of extracting the hidden state for a single line of text. So it tripped me up, personally.

When running the Trainer cell, it found two devices (cuda:0 and CPU)

Information

The question or comment is about chapter:

Question or comment

I got the RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument index in method wrapper__index_select)

Since the model file is already on GPU, I assume that the emotion Dataset is still on CPU, which creates the error. I'm confused on where should I add ".to(device)" to get Dataset on GPU?

The detail error is shown below,

RuntimeError Traceback (most recent call last)
Input In [63], in <cell line: 8>()
1 from transformers import Trainer
3 trainer = Trainer(model=model, args=training_args,
4 compute_metrics=compute_metrics,
5 train_dataset=emotions_encoded["train"],
6 eval_dataset=emotions_encoded["validation"],
7 tokenizer=tokenizer)
----> 8 trainer.train()

File ~\anaconda3\envs\book\lib\site-packages\transformers\trainer.py:1316, in Trainer.train(self, resume_from_checkpoint, trial, ignore_keys_for_eval, **kwargs)
1314 tr_loss_step = self.training_step(model, inputs)
1315 else:
-> 1316 tr_loss_step = self.training_step(model, inputs)
1318 if (
1319 args.logging_nan_inf_filter
1320 and not is_torch_tpu_available()
1321 and (torch.isnan(tr_loss_step) or torch.isinf(tr_loss_step))
1322 ):
1323 # if loss is nan or inf simply add the average of previous logged losses
1324 tr_loss += tr_loss / (1 + self.state.global_step - self._globalstep_last_logged)

File ~\anaconda3\envs\book\lib\site-packages\transformers\trainer.py:1849, in Trainer.training_step(self, model, inputs)
1847 loss = self.compute_loss(model, inputs)
1848 else:
-> 1849 loss = self.compute_loss(model, inputs)
1851 if self.args.n_gpu > 1:
1852 loss = loss.mean() # mean() to average on multi-gpu parallel training

File ~\anaconda3\envs\book\lib\site-packages\transformers\trainer.py:1881, in Trainer.compute_loss(self, model, inputs, return_outputs)
1879 else:
1880 labels = None
-> 1881 outputs = model(**inputs)
1882 # Save past state if it exists
1883 # TODO: this needs to be fixed and made cleaner later.
1884 if self.args.past_index >= 0:

File ~\anaconda3\envs\book\lib\site-packages\torch\nn\modules\module.py:1110, in Module._call_impl(self, *input, **kwargs)
1106 # If we don't have any hooks, we want to skip the rest of the logic in
1107 # this function, and just call forward.
1108 if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
1109 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1110 return forward_call(*input, **kwargs)
1111 # Do not call functions when jit is used
1112 full_backward_hooks, non_full_backward_hooks = [], []

File ~\anaconda3\envs\book\lib\site-packages\transformers\models\distilbert\modeling_distilbert.py:729, in DistilBertForSequenceClassification.forward(self, input_ids, attention_mask, head_mask, inputs_embeds, labels, output_attentions, output_hidden_states, return_dict)
721 r"""
722 labels (:obj:torch.LongTensor of shape :obj:(batch_size,), optional):
723 Labels for computing the sequence classification/regression loss. Indices should be in :obj:[0, ..., 724 config.num_labels - 1]. If :obj:config.num_labels == 1 a regression loss is computed (Mean-Square loss),
725 If :obj:config.num_labels > 1 a classification loss is computed (Cross-Entropy).
726 """
727 return_dict = return_dict if return_dict is not None else self.config.use_return_dict
--> 729 distilbert_output = self.distilbert(
730 input_ids=input_ids,
731 attention_mask=attention_mask,
732 head_mask=head_mask,
733 inputs_embeds=inputs_embeds,
734 output_attentions=output_attentions,
735 output_hidden_states=output_hidden_states,
736 return_dict=return_dict,
737 )
738 hidden_state = distilbert_output[0] # (bs, seq_len, dim)
739 pooled_output = hidden_state[:, 0] # (bs, dim)

File ~\anaconda3\envs\book\lib\site-packages\torch\nn\modules\module.py:1110, in Module._call_impl(self, *input, **kwargs)
1106 # If we don't have any hooks, we want to skip the rest of the logic in
1107 # this function, and just call forward.
1108 if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
1109 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1110 return forward_call(*input, **kwargs)
1111 # Do not call functions when jit is used
1112 full_backward_hooks, non_full_backward_hooks = [], []

File ~\anaconda3\envs\book\lib\site-packages\transformers\models\distilbert\modeling_distilbert.py:550, in DistilBertModel.forward(self, input_ids, attention_mask, head_mask, inputs_embeds, output_attentions, output_hidden_states, return_dict)
547 head_mask = self.get_head_mask(head_mask, self.config.num_hidden_layers)
549 if inputs_embeds is None:
--> 550 inputs_embeds = self.embeddings(input_ids) # (bs, seq_length, dim)
551 return self.transformer(
552 x=inputs_embeds,
553 attn_mask=attention_mask,
(...)
557 return_dict=return_dict,
558 )

File ~\anaconda3\envs\book\lib\site-packages\torch\nn\modules\module.py:1110, in Module._call_impl(self, *input, **kwargs)
1106 # If we don't have any hooks, we want to skip the rest of the logic in
1107 # this function, and just call forward.
1108 if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
1109 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1110 return forward_call(*input, **kwargs)
1111 # Do not call functions when jit is used
1112 full_backward_hooks, non_full_backward_hooks = [], []

File ~\anaconda3\envs\book\lib\site-packages\transformers\models\distilbert\modeling_distilbert.py:130, in Embeddings.forward(self, input_ids)
127 position_ids = torch.arange(seq_length, dtype=torch.long, device=input_ids.device) # (max_seq_length)
128 position_ids = position_ids.unsqueeze(0).expand_as(input_ids) # (bs, max_seq_length)
--> 130 word_embeddings = self.word_embeddings(input_ids) # (bs, max_seq_length, dim)
131 position_embeddings = self.position_embeddings(position_ids) # (bs, max_seq_length, dim)
133 embeddings = word_embeddings + position_embeddings # (bs, max_seq_length, dim)

File ~\anaconda3\envs\book\lib\site-packages\torch\nn\modules\module.py:1110, in Module._call_impl(self, *input, **kwargs)
1106 # If we don't have any hooks, we want to skip the rest of the logic in
1107 # this function, and just call forward.
1108 if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
1109 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1110 return forward_call(*input, **kwargs)
1111 # Do not call functions when jit is used
1112 full_backward_hooks, non_full_backward_hooks = [], []

File ~\anaconda3\envs\book\lib\site-packages\torch\nn\modules\sparse.py:158, in Embedding.forward(self, input)
157 def forward(self, input: Tensor) -> Tensor:
--> 158 return F.embedding(
159 input, self.weight, self.padding_idx, self.max_norm,
160 self.norm_type, self.scale_grad_by_freq, self.sparse)

File ~\anaconda3\envs\book\lib\site-packages\torch\nn\functional.py:2183, in embedding(input, weight, padding_idx, max_norm, norm_type, scale_grad_by_freq, sparse)
2177 # Note [embedding_renorm set_grad_enabled]
2178 # XXX: equivalent to
2179 # with torch.no_grad():
2180 # torch.embedding_renorm_
2181 # remove once script supports set_grad_enabled
2182 no_grad_embedding_renorm(weight, input, max_norm, norm_type)
-> 2183 return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)

RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument index in method wrapper__index_select)

Chapter 4 - Confusion matrix displays wrong numbers and labels

Information

The problem arises in chapter:

Describe the bug

I found two issues related to the code that produces the confusion matrix:

first, in the plot_confusion_matrix function definition the input order is (y_preds, y_true, labels). However, when the function is called the input order is (df_tokens["labels"], df_tokens["predicted_label"], tags.names). In other words, predicted and ground truth labels are passed in the wrong order.
second, there is a mismatch between the numbers and the labels displayed in the confusion matrix. The confusion matrix is created using strings (i.e. ['O', 'B-PER', 'I-PER', 'B-ORG', 'I-ORG', 'B-LOC', 'I-LOC']) rather than IDs from 0 to 6. confusion_matrix function re-orders list of string in alphabetical order (['B-LOC', 'B-ORG', 'B-PER', 'I-LOC', 'I-ORG', 'I-PER', 'O']), however when the function is called the tag names follow the order defined by the IDs.

To Reproduce

It's possible to verify the issue by checking individual numbers in df_tokens:

df_one_label = df_tokens[df_tokens['labels'] == 'I-LOC']

(df_one_label['labels'] == df_one_label['predicted_label']).sum() / len(df_one_label)

With the current code the confusion matrix shows that I-LOC is predicted properly 99% of the time. However, by checking the underlying numbers the correct accuracy is around 85%. The label predicted with an accuracy of 99% is O.

The `to_tf_dataset` is missing `collate_fn` argument.

Information

The problem arises in chapter:

Describe the bug

The to_tf_dataset throws an error TypeError: to_tf_dataset() missing 1 required positional argument: 'collate_fn'. This could be a version issue.

cc @lewtun @cakiki

Chapter 6 notebook cannot be run even with datasets 2.0.0 due to datasets load_dataset error may be related to Google Virus scan

Information

The problem arises in chapter:

Describe the bug

datasets module cannot load the cnn_dailymail dataset correctly due to bug datasets bug #3787.
The bug should have been fixed in datasets v 2.0.0 release but the notebook env is using v 1.16.1.
After upgrading the datasets lib to v2.0.0. the bug persists.

To Reproduce

Steps to reproduce the behavior:

setup_chapter() to check that the datasets version is 1.16.1
upgrade the datasets version to 2.0.0
dataset = load_dataset("cnn_dailymail", version="3.0.0"°
FileNotFoundError: [WinError 3]

Expected behavior

The dataset should be loaded flawlessly as the bug #3787 has been said closed and the problem solved.

Impossible to handle impossible answer due to array size issue during scalar conversion

Information

The problem arises in chapter:

Describe the bug

The pipeline cannot handle anymore the lack of correct answer, permitted when the parameter "handle_impossible_anwser" is set to True.

To Reproduce

Steps to reproduce the behavior:

pipe(question="Why is there no data?", context=context, handle_impossible_answer=True)

In transformers\pipelines\question_answering.py
line 409 : min_null_score = min(min_null_score, (start_[0] * end_[0]).item()) causes ValueError
ValueError: can only convert an array of size 1 to a Python scalar

Expected behavior

Should return {'score': 0.9068416357040405, 'start': 0, 'end': 0, 'answer': ''}

Chapter 5: Running out of memory

Information

The question or comment is about chapter:

Running out of memory

After loading gpt2-xl I get this message prompting me to upgrade to Colab Pro:

Your session crashed after using all available RAM. If you are interested in access to high-RAM runtimes, you may want to check out [Colab Pro]

Misc minor issues through QA chapter

Information

The problem arises in chapter:

Describe the bug

These are misc minor bugs related to haystack >=1.0.0 since the book mentions the repo is updated. Edit: actually you mention 0.10.0, not 1.0.0, so most likely these are not issues after all.

In pipeline, you should use top_k as topk is deprecated

pipe(question=question, context=context, top_k=3)

ExtractiveQAPipeline is imported through pipelines instead of pipeline in newer haystack version

from haystack.pipelines import ExtractiveQAPipeline

There is some invalid Haystack pipeline inference

preds = pipe.run(query=query, top_k_retriever=3, top_k_reader=n_answers,
                 filters={"item_id": [item_id], "split":["train"]})

Should be

preds = pipe.run(
    query=query,
    params={"Retriever": {"top_k": 3}, 
            "Reader": {"top_k": n_answers},
            "filters": {"item_id": [item_id], "split":["train"]}}
)

`'Answer' object is not subscriptable

This causes issues

print(f"Answer {idx+1}: {preds['answers'][idx]['answer']}")
print(f"Review snippet: ...{preds['answers'][idx]['context']}...")

It should be

print(f"Answer {idx+1}: {preds['answers'][idx].answer}")
print(f"Review snippet: ...{preds['answers'][idx].context}...")

Label has different param expectations, such as answer being an Answer instance, question being renamed to query, and the document to be passed

Why using [CLS] in chapter 2 to represent the whole tweet ?

Information

The question or comment is about chapter:

Question or comment

I just would like to know why we use just the [CLS] to represent each tweet ? and why we use the last token is it make sense of embedding the representation of the tweet ?

ResolvePackageNotFound: - libsndfile

Describe the bug

When I run the command "conda env create -f environment.yml", it shows:

Solving environment: failed

ResolvePackageNotFound:
  - libsndfile

To Reproduce

Steps to reproduce the behavior:

Download ZIP
unzip notebooks-main.zip
cd notebooks-main
conda env create -f environment.yml

Chapter 5 - Text Generation | Beam Search Decoding - Log Probabilities

Information

The question or comment is about chapter:

Question or comment

In section Beam Search Decoding of chapter 5, at page 132, the Authors include the following function for calculating a sequence log-probability:

def sequence_logprob(model, labels, input_len=0):
    with torch.no_grad():
        output = model(labels)
        log_probs = log_probs_from_logits(
            output.logits[:, :-1, :], labels[:, 1:])
        seq_log_prob = torch.sum(log_probs[:, input_len:])
    return seq_log_prob.cpu().numpy()

Where labels correspond to output_greedy, calculated as:

max_length = 128
input_txt = """In a shocking finding, scientist discovered \
a herd of unicorns living in a remote, previously unexplored \
valley, in the Andes Mountains. Even more surprising to the \
researchers was the fact that the unicorns spoke perfect English.\n\n
"""
input_ids = tokenizer(input_txt, return_tensors="pt")["input_ids"].to(device)
output_greedy = model.generate(input_ids, max_length=max_length, 
                               do_sample=False)
print(tokenizer.decode(output_greedy.squeeze()))

While it is clear the alignment between logits and labels (i.e., labels are shifted by 1), it is not clear to me why - when calculating the sequence log probability - we slice the log_probs tensor using input_len instead of input_len - 1 (see below)

seq_log_prob = torch.sum(log_probs[:, input_len:])

Let me walk you through the example in the book in details:

input_ids is a 47-long sequence of tokens
output_greedy is a 128-long sequence of tokens (the original 47 input tokens + 81 model-generated tokens; max_length was set to 128)

When doing a forward pass using the model, i.e., outputs = model(output_greedy) the output (what is then passed as labels in sequence_logprob function) will include logits, whose dimension-1 is 128 (our max_length). We know that logits at index 0 actually refer to what would be the second token in our output sequence. Python 0-indexing is confusing here, but we can say that:

logit at index 0 in outputs.logits corresponds to the logits of the second word in our sequence

In other words, logit index >> (index + 2)th word in the sequence (where sequence is 1-indexed).
Following the same reasoning, we know that the first truly model-generated token (i.e., a token not present in the initial prompt) is the 48th word in the sequence, i.e., the (48 - 2)th logit in outputs.logits. We know that the model-generated text starts with The researchers, from the University of California and we can verify it by running:

tokenizer.decode(torch.argmax(output.logits[0,46:52], dim=-1))

In other words, the delta between logits indices and word position in output sequence is 2 because of:

Python indexing (0-based) vs human indexing (1-based)
Logits are "shifted", i.e., they refer to the next word vs current position

As a consequence, the sequence_logprob function should be changed as follows:

def sequence_logprob(model, labels, input_len=0):
    with torch.no_grad():
        output = model(labels)
        log_probs = log_probs_from_logits(
            output.logits[:, :-1, :], labels[:, 1:])
        # CHANGE HERE
        seq_log_prob = torch.sum(log_probs[:, input_len-1:])
    return seq_log_prob.cpu().numpy()

Am I missing something?

Ch 4: results not reproducible for predictions for text example

Information

The problem arises in chapter:

Describe the bug

The NER predictions (in terms of tags) on the Jack Sparrow example are odd and don't match those from the book. They also change each time you re-run all the code above. The tokenization looks fine so the issue seems to be coming from the model itself.
Bug encountered locally and in Google colab.

To Reproduce

Steps to reproduce the behavior:

Open 04_multilingual-ner.ipynb in Colab
Run all the cells up to #115 (included)
Check the output of cell 115

Expected behavior

I was expecting to get relevant tags as found in the book for cell 115 ie. [O I-LOC B-LOC B-LOC O I-LOC O O I-LOC B-LOC].

Ch02: Finetuning code cannot run even git-lfs is installed

Information

The problem arises in chapter:

Describe the bug

from transformers import Trainer

trainer = Trainer(model=model, args=training_args, 
                  compute_metrics=compute_metrics,
                  train_dataset=emotions_encoded["train"],
                  eval_dataset=emotions_encoded["validation"],
                  tokenizer=tokenizer)
trainer.train();

When run the code in Jupyer Notebook:

...
OSError: Looks like you do not have git-lfs installed, please install. You can install from [https://git-lfs.github.com/.](https://git-lfs.github.com/) Then run `git lfs install` (you only have to do this once).

I wonder if some other settings are necessary. Thank you.

CUDA error when substituting with Pegasus-xsum

Information

The question or comment is about chapter:

CUDA error when substituting with Pegasus-xsum

RuntimeError: CUDA error: CUBLAS_STATUS_NOT_INITIALIZED when calling cublasCreate(handle)

To recreate:

device = "cuda" if torch.cuda.is_available() else "cpu"

def chunks(list_of_elements, batch_size):
    """
    Yield successive batch-sized chunks form list_of_elements.
    """
    for i in range(0, len(list_of_elements), batch_size):
        yield list_of_elements[i : i + batch_size]

def evaluate_summaries_pegasus(dataset, metric, model, tokenizer,
                               batch_size=16, device=device,
                               column_text="text",
                               column_summary="title"):
    abstract_batches = list(chunks(dataset[column_text], batch_size))
    target_batches = list(chunks(dataset[column_summary], batch_size))

    for abstract_batch, target_batch in tqdm(
        zip(abstract_batches, target_batches), total=len(abstract_batches)):

        inputs = tokenizer(abstract_batch, max_length=1024, truncation=True,
                           padding="max_length", return_tensors="pt")

        summaries = model.generate(input_ids=inputs["input_ids"].to(device),
                                   attention_mask=inputs["attention_mask"].to(device),
                                   length_penalty=0.8, num_beams=8, max_length=128)
        
        decoded_summaries = [tokenizer.decode(s, skip_special_tokens=True,
                                              clean_up_tokenization_spaces=True) 
                            for s in summaries]
        decoded_summaries = [d.replace("<n>", " ") for d in decoded_summaries]
        metric.add_batch(predictions=decoded_summaries,
                         references=target_batch)
        
    score = metric.compute()
    return score


tokenizer = AutoTokenizer.from_pretrained("google/pegasus-xsum")
pegasus_xsum_model = AutoModelForSeq2SeqLM.from_pretrained("google/pegasus-xsum").to(device)
score_pegasus_xsum = evaluate_summaries_pegasus(test_sampled, rouge_metric,
                                            pegasus_xsum_model, tokenizer, batch_size=4)
rouge_dict_pegasus_xsum = dict((rn, score_pegasus_xsum[rn].mid.fmeasure) for rn in rouge_names)
pegasus_xsum_sample = pd.DataFrame(rouge_dict_pegasus_xsum, index=["pegasus_xsum_sample"])

Ch10 cannot create repositories

Information

The problem arises in chapter:

Describe the bug

In the section **Adding Datasets to the HuggingFace Hub**, after I logged in the HuggingFace account, I could not create repositories using this line of code:

huggingface-cli repo create --type dataset --organization transformersbook codeparrot-train
Instead, I got this error:
403 Client Error: Forbidden for url: https://huggingface.co/api/repos/create (Request ID: hsKOgZ5w0izNyONWMl-wl) - You don't have the rights to create a dataset under this namespace {"error":"You don't have the rights to create a dataset under this namespace"}

Wondering if anyone else has encountered this issue.

where can we download the PDF of the book

where can we download the PDF of the book， thanks

Mini-bug in `tag_text` utility function in Chapter 4

Information

The problem arises in chapter:

Describe the bug

The utility function tag_text references xlmr_tokenizer instead of tokeniser (code does not break as xlmr_tokenizer is a global variable in the notebook, but I believe it is not what the Authors had in mind).

Permalink >>> here

Screenshot

To Reproduce

Code should be:

def tag_text(text, tags, model, tokenizer):
    # Get tokens with special characters
    tokens = tokenizer(text).tokens()
    
    ### CHANGE HERE ###
    # Encode the sequence into IDs
    input_ids = tokenizer(text, return_tensors="pt").input_ids.to(device)
    ### CHANGE ENDS ###
    
    # Get predictions as distribution over 7 possible classes
    outputs = model(input_ids)[0]
    # Take argmax to get most likely class per token
    predictions = torch.argmax(outputs, dim=2)
    # Convert to DataFrame
    preds = [tags.names[p] for p in predictions[0].cpu().numpy()]
    return pd.DataFrame([tokens, preds], index=["Tokens", "Tags"])

Expected behavior

In the notebook context, behaviour would not change. However, if not corrected, the tag_text would throw an error if no xlmr_tokenizer variable exists.

Chap 8 notebook: ONNX TypeError: export() got an unexpected keyword argument 'use_external_data_format'

Information

The problem arises in chapter:

Describe the bug

TypeError: export() got an unexpected keyword argument 'use_external_data_format'
In the notebook from this repository, the same problem has generated a warning not a TypeError.

To Reproduce

Steps to reproduce the behavior:

model_ckpt = "transformersbook/distilbert-base-uncased-distilled-clinc"
onnx_model_path = Path("onnx/model.onnx")
convert( ...

Expected behavior

GPU not detected

Information

The problem arises in chapter:

Describe the bug

After creating the conda env using environment.yml, the local GPU cannot be detected as shown below.
No GPU was detected! This notebook can be very slow without a GPU 🐢
Using transformers v4.11.3
Using datasets v1.16.1

$ python
Python 3.9.7 (default, Sep 16 2021, 13:09:58)
[GCC 7.5.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.

import torch
torch.cuda.is_available()
False

To Reproduce

Steps to reproduce the behavior:

conda env create -f environment.yml
conda activate book
jupyter notebook 01_introduction.ipynb

Expected behavior

torch.cuda.is_available() should be true.

Ch 02: RuntimeError: CUDA out of memory with P100

Information

The problem arises in chapter:

Describe the bug

When running 02_classification.ipynb on Kaggle with a P100 GPU I receive a RuntimeError: CUDA out of memory after running cell 58:

#hide_output
emotions_hidden = emotions_encoded.map(extract_hidden_states, batched=True)

To Reproduce

Steps to reproduce the behavior:

Run 02_classification.ipynb on Kaggle with GPU usage selected.

Stack trace (partially):

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
/tmp/ipykernel_34/3668832236.py in <module>
      1 #hide_output
----> 2 emotions_hidden = emotions_encoded.map(extract_hidden_states, batched=True)

/opt/conda/lib/python3.7/site-packages/datasets/dataset_dict.py in map(self, function, with_indices, input_columns, batched, batch_size, remove_columns, keep_in_memory, load_from_cache_file, cache_file_names, writer_batch_size, features, disable_nullable, fn_kwargs, num_proc, desc)
    502                     desc=desc,
    503                 )
--> 504                 for k, dataset in self.items()
    505             }
    506         )

...
...
...

/opt/conda/lib/python3.7/site-packages/torch/nn/modules/sparse.py in forward(self, input)
    158         return F.embedding(
    159             input, self.weight, self.padding_idx, self.max_norm,
--> 160             self.norm_type, self.scale_grad_by_freq, self.sparse)
    161 
    162     def extra_repr(self) -> str:

/opt/conda/lib/python3.7/site-packages/torch/nn/functional.py in embedding(input, weight, padding_idx, max_norm, norm_type, scale_grad_by_freq, sparse)
   2041         # remove once script supports set_grad_enabled
   2042         _no_grad_embedding_renorm_(weight, input, max_norm, norm_type)
-> 2043     return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
   2044 
   2045 

RuntimeError: CUDA out of memory. Tried to allocate 256.00 MiB (GPU 0; 15.90 GiB total capacity; 512.05 MiB already allocated; 167.75 MiB free; 530.00 MiB reserved in total by PyTorch)

Complete stack trace:
Stacktrace_RuntimeError_ch2_NLP_Transformers.txt

GPU (nvidia-smi):

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Tue Mar 15 13:44:03 2022       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 450.119.04   Driver Version: 450.119.04   CUDA Version: 11.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla P100-PCIE...  Off  | 00000000:00:04.0 Off |                    0 |
| N/A   41C    P0    35W / 250W |  16113MiB / 16280MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
+-----------------------------------------------------------------------------+

Expected behavior

As metioned in the REAMDE.md I would have expected the P100 with its 16 GB to have enough gpu memory for the code being run without issues. I also tried to free up some cache with torch.cuda.empty_cache() but it did not suffice.

Ch02: Error when finetuning in Colab

Information

The problem arises in chapter:

Describe the bug

When fine-tuning the model in Google Colab it throws the following error:

RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument index in method wrapper__index_select)

Note that the notebook runs successfully in Kaggle.

To Reproduce

Steps to reproduce the behavior:

Open Notebook in Colab
Run all cells

RuntimeError                              Traceback (most recent call last)

<ipython-input-66-55916e0ed5b3> in <module>()
      6                   eval_dataset=emotions_encoded["validation"],
      7                   tokenizer=tokenizer)
----> 8 trainer.train();

11 frames

/usr/local/lib/python3.7/dist-packages/torch/nn/functional.py in embedding(input, weight, padding_idx, max_norm, norm_type, scale_grad_by_freq, sparse)
   2181         # remove once script supports set_grad_enabled
   2182         _no_grad_embedding_renorm_(weight, input, max_norm, norm_type)
-> 2183     return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
   2184 
   2185 

RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument index in method wrapper__index_select)

Expected behavior

The finetuning completes successfully.

Chap 09: unknown variables

Information

The problem arises in chapter:

Describe the bug

original_input_ids and masked_input_ids not defined.
In fact they correspond respectively to: inputs["input_ids"][0] and outputs["input_ids"][0]

To Reproduce

Steps to reproduce the behavior:

pd.DataFrame({
"Original tokens": tokenizer.convert_ids_to_tokens(inputs["input_ids"][0]),
"Masked tokens": tokenizer.convert_ids_to_tokens(outputs["input_ids"][0]),
"Original input_ids": original_input_ids,
"Masked input_ids": masked_input_ids,
"Labels": outputs["labels"][0]}).T

Fix

pd.DataFrame({
"Original tokens": tokenizer.convert_ids_to_tokens(inputs["input_ids"][0]),
"Masked tokens": tokenizer.convert_ids_to_tokens(outputs["input_ids"][0]),
"Original input_ids": inputs["input_ids"][0],
"Masked input_ids": outputs["input_ids"][0],
"Labels": outputs["labels"][0]}
).T

training_args in Chapter 2 cannot be initialized on Mac M1 due to AttributeError: module 'torch.distributed' has no attribute 'is_initialized'

Information

The problem arises in chapter:

Describe the bug

This error appears when trying to initalize training_args

AttributeError Traceback (most recent call last)
Input In [69], in <cell line: 6>()
4 logging_steps = len(emotions_encoded["train"]) // batch_size
5 model_name = f"{model_ckpt}-finetuned-emotion"
----> 6 training_args = TrainingArguments(output_dir=model_name,
7 num_train_epochs=2,
8 learning_rate=2e-5,
9 per_device_train_batch_size=batch_size,
10 per_device_eval_batch_size=batch_size,
11 weight_decay=0.01,
12 evaluation_strategy="epoch",
13 disable_tqdm=False,
14 logging_steps=logging_steps,
15 push_to_hub=False,
16 log_level="error")

File :91, in init(self, output_dir, overwrite_output_dir, do_train, do_eval, do_predict, evaluation_strategy, prediction_loss_only, per_device_train_batch_size, per_device_eval_batch_size, per_gpu_train_batch_size, per_gpu_eval_batch_size, gradient_accumulation_steps, eval_accumulation_steps, eval_delay, learning_rate, weight_decay, adam_beta1, adam_beta2, adam_epsilon, max_grad_norm, num_train_epochs, max_steps, lr_scheduler_type, warmup_ratio, warmup_steps, log_level, log_level_replica, log_on_each_node, logging_dir, logging_strategy, logging_first_step, logging_steps, logging_nan_inf_filter, save_strategy, save_steps, save_total_limit, save_on_each_node, no_cuda, seed, data_seed, bf16, fp16, fp16_opt_level, half_precision_backend, bf16_full_eval, fp16_full_eval, tf32, local_rank, xpu_backend, tpu_num_cores, tpu_metrics_debug, debug, dataloader_drop_last, eval_steps, dataloader_num_workers, past_index, run_name, disable_tqdm, remove_unused_columns, label_names, load_best_model_at_end, metric_for_best_model, greater_is_better, ignore_data_skip, sharded_ddp, deepspeed, label_smoothing_factor, optim, adafactor, group_by_length, length_column_name, report_to, ddp_find_unused_parameters, ddp_bucket_cap_mb, dataloader_pin_memory, skip_memory_metrics, use_legacy_prediction_loop, push_to_hub, resume_from_checkpoint, hub_model_id, hub_strategy, hub_token, gradient_checkpointing, fp16_backend, push_to_hub_model_id, push_to_hub_organization, push_to_hub_token, mp_parameters)

File ~/miniforge3/envs/TFM006/lib/python3.10/site-packages/transformers/training_args.py:865, in TrainingArguments.post_init(self)
857 warnings.warn(
858 "--adafactor is deprecated and will be removed in version 5 of 🤗 Transformers. Use --optim adafactor instead",
859 FutureWarning,
860 )
861 self.optim = OptimizerNames.ADAFACTOR
863 if (
864 is_torch_available()
--> 865 and (self.device.type != "cuda")
866 and not (self.device.type == "xla" and "GPU_NUM_DEVICES" in os.environ)
867 and (self.fp16 or self.fp16_full_eval or self.bf16 or self.bf16_full_eval)
868 ):
869 raise ValueError(
870 "Mixed precision training with AMP or APEX (--fp16 or --bf16) and half precision evaluation (--fp16_full_eval or --bf16_full_eval) can only be used on CUDA devices."
871 )
873 if is_torch_available() and self.tf32 is not None:

File ~/miniforge3/envs/TFM006/lib/python3.10/site-packages/transformers/utils/import_utils.py:781, in torch_required..wrapper(*args, **kwargs)
778 @wraps(func)
779 def wrapper(*args, **kwargs):
780 if is_torch_available():
--> 781 return func(*args, **kwargs)
782 else:
783 raise ImportError(f"Method {func.__name__} requires PyTorch.")

File ~/miniforge3/envs/TFM006/lib/python3.10/site-packages/transformers/training_args.py:1099, in TrainingArguments.device(self)
1093 @Property
1094 @torch_required
1095 def device(self) -> "torch.device":
1096 """
1097 The device used by this process.
1098 """
-> 1099 return self._setup_devices

File ~/miniforge3/envs/TFM006/lib/python3.10/site-packages/transformers/utils/generic.py:48, in cached_property.get(self, obj, objtype)
46 cached = getattr(obj, attr, None)
47 if cached is None:
---> 48 cached = self.fget(obj)
49 setattr(obj, attr, cached)
50 return cached

File ~/miniforge3/envs/TFM006/lib/python3.10/site-packages/transformers/utils/import_utils.py:781, in torch_required..wrapper(*args, **kwargs)
778 @wraps(func)
779 def wrapper(*args, **kwargs):
780 if is_torch_available():
--> 781 return func(*args, **kwargs)
782 else:
783 raise ImportError(f"Method {func.__name__} requires PyTorch.")

File ~/miniforge3/envs/TFM006/lib/python3.10/site-packages/transformers/training_args.py:1024, in TrainingArguments._setup_devices(self)
1020 @cached_property
1021 @torch_required
1022 def _setup_devices(self) -> "torch.device":
1023 logger.info("PyTorch: setting up devices")
-> 1024 if torch.distributed.is_initialized() and self.local_rank == -1:
1025 logger.warning(
1026 "torch.distributed process group is initialized, but local_rank == -1. "
1027 "In order to use Torch DDP, launch your script with `python -m torch.distributed.launch"
1028 )
1029 if self.no_cuda:

AttributeError: module 'torch.distributed' has no attribute 'is_initialized'

To Reproduce

Steps to reproduce the behavior:

*note: the notebook is running on Mac M1 on CPU

from transformers import Trainer, TrainingArguments

batch_size = 64
logging_steps = len(emotions_encoded["train"]) // batch_size
model_name = f"{model_ckpt}-finetuned-emotion"
training_args = TrainingArguments(output_dir=model_name,
num_train_epochs=2,
learning_rate=2e-5,
per_device_train_batch_size=batch_size,
per_device_eval_batch_size=batch_size,
weight_decay=0.01,
evaluation_strategy="epoch",
disable_tqdm=False,
logging_steps=logging_steps,
push_to_hub=False,
log_level="error")

Chapter 8 - Issue when training distilbert - RuntimeError: Expected all tensors to be on the same device

Information

The problem arises in chapter:

Describe the bug

Running the chapter 8 code on Colab, I got the following error when training distilbert
distrilbert_trainer.train()

Error:
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cpu and cuda:0

To Reproduce

Steps to reproduce the behavior:

Open the notebook on Colab (GPU runtime)
Run the cells until the chapter Choosing a Good Student Initialization (the one that failed)

Expected behavior

No error

working solution

Modifying the initial command:
install_requirements(is_chapter2 = True) solved the issue so it seems to be related to the transformers library version 4.11 vs 4.13

Chapter 7 doesn't run on SageMaker (SM) Notebook instance

Information

The problem arises in chapter:

Describe the bug

Chapter 7 doesn't run on SM Notebook instance

To Reproduce

Steps to reproduce the behavior:

Create and open SM notebook instance
Clone repo
Run all cells in notebook 07_question-answering.ipynb

Expected behavior

The notebook runs successfully without errors.

Additional information

Several minor fixes are required to get the notebook running on SM. I have fixed all of them, see this fork. However, I'm hesitant to create a PR for it as I'm not sure whether this will create issues for other environments (e.g. Colab, etc).

How would you like to incorporate these fixes into your repo, if at all?

Chapter 09: nlpaug is extremely slow (more than 60s to generate a subsituted sentence !!

Information

The question or comment is about chapter:

Question or comment

Has someone also noticed that the nlpaug naw.ContextualWordEmbsAug(model_path="distilbert-base-uncased", device="cuda", action="substitute") is extremely slow even on "cuda" device?

I have a decent computer Xeon E5, 64 GB, RTX Titan 24 Gbs so I cannot understand why this module is so slow.

Thanks

Best regards

Jerome

Differing NER tags

Information

The problem arises in chapter:

Describe the bug

Located in 04_multilingual-ner.ipynb under "Loading a Custom Model" sub heading. Code notebook below:

preds = [tags.names[p] for p in predictions[0].cpu().numpy()]
pd.DataFrame([xlmr_tokens, preds], index=["Tokens", "Tags"])

My output:
Tags B-ORG I-ORG I-ORG I-ORG B-ORG I-ORG B-ORG B-ORG I-ORG I-PER

To Reproduce

Steps to reproduce the behavior:

Run 04_multilingual-ner.ipynb in Colab

Expected behavior

Tags O I-LOC B-LOC B-LOC O I-LOC O O I-LOC B-LOC

Seems to be OK on Kaggle.

Duplicate labels are overwritten in document store

Information

The problem arises in chapter:

Describe the bug

As flagged by Julian Risch from deepset, there's a small bug when creating the labels for the document store because multiple labels (ie duplicates) are being initialised with the same ID:

from haystack import Label

labels = []
for i, row in dfs["test"].iterrows():
    # Metadata used for filtering in the Retriever
    meta = {"item_id": row["title"], "question_id": row["id"]}
    # Populate labels for questions with answers
    if len(row["answers.text"]):
        for answer in row["answers.text"]:
            label = Label(
                question=row["question"], answer=answer, id=i, origin=row["id"],
                meta=meta, is_correct_answer=True, is_correct_document=True,
                no_answer=False)
            labels.append(label)
    # Populate labels for questions without answers
    else:
        label = Label(
            question=row["question"], answer="", id=i, origin=row["id"],
            meta=meta, is_correct_answer=True, is_correct_document=True,
            no_answer=True)  
        labels.append(label)

TODO

Verify the bug really is a bug (I'm not 100% sure yet)
Fix the bug if needed and update plots etc

unable to run tensorflow example

Information

The problem arises in chapter:

Describe the bug

Running the following code:

import tensorflow as tf

tf_model.compile(
    optimizer=tf.keras.optimizers.Adam(learning_rate=5e-5),
    loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
    metrics=tf.metrics.SparseCategoricalAccuracy())

tf_model.fit(
    x=tf_train_dataset,
    y=None,
    validation_data=tf_eval_dataset,
    batch_size=batch_size,
    epochs=1
)

I get this error

AttributeError: module 'keras.engine.data_adapter' has no attribute 'expand_1d'

and searching online suggests it should have been fixed in tensorflow

huggingface/transformers#20750

To Reproduce

Steps to reproduce the behavior:

install tensorflow-macos==2.11.0
run the above code snippet (in the context of the chapter)
see the error

full stack track

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Cell In[89], line 8
      1 import tensorflow as tf
      3 tf_model.compile(
      4     optimizer=tf.keras.optimizers.Adam(learning_rate=5e-5),
      5     loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
      6     metrics=tf.metrics.SparseCategoricalAccuracy())
----> 8 tf_model.fit(
      9     x=tf_train_dataset,
     10     y=None,
     11     validation_data=tf_eval_dataset,
     12     batch_size=batch_size,
     13     epochs=1
     14 )

File ~/.local/share/virtualenvs/nlp_transformers-keTZnTdD/lib/python3.10/site-packages/keras/utils/traceback_utils.py:70, in filter_traceback.<locals>.error_handler(*args, **kwargs)
     67     filtered_tb = _process_traceback_frames(e.__traceback__)
     68     # To get the full stack trace, call:
     69     # `tf.debugging.disable_traceback_filtering()`
---> 70     raise e.with_traceback(filtered_tb) from None
     71 finally:
     72     del filtered_tb

File /var/folders/hd/csbqkrzd3s95b5c5hfv4cp8c0000gn/T/__autograph_generated_filerzz7novs.py:15, in outer_factory.<locals>.inner_factory.<locals>.tf__train_function(iterator)
     13 try:
     14     do_return = True
---> 15     retval_ = ag__.converted_call(ag__.ld(step_function), (ag__.ld(self), ag__.ld(iterator)), None, fscope)
     16 except:
     17     do_return = False

File ~/.local/share/virtualenvs/nlp_transformers-keTZnTdD/lib/python3.10/site-packages/transformers/modeling_tf_utils.py:1476, in TFPreTrainedModel.train_step(self, data)
   1474 output_to_label = {val: key for key, val in label_to_output.items()}
   1475 if not self._using_dummy_loss:
-> 1476     data = data_adapter.expand_1d(data)
   1477 x, y, sample_weight = data_adapter.unpack_x_y_sample_weight(data)
   1478 # If the inputs are mutable dictionaries, make a shallow copy of them because we will modify
   1479 # them during input/label pre-processing. This avoids surprising the user by wrecking their data.
   1480 # In addition, modifying mutable Python inputs makes XLA compilation impossible.

AttributeError: in user code:

    File "/Users/samueljoseph/.local/share/virtualenvs/nlp_transformers-keTZnTdD/lib/python3.10/site-packages/keras/engine/training.py", line 1249, in train_function  *
        return step_function(self, iterator)
    File "/Users/samueljoseph/.local/share/virtualenvs/nlp_transformers-keTZnTdD/lib/python3.10/site-packages/keras/engine/training.py", line 1233, in step_function  **
        outputs = model.distribute_strategy.run(run_step, args=(data,))
    File "/Users/samueljoseph/.local/share/virtualenvs/nlp_transformers-keTZnTdD/lib/python3.10/site-packages/keras/engine/training.py", line 1222, in run_step  **
        outputs = model.train_step(data)
    File "/Users/samueljoseph/.local/share/virtualenvs/nlp_transformers-keTZnTdD/lib/python3.10/site-packages/transformers/modeling_tf_utils.py", line 1476, in train_step
        data = data_adapter.expand_1d(data)

    AttributeError: module 'keras.engine.data_adapter' has no attribute 'expand_1d'

Expected behavior

that the code example runs without errors

Chapter 2 - Dataset 'emotion' is not loading due to invalid url

Information

The problem arises in chapter:

Describe the bug

To Reproduce

Steps to reproduce the behavior:

Open in colab the notebook to chapter 2
Run the code:
from datasets import load_dataset
emotions = load_dataset("emotion")
Error:
Downloading and preparing dataset emotion/default (download: 1.97 MiB, generated: 2.07 MiB, post-processed: Unknown size, total: 4.05 MiB) to /root/.cache/huggingface/datasets/emotion/default/0.0.0/348f63ca8e27b3713b6c04d723efe6d824a56fb3d1449794716c0f0296072705...
FileNotFoundError: Couldn't find file at https://www.dropbox.com/s/1pzkadrvffbqw6o/train.txt?dl=1

Transformer Anatomy

Hello,

Information

There is a missing image in the Images folder for "Transformer Anatomy".

The problem arises in chapter:

Transformer Anatomy

Describe the bug

The image given in the file path below does not exist in the Images folder.

from IPython.display import Image

Image(filename="images/chapter03_bertviz-neuron-light.png")

Please could you put this image into the Images folder?
Thanks.

Cannot push datasets to the hub

Information

The problem arises in chapter:

Describe the bug

Cannot push datasets .

To Reproduce

Steps to reproduce the behavior:

$ git push
batch response: Authorization error. B | 0 B/s
error: failed to push some refs to 'https://huggingface.co/datasets/Shuchen/codeparrot-valid'

Authorization error happens here, but I have logged in successfully.

$ huggingface-cli login

        _|    _|  _|    _|    _|_|_|    _|_|_|  _|_|_|  _|      _|    _|_|_|      _|_|_|_|    _|_|      _|_|_|  _|_|_|_|
        _|    _|  _|    _|  _|        _|          _|    _|_|    _|  _|            _|        _|    _|  _|        _|
        _|_|_|_|  _|    _|  _|  _|_|  _|  _|_|    _|    _|  _|  _|  _|  _|_|      _|_|_|    _|_|_|_|  _|        _|_|_|
        _|    _|  _|    _|  _|    _|  _|    _|    _|    _|    _|_|  _|    _|      _|        _|    _|  _|        _|
        _|    _|    _|_|      _|_|_|    _|_|_|  _|_|_|  _|      _|    _|_|_|      _|        _|    _|    _|_|_|  _|_|_|_|

        To login, `huggingface_hub` now requires a token generated from https://huggingface.co/settings/token.
        (Deprecated, will be removed in v0.3.0) To login with username and password instead, interrupt with Ctrl+C.

Token:
Login successful
Your token has been saved to /home/shuchen/.huggingface/token

Anybody can help?

Expected behavior

Can't run on M1 Mac kernel crashing and GPU not detected

Describe the bug

v11 of cudatoolkit is not available for osx-64. Possibly removed by Nvidia?

https://anaconda.org/anaconda/cudatoolkit
https://developer.nvidia.com/nvidia-cuda-toolkit-11_6_0-developer-tools-mac-hosts

I switch to v9 to get the environment to build, but the GPU is not detected and the kernel crashes running the first command of the first introduction notebook.

#hide
from utils import *
setup_chapter()

To Reproduce

Steps to reproduce the behavior:

See error building conda environment ResolvePackageNotFound: - cudatoolkit=11.3
Switch environment.yml to v9 cudatoolkit
Attempt to run first notebook command kernel crashes
In python note GPU is not found. >>> import torch

torch.cuda.is_available()
False
exit()

Expected behavior

environment file works as is, GPU is available, kernel doesn't crash on the first command.

synonym replacement

Information

The question or comment is about chapter:

Question or comment

On p272, the authors said "we'll focus on using synonym replacement.....", but on p273 example was given using naw.ContexualWordEmbsAug() instead of naw.SynonymAug(). Can the authors explain the difference?

Chapter 6 failed - fine tune PEGASSUS

Information

The problem arises in chapter:

Summarization

Describe the bug

Steps to reproduce the behavior:

Run the notebook on a CUDA/GPU enabled device- A100 card

trainer = Trainer(model=model,
                  args=training_args,
                  tokenizer=tokenizer, data_collator=seq2seq_data_collator,
                  train_dataset=dataset_samsum_pt["train"],
                  eval_dataset=dataset_samsum_pt["validation"])

Trainer() fails withwith the following error:

Traceback (most recent call last):
File "/home/shabnam/anaconda3/envs/rapids-22.08/lib/python3.9/site-packages/IPython/core/interactiveshell.py", line 3378, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "", line 1, in
trainer = Trainer(model=model, args=training_args,
File "/home/shabnam/anaconda3/envs/rapids-22.08/lib/python3.9/site-packages/transformers/trainer.py", line 450, in init
self._move_model_to_device(model, args.device)
File "/home/shabnam/anaconda3/envs/rapids-22.08/lib/python3.9/site-packages/transformers/trainer.py", line 722, in _move_model_to_device
model = model.to(device)
AttributeError: 'str' object has no attribute 'to'

Expected behavior:
training ...

02_classification.ipynb，emotion dataset file is removed from dropbox

Information

The problem arises in chapter:

Describe the bug

When loading emotion dataset by calling load_dataset("emotion"), an exception throwed. It seems the files were removed from dropbox.

Screenshot👇

Expected behavior

load_dataset("emotion")` run successfully

CUDA out of memory

Information

The problem arises in chapter:

Describe the bug

RuntimeError: CUDA out of memory.

To Reproduce

Steps to reproduce the behavior:

Run the 04_multilingual-ner.ipynb notebook in Colab free version or my own machine Jupyter Notebook(Both 11GB memory GPU)
The cuda memory don't free up when finish the first trainer.train().
I use torch.cuda.empty_cache(). Don't work. Kill the process in nvidia-smi, it will also kill the notebook and I have to re run the notebook from start.

So do you have a solution to deal with the CUDA OOM problem in Jupyter notebook?

Tensorflow version of code

Tensorflow version of code is sometimes incompatible.

In case someone makes the transformation, if grateful can share the link too

Thanks

Elasticsearch With Haystack -Initial connection to Elasticsearch failed

Hi

I am trying to run Chapter 7 to learn about Haystack for QA:

I am using Jupyter Notebook which is connected to my GCP VM : Debian GNU/Linux 9, Tesla V100
I did install debian version
but facing the following error : ConnectionError: Initial connection to Elasticsearch failed. Make sure you run an Elasticsearch instance at [{'host': 'localhost', 'port': 9200}] and that it has finished the initial ramp up (can take > 30s).

!wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-8.1.2-linux-x86_64.tar.gz
!wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-8.1.2-linux-x86_64.tar.gz.sha512
!shasum -a 512 -c elasticsearch-8.1.2-linux-x86_64.tar.gz.sha512
!tar -xzf elasticsearch-8.1.2-linux-x86_64.tar.gz
!cd elasticsearch-8.1.2/
!pip install pymilvus
import pymilvus
import os
from subprocess import Popen, PIPE, STDOUT

!chown -R daemon:daemon elasticsearch-8.1.2
es_server = Popen(args=['elasticsearch-8.1.2/bin/elasticsearch'])
!sleep 30

from haystack.document_store.elasticsearch import ElasticsearchDocumentStore

#document_store = ElasticsearchDocumentStore(host='localhost', port= 9201, username='', password='')
document_store = ElasticsearchDocumentStore(return_embedding=True)

I would appreciate your support on this.

Shabnam

how to get loss returned along with logit in Tensorflow

Information

The question or comment is about chapter:

Question or comment

The book shows a really interesting example of getting the loss returned along with the predicted class probability, in the "Error Analysis" section of chapter 2:

Before moving on, we should investigate our model’s predictions a little bit further. A simple yet powerful technique is to sort the validation samples by the model loss. When we pass the label during the forward pass, the loss is automatically calculated and returned. Here’s a function that returns the loss along with the predicted label:

from torch.nn.functional import cross_entropy

def forward_pass_with_label(batch):
    # Place all input tensors on the same device as the model
    inputs = {k:v.to(device) for k,v in batch.items()
              if k in tokenizer.model_input_names}

    with torch.no_grad():
        output = model(**inputs)
        pred_label = torch.argmax(output.logits, axis=-1)
        
        loss = cross_entropy(output.logits, batch["label"].to(device),
                             reduction="none")
    # Place outputs on CPU for compatibility with other dataset columns
    return {"loss": loss.cpu().numpy(),
            "predicted_label": pred_label.cpu().numpy()}

Does anyone have any idea how to do similar for a tensorflow based approach?

I've been reading the documentation for the TF model predict function, but can't immediately see anything that would correspond to the same https://www.tensorflow.org/api_docs/python/tf/keras/Model#predict although to be honest, I'm not really quite following what the example code is doing ...

is it taking the validation dataset, and re-predicting the output for each item in it, and then calculating the loss as cross entroy as function of the output logits and what the correct label should have been ...?

so to do that with TF I'd need to take the validation set and do something similar ...?

cnn_dailymail is broken in pinned version of datasets

Information

The problem arises in chapter:

Describe the bug

As per huggingface/datasets#3830, trying to load the dataset fails. This change is still not in the latest release, but will likely need some update

To Reproduce

Steps to reproduce the behavior:

Just run

load_dataset("cnn_dailymail", '3.0.0')

For exact error message, see linked issue

Chapter 7 Error when when importing EvalDocuments in Colab

Information

The problem arises in chapter:

Describe the bug

Error when importing EvalDocuments

ImportError: cannot import name 'EvalDocuments' from 'haystack.modeling.evaluation.eval' (unknown location)

To Reproduce

Steps to reproduce the behavior:

pip install farm-haystack
from haystack.eval import EvalDocuments

Expected behavior

Complete import EvalDocuments without error

nlp-with-transformers / notebooks Goto Github PK

notebooks's People

Contributors

Stargazers

Watchers

Forkers

notebooks's Issues

Information

Describe the bug

To Reproduce

Information

Question or comment

Describe the bug

To Reproduce

Expected behavior

proposed solution (after several hours of head banging against the keyboard)

Information

Describe the bug

To Reproduce

Expected behavior

Information

Describe the bug

To Reproduce

Expected behavior

Proposed fix

Information

Question or comment

Information

Question or comment

Information

Question or comment

The detail error is shown below,

Information

Describe the bug

To Reproduce

Information

Describe the bug

Information

Describe the bug

To Reproduce

Expected behavior

Information

Describe the bug

To Reproduce

Expected behavior

Information

Running out of memory

Information

Describe the bug

Information

Question or comment

Describe the bug

To Reproduce

Information

Question or comment

Information

Describe the bug

To Reproduce

Expected behavior

Information

Describe the bug

Information

CUDA error when substituting with Pegasus-xsum

Information

Describe the bug

Information

Describe the bug

To Reproduce

Expected behavior

Information

Describe the bug

To Reproduce

Expected behavior

Information

Describe the bug

To Reproduce

Expected behavior

Information

Describe the bug

To Reproduce