nlp-with-transformers / notebooks Goto Github PK
View Code? Open in Web Editor NEWJupyter notebooks for the Natural Language Processing with Transformers book
Home Page: https://transformersbook.com/
License: Apache License 2.0
Jupyter notebooks for the Natural Language Processing with Transformers book
Home Page: https://transformersbook.com/
License: Apache License 2.0
The problem arises in chapter:
The Introduction notebook is incomplete
Steps to reproduce the behavior:
The question or comment is about chapter:
FARMReader was initialized and model_ckpt = "deepset/minilm-uncased-squad2"
reader = FARMReader(model_name_or_path=model_ckpt, progress_bar=False,
max_seq_len=max_seq_length, doc_stride=doc_stride, return_no_answer=True)
With evaluate_reader function defined in page 198 of the book, reader_eval["Fine-tune on SQuAD"] = evaluate_reader(reader)
.
The EM and F1 scores on "Fine-tune on SQuAD" that I got are both zeros. This also impacts the domain adaption results for "Fine-tune on SQuAD+SubjQA". Following the notebook till this step, and wondering what I've overlooked in evaluating the reader.
Thank you !
Running the notebook failed with an error:
Steps to reproduce the behavior:
training..
do not pass tokenizer to Trainer()
from transformers import Trainer
trainer = Trainer(model=model, args=training_args,
compute_metrics=compute_metrics,
train_dataset=emotions_encoded["train"],
eval_dataset=emotions_encoded["validation"],
#tokenizer=tokenizer
)
The problem arises in chapter:
When I try to execute cell 26 in 08_model-compression.ipnyb on my local machine I'm getting the following error
Steps to reproduce the behavior:
RuntimeError Traceback (most recent call last)
Input In [28], in <cell line: 6>()
1 distilbert_trainer = DistillationTrainer(model_init=student_init,
2 teacher_model=teacher_model, args=student_training_args,
3 train_dataset=clinc_enc['train'], eval_dataset=clinc_enc['validation'],
4 compute_metrics=compute_metrics, tokenizer=student_tokenizer)
----> 6 distilbert_trainer.train()
File ~\anaconda3\lib\site-packages\transformers\trainer.py:1316, in Trainer.train(self, resume_from_checkpoint, trial, ignore_keys_for_eval, **kwargs)
1314 tr_loss_step = self.training_step(model, inputs)
1315 else:
-> 1316 tr_loss_step = self.training_step(model, inputs)
1318 if (
1319 args.logging_nan_inf_filter
1320 and not is_torch_tpu_available()
1321 and (torch.isnan(tr_loss_step) or torch.isinf(tr_loss_step))
1322 ):
1323 # if loss is nan or inf simply add the average of previous logged losses
1324 tr_loss += tr_loss / (1 + self.state.global_step - self._globalstep_last_logged)
File ~\anaconda3\lib\site-packages\transformers\trainer.py:1849, in Trainer.training_step(self, model, inputs)
1847 loss = self.compute_loss(model, inputs)
1848 else:
-> 1849 loss = self.compute_loss(model, inputs)
1851 if self.args.n_gpu > 1:
1852 loss = loss.mean() # mean() to average on multi-gpu parallel training
Input In [17], in DistillationTrainer.compute_loss(self, model, inputs, return_outputs)
10 def compute_loss(self, model, inputs, return_outputs=False):
---> 11 outputs_stu = model(**inputs)
12 # Extract cross-entropy loss and logits from student
13 loss_ce = outputs_stu.loss
File ~\anaconda3\lib\site-packages\torch\nn\modules\module.py:1110, in Module._call_impl(self, *input, **kwargs)
1106 # If we don't have any hooks, we want to skip the rest of the logic in
1107 # this function, and just call forward.
1108 if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
1109 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1110 return forward_call(*input, **kwargs)
1111 # Do not call functions when jit is used
1112 full_backward_hooks, non_full_backward_hooks = [], []
File ~\anaconda3\lib\site-packages\transformers\models\distilbert\modeling_distilbert.py:729, in DistilBertForSequenceClassification.forward(self, input_ids, attention_mask, head_mask, inputs_embeds, labels, output_attentions, output_hidden_states, return_dict)
721 r"""
722 labels (:obj:torch.LongTensor
of shape :obj:(batch_size,)
, optional
):
723 Labels for computing the sequence classification/regression loss. Indices should be in :obj:[0, ..., 724 config.num_labels - 1]
. If :obj:config.num_labels == 1
a regression loss is computed (Mean-Square loss),
725 If :obj:config.num_labels > 1
a classification loss is computed (Cross-Entropy).
726 """
727 return_dict = return_dict if return_dict is not None else self.config.use_return_dict
--> 729 distilbert_output = self.distilbert(
730 input_ids=input_ids,
731 attention_mask=attention_mask,
732 head_mask=head_mask,
733 inputs_embeds=inputs_embeds,
734 output_attentions=output_attentions,
735 output_hidden_states=output_hidden_states,
736 return_dict=return_dict,
737 )
738 hidden_state = distilbert_output[0] # (bs, seq_len, dim)
739 pooled_output = hidden_state[:, 0] # (bs, dim)
File ~\anaconda3\lib\site-packages\torch\nn\modules\module.py:1110, in Module._call_impl(self, *input, **kwargs)
1106 # If we don't have any hooks, we want to skip the rest of the logic in
1107 # this function, and just call forward.
1108 if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
1109 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1110 return forward_call(*input, **kwargs)
1111 # Do not call functions when jit is used
1112 full_backward_hooks, non_full_backward_hooks = [], []
File ~\anaconda3\lib\site-packages\transformers\models\distilbert\modeling_distilbert.py:550, in DistilBertModel.forward(self, input_ids, attention_mask, head_mask, inputs_embeds, output_attentions, output_hidden_states, return_dict)
547 head_mask = self.get_head_mask(head_mask, self.config.num_hidden_layers)
549 if inputs_embeds is None:
--> 550 inputs_embeds = self.embeddings(input_ids) # (bs, seq_length, dim)
551 return self.transformer(
552 x=inputs_embeds,
553 attn_mask=attention_mask,
(...)
557 return_dict=return_dict,
558 )
File ~\anaconda3\lib\site-packages\torch\nn\modules\module.py:1110, in Module._call_impl(self, *input, **kwargs)
1106 # If we don't have any hooks, we want to skip the rest of the logic in
1107 # this function, and just call forward.
1108 if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
1109 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1110 return forward_call(*input, **kwargs)
1111 # Do not call functions when jit is used
1112 full_backward_hooks, non_full_backward_hooks = [], []
File ~\anaconda3\lib\site-packages\transformers\models\distilbert\modeling_distilbert.py:130, in Embeddings.forward(self, input_ids)
127 position_ids = torch.arange(seq_length, dtype=torch.long, device=input_ids.device) # (max_seq_length)
128 position_ids = position_ids.unsqueeze(0).expand_as(input_ids) # (bs, max_seq_length)
--> 130 word_embeddings = self.word_embeddings(input_ids) # (bs, max_seq_length, dim)
131 position_embeddings = self.position_embeddings(position_ids) # (bs, max_seq_length, dim)
133 embeddings = word_embeddings + position_embeddings # (bs, max_seq_length, dim)
File ~\anaconda3\lib\site-packages\torch\nn\modules\module.py:1110, in Module._call_impl(self, *input, **kwargs)
1106 # If we don't have any hooks, we want to skip the rest of the logic in
1107 # this function, and just call forward.
1108 if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
1109 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1110 return forward_call(*input, **kwargs)
1111 # Do not call functions when jit is used
1112 full_backward_hooks, non_full_backward_hooks = [], []
File ~\anaconda3\lib\site-packages\torch\nn\modules\sparse.py:158, in Embedding.forward(self, input)
157 def forward(self, input: Tensor) -> Tensor:
--> 158 return F.embedding(
159 input, self.weight, self.padding_idx, self.max_norm,
160 self.norm_type, self.scale_grad_by_freq, self.sparse)
File ~\anaconda3\lib\site-packages\torch\nn\functional.py:2183, in embedding(input, weight, padding_idx, max_norm, norm_type, scale_grad_by_freq, sparse)
2177 # Note [embedding_renorm set_grad_enabled]
2178 # XXX: equivalent to
2179 # with torch.no_grad():
2180 # torch.embedding_renorm_
2181 # remove once script supports set_grad_enabled
2182 no_grad_embedding_renorm(weight, input, max_norm, norm_type)
-> 2183 return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument index in method wrapper__index_select)
When I try to set device as "CPU" at the start of notebook even then I'm getting this issue.
Describe the bug
There is a reference to a function display_df(df.T, header=None) but the dsiplay_df function is not defined in the notebook.
It needs to perform something like this (or a better-written version of this):
def display_df(df, header=True):
if header:
return df
else:
df.columns = ["" for col in df.columns]
return df
The problem arises in chapter:
The function tag_text
has a bug where a variable is defined as input_ids
and then used as inputs
Steps to reproduce the behavior:
On last cell before "Error Analysis", with code
text_de = "Jeff Dean ist ein Informatiker bei Google in Kalifornien"
tag_text(text_de, tags, trainer.model, xlmr_tokenizer)
Produces error:
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
/tmp/ipykernel_33/24045397.py in <module>
1 # hide_output
2 text_de = "Jeff Dean ist ein Informatiker bei Google in Kalifornien"
----> 3 tag_text(text_de, tags, trainer.model, xlmr_tokenizer)
/tmp/ipykernel_33/469722974.py in tag_text(text, tags, model, tokenizer)
5 input_ids = xlmr_tokenizer(text, return_tensors="pt").input_ids.to(device)
6 # Get predictions as distribution over 7 possible classes
----> 7 outputs = model(inputs)[0]
8 # Take argmax to get most likely class per token
9 predictions = torch.argmax(outputs, dim=2)
NameError: name 'inputs' is not defined
Expect the notebook to run without error.
Replace inputs
by input_ids
in the definition of tag text
Current
def tag_text(text, tags, model, tokenizer):
# Get tokens with special characters
tokens = tokenizer(text).tokens()
# Encode the sequence into IDs
input_ids = xlmr_tokenizer(text, return_tensors="pt").input_ids.to(device)
# Get predictions as distribution over 7 possible classes
outputs = model(inputs)[0]
# Take argmax to get most likely class per token
predictions = torch.argmax(outputs, dim=2)
# Convert to DataFrame
preds = [tags.names[p] for p in predictions[0].cpu().numpy()]
return pd.DataFrame([tokens, preds], index=["Tokens", "Tags"])
Proposed
def tag_text(text, tags, model, tokenizer):
# Get tokens with special characters
tokens = tokenizer(text).tokens()
# Encode the sequence into IDs
input_ids = xlmr_tokenizer(text, return_tensors="pt").input_ids.to(device)
# Get predictions as distribution over 7 possible classes
outputs = model(input_ids)[0] # <-- Change here
# Take argmax to get most likely class per token
predictions = torch.argmax(outputs, dim=2)
# Convert to DataFrame
preds = [tags.names[p] for p in predictions[0].cpu().numpy()]
return pd.DataFrame([tokens, preds], index=["Tokens", "Tags"])
The question or comment is about chapter:
Hi
I would like to say first thanks for writing this amazing book. And then ask a question about the attention mechanism in Transformers (referring to page 61). I am trying to compare the meaning and mechanism of what is named as Self Attention in Transformers with what I previously knew as Self attention from this paper:https://aclanthology.org/N16-1174.pdf and local and general attention from the following : https://arxiv.org/pdf/1508.04025.pdf
What it has been used in these papers was HAN model with Self,local or Global attention on top of RNN , GRU, LSTM or CNN layers. As Transformers are new architecture , I am wondering if the mathematics behind the attention is same as these 2 papers or not?
Please forgive me if the question seems very basic for you.
Regards
Shabnam
Hi, how to host the notebooks like fastbook.
The question or comment is about chapter:
In the declaration of the method extract_hidden_states
in section Transformers as Feature Extractors
subsection Extracting the last hidden states
, the inclusion of the line if k in tokenizer.model_input_names
is not explained in the text. This unexplained line is also a deviation from the previous code demonstrating the behavior of extracting the hidden state for a single line of text. So it tripped me up, personally.
The question or comment is about chapter:
I got the RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument index in method wrapper__index_select)
Since the model file is already on GPU, I assume that the emotion Dataset is still on CPU, which creates the error. I'm confused on where should I add ".to(device)" to get Dataset on GPU?
RuntimeError Traceback (most recent call last)
Input In [63], in <cell line: 8>()
1 from transformers import Trainer
3 trainer = Trainer(model=model, args=training_args,
4 compute_metrics=compute_metrics,
5 train_dataset=emotions_encoded["train"],
6 eval_dataset=emotions_encoded["validation"],
7 tokenizer=tokenizer)
----> 8 trainer.train()
File ~\anaconda3\envs\book\lib\site-packages\transformers\trainer.py:1316, in Trainer.train(self, resume_from_checkpoint, trial, ignore_keys_for_eval, **kwargs)
1314 tr_loss_step = self.training_step(model, inputs)
1315 else:
-> 1316 tr_loss_step = self.training_step(model, inputs)
1318 if (
1319 args.logging_nan_inf_filter
1320 and not is_torch_tpu_available()
1321 and (torch.isnan(tr_loss_step) or torch.isinf(tr_loss_step))
1322 ):
1323 # if loss is nan or inf simply add the average of previous logged losses
1324 tr_loss += tr_loss / (1 + self.state.global_step - self._globalstep_last_logged)
File ~\anaconda3\envs\book\lib\site-packages\transformers\trainer.py:1849, in Trainer.training_step(self, model, inputs)
1847 loss = self.compute_loss(model, inputs)
1848 else:
-> 1849 loss = self.compute_loss(model, inputs)
1851 if self.args.n_gpu > 1:
1852 loss = loss.mean() # mean() to average on multi-gpu parallel training
File ~\anaconda3\envs\book\lib\site-packages\transformers\trainer.py:1881, in Trainer.compute_loss(self, model, inputs, return_outputs)
1879 else:
1880 labels = None
-> 1881 outputs = model(**inputs)
1882 # Save past state if it exists
1883 # TODO: this needs to be fixed and made cleaner later.
1884 if self.args.past_index >= 0:
File ~\anaconda3\envs\book\lib\site-packages\torch\nn\modules\module.py:1110, in Module._call_impl(self, *input, **kwargs)
1106 # If we don't have any hooks, we want to skip the rest of the logic in
1107 # this function, and just call forward.
1108 if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
1109 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1110 return forward_call(*input, **kwargs)
1111 # Do not call functions when jit is used
1112 full_backward_hooks, non_full_backward_hooks = [], []
File ~\anaconda3\envs\book\lib\site-packages\transformers\models\distilbert\modeling_distilbert.py:729, in DistilBertForSequenceClassification.forward(self, input_ids, attention_mask, head_mask, inputs_embeds, labels, output_attentions, output_hidden_states, return_dict)
721 r"""
722 labels (:obj:torch.LongTensor
of shape :obj:(batch_size,)
, optional
):
723 Labels for computing the sequence classification/regression loss. Indices should be in :obj:[0, ..., 724 config.num_labels - 1]
. If :obj:config.num_labels == 1
a regression loss is computed (Mean-Square loss),
725 If :obj:config.num_labels > 1
a classification loss is computed (Cross-Entropy).
726 """
727 return_dict = return_dict if return_dict is not None else self.config.use_return_dict
--> 729 distilbert_output = self.distilbert(
730 input_ids=input_ids,
731 attention_mask=attention_mask,
732 head_mask=head_mask,
733 inputs_embeds=inputs_embeds,
734 output_attentions=output_attentions,
735 output_hidden_states=output_hidden_states,
736 return_dict=return_dict,
737 )
738 hidden_state = distilbert_output[0] # (bs, seq_len, dim)
739 pooled_output = hidden_state[:, 0] # (bs, dim)
File ~\anaconda3\envs\book\lib\site-packages\torch\nn\modules\module.py:1110, in Module._call_impl(self, *input, **kwargs)
1106 # If we don't have any hooks, we want to skip the rest of the logic in
1107 # this function, and just call forward.
1108 if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
1109 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1110 return forward_call(*input, **kwargs)
1111 # Do not call functions when jit is used
1112 full_backward_hooks, non_full_backward_hooks = [], []
File ~\anaconda3\envs\book\lib\site-packages\transformers\models\distilbert\modeling_distilbert.py:550, in DistilBertModel.forward(self, input_ids, attention_mask, head_mask, inputs_embeds, output_attentions, output_hidden_states, return_dict)
547 head_mask = self.get_head_mask(head_mask, self.config.num_hidden_layers)
549 if inputs_embeds is None:
--> 550 inputs_embeds = self.embeddings(input_ids) # (bs, seq_length, dim)
551 return self.transformer(
552 x=inputs_embeds,
553 attn_mask=attention_mask,
(...)
557 return_dict=return_dict,
558 )
File ~\anaconda3\envs\book\lib\site-packages\torch\nn\modules\module.py:1110, in Module._call_impl(self, *input, **kwargs)
1106 # If we don't have any hooks, we want to skip the rest of the logic in
1107 # this function, and just call forward.
1108 if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
1109 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1110 return forward_call(*input, **kwargs)
1111 # Do not call functions when jit is used
1112 full_backward_hooks, non_full_backward_hooks = [], []
File ~\anaconda3\envs\book\lib\site-packages\transformers\models\distilbert\modeling_distilbert.py:130, in Embeddings.forward(self, input_ids)
127 position_ids = torch.arange(seq_length, dtype=torch.long, device=input_ids.device) # (max_seq_length)
128 position_ids = position_ids.unsqueeze(0).expand_as(input_ids) # (bs, max_seq_length)
--> 130 word_embeddings = self.word_embeddings(input_ids) # (bs, max_seq_length, dim)
131 position_embeddings = self.position_embeddings(position_ids) # (bs, max_seq_length, dim)
133 embeddings = word_embeddings + position_embeddings # (bs, max_seq_length, dim)
File ~\anaconda3\envs\book\lib\site-packages\torch\nn\modules\module.py:1110, in Module._call_impl(self, *input, **kwargs)
1106 # If we don't have any hooks, we want to skip the rest of the logic in
1107 # this function, and just call forward.
1108 if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
1109 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1110 return forward_call(*input, **kwargs)
1111 # Do not call functions when jit is used
1112 full_backward_hooks, non_full_backward_hooks = [], []
File ~\anaconda3\envs\book\lib\site-packages\torch\nn\modules\sparse.py:158, in Embedding.forward(self, input)
157 def forward(self, input: Tensor) -> Tensor:
--> 158 return F.embedding(
159 input, self.weight, self.padding_idx, self.max_norm,
160 self.norm_type, self.scale_grad_by_freq, self.sparse)
File ~\anaconda3\envs\book\lib\site-packages\torch\nn\functional.py:2183, in embedding(input, weight, padding_idx, max_norm, norm_type, scale_grad_by_freq, sparse)
2177 # Note [embedding_renorm set_grad_enabled]
2178 # XXX: equivalent to
2179 # with torch.no_grad():
2180 # torch.embedding_renorm_
2181 # remove once script supports set_grad_enabled
2182 no_grad_embedding_renorm(weight, input, max_norm, norm_type)
-> 2183 return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument index in method wrapper__index_select)
The problem arises in chapter:
I found two issues related to the code that produces the confusion matrix:
plot_confusion_matrix
function definition the input order is (y_preds, y_true, labels)
. However, when the function is called the input order is (df_tokens["labels"], df_tokens["predicted_label"], tags.names)
. In other words, predicted and ground truth labels are passed in the wrong order.['O', 'B-PER', 'I-PER', 'B-ORG', 'I-ORG', 'B-LOC', 'I-LOC']
) rather than IDs from 0 to 6. confusion_matrix
function re-orders list of string in alphabetical order (['B-LOC', 'B-ORG', 'B-PER', 'I-LOC', 'I-ORG', 'I-PER', 'O']
), however when the function is called the tag names follow the order defined by the IDs.It's possible to verify the issue by checking individual numbers in df_tokens
:
df_one_label = df_tokens[df_tokens['labels'] == 'I-LOC']
(df_one_label['labels'] == df_one_label['predicted_label']).sum() / len(df_one_label)
With the current code the confusion matrix shows that I-LOC
is predicted properly 99% of the time. However, by checking the underlying numbers the correct accuracy is around 85%. The label predicted with an accuracy of 99% is O
.
The problem arises in chapter:
The to_tf_dataset
throws an error TypeError: to_tf_dataset() missing 1 required positional argument: 'collate_fn'
. This could be a version issue.
The problem arises in chapter:
datasets module cannot load the cnn_dailymail dataset correctly due to bug datasets bug #3787.
The bug should have been fixed in datasets v 2.0.0 release but the notebook env is using v 1.16.1.
After upgrading the datasets lib to v2.0.0. the bug persists.
Steps to reproduce the behavior:
The dataset should be loaded flawlessly as the bug #3787 has been said closed and the problem solved.
The problem arises in chapter:
The pipeline cannot handle anymore the lack of correct answer, permitted when the parameter "handle_impossible_anwser" is set to True.
Steps to reproduce the behavior:
In transformers\pipelines\question_answering.py
line 409 : min_null_score = min(min_null_score, (start_[0] * end_[0]).item()) causes ValueError
ValueError: can only convert an array of size 1 to a Python scalar
Should return {'score': 0.9068416357040405, 'start': 0, 'end': 0, 'answer': ''}
The question or comment is about chapter:
After loading gpt2-xl I get this message prompting me to upgrade to Colab Pro:
Your session crashed after using all available RAM. If you are interested in access to high-RAM runtimes, you may want to check out [Colab Pro]
The problem arises in chapter:
These are misc minor bugs related to haystack >=1.0.0 since the book mentions the repo is updated. Edit: actually you mention 0.10.0, not 1.0.0, so most likely these are not issues after all.
pipeline
, you should use top_k
as topk
is deprecatedpipe(question=question, context=context, top_k=3)
ExtractiveQAPipeline
is imported through pipelines
instead of pipeline
in newer haystack versionfrom haystack.pipelines import ExtractiveQAPipeline
preds = pipe.run(query=query, top_k_retriever=3, top_k_reader=n_answers,
filters={"item_id": [item_id], "split":["train"]})
Should be
preds = pipe.run(
query=query,
params={"Retriever": {"top_k": 3},
"Reader": {"top_k": n_answers},
"filters": {"item_id": [item_id], "split":["train"]}}
)
This causes issues
print(f"Answer {idx+1}: {preds['answers'][idx]['answer']}")
print(f"Review snippet: ...{preds['answers'][idx]['context']}...")
It should be
print(f"Answer {idx+1}: {preds['answers'][idx].answer}")
print(f"Review snippet: ...{preds['answers'][idx].context}...")
Label
has different param expectations, such as answer being an Answer
instance, question
being renamed to query
, and the document to be passedThe question or comment is about chapter:
I just would like to know why we use just the [CLS] to represent each tweet ? and why we use the last token is it make sense of embedding the representation of the tweet ?
When I run the command "conda env create -f environment.yml", it shows:
Solving environment: failed
ResolvePackageNotFound:
- libsndfile
Steps to reproduce the behavior:
The question or comment is about chapter:
In section Beam Search Decoding of chapter 5, at page 132, the Authors include the following function for calculating a sequence log-probability:
def sequence_logprob(model, labels, input_len=0):
with torch.no_grad():
output = model(labels)
log_probs = log_probs_from_logits(
output.logits[:, :-1, :], labels[:, 1:])
seq_log_prob = torch.sum(log_probs[:, input_len:])
return seq_log_prob.cpu().numpy()
Where labels
correspond to output_greedy
, calculated as:
max_length = 128
input_txt = """In a shocking finding, scientist discovered \
a herd of unicorns living in a remote, previously unexplored \
valley, in the Andes Mountains. Even more surprising to the \
researchers was the fact that the unicorns spoke perfect English.\n\n
"""
input_ids = tokenizer(input_txt, return_tensors="pt")["input_ids"].to(device)
output_greedy = model.generate(input_ids, max_length=max_length,
do_sample=False)
print(tokenizer.decode(output_greedy.squeeze()))
While it is clear the alignment between logits and labels (i.e., labels are shifted by 1), it is not clear to me why - when calculating the sequence log probability - we slice the log_probs
tensor using input_len
instead of input_len - 1
(see below)
seq_log_prob = torch.sum(log_probs[:, input_len:])
Let me walk you through the example in the book in details:
input_ids
is a 47-long sequence of tokensoutput_greedy
is a 128-long sequence of tokens (the original 47 input tokens + 81 model-generated tokens; max_length
was set to 128)When doing a forward pass using the model, i.e., outputs = model(output_greedy)
the output (what is then passed as labels
in sequence_logprob
function) will include logits, whose dimension-1 is 128 (our max_length
). We know that logits at index 0 actually refer to what would be the second token in our output sequence. Python 0-indexing is confusing here, but we can say that:
logit at index 0 in outputs.logits
corresponds to the logits of the second word in our sequence
In other words, logit index >> (index + 2)th word
in the sequence (where sequence is 1-indexed).
Following the same reasoning, we know that the first truly model-generated token (i.e., a token not present in the initial prompt) is the 48th word in the sequence, i.e., the (48 - 2)th
logit in outputs.logits
. We know that the model-generated text starts with The researchers, from the University of California and we can verify it by running:
tokenizer.decode(torch.argmax(output.logits[0,46:52], dim=-1))
In other words, the delta between logits indices and word position in output sequence is 2 because of:
As a consequence, the sequence_logprob
function should be changed as follows:
def sequence_logprob(model, labels, input_len=0):
with torch.no_grad():
output = model(labels)
log_probs = log_probs_from_logits(
output.logits[:, :-1, :], labels[:, 1:])
# CHANGE HERE
seq_log_prob = torch.sum(log_probs[:, input_len-1:])
return seq_log_prob.cpu().numpy()
Am I missing something?
The problem arises in chapter:
The NER predictions (in terms of tags) on the Jack Sparrow example are odd and don't match those from the book. They also change each time you re-run all the code above. The tokenization looks fine so the issue seems to be coming from the model itself.
Bug encountered locally and in Google colab.
Steps to reproduce the behavior:
I was expecting to get relevant tags as found in the book for cell 115 ie. [O I-LOC B-LOC B-LOC O I-LOC O O I-LOC B-LOC].
The problem arises in chapter:
from transformers import Trainer
trainer = Trainer(model=model, args=training_args,
compute_metrics=compute_metrics,
train_dataset=emotions_encoded["train"],
eval_dataset=emotions_encoded["validation"],
tokenizer=tokenizer)
trainer.train();
When run the code in Jupyer Notebook:
...
OSError: Looks like you do not have git-lfs installed, please install. You can install from [https://git-lfs.github.com/.](https://git-lfs.github.com/) Then run `git lfs install` (you only have to do this once).
I wonder if some other settings are necessary. Thank you.
The question or comment is about chapter:
RuntimeError: CUDA error: CUBLAS_STATUS_NOT_INITIALIZED when calling cublasCreate(handle)
To recreate:
device = "cuda" if torch.cuda.is_available() else "cpu"
def chunks(list_of_elements, batch_size):
"""
Yield successive batch-sized chunks form list_of_elements.
"""
for i in range(0, len(list_of_elements), batch_size):
yield list_of_elements[i : i + batch_size]
def evaluate_summaries_pegasus(dataset, metric, model, tokenizer,
batch_size=16, device=device,
column_text="text",
column_summary="title"):
abstract_batches = list(chunks(dataset[column_text], batch_size))
target_batches = list(chunks(dataset[column_summary], batch_size))
for abstract_batch, target_batch in tqdm(
zip(abstract_batches, target_batches), total=len(abstract_batches)):
inputs = tokenizer(abstract_batch, max_length=1024, truncation=True,
padding="max_length", return_tensors="pt")
summaries = model.generate(input_ids=inputs["input_ids"].to(device),
attention_mask=inputs["attention_mask"].to(device),
length_penalty=0.8, num_beams=8, max_length=128)
decoded_summaries = [tokenizer.decode(s, skip_special_tokens=True,
clean_up_tokenization_spaces=True)
for s in summaries]
decoded_summaries = [d.replace("<n>", " ") for d in decoded_summaries]
metric.add_batch(predictions=decoded_summaries,
references=target_batch)
score = metric.compute()
return score
tokenizer = AutoTokenizer.from_pretrained("google/pegasus-xsum")
pegasus_xsum_model = AutoModelForSeq2SeqLM.from_pretrained("google/pegasus-xsum").to(device)
score_pegasus_xsum = evaluate_summaries_pegasus(test_sampled, rouge_metric,
pegasus_xsum_model, tokenizer, batch_size=4)
rouge_dict_pegasus_xsum = dict((rn, score_pegasus_xsum[rn].mid.fmeasure) for rn in rouge_names)
pegasus_xsum_sample = pd.DataFrame(rouge_dict_pegasus_xsum, index=["pegasus_xsum_sample"])
The problem arises in chapter:
huggingface-cli repo create --type dataset --organization transformersbook codeparrot-train
Instead, I got this error:
403 Client Error: Forbidden for url: https://huggingface.co/api/repos/create (Request ID: hsKOgZ5w0izNyONWMl-wl) - You don't have the rights to create a dataset under this namespace {"error":"You don't have the rights to create a dataset under this namespace"}
Wondering if anyone else has encountered this issue.
where can we download the PDF of the book, thanks
The problem arises in chapter:
The utility function tag_text
references xlmr_tokenizer
instead of tokeniser
(code does not break as xlmr_tokenizer
is a global variable in the notebook, but I believe it is not what the Authors had in mind).
Permalink >>> here
Code should be:
def tag_text(text, tags, model, tokenizer):
# Get tokens with special characters
tokens = tokenizer(text).tokens()
### CHANGE HERE ###
# Encode the sequence into IDs
input_ids = tokenizer(text, return_tensors="pt").input_ids.to(device)
### CHANGE ENDS ###
# Get predictions as distribution over 7 possible classes
outputs = model(input_ids)[0]
# Take argmax to get most likely class per token
predictions = torch.argmax(outputs, dim=2)
# Convert to DataFrame
preds = [tags.names[p] for p in predictions[0].cpu().numpy()]
return pd.DataFrame([tokens, preds], index=["Tokens", "Tags"])
In the notebook context, behaviour would not change. However, if not corrected, the tag_text
would throw an error if no xlmr_tokenizer
variable exists.
The problem arises in chapter:
TypeError: export() got an unexpected keyword argument 'use_external_data_format'
In the notebook from this repository, the same problem has generated a warning not a TypeError.
Steps to reproduce the behavior:
The problem arises in chapter:
After creating the conda env using environment.yml
, the local GPU cannot be detected as shown below.
No GPU was detected! This notebook can be very slow without a GPU 🐢
Using transformers v4.11.3
Using datasets v1.16.1
$ python
Python 3.9.7 (default, Sep 16 2021, 13:09:58)
[GCC 7.5.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
import torch
torch.cuda.is_available()
False
Steps to reproduce the behavior:
torch.cuda.is_available() should be true.
The problem arises in chapter:
When running 02_classification.ipynb on Kaggle with a P100 GPU I receive a RuntimeError
: CUDA out of memory after running cell 58:
#hide_output
emotions_hidden = emotions_encoded.map(extract_hidden_states, batched=True)
Steps to reproduce the behavior:
Stack trace (partially):
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
/tmp/ipykernel_34/3668832236.py in <module>
1 #hide_output
----> 2 emotions_hidden = emotions_encoded.map(extract_hidden_states, batched=True)
/opt/conda/lib/python3.7/site-packages/datasets/dataset_dict.py in map(self, function, with_indices, input_columns, batched, batch_size, remove_columns, keep_in_memory, load_from_cache_file, cache_file_names, writer_batch_size, features, disable_nullable, fn_kwargs, num_proc, desc)
502 desc=desc,
503 )
--> 504 for k, dataset in self.items()
505 }
506 )
...
...
...
/opt/conda/lib/python3.7/site-packages/torch/nn/modules/sparse.py in forward(self, input)
158 return F.embedding(
159 input, self.weight, self.padding_idx, self.max_norm,
--> 160 self.norm_type, self.scale_grad_by_freq, self.sparse)
161
162 def extra_repr(self) -> str:
/opt/conda/lib/python3.7/site-packages/torch/nn/functional.py in embedding(input, weight, padding_idx, max_norm, norm_type, scale_grad_by_freq, sparse)
2041 # remove once script supports set_grad_enabled
2042 _no_grad_embedding_renorm_(weight, input, max_norm, norm_type)
-> 2043 return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
2044
2045
RuntimeError: CUDA out of memory. Tried to allocate 256.00 MiB (GPU 0; 15.90 GiB total capacity; 512.05 MiB already allocated; 167.75 MiB free; 530.00 MiB reserved in total by PyTorch)
Complete stack trace:
Stacktrace_RuntimeError_ch2_NLP_Transformers.txt
GPU (nvidia-smi):
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
- Avoid using `tokenizers` before the fork if possible
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Tue Mar 15 13:44:03 2022
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 450.119.04 Driver Version: 450.119.04 CUDA Version: 11.0 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Tesla P100-PCIE... Off | 00000000:00:04.0 Off | 0 |
| N/A 41C P0 35W / 250W | 16113MiB / 16280MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
+-----------------------------------------------------------------------------+
As metioned in the REAMDE.md I would have expected the P100 with its 16 GB to have enough gpu memory for the code being run without issues. I also tried to free up some cache with torch.cuda.empty_cache()
but it did not suffice.
The problem arises in chapter:
When fine-tuning the model in Google Colab it throws the following error:
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument index in method wrapper__index_select)
Note that the notebook runs successfully in Kaggle.
Steps to reproduce the behavior:
RuntimeError Traceback (most recent call last)
<ipython-input-66-55916e0ed5b3> in <module>()
6 eval_dataset=emotions_encoded["validation"],
7 tokenizer=tokenizer)
----> 8 trainer.train();
11 frames
/usr/local/lib/python3.7/dist-packages/torch/nn/functional.py in embedding(input, weight, padding_idx, max_norm, norm_type, scale_grad_by_freq, sparse)
2181 # remove once script supports set_grad_enabled
2182 _no_grad_embedding_renorm_(weight, input, max_norm, norm_type)
-> 2183 return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
2184
2185
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument index in method wrapper__index_select)
The finetuning completes successfully.
The problem arises in chapter:
original_input_ids and masked_input_ids not defined.
In fact they correspond respectively to: inputs["input_ids"][0] and outputs["input_ids"][0]
Steps to reproduce the behavior:
pd.DataFrame({
"Original tokens": tokenizer.convert_ids_to_tokens(inputs["input_ids"][0]),
"Masked tokens": tokenizer.convert_ids_to_tokens(outputs["input_ids"][0]),
"Original input_ids": inputs["input_ids"][0],
"Masked input_ids": outputs["input_ids"][0],
"Labels": outputs["labels"][0]}
).T
The problem arises in chapter:
This error appears when trying to initalize training_args
AttributeError Traceback (most recent call last)
Input In [69], in <cell line: 6>()
4 logging_steps = len(emotions_encoded["train"]) // batch_size
5 model_name = f"{model_ckpt}-finetuned-emotion"
----> 6 training_args = TrainingArguments(output_dir=model_name,
7 num_train_epochs=2,
8 learning_rate=2e-5,
9 per_device_train_batch_size=batch_size,
10 per_device_eval_batch_size=batch_size,
11 weight_decay=0.01,
12 evaluation_strategy="epoch",
13 disable_tqdm=False,
14 logging_steps=logging_steps,
15 push_to_hub=False,
16 log_level="error")
File :91, in init(self, output_dir, overwrite_output_dir, do_train, do_eval, do_predict, evaluation_strategy, prediction_loss_only, per_device_train_batch_size, per_device_eval_batch_size, per_gpu_train_batch_size, per_gpu_eval_batch_size, gradient_accumulation_steps, eval_accumulation_steps, eval_delay, learning_rate, weight_decay, adam_beta1, adam_beta2, adam_epsilon, max_grad_norm, num_train_epochs, max_steps, lr_scheduler_type, warmup_ratio, warmup_steps, log_level, log_level_replica, log_on_each_node, logging_dir, logging_strategy, logging_first_step, logging_steps, logging_nan_inf_filter, save_strategy, save_steps, save_total_limit, save_on_each_node, no_cuda, seed, data_seed, bf16, fp16, fp16_opt_level, half_precision_backend, bf16_full_eval, fp16_full_eval, tf32, local_rank, xpu_backend, tpu_num_cores, tpu_metrics_debug, debug, dataloader_drop_last, eval_steps, dataloader_num_workers, past_index, run_name, disable_tqdm, remove_unused_columns, label_names, load_best_model_at_end, metric_for_best_model, greater_is_better, ignore_data_skip, sharded_ddp, deepspeed, label_smoothing_factor, optim, adafactor, group_by_length, length_column_name, report_to, ddp_find_unused_parameters, ddp_bucket_cap_mb, dataloader_pin_memory, skip_memory_metrics, use_legacy_prediction_loop, push_to_hub, resume_from_checkpoint, hub_model_id, hub_strategy, hub_token, gradient_checkpointing, fp16_backend, push_to_hub_model_id, push_to_hub_organization, push_to_hub_token, mp_parameters)
File ~/miniforge3/envs/TFM006/lib/python3.10/site-packages/transformers/training_args.py:865, in TrainingArguments.post_init(self)
857 warnings.warn(
858 "--adafactor
is deprecated and will be removed in version 5 of 🤗 Transformers. Use --optim adafactor
instead",
859 FutureWarning,
860 )
861 self.optim = OptimizerNames.ADAFACTOR
863 if (
864 is_torch_available()
--> 865 and (self.device.type != "cuda")
866 and not (self.device.type == "xla" and "GPU_NUM_DEVICES" in os.environ)
867 and (self.fp16 or self.fp16_full_eval or self.bf16 or self.bf16_full_eval)
868 ):
869 raise ValueError(
870 "Mixed precision training with AMP or APEX (--fp16
or --bf16
) and half precision evaluation (--fp16_full_eval
or --bf16_full_eval
) can only be used on CUDA devices."
871 )
873 if is_torch_available() and self.tf32 is not None:
File ~/miniforge3/envs/TFM006/lib/python3.10/site-packages/transformers/utils/import_utils.py:781, in torch_required..wrapper(*args, **kwargs)
778 @wraps(func)
779 def wrapper(*args, **kwargs):
780 if is_torch_available():
--> 781 return func(*args, **kwargs)
782 else:
783 raise ImportError(f"Method {func.__name__}
requires PyTorch.")
File ~/miniforge3/envs/TFM006/lib/python3.10/site-packages/transformers/training_args.py:1099, in TrainingArguments.device(self)
1093 @Property
1094 @torch_required
1095 def device(self) -> "torch.device":
1096 """
1097 The device used by this process.
1098 """
-> 1099 return self._setup_devices
File ~/miniforge3/envs/TFM006/lib/python3.10/site-packages/transformers/utils/generic.py:48, in cached_property.get(self, obj, objtype)
46 cached = getattr(obj, attr, None)
47 if cached is None:
---> 48 cached = self.fget(obj)
49 setattr(obj, attr, cached)
50 return cached
File ~/miniforge3/envs/TFM006/lib/python3.10/site-packages/transformers/utils/import_utils.py:781, in torch_required..wrapper(*args, **kwargs)
778 @wraps(func)
779 def wrapper(*args, **kwargs):
780 if is_torch_available():
--> 781 return func(*args, **kwargs)
782 else:
783 raise ImportError(f"Method {func.__name__}
requires PyTorch.")
File ~/miniforge3/envs/TFM006/lib/python3.10/site-packages/transformers/training_args.py:1024, in TrainingArguments._setup_devices(self)
1020 @cached_property
1021 @torch_required
1022 def _setup_devices(self) -> "torch.device":
1023 logger.info("PyTorch: setting up devices")
-> 1024 if torch.distributed.is_initialized() and self.local_rank == -1:
1025 logger.warning(
1026 "torch.distributed process group is initialized, but local_rank == -1. "
1027 "In order to use Torch DDP, launch your script with `python -m torch.distributed.launch"
1028 )
1029 if self.no_cuda:
AttributeError: module 'torch.distributed' has no attribute 'is_initialized'
Steps to reproduce the behavior:
*note: the notebook is running on Mac M1 on CPU
from transformers import Trainer, TrainingArguments
batch_size = 64
logging_steps = len(emotions_encoded["train"]) // batch_size
model_name = f"{model_ckpt}-finetuned-emotion"
training_args = TrainingArguments(output_dir=model_name,
num_train_epochs=2,
learning_rate=2e-5,
per_device_train_batch_size=batch_size,
per_device_eval_batch_size=batch_size,
weight_decay=0.01,
evaluation_strategy="epoch",
disable_tqdm=False,
logging_steps=logging_steps,
push_to_hub=False,
log_level="error")
The problem arises in chapter:
Running the chapter 8 code on Colab, I got the following error when training distilbert
distrilbert_trainer.train()
Error:
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cpu and cuda:0
Steps to reproduce the behavior:
No error
Modifying the initial command:
install_requirements(is_chapter2 = True) solved the issue so it seems to be related to the transformers library version 4.11 vs 4.13
The problem arises in chapter:
Chapter 7 doesn't run on SM Notebook instance
Steps to reproduce the behavior:
07_question-answering.ipynb
The notebook runs successfully without errors.
Several minor fixes are required to get the notebook running on SM. I have fixed all of them, see this fork. However, I'm hesitant to create a PR for it as I'm not sure whether this will create issues for other environments (e.g. Colab, etc).
How would you like to incorporate these fixes into your repo, if at all?
The question or comment is about chapter:
Has someone also noticed that the nlpaug naw.ContextualWordEmbsAug(model_path="distilbert-base-uncased", device="cuda", action="substitute") is extremely slow even on "cuda" device?
I have a decent computer Xeon E5, 64 GB, RTX Titan 24 Gbs so I cannot understand why this module is so slow.
Thanks
Best regards
Jerome
The problem arises in chapter:
Located in 04_multilingual-ner.ipynb under "Loading a Custom Model" sub heading. Code notebook below:
preds = [tags.names[p] for p in predictions[0].cpu().numpy()]
pd.DataFrame([xlmr_tokens, preds], index=["Tokens", "Tags"])
My output:
Tags B-ORG I-ORG I-ORG I-ORG B-ORG I-ORG B-ORG B-ORG I-ORG I-PER
Steps to reproduce the behavior:
Tags O I-LOC B-LOC B-LOC O I-LOC O O I-LOC B-LOC
Seems to be OK on Kaggle.
The problem arises in chapter:
As flagged by Julian Risch from deepset, there's a small bug when creating the labels for the document store because multiple labels (ie duplicates) are being initialised with the same ID:
from haystack import Label
labels = []
for i, row in dfs["test"].iterrows():
# Metadata used for filtering in the Retriever
meta = {"item_id": row["title"], "question_id": row["id"]}
# Populate labels for questions with answers
if len(row["answers.text"]):
for answer in row["answers.text"]:
label = Label(
question=row["question"], answer=answer, id=i, origin=row["id"],
meta=meta, is_correct_answer=True, is_correct_document=True,
no_answer=False)
labels.append(label)
# Populate labels for questions without answers
else:
label = Label(
question=row["question"], answer="", id=i, origin=row["id"],
meta=meta, is_correct_answer=True, is_correct_document=True,
no_answer=True)
labels.append(label)
The problem arises in chapter:
Running the following code:
import tensorflow as tf
tf_model.compile(
optimizer=tf.keras.optimizers.Adam(learning_rate=5e-5),
loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
metrics=tf.metrics.SparseCategoricalAccuracy())
tf_model.fit(
x=tf_train_dataset,
y=None,
validation_data=tf_eval_dataset,
batch_size=batch_size,
epochs=1
)
I get this error
AttributeError: module 'keras.engine.data_adapter' has no attribute 'expand_1d'
and searching online suggests it should have been fixed in tensorflow
huggingface/transformers#20750
Steps to reproduce the behavior:
full stack track
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
Cell In[89], line 8
1 import tensorflow as tf
3 tf_model.compile(
4 optimizer=tf.keras.optimizers.Adam(learning_rate=5e-5),
5 loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
6 metrics=tf.metrics.SparseCategoricalAccuracy())
----> 8 tf_model.fit(
9 x=tf_train_dataset,
10 y=None,
11 validation_data=tf_eval_dataset,
12 batch_size=batch_size,
13 epochs=1
14 )
File ~/.local/share/virtualenvs/nlp_transformers-keTZnTdD/lib/python3.10/site-packages/keras/utils/traceback_utils.py:70, in filter_traceback.<locals>.error_handler(*args, **kwargs)
67 filtered_tb = _process_traceback_frames(e.__traceback__)
68 # To get the full stack trace, call:
69 # `tf.debugging.disable_traceback_filtering()`
---> 70 raise e.with_traceback(filtered_tb) from None
71 finally:
72 del filtered_tb
File /var/folders/hd/csbqkrzd3s95b5c5hfv4cp8c0000gn/T/__autograph_generated_filerzz7novs.py:15, in outer_factory.<locals>.inner_factory.<locals>.tf__train_function(iterator)
13 try:
14 do_return = True
---> 15 retval_ = ag__.converted_call(ag__.ld(step_function), (ag__.ld(self), ag__.ld(iterator)), None, fscope)
16 except:
17 do_return = False
File ~/.local/share/virtualenvs/nlp_transformers-keTZnTdD/lib/python3.10/site-packages/transformers/modeling_tf_utils.py:1476, in TFPreTrainedModel.train_step(self, data)
1474 output_to_label = {val: key for key, val in label_to_output.items()}
1475 if not self._using_dummy_loss:
-> 1476 data = data_adapter.expand_1d(data)
1477 x, y, sample_weight = data_adapter.unpack_x_y_sample_weight(data)
1478 # If the inputs are mutable dictionaries, make a shallow copy of them because we will modify
1479 # them during input/label pre-processing. This avoids surprising the user by wrecking their data.
1480 # In addition, modifying mutable Python inputs makes XLA compilation impossible.
AttributeError: in user code:
File "/Users/samueljoseph/.local/share/virtualenvs/nlp_transformers-keTZnTdD/lib/python3.10/site-packages/keras/engine/training.py", line 1249, in train_function *
return step_function(self, iterator)
File "/Users/samueljoseph/.local/share/virtualenvs/nlp_transformers-keTZnTdD/lib/python3.10/site-packages/keras/engine/training.py", line 1233, in step_function **
outputs = model.distribute_strategy.run(run_step, args=(data,))
File "/Users/samueljoseph/.local/share/virtualenvs/nlp_transformers-keTZnTdD/lib/python3.10/site-packages/keras/engine/training.py", line 1222, in run_step **
outputs = model.train_step(data)
File "/Users/samueljoseph/.local/share/virtualenvs/nlp_transformers-keTZnTdD/lib/python3.10/site-packages/transformers/modeling_tf_utils.py", line 1476, in train_step
data = data_adapter.expand_1d(data)
AttributeError: module 'keras.engine.data_adapter' has no attribute 'expand_1d'
that the code example runs without errors
The problem arises in chapter:
Steps to reproduce the behavior:
Hello,
There is a missing image in the Images folder for "Transformer Anatomy".
The problem arises in chapter:
The image given in the file path below does not exist in the Images folder.
from IPython.display import Image
Image(filename="images/chapter03_bertviz-neuron-light.png")
Please could you put this image into the Images folder?
Thanks.
The problem arises in chapter:
Cannot push datasets .
Steps to reproduce the behavior:
$ git push
batch response: Authorization error. B | 0 B/s
error: failed to push some refs to 'https://huggingface.co/datasets/Shuchen/codeparrot-valid'
Authorization error happens here, but I have logged in successfully.
$ huggingface-cli login
_| _| _| _| _|_|_| _|_|_| _|_|_| _| _| _|_|_| _|_|_|_| _|_| _|_|_| _|_|_|_|
_| _| _| _| _| _| _| _|_| _| _| _| _| _| _| _|
_|_|_|_| _| _| _| _|_| _| _|_| _| _| _| _| _| _|_| _|_|_| _|_|_|_| _| _|_|_|
_| _| _| _| _| _| _| _| _| _| _|_| _| _| _| _| _| _| _|
_| _| _|_| _|_|_| _|_|_| _|_|_| _| _| _|_|_| _| _| _| _|_|_| _|_|_|_|
To login, `huggingface_hub` now requires a token generated from https://huggingface.co/settings/token.
(Deprecated, will be removed in v0.3.0) To login with username and password instead, interrupt with Ctrl+C.
Token:
Login successful
Your token has been saved to /home/shuchen/.huggingface/token
Anybody can help?
v11 of cudatoolkit is not available for osx-64. Possibly removed by Nvidia?
https://anaconda.org/anaconda/cudatoolkit
https://developer.nvidia.com/nvidia-cuda-toolkit-11_6_0-developer-tools-mac-hosts
I switch to v9 to get the environment to build, but the GPU is not detected and the kernel crashes running the first command of the first introduction notebook.
#hide
from utils import *
setup_chapter()
Steps to reproduce the behavior:
torch.cuda.is_available()
False
exit()
environment file works as is, GPU is available, kernel doesn't crash on the first command.
The question or comment is about chapter:
On p272, the authors said "we'll focus on using synonym replacement.....", but on p273 example was given using naw.ContexualWordEmbsAug() instead of naw.SynonymAug(). Can the authors explain the difference?
The problem arises in chapter:
Steps to reproduce the behavior:
Run the notebook on a CUDA/GPU enabled device- A100 card
trainer = Trainer(model=model,
args=training_args,
tokenizer=tokenizer, data_collator=seq2seq_data_collator,
train_dataset=dataset_samsum_pt["train"],
eval_dataset=dataset_samsum_pt["validation"])
Trainer() fails withwith the following error:
Traceback (most recent call last):
File "/home/shabnam/anaconda3/envs/rapids-22.08/lib/python3.9/site-packages/IPython/core/interactiveshell.py", line 3378, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "", line 1, in
trainer = Trainer(model=model, args=training_args,
File "/home/shabnam/anaconda3/envs/rapids-22.08/lib/python3.9/site-packages/transformers/trainer.py", line 450, in init
self._move_model_to_device(model, args.device)
File "/home/shabnam/anaconda3/envs/rapids-22.08/lib/python3.9/site-packages/transformers/trainer.py", line 722, in _move_model_to_device
model = model.to(device)
AttributeError: 'str' object has no attribute 'to'
Expected behavior:
training ...
The problem arises in chapter:
When loading emotion dataset by calling load_dataset("emotion")
, an exception throwed. It seems the files were removed from dropbox.
Screenshot👇
load_dataset("emotion")` run successfully
The problem arises in chapter:
RuntimeError: CUDA out of memory.
Steps to reproduce the behavior:
So do you have a solution to deal with the CUDA OOM problem in Jupyter notebook?
Tensorflow version of code is sometimes incompatible.
In case someone makes the transformation, if grateful can share the link too
Thanks
Hi
I am trying to run Chapter 7 to learn about Haystack for QA:
I am using Jupyter Notebook which is connected to my GCP VM : Debian GNU/Linux 9, Tesla V100
I did install debian version
but facing the following error : ConnectionError: Initial connection to Elasticsearch failed. Make sure you run an Elasticsearch instance at [{'host': 'localhost', 'port': 9200}]
and that it has finished the initial ramp up (can take > 30s).
!wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-8.1.2-linux-x86_64.tar.gz
!wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-8.1.2-linux-x86_64.tar.gz.sha512
!shasum -a 512 -c elasticsearch-8.1.2-linux-x86_64.tar.gz.sha512
!tar -xzf elasticsearch-8.1.2-linux-x86_64.tar.gz
!cd elasticsearch-8.1.2/
!pip install pymilvus
import pymilvus
import os
from subprocess import Popen, PIPE, STDOUT
!chown -R daemon:daemon elasticsearch-8.1.2
es_server = Popen(args=['elasticsearch-8.1.2/bin/elasticsearch'])
!sleep 30
from haystack.document_store.elasticsearch import ElasticsearchDocumentStore
#document_store = ElasticsearchDocumentStore(host='localhost', port= 9201, username='', password='')
document_store = ElasticsearchDocumentStore(return_embedding=True)
I would appreciate your support on this.
Shabnam
The question or comment is about chapter:
The book shows a really interesting example of getting the loss returned along with the predicted class probability, in the "Error Analysis" section of chapter 2:
Before moving on, we should investigate our model’s predictions a little bit further. A simple yet powerful technique is to sort the validation samples by the model loss. When we pass the label during the forward pass, the loss is automatically calculated and returned. Here’s a function that returns the loss along with the predicted label:
from torch.nn.functional import cross_entropy
def forward_pass_with_label(batch):
# Place all input tensors on the same device as the model
inputs = {k:v.to(device) for k,v in batch.items()
if k in tokenizer.model_input_names}
with torch.no_grad():
output = model(**inputs)
pred_label = torch.argmax(output.logits, axis=-1)
loss = cross_entropy(output.logits, batch["label"].to(device),
reduction="none")
# Place outputs on CPU for compatibility with other dataset columns
return {"loss": loss.cpu().numpy(),
"predicted_label": pred_label.cpu().numpy()}
Does anyone have any idea how to do similar for a tensorflow based approach?
I've been reading the documentation for the TF model predict
function, but can't immediately see anything that would correspond to the same https://www.tensorflow.org/api_docs/python/tf/keras/Model#predict although to be honest, I'm not really quite following what the example code is doing ...
is it taking the validation dataset, and re-predicting the output for each item in it, and then calculating the loss as cross entroy as function of the output logits and what the correct label should have been ...?
so to do that with TF I'd need to take the validation set and do something similar ...?
The problem arises in chapter:
As per huggingface/datasets#3830, trying to load the dataset fails. This change is still not in the latest release, but will likely need some update
Steps to reproduce the behavior:
Just run
load_dataset("cnn_dailymail", '3.0.0')
For exact error message, see linked issue
The problem arises in chapter:
Error when importing EvalDocuments
ImportError: cannot import name 'EvalDocuments' from 'haystack.modeling.evaluation.eval' (unknown location)
Steps to reproduce the behavior:
pip install farm-haystack
from haystack.eval import EvalDocuments
Complete import EvalDocuments without error
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.