I am getting Index out of Range error in tokenization.py whe

I'm not sure what the python is doing but the vocabulary is not simply a text f

Seems like this happens because of ' ' lines in vocab file, <a href="https://github.c

Thanks <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-u

Thanks <a class="user-mention notranslate" data-hovercard-type="user" dat

Thanks <a class="user-mention notranslate" data-hovercard-ty

Index Out of Range Error in tokenization using TF Hub for Pretrained Albert Models about albert HOT 9 CLOSED

google-research commented on August 25, 2024 4

Index Out of Range Error in tokenization using TF Hub for Pretrained Albert Models

from albert.

Comments (9)

KodairaTomonori commented on August 25, 2024 5

'30k-clean.model' is 'spm_model_file', not 'vocab_file'.
The code changes are:

diff --git a/albert/run_classifier_with_tfhub.py b/albert/run_classifier_with_tfhub.py
index 92fef74..26f4339 100644                                                         
--- a/albert/run_classifier_with_tfhub.py                                             
+++ b/albert/run_classifier_with_tfhub.py                                             
@@ -156,6 +156,7 @@ def create_tokenizer_from_hub_module(albert_hub_module_handle):   
     with tf.Session() as sess:                                                       
       vocab_file, do_lower_case = sess.run([tokenization_info["vocab_file"],         
                                             tokenization_info["do_lower_case"]])     
+    FLAGS.spm_model_file = vocab_file                                                
   return tokenization.FullTokenizer(                                                 
       vocab_file=vocab_file, do_lower_case=do_lower_case,                            
       spm_model_file=FLAGS.spm_model_file)

from albert.

eaplatanios commented on August 25, 2024 1

I'm not sure what the python script is doing but the vocabulary is not simply a text file. It's a SentencePiece model file (which is basically a protobuf-serialized file) and must be deserialized using either protobuf directly or the SentencePiece python API.

from albert.

mnsrmov commented on August 25, 2024 1

Upgrade your tensorflow to at least 1.15.

from albert.

LysandreJik commented on August 25, 2024

I am getting the same errors on all the available TF hub models.

from albert.

PavelKovalets commented on August 25, 2024

Seems like this happens because of '\n' lines in vocab file, similar case described here. But it is not clear how to handle this, maybe the files are corrupted?

from albert.

YujiOshima commented on August 25, 2024

Thanks @eaplatanios @KodairaTomonori .
I have resolved Index out of Range error but I got another error like below.

LookupError: No gradient defined for operation 'module_apply_tokens/bert/encoder/transformer/group_0_23/layer_23/inner_group_0/LayerNorm_1/batchnorm/add_1' (op type: AddV2)

Do you have any ideas? Thanks!

from albert.

chiragsanghvi10 commented on August 25, 2024

Thanks @eaplatanios @KodairaTomonori .
I have resolved Index out of Range error but I got another error like below.
LookupError: No gradient defined for operation 'module_apply_tokens/bert/encoder/transformer/group_0_23/layer_23/inner_group_0/LayerNorm_1/batchnorm/add_1' (op type: AddV2)
Do you have any ideas? Thanks!

@YujiOshima Hi, I am still facing the same issue, can you help me with how did you resolve the issue?

from albert.

mnsrmov commented on August 25, 2024

Thanks @eaplatanios @KodairaTomonori .
I have resolved Index out of Range error but I got another error like below.
LookupError: No gradient defined for operation 'module_apply_tokens/bert/encoder/transformer/group_0_23/layer_23/inner_group_0/LayerNorm_1/batchnorm/add_1' (op type: AddV2)
Do you have any ideas? Thanks!
@YujiOshima Hi, I am still facing the same issue, can you help me with how did you resolve the issue?

For version one, if you use the latest tensorflow 1 it'll fix both problems. However, I have not found a sensible solution for version 2 yet.

from albert.

0x0539 commented on August 25, 2024

We have fixed the issue with the v2 ALBERT modules on TF-Hub
We also updated the README.md with some details about --spm_model_file
Please let us know if you have other issues!

from albert.

Recommend Projects

Index Out of Range Error in tokenization using TF Hub for Pretrained Albert Models about albert HOT 9 CLOSED

Comments (9)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent