🖥 Benchmarking transformers Hi t

question about the max seq length about ltp HOT 2 OPEN

kssteven418 commented on June 14, 2024

question about the max seq length

from ltp.

Comments (2)

kssteven418 commented on June 14, 2024

Hi,

Yes, the pre-trained models like BERT and RoBERTa cannot be finetuned using longer sequence lengths than the maximum sequence length that they were pre-trained on as it will violate the pre-defined positional embedding rules, etc. This is why it is prevented from the huggingface implementation and errors out when attempts to increase max-seq-length above 512. Since the current version of LTP is implemented on top of RoBERTa model as a baseline, and has the same issue.

The setting of sequence length 1024 you found in the paper (in section A.2, probably?) is to demo the effect of long sequence lengths on processing latency and did not require a pre-trained checkpoint.

The possible workaround that I would suggest is to find a checkpoint that has been trained with longer sequence lengths (there might be some models specialized in processing long documents) and extend/migrate the LTP implementation to that model class.

Hope this helps answer your question.

from ltp.

XueqiYang commented on June 14, 2024

Hi Sehoon,
Thanks for your reply. It's very helpful! I recently came cross several works which have addressed the 512 token length limitation, e.g., longformer and bigbird, which leverage the memory requirement by modifying the attention mechanism. Those models are pretrained on max_sequence_length over 512. I'm concerned that if LTP implementation can be extended/migrated to those models without pre-training.

And another quick question about your token pruning implementation. Is it possible to reconstruct the pruned tokens as demonstrated in Figure. 2 in the paper and make the final pruning result interpretable for the downstream task? Thanks!

from ltp.

Recommend Projects

question about the max seq length about ltp HOT 2 OPEN

Comments (2)

Related Issues (10)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent