Giter VIP home page Giter VIP logo

Comments (2)

kssteven418 avatar kssteven418 commented on June 14, 2024

Hi,

Yes, the pre-trained models like BERT and RoBERTa cannot be finetuned using longer sequence lengths than the maximum sequence length that they were pre-trained on as it will violate the pre-defined positional embedding rules, etc. This is why it is prevented from the huggingface implementation and errors out when attempts to increase max-seq-length above 512. Since the current version of LTP is implemented on top of RoBERTa model as a baseline, and has the same issue.

The setting of sequence length 1024 you found in the paper (in section A.2, probably?) is to demo the effect of long sequence lengths on processing latency and did not require a pre-trained checkpoint.

The possible workaround that I would suggest is to find a checkpoint that has been trained with longer sequence lengths (there might be some models specialized in processing long documents) and extend/migrate the LTP implementation to that model class.

Hope this helps answer your question.

from ltp.

XueqiYang avatar XueqiYang commented on June 14, 2024

Hi Sehoon,
Thanks for your reply. It's very helpful! I recently came cross several works which have addressed the 512 token length limitation, e.g., longformer and bigbird, which leverage the memory requirement by modifying the attention mechanism. Those models are pretrained on max_sequence_length over 512. I'm concerned that if LTP implementation can be extended/migrated to those models without pre-training.

And another quick question about your token pruning implementation. Is it possible to reconstruct the pruned tokens as demonstrated in Figure. 2 in the paper and make the final pruning result interpretable for the downstream task? Thanks!

from ltp.

Related Issues (10)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.