microsoft / fastformers Goto Github PK
View Code? Open in Web Editor NEWFastFormers - highly efficient transformer models for NLU
License: Other
FastFormers - highly efficient transformer models for NLU
License: Other
model is bert_base
When i do prune, set num_heads to 8.
Before prune, shape of key(value) weight is [768, 768]
After prune, shape of key(value) weight became [512, 768]
so, saved weights can't be loaded by transformers
Fastformers is already impressive as is but a new paper has just been released: ZORB
And it aims to become an alternative to... Backpropagation (you read this right)
Such a breakthrough allow ~300X additional speedup to fastformers (even more in theory) while only diminishing accuracy of a few percents on many cases and actually apparently outperforming BP with a lower error count?
Anyway it's a huge performance breakthrough and should be popularized by fastformers and huggingface.
https://paperswithcode.com/paper/zorb-a-derivative-free-backpropagation
Notes:
Extensive testing of the variable accuracy loss would be welcome.
Some activations functions needs to be adapted? (e.g Mish?)
Hello!
I was reading your paper and was looking in the HF repo and found huggingface/transformers#8083 where it appeared that you were discussing adding your functionality to their library, however that never happened, so I am curious if you discovered something that prohibited adding this functionality. Thanks!
Model I am using (Bert, XLNet ...): BERT
Language I am using the model on (English, Chinese ...): English
The problem arises when using:
The tasks I am working on is:
Steps to reproduce the behavior:
python examples\fastformers\run_superglue.py --model_type bert --model_name_or_path ..\model\fastformers\teacher_model\teacher-bert-base --task_name BoolQ --output_dir .\out --do_eval --data_dir ..\dataset\fastformers\BoolQ --per_instance_eval_batch_size 1 --use_fixed_seq_length --do_lower_case --max_seq_length 512 --no_cuda
get error below:
FileNotFoundError: [Errno 2] No such file or directory: '..\dataset\fastformers\BoolQ\tensors_dev_..\model\fastformers\teacher_model\teacher-bert-base_512_boolq_True'
transformers
version: 2.11.0Congratulations on the work, it looks amazing π
If there is an already fine-tuned model from Hugging Face for, let's say, generating question-answer pairs such as valhalla/t5-base-qa-qg-hl, how could it be further optimized for inference using your method? I'm a bit lost
Thank you in advance!
The README mentions Intel cpus are required because of the necessity of AVX256 support.
First of all AMD cpu supports AVX 256 since a Long time (jaguar which predate zen).
True AVX 256 support (not only being compatible but being twice as fast as AVX 128) came one year ago with ZEN 2 cpus.
Zen 3 cpus are now being released and are the fastest cpus in the world and any Deep learning researchers should have them and be compatible with them, period.
Thank you very much for your work. I would like to ask where the code corresponding to fastformer is and intend to use it in visual tasks.
Model I am using (Bert, XLNet ...): student-4L-312
Language I am using the model on (English, Chinese ...):English
The problem arises when using:
The tasks I am working on is:
Steps to reproduce the behavior:
python examples/fastformers/run_superglue.py --model_type bert --model_name_or_path ../model/fastformers/student_model/student-4L-312 --task_name BoolQ --output_dir ./out --do_eval --data_dir ../dataset/fastformers/BoolQ --per_instance_eval_batch_size 1 --do_lower_case --max_seq_length 512 --use_onnxrt --no_cuda
error message:
Traceback (most recent call last):
File "examples/fastformers/run_superglue.py", line 1901, in <module>
main()
File "examples/fastformers/run_superglue.py", line 1840, in main
from onnxruntime import ExecutionMode, InferenceSession, SessionOptions
File "/home/username/.local/lib/python3.6/site-packages/onnxruntime/__init__.py", line 13, in <module>
from onnxruntime.capi._pybind_state import get_all_providers, get_available_providers, get_device, set_seed, \
ImportError: cannot import name 'get_all_providers'
transformers
version: transformers (2.11.0)Hi,
Currently this repo only supports fine-tuning for SuperGlue tasks, am I right? Are you going to enable fine-tuning for other tasks as, for example, a generic sequence classification problem?
I believe that fine-tuning on SuperGlue tasks only strongly limits the applicability of Fastformers
There are important files that Microsoft projects should all have that are not present in this repository. A pull request has been opened to add the missing file(s). When the pr is merged this issue will be closed automatically.
Microsoft teams can learn more about this effort and share feedback within the open source guidance available internally.
Hi, I followed the example and successfully converted the model to int8 on GPT2 model.
but the model generated with some Customized onnx op:
Shape,Gather,Range,Unsqueeze,Concat,Reshape,Add,LayerNormalization,DynamicQua β
β ntizeLinear,Slice,Mul,MatMulInteger,Squeeze,Cast,Split,Sub,Transpose,MatMul,Pow,Div,Wh β
β ere,Softmax,FastGelu,SkipLayerNormalization
Such as DynamicQuantizeLinear
, FastGelu
, how to converted it to tensorrt?
the int8 model was 400M compares with original 1.2G. much more small, if can inference via tensorrt, it can be massively accelerated.
Hello and so happy to see you use Pytorch-Lightning! π
Just wondering if you already heard about quite the new Pytorch Lightning (PL) ecosystem CI where we would like to invite you to... You can check out our blog post about it: Stay Ahead of Breaking Changes with the New Lightning Ecosystem CI β‘
As you use PL framework for your cool project, we would like to enhance your experience and offer you safe updates to our future releases. At this moment, you run tests with a particular PL version, but it may accidentally happen that the next version will be incompatible with your project... π We do not intend to change anything on our project side, but still here we have a solution - ecosystem CI with testing both - your and our latest development head we can find it very early and prevent releasing eventually bad version... π
What is needed to do?
What will you get?
cc: @Borda
source: https://github.com/microsoft/fastformers/tree/main/examples/question-answering
python run_squad.py \
--model_type bert \
--model_name_or_path bert-base-uncased \
--do_train \
--do_eval \
--do_lower_case \
--train_file $SQUAD_DIR/train-v1.1.json \
--predict_file $SQUAD_DIR/dev-v1.1.json \
--per_gpu_train_batch_size 12 \
--learning_rate 3e-5 \
--num_train_epochs 2.0 \
--max_seq_length 384 \
--doc_stride 128 \
--output_dir /tmp/debug_squad/
gives
06/28/2021 15:44:54 - WARNING - transformers.tokenization_utils_base - Truncation was not explicitely activated but `max_length` is provided a specific value, please use `truncation=True` to explicitely truncate examples to max length. Defaulting to 'only_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you may want to check this is the right behavior.
06/28/2021 15:44:54 - WARNING - transformers.tokenization_utils_base - Truncation was not explicitely activated but `max_length` is provided a specific value, please use `truncation=True` to explicitely truncate examples to max length. Defaulting to 'only_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you may want to check this is the right behavior.
multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
File "/home/sdp/.pyenv/versions/3.7.10/lib/python3.7/multiprocessing/pool.py", line 121, in worker
result = (True, func(*args, **kwds))
File "/home/sdp/.pyenv/versions/3.7.10/lib/python3.7/multiprocessing/pool.py", line 44, in mapstar
return list(map(*args))
File "/home/sdp/.pyenv/versions/3.7.10/lib/python3.7/site-packages/transformers/data/processors/squad.py", line 142, in squad_convert_example_to_features
return_token_type_ids=True,
File "/home/sdp/.pyenv/versions/3.7.10/lib/python3.7/site-packages/transformers/tokenization_utils_base.py", line 1521, in encode_plus
**kwargs,
File "/home/sdp/.pyenv/versions/3.7.10/lib/python3.7/site-packages/transformers/tokenization_utils.py", line 372, in _encode_plus
verbose=verbose,
File "/home/sdp/.pyenv/versions/3.7.10/lib/python3.7/site-packages/transformers/tokenization_utils.py", line 578, in _prepare_for_model
stride=stride,
File "/home/sdp/.pyenv/versions/3.7.10/lib/python3.7/site-packages/transformers/tokenization_utils.py", line 675, in truncate_sequences
assert len(ids) > num_tokens_to_remove
AssertionError
"""
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "run_squad.py", line 827, in <module>
main()
File "run_squad.py", line 765, in main
train_dataset = load_and_cache_examples(args, tokenizer, evaluate=False, output_examples=False)
File "run_squad.py", line 459, in load_and_cache_examples
threads=args.threads,
File "/home/sdp/.pyenv/versions/3.7.10/lib/python3.7/site-packages/transformers/data/processors/squad.py", line 331, in squad_convert_examples_to_features
disable=not tqdm_enabled,
File "/home/sdp/.pyenv/versions/3.7.10/lib/python3.7/site-packages/tqdm/std.py", line 1178, in __iter__
for obj in iterable:
File "/home/sdp/.pyenv/versions/3.7.10/lib/python3.7/multiprocessing/pool.py", line 325, in <genexpr>
return (item for chunk in result for item in chunk)
File "/home/sdp/.pyenv/versions/3.7.10/lib/python3.7/multiprocessing/pool.py", line 748, in next
raise value
AssertionError
/cc @ykim362
Hello,
First of all, thank you very much for open-sourcing this research - I expect it will have a large impact on helping bring Transformers to production!
I have a question about the results in Table 3 of your paper.
Is the distilled model with (4L, 312) obtained from task-agnostic or task-specific distillation? In Section 2 you state
Since we are experimenting with various NLU tasks, the capacity of the optimal student model that preserves accuracy may vary with varying level of taskβs difficulty. Therefore, we experiment with distilling various sized student models; then, we pick the smaller model among the distilled models that can offer higher accuracy than the original BERT model for each task.
and I could not tell from the codebase which approach you used to generate the numbers on Table 3.
Thank you!
XLnet is supported by Transformers and in 2020 still remains the pre trained language model with the most SOTA on paperswithcode.com (far beyond BERT, Albert & ci)
I am trying to convert the Roberta-large model to Fastformers. I am facing this issue with data files after preprocessing
runcate_sequences
assert len(ids) > num_tokens_to_remove
AssertionError
what did lead me to this error
A link to original question on Stack Overflow:
Hi,
Thanks for sharing this fantastic repository!
Could you provide the hyperparameters used when finetuning the superglue tasks?
For example, I noticed the num_train_epochs=10 and learning rate is 1e-5 in README.
What about the batch_size?
Does every task share the same hyperparameters?
Hello again @ykim362,
I'm trying to reproduce your distillation results from Section 2 of the FastFormers paper and I have a few questions I was hoping you could help with:
General_TinyBERT(Nlayer-Ddim)
or General_TinyBERT_v2(Nlayer-Ddim)
?>>> from transformers import AutoTokenizer
>>> tokenizer = AutoTokenizer.from_pretrained("huawei-noah/TinyBERT_General_4L_312D")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/lewtun/git/transformers/src/transformers/models/auto/tokenization_auto.py", line 345, in from_pretrained
config = AutoConfig.from_pretrained(pretrained_model_name_or_path, **kwargs)
File "/Users/lewtun/git/transformers/src/transformers/models/auto/configuration_auto.py", line 360, in from_pretrained
raise ValueError(
ValueError: Unrecognized model in huawei-noah/TinyBERT_General_4L_312D. Should have a `model_type` key in its config.json, or contain one of the following strings in its name: retribert, mt5, t5, mobilebert, distilbert, albert, bert-generation, camembert, xlm-roberta, pegasus, marian, mbart, mpnet, bart, blenderbot, reformer, longformer, roberta, deberta, flaubert, fsmt, squeezebert, bert, openai-gpt, gpt2, transfo-xl, xlnet, xlm-prophetnet, prophetnet, xlm, ctrl, electra, encoder-decoder, funnel, lxmert, dpr, layoutlm, rag, tapas
Did you have to do something special to load TinyBERT in your FastFormers experiments? Looking at your source code (link) it seems you use the standard from_pretrained
methods of the Transformers library, so I'm curious whether you encountered the same problem.
4. Did you use the data augmentation technique from TinyBERT (i.e. combine BERT with GloVe word embeddings) in your experiments? Looking at your codebase, I could not see this being used, but just want to double-check since it appears to play an important role in the TinyBERT paper.
5. Finally, what values of state_loss_ratio
and att_loss_ratio
did you use to generate the distilled model in Table 3 of your paper?
For reference, I am not working directly from the fastformers
repo, so have the following dependencies:
- `transformers` version: 4.0.0-rc-1
- Platform: Linux-4.15.0-72-generic-x86_64-with-Ubuntu-18.04-bionic
- Python version: 3.6.9
- PyTorch version (GPU?): 1.6.0 (True)
- Tensorflow version (GPU?): 2.3.0 (True)
- Using GPU in script?: (True)
- Using distributed or parallel set-up in script?: None
Thank you!
A declarative, efficient, and flexible JavaScript library for building user interfaces.
π Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. πππ
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google β€οΈ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.