Comments (8)
Ah sorry for that, the code is a bit dumb. You see the error messages because the converter expects checkpoints to contain identifiers for compiled model weights, but the checkpoint is saved without these.
If you run the load_local_model.py
function with impl.compile_torch=False
, it should work.
from cramming.
No problem - and I think you had already included those instructions somewhere, I just overlooked them. So that is on me.
I can close this issue for now, but one quick question. I am presuming with the modified architecture it will not currently work with the AutoClass from the HF library? I mainly ask as I saw the modelcard for your model: https://huggingface.co/JonasGeiping/crammed-bert implies you can.
Guessing part of the model card was autogenerated?
For now, to reload a model using the crammed-bert arch we need to use the codebase provided here?
Thanks again for the help and great repo
from cramming.
You can (if everything works correctly), if you import the cramming package first, as shown in the documentation. It will register the model as an additional AutoModelForMaskedLM
.
from cramming.
You are absolutely correct - and apologies for not noticing that and probably wasting your time :). Thanks again
from cramming.
No problem
from cramming.
Sorry to come back again - but have now ran into one (I presume final) issue. When you want to use the AutoModelForSequenceClassification as defined by the crammed library, but loading a model that has been pre-trained using the MLM objective - it does not seem to allow adjusting the num_labels via the normal arguments passing.
e.g.
from transformers import AutoModelForSequenceClassification, AutoTokenizer, AutoModelForMaskedLM, AutoConfig
classifier_model = AutoModelForSequenceClassification.from_pretrained("JonasGeiping/crammed-bert", num_labels = 2)
it ends up passing None to torch.linear as it is actually looking for num_labels in the config file rather than arguments.
TypeError: empty() received an invalid combination of arguments - got (tuple, dtype=NoneType, device=NoneType), but expected one of:
Any thoughts? My main desire is to use in a more straight forward fashion.
##########UPDATE#########
For now my crude fix is to replace the num_labels derivation inside crammed_bert.py from
self.num_labels = self.cfg.num_labels
Which uses the config created by the Omegconf class.
to:
self.num_labels = self.config.num_labels
Which uses the config provided from the AutoModel class.
It works - but doesn't seem ideal
from cramming.
Hm that seems like a reasonable fix for now. Really though, the whole translation between the hydra config that the model was originally trained with, and the config
that huggingface expects is not so ideal in the long run.
from cramming.
Sure - its no problem really, my use case is quite specific and need to move away from the hydra config is all. It's great work and generally it meshes fine with huggingface.
from cramming.
Related Issues (20)
- RuntimeError: CUDA error: CUBLAS_STATUS_NOT_INITIALIZED when calling `cublasCreate(handle)` while running evaluation HOT 10
- Pretraining on a single RTX 3060 HOT 2
- Errors with both the verify installation command as well as the final recipe HOT 2
- GLUE evaluation numbers are very poor, if increase the sequence length to 512 and float 32 HOT 5
- Evaluation failed on MNLI and STSB Datasets for Last1.13release HOT 3
- I run the test command,got this error,how to fix it?looks like no dataset HOT 12
- Tutorial for pretrain RoBERTa with custom data HOT 2
- Issue with torch.compile / dynamo HOT 5
- Question about sparse token prediction HOT 1
- Finetuning for SQuAD task HOT 2
- try it on Mac M1 but failed HOT 2
- can't import cramming HOT 2
- TypeError: _load_optimizer() missing 1 required positional argument: 'initial_time' HOT 1
- torch._dynamo error on step 2: calling compiler function 'inductor' HOT 7
- Finetuning for token classification HOT 3
- Configs for GPT? HOT 2
- From PR 43 HOT 5
- Unable to replicate the results using the default command HOT 15
- How to load local data HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from cramming.