Comments (8)
There are two variants of V100 - 16Gb and 32Gb. You need 32Gb. I can recommend https://immers.cloud service for that (I'm not affiliated).
from ru-gpts.
I just don't really understand why GPT-2 Large is not runnable while original GPT-2 774M runs fine?.. Maybe I'm doing something wrong or there is a way to compress the model for 16 GB VRAM?
from ru-gpts.
Runs or trains? You need way less memory to run a model.
from ru-gpts.
Train, I meant. GPT-2 large trains fine and actually fast enough on V100 with gpt2-simple library, but the model of same size from sberbank isn't available for finetuning, weirdly enough. With same weights... What's the difference if architecture is basically identical and the only difference is training sets?
from ru-gpts.
I've taken a quick look at gpt2-simple. It seems they use SGD and gradient checkpoints to fit the large model. I'd suggest you to set the batch_size to 1, optim to SGD and fp16 with O2 and see how it goes.
from ru-gpts.
UPDATE: It did work! O2 arg for fpt16 finally fixed the not enough memory error. But now there's another issue that I have no idea how arises. Honestly, the whole dataset loading seems faulty, for some reason, considering it's second error with it.
@king-menin, mind helping to figure out what it is?
UPDATE 2: The features file generated seems to contain something really weird with seemingly broken encoding. Maybe that's the problem?
from ru-gpts.
Update #3: The line_by_line allows you to start finetuning for a few iterations, but apparently, on some line it stops. Does it mean that some lines from the dataset just can't be loaded?
Update #4: Yeah, seems like it. Dataset with russian horror stories worked, while dataset of articles from foundation didn't. Seems like we'll need a function to preproccess dataset, apparently...
from ru-gpts.
PROBLEM SOLVED: GPT-2 model needs formatting. You can't just feed it raw text. You need to use <|n|>
delimiter for your dataset, else it doesn't recognize it.. Honestly, you should update the pretraining section of readme, because this thing cost me 2 days of headache...
from ru-gpts.
Related Issues (20)
- describe carbon emission
- ruGPT3XL_generation.ipynb not working HOT 3
- Новость курс
- AssertionError: model parallel group is not initialized HOT 1
- The model requires `num_beams`, although it is not needed in the example HOT 3
- Ru-gpts for chit-chat bot HOT 2
- Прямая трансляция по apex legends HOT 1
- Games
- Correct data format for fine-tuning RUGPT3 models
- A
- The XL Model and the latest DeepSpeed
- Как настроить на вопрос\ответ? HOT 2
- Apackage missing HOT 2
- Hello
- Are there hardware requirements to execute the script? HOT 17
- Ускорение инференса rugpt3-large HOT 1
- Как embedding'и получить и какой они длины? HOT 1
- Unable to use RuGPT3FinetuneHF.ipynb Colab notebook HOT 1
- Link to code implementation is not available
- No "nvcc" utilite founded during environment installation HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from ru-gpts.