kojimano / megatron-deepspeed-abci Goto Github PK
View Code? Open in Web Editor NEWLicense: Other
License: Other
対象
・ABCI転送する前処理済みのデータ
時期
・リハーサル前
・今は学習中にtokenizeしているか、事前にtokenizeしているか
・学習中にtokenizeするとどのくらいパフォーマンスが落ちるか
どのtokenizerを使うのか。
CAデータのtoken数を測る。
事前学習後にinstruction tuning を行う。(RLHFよりも効果が大きいという話を以前どこかで聞いた記憶がある。)
日本語で行う場合のデータをどうするか。また、evaluation用のデータ(タスク)とinstruction tuning用のデータ(タスク)は分ける必要がありそう。
04/27のミーティングより
方針
よって、以下を用いる。
前処理の優先順位
#9 とも関連。
#GPUs | #Layers | DP | MP | PP | MBS | GBS | SL | AC | Max Mem (allocated) | Max Mem (reserved) | Sec/it | TFLOPs | Notes |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
4 | 4 | 1 | 2 | 2 | 1 | 8 | 1024 | Yes | 8584 MiB | 9936 MiB | 0.5 | 45.63 | 4/28 |
4 | 4 | 1 | 2 | 2 | 1 | 8 | 1024 | No | 8585 MiB | 10278 MiB | 0.44 | 45.09 | 4/28 |
4 | 2 | 1 | 2 | 2 | 1 | 8 | 2048 | Yes | 4458 MiB | 5336 MiB | 0.6 | 47.8 | 4/28 |
4 | 4 | 1 | 2 | 2 | 1 | 8 | 2048 | Yes | 8525 MiB | 10142 MiB | 0.97 | 51.64 | 4/28 |
4 | 4 | 1 | 1 | 4 | 1 | 8 | 2048 | Yes | 6057 / 10970 MiB (OOM) | 7980 / 13278 MiB (OOM) | - | 43.7 | 4/28 |
4 | 2 | 1 | 4 | 1 | 1 | 8 | 2048 | Yes | 4458 MiB | 4458 MiB | 0.6 | 47.5 | 4/28 |
4 | 4 | 1 | 4 | 1 | 1 | 8 | 2048 | No | 7462 MiB | 9236 MiB | 0.8 | 44.6 | 4/28 |
4 | 4 | 1 | 4 | 1 | 1 | 8 | 2048 | Yes | 7463 MiB | 8134 MiB | 1.0 | 47.0 | 4/28 |
4 | 4 | 1 | 4 | 1 | 2 | 8 | 2048 | Yes | 7462 MiB | 8528 MiB | 0.8 | 60.9 | 4/28 |
4 | 4 | 1 | 4 | 1 | 4 | 8 | 2048 | Yes | 7479 MiB | 8890 MiB | 0.8 | 60.9 | 4/28 |
4 | 4 | 1 | 4 | 1 | 4 | 8 | 2048 | No | 11793 MiB | 13516 MiB | 0.6 | 57.9 | 4/28 |
4 | 6 | 4 | 1 | 1 | 1 | 8 | 2048 | Yes | 10467 MiB (OOM) | 11272 MiB (OOM) | - | - | 4/28 |
Memory usages seems to increase after logging?
#GPUs | Size | DP | MP | PP | MBS | GBS | SL | AC | Zero | Max Mem (allocated) | Max Mem (reserved) | TFLOPs | Sec/it | Est. Aggr. PetaFLOPs | B tokens | Notes |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
32 | 10B | 1 | 4 | 8 | 1 | 90 | 1024 | No | 1 | OOM MiB | OOM MiB | - | - | - | - | 4/28 |
32 | 10B | 1 | 4 | 8 | 1 | 90 | 2048 | Yes | 1 | - MiB | - MiB | 39.3 | 12.4 | - | 152 | 4/28 |
32 | 10B | 1 | 4 | 8 | 2 | 90 | 2048 | Yes | 1 | 7875 MiB | 8892 MiB | 40.1 | 12.2 | - | 155 | 4/28 |
32 | 10B | 1 | 4 | 8 | 4 | 90 | 2048 | Yes | 1 | - MiB | - MiB | - | - | - | - | 4/28 |
32 | 13B | 1 | 4 | 8 | 1 | 8 | 2048 | Yes | 1 | 7568 MiB | 8586 MiB | 23.5 | 2.3 | - | - | 4/28 |
32 | 13B | 1 | 4 | 8 | 1 | 512 | 2048 | Yes | 1 | 8966 MiB | 10100 MiB | 42.7 | 83.5 | - | - | 4/28 |
32 | 13B | 1 | 4 | 8 | 1 | 90 | 1024 | No | 1 | OOM MiB | OOM MiB | - | - | - | - | 4/28 |
32 | 13B | 1 | 4 | 8 | 1 | 90 | 2048 | Yes | 1 | 8964 MiB | 10124 MiB | 40.0 | 15.4 | - | 123 | 4/28 |
32 | 13B | 1 | 4 | 8 | 2 | 90 | 2048 | Yes | 1 | 9303 MiB | 10648 MiB | 48.7 | 12.8 | - | 148 | 4/28 |
32 | 13B | 1 | 4 | 8 | 4 | 88 | 2048 | Yes | 1 | 12243 MiB (OOM) | 14108 MiB (OOM) | 44.2 | 13.8 | - | - | 4/28 |
#GPUs | #Layers | DP | MP | PP | MBS | GBS | AC | Zero | Max Mem (allocated) | Max Mem (reserved) | TFLOPs | Sec/it | B tokens | Notes |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
32 | 10 | 4 | 1 | 1 | 1 | 88 | Yes | None | 7540 MiB | 9116 MiB | 43.2 | 1.2 | - | 5/2 |
32 | 10 | 4 | 1 | 1 | 1 | 88 | Yes | 1 | 5050 MiB | - MiB | 43.1 | - | 1.2 | 5/2 |
32 | 10 | 4 | 1 | 1 | 1 | 88 | Yes | 2 | 5490 MiB | - MiB | 42.9 | 1.2 | - | 5/2 |
#GPUs | Size | DP | MP | PP | MBS | GBS | AC | AC chunk | DAC | Max Mem (allocated) | Max Mem (reserved) | TFLOPs | Sec/it | Notes | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
4 | 10B (6 layers) | 1 | 4 | 1 | 1 | 88 | 2048 | No | - | No | 7758 MiB | 8732.MiB | 46.25 | 8.5 | 4/28 |
4 | 10B (6 layers) | 1 | 4 | 1 | 2 | 88 | 2048 | No | - | No | 10614 MiB (OOM) | 11858 MiB (OOM) | 49.75 | 7.9 | 4/28 |
4 | 10B (6 layers) | 1 | 4 | 1 | 1 | 88 | 2048 | Yes | 1 | No | 6931 MiB | 7162 MiB | 46.36 | 11.3 | 4/28 |
4 | 10B (6 layers) | 1 | 4 | 1 | 2 | 88 | 2048 | Yes | 1 | No | 6931 MiB | 7538 MiB | 50.33 | 10.4 | 4/28 |
4 | 10B (6 layers) | 1 | 4 | 1 | 2 | 88 | 2048 | Yes | 1 | Yes | 6979 MiB | 7242 MiB | 49.9 | 10.5 | 4/28 |
4 | 10B (6 layers) | 1 | 4 | 1 | 4 | 88 | 2048 | Yes | 1 | Yes | 7027 MiB | 8808 MiB | 53.05 | 9.9 | 4/28 |
4 | 10B (6 layers) | 1 | 4 | 1 | 8 | 88 | 2048 | Yes | 1 | Yes | 7124 MiB | 10592 MiB | 53.26 | 9.9 | 4/28 |
4 | 10B (6 layers) | 1 | 4 | 1 | 2 | 88 | 2048 | Yes | 2 | Yes | - MiB | - MiB | - | - | bug did not work ... |
(次回以降のプロジェクトで実施したい?)
【今後Tokenizerを自分たちで学習する場合について小島さんとのディスカッション】
どのような形にして保存しておけば良いか。
・1行1文のプレーンテキスト?jsonl?
・圧縮する?(ディスク容量との兼ね合い)
・分割する?(ファイル読み込みの実装、メモリとの兼ね合い)
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.