Giter VIP home page Giter VIP logo

megatron-deepspeed-abci's Issues

聞いておきたいこと

・今は学習中にtokenizeしているか、事前にtokenizeしているか
・学習中にtokenizeするとどのくらいパフォーマンスが落ちるか

instruction tuningについて

事前学習後にinstruction tuning を行う。(RLHFよりも効果が大きいという話を以前どこかで聞いた記憶がある。)

日本語で行う場合のデータをどうするか。また、evaluation用のデータ(タスク)とinstruction tuning用のデータ(タスク)は分ける必要がありそう。

Debug data processing pipelines

  • Wikipediaのpreprocessing scriptが生成するjsonlの“text”のフィールドに空欄の多い件
  • Abejaのtokenizerの動作確認

Sambanovaで使うデータセットの用意

04/27のミーティングより

方針

  • mc4など前処理に時間がかかりそうなものは避ける
  • あくまでABCIグランドチャレンジに向けた練習という立ち位置

よって、以下を用いる。

  • Wikipedia(未DL)
  • CC100(DL済み)
    • 比較的クリーンなので

前処理の優先順位

  • 正規化(NFD or NFC)
  • 重複削除

ABCI GPU ベンチマーク

Model benchmarking results

Overview

Model hyperparameters

  • 13B OPT-13B
    • --num-layers 40
    • --hidden-size 5120
    • --num-attention-heads 40
  • 10B GLM-10B <== use this one
    • --num-layers 48
    • --hidden-size 4096
    • --num-attention-heads 64
  • 10B Megatron-10B
    • --num-layers 50
    • --hidden-size 4096
    • --num-attention-heads 32

Notations

  • MBS = micro batch size
  • GBS = global batch size
  • Sec/it = seconds per iteration
  • Est. Aggr. PetaFLOPs = TFLOPs * Nodes / 1024

Preliminary Experiments

#GPUs #Layers DP MP PP MBS GBS SL AC Max Mem (allocated) Max Mem (reserved) Sec/it TFLOPs Notes
4 4 1 2 2 1 8 1024 Yes 8584 MiB 9936 MiB 0.5 45.63 4/28
4 4 1 2 2 1 8 1024 No 8585 MiB 10278 MiB 0.44 45.09 4/28
4 2 1 2 2 1 8 2048 Yes 4458 MiB 5336 MiB 0.6 47.8 4/28
4 4 1 2 2 1 8 2048 Yes 8525 MiB 10142 MiB 0.97 51.64 4/28
4 4 1 1 4 1 8 2048 Yes 6057 / 10970 MiB (OOM) 7980 / 13278 MiB (OOM) - 43.7 4/28
4 2 1 4 1 1 8 2048 Yes 4458 MiB 4458 MiB 0.6 47.5 4/28
4 4 1 4 1 1 8 2048 No 7462 MiB 9236 MiB 0.8 44.6 4/28
4 4 1 4 1 1 8 2048 Yes 7463 MiB 8134 MiB 1.0 47.0 4/28
4 4 1 4 1 2 8 2048 Yes 7462 MiB 8528 MiB 0.8 60.9 4/28
4 4 1 4 1 4 8 2048 Yes 7479 MiB 8890 MiB 0.8 60.9 4/28
4 4 1 4 1 4 8 2048 No 11793 MiB 13516 MiB 0.6 57.9 4/28
4 6 4 1 1 1 8 2048 Yes 10467 MiB (OOM) 11272 MiB (OOM) - - 4/28

Memory usages seems to increase after logging?

Experiments-1

#GPUs Size DP MP PP MBS GBS SL AC Zero Max Mem (allocated) Max Mem (reserved) TFLOPs Sec/it Est. Aggr. PetaFLOPs B tokens Notes
32 10B 1 4 8 1 90 1024 No 1 OOM MiB OOM MiB - - - - 4/28
32 10B 1 4 8 1 90 2048 Yes 1 - MiB - MiB 39.3 12.4 - 152 4/28
32 10B 1 4 8 2 90 2048 Yes 1 7875 MiB 8892 MiB 40.1 12.2 - 155 4/28
32 10B 1 4 8 4 90 2048 Yes 1 - MiB - MiB - - - - 4/28
32 13B 1 4 8 1 8 2048 Yes 1 7568 MiB 8586 MiB 23.5 2.3 - - 4/28
32 13B 1 4 8 1 512 2048 Yes 1 8966 MiB 10100 MiB 42.7 83.5 - - 4/28
32 13B 1 4 8 1 90 1024 No 1 OOM MiB OOM MiB - - - - 4/28
32 13B 1 4 8 1 90 2048 Yes 1 8964 MiB 10124 MiB 40.0 15.4 - 123 4/28
32 13B 1 4 8 2 90 2048 Yes 1 9303 MiB 10648 MiB 48.7 12.8 - 148 4/28
32 13B 1 4 8 4 88 2048 Yes 1 12243 MiB (OOM) 14108 MiB (OOM) 44.2 13.8 - - 4/28

Deepspeed (Reduce PP bubble / disable activation checkpoints)

#GPUs #Layers DP MP PP MBS GBS AC Zero Max Mem (allocated) Max Mem (reserved) TFLOPs Sec/it B tokens Notes
32 10 4 1 1 1 88 Yes None 7540 MiB 9116 MiB 43.2 1.2 - 5/2
32 10 4 1 1 1 88 Yes 1 5050 MiB - MiB 43.1 - 1.2 5/2
32 10 4 1 1 1 88 Yes 2 5490 MiB - MiB 42.9 1.2 - 5/2

Activation Partitioning and Activation Checkpointing Chunks

#GPUs Size DP MP PP MBS GBS AC AC chunk DAC Max Mem (allocated) Max Mem (reserved) TFLOPs Sec/it Notes
4 10B (6 layers) 1 4 1 1 88 2048 No - No 7758 MiB 8732.MiB 46.25 8.5 4/28
4 10B (6 layers) 1 4 1 2 88 2048 No - No 10614 MiB (OOM) 11858 MiB (OOM) 49.75 7.9 4/28
4 10B (6 layers) 1 4 1 1 88 2048 Yes 1 No 6931 MiB 7162 MiB 46.36 11.3 4/28
4 10B (6 layers) 1 4 1 2 88 2048 Yes 1 No 6931 MiB 7538 MiB 50.33 10.4 4/28
4 10B (6 layers) 1 4 1 2 88 2048 Yes 1 Yes 6979 MiB 7242 MiB 49.9 10.5 4/28
4 10B (6 layers) 1 4 1 4 88 2048 Yes 1 Yes 7027 MiB 8808 MiB 53.05 9.9 4/28
4 10B (6 layers) 1 4 1 8 88 2048 Yes 1 Yes 7124 MiB 10592 MiB 53.26 9.9 4/28
4 10B (6 layers) 1 4 1 2 88 2048 Yes 2 Yes - MiB - MiB - - bug did not work ...

Notes

  • Activation Partitioning seems to be deepspeed feature and combined with AC
  • DAC stands for distribute-checkpointed-activations
  • Bug for controlling the chunk size of activation checkpointing
    Screen Shot 2023-05-04 at 11 16 48

Tokenizer周りについて

(次回以降のプロジェクトで実施したい?)

【今後Tokenizerを自分たちで学習する場合について小島さんとのディスカッション】

  • sample from diverse corpora
  • train sentencepiece

学習データの最終的な固め方

どのような形にして保存しておけば良いか。
・1行1文のプレーンテキスト?jsonl?
・圧縮する?(ディスク容量との兼ね合い)
・分割する?(ファイル読み込みの実装、メモリとの兼ね合い)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.