Comments (5)
目前develop分支的结果
2231 2021-08-27:19:18:33,294 INFO [test_bert.py:236] ckp True fp16 True ps True: step elapse 5.9073121547698975 sec/iter, 16.184105474731595 Tflops
2232 2021-08-27:19:18:33,294 INFO [test_bert.py:238] model 0.72940493
2233 2021-08-27:19:18:33,295 INFO [global_timer.py:45] *********** PROFILE RESULTS *************
2234 2021-08-27:19:18:33,295 INFO [global_timer.py:50] CHUNK_LIST_prepare_device, 0, 0.0 %
2235 2021-08-27:19:18:33,295 INFO [global_timer.py:50] CHUNK_allocate_payload, 0, 0.0 %
2236 2021-08-27:19:18:33,295 INFO [global_timer.py:50] CLIENT_access, 0.02212691307067871, 0.33153822024058927 %
2237 2021-08-27:19:18:33,295 INFO [global_timer.py:50] CLIENT_release, 0.019531965255737305, 0.2926568644258493 %
2238 2021-08-27:19:18:33,295 INFO [global_timer.py:50] chunk_cpu_gpu_move, 0, 0.0 %
2239 2021-08-27:19:18:33,295 INFO [global_timer.py:50] CLIENT_access_dist, 0.04144024848937988, 0.6209192482752114 %
2240 2021-08-27:19:18:33,295 INFO [global_timer.py:50] CLIENT_release_dist, 0.027328968048095703, 0.4094831212440634 %
2241 2021-08-27:19:18:33,295 INFO [global_timer.py:50] chunk_gpu_cpu_move, 0, 0.0 %
2242 2021-08-27:19:18:33,295 INFO [global_timer.py:50] CHUNK_LIST_chunk_move, 0, 0.0 %
2243 2021-08-27:19:18:33,295 INFO [global_timer.py:50] FWD, 0.2834486961364746, 4.247048648242367 %
2244 2021-08-27:19:18:33,295 INFO [global_timer.py:50] BWD, 3.0267438888549805, 45.351164838479654 %
2245 2021-08-27:19:18:33,295 INFO [global_timer.py:50] ADAM_prepare_data_fp16_grad_to_fp32_grad_copy, 0.1033015251159668, 1.5478166193218406 %
2246 2021-08-27:19:18:33,295 INFO [global_timer.py:50] ADAM_prepare_data, 0.1308901309967041, 1.9611900195517777 %
2247 2021-08-27:19:18:33,295 INFO [global_timer.py:50] ADAM_compute, 1.1683895587921143, 17.50654479602667 %
2248 2021-08-27:19:18:33,295 INFO [global_timer.py:50] ADAM_param_fp32_to_fp16, 0.17122220993041992, 2.5655050284088605 %
2249 2021-08-27:19:18:33,295 INFO [global_timer.py:50] ADAM_release_data, 0.02155280113220215, 0.3229360239155347 %
2250 2021-08-27:19:18:33,295 INFO [global_timer.py:50] ADAM, 1.658038854598999, 24.843196571867583 %
2251 2021-08-27:19:18:33,295 INFO [global_timer.py:76] *********** DATA MOVE RESULTS *************
2252 2021-08-27:19:18:33,295 INFO [global_timer.py:86] chunk_cpu_gpu_move: 0.0 MB
2253 2021-08-27:19:18:33,295 INFO [global_timer.py:86] chunk_gpu_cpu_move: 0.0 MB
2254 2021-08-27:19:18:33,295 INFO [global_timer.py:83] ADAM_prepare_data_fp16_grad_to_fp32_grad_copy: 1391.2294960021973 MB, 393 times, 13467.656885417677 MB/s
2255 2021-08-27:19:18:33,295 INFO [global_timer.py:83] ADAM_param_fp32_to_fp16: 2782.4589920043945 MB, 393 times, 16250.572826592477 MB/s
from patrickstar.
CPU型号,AMD Ryzen 7 3700X 8-Core Processor
新的ds_adam kernel性能感觉比deepspeed慢很多。
建议单测内增加性能比较。
from patrickstar.
我担心是 loss scale 导致的,因为它相当于给所有的参数都求了个 sum……可以去掉 loss scale 看一下
from patrickstar.
如果是loss scale影响的话,那还真不如把loss scale放在GPU上,反向产生梯度的时候。
from patrickstar.
速度并不明显差异。ADAM_compute时间增加因为算入了fp16->fp32转化时间。
from patrickstar.
Related Issues (20)
- Memory-centric tiling HOT 1
- Support both dynamic model data partition and static model data partition. HOT 1
- Polish memory and speed profiler.
- PatrickStar's Performance in Models Like GANs HOT 2
- Support NVMe HOT 1
- 运行报错 HOT 1
- Optimize chunk allocate and release HOT 2
- Proposal: overlap NVMe read and write with computing. HOT 2
- Skipping ADAM in warmup affects the overall performance.
- Support communication config before training
- Search the best chunk size. HOT 1
- Accelerate Chunk List Construction Speed. HOT 1
- support using PatrickStar on MegatronDeepSpeed? HOT 3
- Error when install under python3.6 HOT 1
- FP32ChunkReadBuffer throw errors for vit training.
- 希望能够保持特定层的 weight 仍为 float32 HOT 5
- A major refactor to sacrifice some performance for flexiblity and simplicity HOT 1
- RuntimeError: chunk move failed. HOT 3
- install issue HOT 2
- Hi,What is the offset do?
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from patrickstar.