Comments (3)
To provide better context, i also tried the same code with legacy autograd profiler
Code to reproduce
import torch
with torch.autograd.profiler.profile(use_cuda=True) as prof:
for _ in range(100):
y = torch.randn(1).cuda() + torch.randn(1).cuda()
print(prof.key_averages().table(sort_by="cuda_time_total", row_limit=10))
Code Result
STAGE:2024-01-21 21:44:48 2136:2136 ActivityProfilerController.cpp:318] Completed Stage: Collection
STAGE:2024-01-21 21:44:48 2136:2136 ActivityProfilerController.cpp:322] Completed Stage: Post Processing
------------------------------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------
Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls
------------------------------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------
aten::to 6.92% 2.056ms 44.29% 13.158ms 65.790us 2.962ms 9.35% 13.700ms 68.500us 200
aten::add 2.04% 605.000us 39.60% 11.764ms 117.640us 12.221ms 38.59% 12.221ms 122.210us 100
aten::_to_copy 13.34% 3.962ms 37.37% 11.102ms 55.510us 4.156ms 13.12% 10.738ms 53.690us 200
aten::randn 13.30% 3.952ms 14.89% 4.425ms 22.125us 3.134ms 9.90% 5.750ms 28.750us 200
aten::copy_ 2.40% 714.000us 21.46% 6.376ms 31.880us 4.853ms 15.32% 4.853ms 24.265us 200
aten::empty_strided 1.67% 496.000us 2.46% 730.000us 3.650us 1.729ms 5.46% 1.729ms 8.645us 200
aten::normal_ 1.35% 400.000us 1.35% 400.000us 2.000us 1.601ms 5.06% 1.601ms 8.005us 200
aten::empty 0.25% 73.000us 0.25% 73.000us 0.365us 1.015ms 3.20% 1.015ms 5.075us 200
cudaDeviceGetStreamPriorityRange 1.19% 354.000us 1.19% 354.000us 354.000us 0.000us 0.00% 0.000us 0.000us 1
cudaGetDeviceCount 0.00% 0.000us 0.00% 0.000us 0.000us 0.000us 0.00% 0.000us 0.000us 2
------------------------------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------
Self CPU time total: 29.709ms
Self CUDA time total: 31.671ms
from kineto.
I have the same problem.
cuda: 12.3
pytorch:2.1.2
from kineto.
The ops ProfilerStep*
, aten::empty
, aten::to
, aten::add
etc. are launched on the CPU so the profiler is working as expected when ProfilerActivity.CPU
is not added. The output of the profiler is the expected behavior and not a bug.
from kineto.
Related Issues (20)
- TB_Plugin_CI failing with AttributeError: module 'mpmath' has no attribute 'rational'
- [Plugin-Bug]The Operators of baseline-run and exp-run are showed in a misaligned order HOT 1
- How to add customized metadata with on demand profiling ? HOT 7
- [RFC] Support XPU Backend With PTI-sdk in Kineto HOT 3
- [Discussion] Which clock should we be using for timestamps? HOT 2
- GPU traces fail when using PyTorch lightning due to square braces in traceName HOT 2
- Support memory profiling feature from on-demand path
- Roctracer crashes when number of samples too high
- TypeError: bad operand type for unary -: 'NoneType' HOT 4
- [Synchronization events] Missing StreamWait event in cases
- KeyError: <torch_tb_profiler.profiler.node.OperatorNode object at 0x7f4a45dc3e80> HOT 1
- Module View cannot show device time HOT 5
- CUDA time difference between print function and Profiler TensorBoard
- 【Feature Request】Add Process Status Check Before Profiling to Handle Non-Running Training Tasks
- Upgrade to CUDA 12.4 is causing segfaults in 4 Range Profiler Tests
- [BUG] Number of communication kernels don't match between workers in run
- Train process is blocked when kineto is processing traceEvents HOT 1
- Update libfmt in kineto
- IN Build CUPTI_RUNTIME_TRACE_CBID_cudaLaunchKernelExC_v11060, not found HOT 2
- [RFC][XPU profiler] Introduce XPU profiler by following kineto plugin design HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from kineto.