Giter VIP home page Giter VIP logo

Comments (3)

chenlinchuang avatar chenlinchuang commented on August 24, 2024

To provide better context, i also tried the same code with legacy autograd profiler

Code to reproduce

import torch

with torch.autograd.profiler.profile(use_cuda=True) as prof:
    for _ in range(100):
        y = torch.randn(1).cuda() + torch.randn(1).cuda()
            
print(prof.key_averages().table(sort_by="cuda_time_total", row_limit=10))

Code Result

STAGE:2024-01-21 21:44:48 2136:2136 ActivityProfilerController.cpp:318] Completed Stage: Collection
STAGE:2024-01-21 21:44:48 2136:2136 ActivityProfilerController.cpp:322] Completed Stage: Post Processing
------------------------------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------
                                Name    Self CPU %      Self CPU   CPU total %     CPU total  CPU time avg     Self CUDA   Self CUDA %    CUDA total  CUDA time avg    # of Calls
------------------------------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------
                            aten::to         6.92%       2.056ms        44.29%      13.158ms      65.790us       2.962ms         9.35%      13.700ms      68.500us           200
                           aten::add         2.04%     605.000us        39.60%      11.764ms     117.640us      12.221ms        38.59%      12.221ms     122.210us           100
                      aten::_to_copy        13.34%       3.962ms        37.37%      11.102ms      55.510us       4.156ms        13.12%      10.738ms      53.690us           200
                         aten::randn        13.30%       3.952ms        14.89%       4.425ms      22.125us       3.134ms         9.90%       5.750ms      28.750us           200
                         aten::copy_         2.40%     714.000us        21.46%       6.376ms      31.880us       4.853ms        15.32%       4.853ms      24.265us           200
                 aten::empty_strided         1.67%     496.000us         2.46%     730.000us       3.650us       1.729ms         5.46%       1.729ms       8.645us           200
                       aten::normal_         1.35%     400.000us         1.35%     400.000us       2.000us       1.601ms         5.06%       1.601ms       8.005us           200
                         aten::empty         0.25%      73.000us         0.25%      73.000us       0.365us       1.015ms         3.20%       1.015ms       5.075us           200
    cudaDeviceGetStreamPriorityRange         1.19%     354.000us         1.19%     354.000us     354.000us       0.000us         0.00%       0.000us       0.000us             1
                  cudaGetDeviceCount         0.00%       0.000us         0.00%       0.000us       0.000us       0.000us         0.00%       0.000us       0.000us             2
------------------------------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------
Self CPU time total: 29.709ms
Self CUDA time total: 31.671ms

from kineto.

exitNA avatar exitNA commented on August 24, 2024

I have the same problem.

cuda: 12.3
pytorch:2.1.2

from kineto.

anupambhatnagar avatar anupambhatnagar commented on August 24, 2024

The ops ProfilerStep*, aten::empty, aten::to, aten::add etc. are launched on the CPU so the profiler is working as expected when ProfilerActivity.CPU is not added. The output of the profiler is the expected behavior and not a bug.

from kineto.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.