pytorch / kineto Goto Github PK

A CPU+GPU Profiling library that provides access to timeline traces and hardware performance counters.

License: Other

CMake 0.29% C++ 11.27% Starlark 0.05% Python 7.99% HTML 70.70% TypeScript 8.77% CSS 0.01% JavaScript 0.02% Shell 0.08% Cuda 0.82%

kineto's Issues

[Document] Profiler API Documentation

Update API Documentation https://pytorch.org/docs/stable/profiler.html

Feedback from Users:

Document difference between “wait” and “warmup” parameters in profiler schedule. Will be great if their meaning can be clarified in the documentation.
Explicitly state in the PyTorch docs that additional plugin needs to be installed to leverage tensorboard. You have the link to github, where it’s clearly stated, but it should be more explicit in my opinion, so I suggest to put that clarification in the main PyTorch documentation
Document profiler.step function. Maybe documentation can be improved.
Document record_function. Is there any way to mark regions of the code to be grouped together in profiling?
Document trace events lifecycle. Our current code has infinite loop in a function, out of which we return from the middle when some “exit condition” is satisfied (either number of steps or if validation loss stops improving). While we can definitely refactor that loop to “normally” exit and thus trigger the “exit” call to the profiler context manager, it will be good if there is some helper functionality (or at least an example in the documentation) on how to best handle such structure of the training loop. I think that tb events are still properly logged, but if I try to summarize the p.key_averages() inside the “with torch.profiler” block (before “return” operator), I get a “RuntimeError: can't export a trace that didn't finish running”. Again, this is not a big deal, since we can refactor the code, but will be nice if such structure is generally supported (maybe it already is, then will be good to have examples in the docs).

Is kineto usable with pytorch 1.8?

Hi, I have noticed use_kineto argument in torch.autograd.profiler.profile’s signature (PyTorch 1.8) but not in docs you point to in the main readme (i.e., https://pytorch.org/docs/master/profiler.html) which make no mention of kineto [anymore]. Therefore the question from the title - is kineto project still alive and usable with PyTorch 1.8?

If affirmative, are there any additional installs required beyond PyTorch itself for the kineto to work?

[Task] Enhance c10d profiling to capture more information, e.g. tensor dtype

pytorch/pytorch#55358

[Task] Distributed training

Why the time range of events are strange?

I'm sorry I probably shouldn't ask questions here, but I have tried to ask questions on stackoverflow and pytoch forums, and no one responded.

My problem is that the timestamp of the events reported by the pytorch profiler is strange.

# %%
import torch
import torchvision.models as models
import time
print(torch.__version__)

model = models.resnet18().cuda()
inputs = torch.randn(5, 3, 224, 224).cuda()
with torch.profiler.profile(
        activities=[
            torch.profiler.ProfilerActivity.CPU,
            torch.profiler.ProfilerActivity.CUDA]
) as p:
    outputs = model(inputs)

# %%
events = p.events()
print('begin     ', min(events, key=lambda e: e.time_range.start).time_range.start / 1000000)
print('end       ', max(events, key=lambda e: e.time_range.end).time_range.end / 1000000)
print('realtime  ', time.time())
print('monotic   ', time.perf_counter())

The output is as follows:

1.8.1
begin      1618284730.230899
end        1618284733.091974
realtime   1618299047.4422061
monotic    14317.211549314

My question is why the time of the event is so different from the time of the two clocks. The result I hope is that all the timestamps are consistent with the monotic clock.

Can I achieve the desired effect by modifying the source code, and if so, what should I do?

[Task] TensorCore

Intended usage

What is the intended usage of this library? It doesn't look like there are any examples or blog up yet. Do I need to build a C++ program around this library? How do I then invoke it while running a python pytorch run?

Profiler plugin does not refresh traces

Running training multiple times generates more traces, but refresh button does not re-scan the log directory and does not show new traces

[Task] GPU Utilization

[Task] Multiple tracing files support

[Task] Capture GPU/SM utilization and occupancy: Top level and on timeline

Page scroll up

Scroll the page down:

Input a string into "Search by Name" and press enter:

The page is automatically scrolled up. It is not a good user experience which makes user lose focus.

The "Group By" still behave like this.
The "Group By" and "Search by Name" in "Kernel View" also behave like this.

Add support for visualizing chrome trace files in tb_plugin from S3 URLs

For chrome trace files in S3 bucket, the tb_plugin UI gets stuck in spinning wheel to load the files and nothing gets displayed. Add support for visualizing the traces from S3 URLs for the Profiler plugin, similar to how one can view the main tensorboard runs in the rest of the tensorboard UI.

PYTORCH_PROFILER tab is not getting rendered automatically is JupyterHub's Tensorboard

The .json files are being saved by profiler at the logdir, but when I try to open tensorboard from jupterhub, the pytorch_profiler tab doesn't seem to be rendered automatically. All the dependencies as mentioned in README are present in the hub environment.

Add documentation for cloud storage support

Profiler plugin arrow navigation in VSCode

When TensorBoard with Profiler plugin is opened in VSCode key navigation with arrows does not work, while it is working in regular browser

[Task]: Trace torch.nn.DistributedDataParallel and torch.nn.DataParallel forward

Torch-tb-profiler on Conda

https://anaconda.org/search?q=torch-tb-profiler doesn't return anything

Loader doesn't correctly parse trace filename when repeat>1

https://github.com/pytorch/pytorch/blob/7d4e9bdba144e162882fb854324430c4b92fb267/torch/profiler/profiler.py#L75
Here's how tensorboard_trace_handler decides the filename

file_name = "{}.{}.pt.trace.json".format(worker_name, int(time.time() * 1000))

And here's how the loader parses it

kineto/tb_plugin/torch_tb_profiler/profiler/loader.py

Line 43 in b4e71c3

worker = path[:-len(pattern)]

When repeat > 1, there will be multiple trace files from each worker under different timestamps, and yet they will be identified as different workers by the loader.

[Task] Additional Recommendations: 1) Long idle period 2) Frequent CPU-GPU interactions

Chrome tracing json file fail to load because of unexpected string encoding

When using pytorch profiler with kineto enabled to profile model such as torchvision.alexnet, the dumped chrome tracing file can't be loaded by chrome://tracing.
It is caused by string with unexpected encoding in the file:

The running environment:
OS version: Ubuntu 18.04; Python: 3.8.5; CUDA: 11.1
PyTorch install: https://download.pytorch.org/whl/test/cu111/torch-1.8.0%2Bcu111-cp38-cp38-linux_x86_64.whl
Torchvision install: https://download.pytorch.org/whl/test/cu111/torchvision-0.9.0%2Bcu111-cp38-cp38-linux_x86_64.whl

The corrupted string will cause the dumped chrome tracing file fail to load. So it will cause our tensorboard plugin fail to show it.
It also makes PyTorch profiler CLI print confused to user. You could see the corrupted event is shown as empty string.

[Task]: support gzip for chrome tracing file

Support export chrome tracing file in gzip format to reduce file size

Chrome tracing file can't tell async task with sync task

Through reviewing the code of pytorch profiler and kineto, I found in the case of async task, there is an inconsistency on assigning thread id to FunctionEvent and ClientTraceActivity.

When running with @torch.jit.script, torch.jit.fork and torch.jit.wait, such as the first example code in Dynamic Parallelism in TorchScript, the “example” function is an async task, its start_thread_id and end_thread_id in profiling result will be different.
In torch/autograd/profiler.py, I could see it carefully handles this kind of event with checking “is_async”, it does not attach children and does not calculate self time on it.
However, in the dumped chrome tracing file (with kineto enabled), I could only see “example” as a normal “Operator” with “ph” as “X”. There is no message showing it is an async task. It just assign end thread's id to "tid".
My concern: Will regarding the async tasks simply as sync tasks have potential risk? If we simply regard async tasks as sync tasks, will they have risk to overlap (cross, not contain) with each other in chrome tracing UI view?
And if we can’t tell async tasks with sync tasks, our tb_plugin may can’t achieve same expected result as pytorch autograd profiler. For example, we will calculate self time of it because we can’t know it is actually an async task.

Another two tiny issues:

There are 3 thread id:
2.1 In profiler_kineto.cpp, the thread id is got from at::RecordFunction::currentThreadId(start, end).
2.2 In chrome tracing file, the “tid” is real pthread id
2.3 In chrome tracing file, the id in “thread_name” is got from ChromeTraceLogger::renameThreadID.
Maybe the 2.1 and 2.3 could be made more consistent?
The Ln56 in profiler_kineto.cpp seems redundant, only keep Ln73 is enough.

[Task] Capture device memory related metadata and export to chrome trace

In plugin Overall View, chart does not resize with size of browser window

Start tensorboard with non-max browser Window, then maximize browser Window.
In Overall View, charts stay small.

CUDA traces not getting generated in some environments

Cuda traces are not showing up for the Resent sample in AWS DLAMI with CUDA 10.2. Generated trace is attached
trace.zip

Profiler Tutorials update

Current tutorials point to autograd and need to be updated with new Profiler:

https://pytorch.org/tutorials/recipes/recipes/profiler_recipe.html
https://pytorch.org/tutorials/beginner/profiler.html

Profiler gets stuck on trace handling for a toy training loop

Basic setup info

device: Tesla V100-SXM2-32GB
ram: 64GB
pytorch: 1.8
cudatoolkit: 10.2
python: 3.7.8
environment: conda 4.7.5
os: CentOS Linux release 7.9.200

Description
Hi, I tried to export profiler trace for 1 epoch of training for a tutorial toy problem with examples of your new profiler API. Unfortunately, whenever training finishes and trace handling is called either via profiler or manually it gets stuck indefinitely, never outputting anything. I observed that from the moment of entering the trace handler RAM consumption of the host increases from 10 to 25GB over couple of minutes and stays there. Mindful of legacy profiler issues I checked the impact of setting DataLoader's num_workers to 0, but didn't seem to play a role. Any help appreciated

Conda's environment.yml

name: torch18
channels:
  - conda-forge
  - defaults
  - pytorch
dependencies:
  - matplotlib
  - cudatoolkit=10.2
  - pytorch=1.8.0
  - torchvision=0.9.0
  - request
  - request-oauthlib
  - pip:
    - tensorboard==1.15.0
    - tensorboard-plugin-wit==1.8.0
    - torch-tb-profiler
    - pynvml

Minimal Example

import sys

import torch
import torch.optim as optim
import torch.nn as nn
import torch.nn.functional as F
import torchvision
import torchvision.transforms as transforms

device = torch.device("cuda:0")

transform = transforms.Compose(
    [transforms.ToTensor(),
     transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])

trainset = torchvision.datasets.CIFAR10(root='./data', train=True, download=True,
                                        transform=transform)
trainloader = torch.utils.data.DataLoader(trainset, batch_size=4, shuffle=True,
                                          num_workers=2)


class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(3, 6, 5)
        self.pool = nn.MaxPool2d(2, 2)
        self.conv2 = nn.Conv2d(6, 16, 5)
        self.fc1 = nn.Linear(16 * 5 * 5, 120)
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 10)

    def forward(self, x):
        x = self.pool(F.relu(self.conv1(x)))
        x = self.pool(F.relu(self.conv2(x)))
        x = x.view(-1, 16 * 5 * 5)
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x


net = Net()
net.to(device)

criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(net.parameters(), lr=0.001, momentum=0.9)

tb_handler = torch.profiler.tensorboard_trace_handler('./log')

with torch.profiler.profile(
        activities=[
            torch.profiler.ProfilerActivity.CPU,
            torch.profiler.ProfilerActivity.CUDA]) as p:
    for epoch in range(1):  # loop over the dataset multiple times
        running_loss = 0.0
        for i, data in enumerate(trainloader, 0):
            # get the inputs; data is a list of [inputs, labels]
            inputs, labels = data[0].to(device), data[1].to(device)

            # zero the parameter gradients
            optimizer.zero_grad()

            # forward + backward + optimize
            outputs = net(inputs)
            loss = criterion(outputs, labels)
            loss.backward()
            optimizer.step()

            # print statistics
            running_loss += loss.item()
            if i % 2000 == 1999:  # print every 2000 mini-batches
                print('[%d, %5d] loss: %.3f' %
                      (epoch + 1, i + 1, running_loss / 2000))
                running_loss = 0.0
        p.step()
    print("epochs done")
    tb_handler(p)
    print("export done") # never gets here

print('Finished Training')

blog variable name error

Should change to "with_stack".
https://pytorch.org/blog/introducing-pytorch-profiler-the-new-and-improved-performance-tool/

Plugin fails to load traces

Running BERT training:
python /home/azureuser/pyprofiler/bert_for_sequence_classification.py --train-steps 64 --epochs 2 --pytorch-only

https://github.com/lenisha/pyprofiler/blob/main/bert_for_sequence_classification.py

and producing following trace:
https://github.com/lenisha/pyprofiler/blob/main/trace/bert_record_nt/worker0.pt.trace.json

Plugin does not load the trace , just showing spinner

Profiler issue for our Cloud Advocates notebook related to multiple file workers.

Cloud advocates team is working on the demo for Build and is looking to include profiler part in the demo.
Walked them thru setup, API etc but we see a weird behaviour, here is notebook and training loop:
tlaloc/explore.ipynb at main · sethjuarez/tlaloc (github.com) (cell 21)

When running we see profiler generates two traces files instead of one. See in the logs directory tlaloc/notebooks/logs at main · sethjuarez/tlaloc

Two trace files generated for the run
TB Plugin shows 4 Workers in the configuration pane – why? There is only one GPU worker.

In the traces we also see lots of GPU streams is it expected?

Add ProfilerAction.PAUSE

A ProfilerAction.PAUSE could be used to skip some batches when switching from train_dataloader to validation_dataloader

warnings in profiler console output

There is some warning from “security_validator.py” repeatedly appearing in the logs of tensorboard:
W0303 12:02:32.939834 139855689737984 security_validator.py:51] In 3.0, this warning will become an error
X-Content-Type-Options is required to be "nosniff"

tensorboard "--logdir" path can't support "~" as path prefix

Kernels overlap in the same CUDA stream

Sometimes(this rarely happens), open this json file from "chrome://tracing", we can see the "stream 7" has 2 lines and there is a small triangle in front of it.
I find it is caused by a very very thin kernel with "dur" 0. This kernel is overlapped with another event. Then 2 lines are shown in stream7.

It's worth further analyzing on why 2 kernels overlaps in the same stream.

Show scopes for operations

Hi,
A key feature we are looking but is currently missing is showing scopes for operations (like the tensorflow profiler).

Example:

empty_

vs.

SequentialModel/layers[2]/Attention/empty_

And we want the 2nd option.
(with the 2nd option, getting combined statistics for all occurrences of empty_ could be done by grouping)

According to the Pytorch API, using with_stack=True would only record file and line numbers.
Unfortunately, file and line numbers is still confusing since we can have both

SequentialModel/layers[2]/Attention/empty_
SequentialModel/layers[3]/SomeModelWithAttention/Attention/empty_

which would point to the same place in code.

Separate traces for training and validation steps

Our code is structured in such a way that inside the main training loop, we will sometimes run validation as well. Ideally, it should be logged separately from training. Is there any functionality to support this?
E.g. similar to what this issue is asking for: #86,

but ideally we should be able to span a second instance of profiler (or provide some metadata to say that current steps are not for training, but for validation).

[Task] Add support for saving chrome trace files to s3 urls and Azure blob

Add support for saving the generated chrome trace files to s3 URLs. This works for Tensorboard SummaryWriter but is not supported for the Profiler traces.

Current Behavior
on_trace_ready=torch.profiler.tensorboard_trace_handler('s3://tb-demo/pytorch/')

Creates local files like:

(base) ubuntu@ip-172-31-22-142:~$ ls 
s3\:/tb-demo/pytorch/ip-172-31-22-142_14545.1618682202565.pt.trace.json 
s3:/tb-demo/pytorch/ip-172-31-22-142_14545.1618682202565.pt.trace.json

Expected Behavior
The chrome trace files should be saved in the S3 bucket

Why CUDA_SOURCE_DIR instead of CUDA_HOME?

This is sort of nitpicking.

According to README of libkineto and CMakeLists.txt,

If CUDA_SOURCE_DIR is not set, libkineto will fail to build.

kineto/libkineto/CMakeLists.txt

Lines 80 to 83 in 3c77248

 if (NOT CUDA_SOURCE_DIR) 

 set(CUDA_SOURCE_DIR "$ENV{CUDA_SOURCE_DIR}") 

 message(INFO " CUDA_SOURCE_DIR = ${CUDA_SOURCE_DIR}") 

 endif()

However, the build script of PyTorch uses CUDA_HOME.
Why did you choose CUDA_SOURCE_DIR instead of CUDA_HOME? Is there any reason for this?

"memory bandwidth" with invalid value inf in dumped chrome tracing file

Our colleague produced the duration of "Memcpy" as 0 when using kineto to profile. It causes the "memory bandwidth (GB/s)" to be inf. Then the file can't be open by chrome://tracing and our tensorboard plugin's "json.load" fail to load it.

Because the inf is neither a string nor a number, so it can't be parsed as valid json format.
A solution is to add "if" to judge whether "dur" is 0, if so a string with "inf" could be assigned as value.

The bug case:

{
  "schemaVersion": 1,
  "traceEvents": [
  
  {
    "ph": "X", "cat": "Memcpy", 
    "name": "Memcpy HtoD (Pageable -> Device)", "pid": 0, "tid": "stream 7",
    "ts": 1614800519473220, "dur": 0,
    "args": {
      "device": 0, "context": 1,
      "stream": 7, "correlation": 20002, "external id": 3981,
      "bytes": 200, "memory bandwidth (GB/s)": inf
    }
  }
]}

Tensorboard hangs in VS code

VScode opens tensorboard but hangs forever (displaying a spinner). The same happens in the browser but is resolved on refreshing the page. The cause appear to be a race between tensorboard loading/finding the trace files and the web page opening. Since VScode loads the frame at the same time as starting tensorboard and there is no refresh for the frame in VSCode, there's no easy way for the user to get the page to load.

security warning in tensorboard

There is some warning from “security_validator.py” repeatedly appearing in the logs of tensorboard:

W0303 12:02:32.939834 139855689737984 security_validator.py:51] In 3.0, this warning will become an error
X-Content-Type-Options is required to be "nosniff"

Negative number in input textbox

In Operator view, now user can input negative number into the "Top kernels to show". This should be forbidden.
The way to reproduce it:

Mouse move to head of the number, click.
Input the negative character "-", press enter key.

[Task] VSCode integration

#147

[Task] Memory View

'"ts": 0' event is found in kineto's chrome tracing file

When I train resnet50 model with big batch size such as 128 or 256, the profiler dumps the following events in chrome tracing file:

  {
    "ph": "X", "cat": "Kernel", 
    "name": "volta_sgemm_64x64_nt", "pid": 0, "tid": "stream 7",
    **"ts": 0, "dur": 0,**
    "args": {
      "queued": 0, "device": 0, "context": 1,
      "stream": 7, "correlation": 106123, "external id": 34735,
      "registers per thread": 126,
      "shared memory": 8192,
      "warps per SM": 14.4,
      "grid": [4, 4, 36],
      "block": [64, 1, 1]
    }
  },
  {
    "ph": "f", "id": 106123, "pid": 0, "tid": "stream 7", "ts": 0,
    "cat": "async", "name": "launch", "bp": "e"
  },
  {
    "ph": "X", "cat": "Runtime", 
    "name": "cudaLaunchKernel", "pid": 29463, "tid": "3239786240",
    "ts": 1615207264505000, "dur": 6,
    "args": {
      "cbid": 211, "correlation": 106123,
      "external id": 34735, "external ts": 1615207264504915
    }
  },
  {
    "ph": "s", "id": 106123, "pid": 29463, "tid": 3239786240, "ts": 1615207264505000,
    "cat": "async", "name": "launch"
  },

You can see "ts" is 0 and "dur" is 0 in above kernel. And there are more than 1 thousand these events in the file.

The model code I used is plugin example resnet50, and changing its "batch_size" from 32 to 128.
The GPU: NVIDIA V100.
The torch whl: https://download.pytorch.org/whl/nightly/cu111/torch-1.9.0.dev20210305%2Bcu111-cp38-cp38-linux_x86_64.whl
The torchvision whl: https://download.pytorch.org/whl/nightly/cu111/torchvision-0.9.0.dev20210305%2Bcu111-cp38-cp38-linux_x86_64.whl

Because these kernels with 0 ts are launched by the runtimes which are nearly the end of profiling, I guess it is due to these kernels are executed later than "profiler stop". This is my snapshot:

You can see the kernels that which should be executed after "profiler stop" is not painted here(They are painted at time "0"). And the last operators don't have related kernels, but they really launched kernels in chrome tracing file.

The is also bad experience that user will be confused to see the following in this case: (ts=0 will make these events starts at 0, and other events will be shown to many years later)

BTW, the expected behavior is correctly dumping the kernels launched during profiler's start and stop, rather than removing these kernels from file. Because we have to make the operators' kernel time to be correct.

_populate_cpu_children seems pretty slow

[Task] Module view

Build failure on Ubuntu 16.04

Log

/usr/bin/c++ -DHAVE_MALLOC_USABLE_SIZE=1 -DHAVE_MMAP=1 -DHAVE_SHM_OPEN=1 -DHAVE_SHM_UNLINK=1 -DIDEEP_USE_MKL -DMAGMA_V2 -DMINIZ_DISABLE_ZIP_READER_CRC32_CHECKS -DONNXIFI_ENABLE_EXT=1 -DONNX_ML=1 -DONNX_NAMESPACE=onnx_torch -DTH_BLAS_MKL -DUSE_EXTERNAL_MZCRC -D_FILE_OFFSET_BITS=64 -I../cmake/../third_party/benchmark/include -Icaffe2/contrib/aten -I../third_party/onnx -Ithird_party/onnx -I../third_party/foxi -Ithird_party/foxi -I../third_party/kineto/libkineto/include -I../third_party/kineto/libkineto/src -I../third_party/fmt/include -I/usr/local/cuda/extras/CUPTI/include -I/usr/local/cuda/include -isystem third_party/gloo -isystem ../cmake/../third_party/gloo -isystem ../cmake/../third_party/googletest/googlemock/include -isystem ../cmake/../third_party/googletest/googletest/include -isystem ../third_party/protobuf/src -isystem /home/chester/miniconda3/envs/pytorch-build-py37/include -isystem ../third_party/gemmlowp -isystem ../third_party/neon2sse -isystem ../third_party/XNNPACK/include -isystem ../third_party -isystem ../cmake/../third_party/eigen -isystem /home/chester/miniconda3/envs/pytorch-build-py37/include/python3.7m -isystem /home/chester/miniconda3/envs/pytorch-build-py37/lib/python3.7/site-packages/numpy/core/include -isystem ../cmake/../third_party/pybind11/include -isystem /usr/lib/openmpi/include/openmpi/opal/mca/event/libevent2021/libevent -isystem /usr/lib/openmpi/include/openmpi/opal/mca/event/libevent2021/libevent/include -isystem /usr/lib/openmpi/include -isystem /usr/lib/openmpi/include/openmpi -isystem ../cmake/../third_party/cub -isystem ../third_party/ideep/mkl-dnn/include -isystem ../third_party/ideep/include -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -O3 -DNDEBUG -DNDEBUG -fPIC -fvisibility=hidden -DCAFFE2_USE_GLOO -DCUDA_HAS_FP16=1 -DHAVE_GCC_GET_CPUID -DUSE_AVX -DUSE_AVX2 -DTH_HAVE_THREAD -DKINETO_NAMESPACE=libkineto -std=gnu++14 -DHAS_CUPTI -std=c++14 -MD -MT third_party/kineto/libkineto/CMakeFiles/kineto_base.dir/src/cupti_strings.cpp.o -MF third_party/kineto/libkineto/CMakeFiles/kineto_base.dir/src/cupti_strings.cpp.o.d -o third_party/kineto/libkineto/CMakeFiles/kineto_base.dir/src/cupti_strings.cpp.o -c ../third_party/kineto/libkineto/src/cupti_strings.cpp
../third_party/kineto/libkineto/src/cupti_strings.cpp: In function ‘const char* libkineto::runtimeCbidName(CUpti_CallbackId)’:
../third_party/kineto/libkineto/src/cupti_strings.cpp:478:105: error: expected ‘,’ before ‘)’ token
   static_assert(CUPTI_RUNTIME_TRACE_CBID_SIZE < (sizeof(runtimeCbidNames) / sizeof(runtimeCbidNames[0])));
                                                                                                         ^
../third_party/kineto/libkineto/src/cupti_strings.cpp:478:105: error: expected string-literal before ‘)’ token

Environment

Collecting environment information...
PyTorch version: 1.9.0a0+gitebfa927
Is debug build: False
CUDA used to build PyTorch: 10.2
ROCM used to build PyTorch: N/A

OS: Ubuntu 16.04.7 LTS (x86_64)
GCC version: (Ubuntu 5.4.0-6ubuntu1~16.04.12) 5.4.0 20160609
Clang version: Could not collect
CMake version: version 3.18.2

Python version: 3.7 (64-bit runtime)
Is CUDA available: True
CUDA runtime version: Could not collect
GPU models and configuration:
GPU 0: GeForce GTX 1080
GPU 1: GeForce GTX 1080

Nvidia driver version: 440.33.01
cuDNN version: Probably one of the following:
/usr/lib/x86_64-linux-gnu/libcudnn.so.7.1.3
/usr/lib/x86_64-linux-gnu/libcudnn.so.8.0.5
/usr/lib/x86_64-linux-gnu/libcudnn_adv_infer.so.8.0.5
/usr/lib/x86_64-linux-gnu/libcudnn_adv_train.so.8.0.5
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_infer.so.8.0.5
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_train.so.8.0.5
/usr/lib/x86_64-linux-gnu/libcudnn_ops_infer.so.8.0.5
/usr/lib/x86_64-linux-gnu/libcudnn_ops_train.so.8.0.5
HIP runtime version: N/A
MIOpen runtime version: N/A

	if (NOT CUDA_SOURCE_DIR)
	set(CUDA_SOURCE_DIR "$ENV{CUDA_SOURCE_DIR}")
	message(INFO " CUDA_SOURCE_DIR = ${CUDA_SOURCE_DIR}")
	endif()

pytorch / kineto Goto Github PK

kineto's Issues

Log

Environment

Recommend Projects

Recommend Topics

Recommend Org