celtoys / remotery Goto Github PK

View Code? Open in Web Editor NEW

3.1K 78.0 264.0 2 MB

Single C file, Realtime CPU/GPU Profiler with Remote Web Viewer

License: Apache License 2.0

C 73.78% JavaScript 24.34% CSS 0.92% HTML 0.62% Objective-C++ 0.34%

profiler c gpu cpu d3d11 opengl cuda metal d3d12 vulkan

remotery's Introduction

Remotery

A realtime CPU/GPU profiler hosted in a single C file with a viewer that runs in a web browser.

Features:

Lightweight instrumentation of multiple threads running on the CPU and GPU.
Web viewer that runs in Chrome, Firefox and Safari; on Desktops, Mobiles or Tablets.
GPU UI rendering, bypassing the DOM completely, for real-time 60hz viewer updates at 10,000x the performance.
Automatic thread sampler that tells you what processor cores your threads are running on without requiring Administrator privileges.
Drop saved traces onto the Remotery window to load historical runs for inspection.
Console output for logging text.
Console input for sending commands to your game.
A Property API for recording named/typed values over time, alongside samples.
Profiles itself and shows how it's performing in the viewer.

Supported Profiling Platforms:

Windows 7/8/10/11/UWP (Hololens), Linux, OSX, iOS, Android, Xbox One/Series, Free BSD.

Supported GPU Profiling APIS:

D3D 11/12, OpenGL, CUDA, Metal, Vulkan.

Compiling

Windows (MSVC) - add lib/Remotery.c and lib/Remotery.h to your program. Set include directories to add Remotery/lib path. The required libraries (ws2_32.lib and winmm.lib) should be picked up through the use of the #pragma comment directives in Remotery.c.
Windows (MINGW-64) - add lib/Remotery.c and lib/Remotery.h to your program. Set include directories to add Remotery/lib path. You will need to link libws2_32.a and libwinmm.a yourself through your build system, as GCC (and therefore MINGW-64) do not support #pragma comment directives
Mac OS X (XCode) - simply add lib/Remotery.c, lib/Remotery.h and lib/Remotery.mm to your program.
Linux (GCC) - add the source in lib folder. Compilation of the code requires -pthreads for library linkage. For example to compile the same run: cc lib/Remotery.c sample/sample.c -I lib -pthread -lm
FreeBSD - the easiest way is to take a look at the official port (devel/remotery) and modify the port's Makefile if needed. There is also a package available via pkg install remotery.
Vulkan - Ensure your include directories are set such that the Vulkan headers can be included with the statement: #include <vulkan/vulkan.h>. Currently the Vulkan implementation requires either Vulkan 1.2+ with the hostQueryReset and timelineSemaphore features enabled, or < 1.2 with the VK_EXT_host_query_reset and VK_KHR_timeline_semaphore extensions. The extension VK_EXT_calibrated_timestamps (or VK_KHR_calibrated_timestamps) is also always required.

You can define some extra macros to modify what features are compiled into Remotery:

Macro               Default     Description

RMT_ENABLED         1           Disable this to not include any bits of Remotery in your build
RMT_USE_TINYCRT     0           Used by the Celtoys TinyCRT library (not released yet)
RMT_USE_CUDA        0           Assuming CUDA headers/libs are setup, allow CUDA profiling
RMT_USE_D3D11       0           Assuming Direct3D 11 headers/libs are setup, allow D3D11 GPU profiling
RMT_USE_D3D12       0           Allow D3D12 GPU profiling
RMT_USE_OPENGL      0           Allow OpenGL GPU profiling (dynamically links OpenGL libraries on available platforms)
RMT_USE_METAL       0           Allow Metal profiling of command buffers
RMT_USE_VULKAN      0           Allow Vulkan GPU profiling

Basic Use

See the sample directory for further examples. A quick example:

int main()
{
    // Create the main instance of Remotery.
    // You need only do this once per program.
    Remotery* rmt;
    rmt_CreateGlobalInstance(&rmt);

    // Explicit begin/end for C
    {
        rmt_BeginCPUSample(LogText, 0);
        rmt_LogText("Time me, please!");
        rmt_EndCPUSample();
    }

    // Scoped begin/end for C++
    {
        rmt_ScopedCPUSample(LogText, 0);
        rmt_LogText("Time me, too!");
    }

    // Destroy the main instance of Remotery.
    rmt_DestroyGlobalInstance(rmt);
}

Running the Viewer

Double-click or launch vis/index.html from the browser.

Sampling CUDA GPU activity

Remotery allows for profiling multiple threads of CUDA execution using different asynchronous streams that must all share the same context. After initialising both Remotery and CUDA you need to bind the two together using the call:

rmtCUDABind bind;
bind.context = m_Context;
bind.CtxSetCurrent = &cuCtxSetCurrent;
bind.CtxGetCurrent = &cuCtxGetCurrent;
bind.EventCreate = &cuEventCreate;
bind.EventDestroy = &cuEventDestroy;
bind.EventRecord = &cuEventRecord;
bind.EventQuery = &cuEventQuery;
bind.EventElapsedTime = &cuEventElapsedTime;
rmt_BindCUDA(&bind);

Explicitly pointing to the CUDA interface allows Remotery to be included anywhere in your project without need for you to link with the required CUDA libraries. After the bind completes you can safely sample any CUDA activity:

CUstream stream;

// Explicit begin/end for C
{
    rmt_BeginCUDASample(UnscopedSample, stream);
    // ... CUDA code ...
    rmt_EndCUDASample(stream);
}

// Scoped begin/end for C++
{
    rmt_ScopedCUDASample(ScopedSample, stream);
    // ... CUDA code ...
}

Remotery supports only one context for all threads and will use cuCtxGetCurrent and cuCtxSetCurrent to ensure the current thread has the context you specify in rmtCUDABind.context.

Sampling Direct3D 11 GPU activity

Remotery allows sampling of D3D11 GPU activity on multiple devices on multiple threads. After initialising Remotery, you need to bind it to D3D11 with a single call from the thread that owns the device context:

// Parameters are ID3D11Device* and ID3D11DeviceContext*
rmt_BindD3D11(d3d11_device, d3d11_context);

Sampling is then a simple case of:

// Explicit begin/end for C
{
    rmt_BeginD3D11Sample(UnscopedSample);
    // ... D3D code ...
    rmt_EndD3D11Sample();
}

// Scoped begin/end for C++
{
    rmt_ScopedD3D11Sample(ScopedSample);
    // ... D3D code ...
}

Subsequent sampling calls from the same thread will use that device/context combination. When you shutdown your D3D11 device and context, ensure you notify Remotery before shutting down Remotery itself:

rmt_UnbindD3D11();

Sampling OpenGL GPU activity

Remotery allows sampling of GPU activity on your main OpenGL context. After initialising Remotery, you need to bind it to OpenGL with the single call:

rmt_BindOpenGL();

Sampling is then a simple case of:

// Explicit begin/end for C
{
    rmt_BeginOpenGLSample(UnscopedSample);
    // ... OpenGL code ...
    rmt_EndOpenGLSample();
}

// Scoped begin/end for C++
{
    rmt_ScopedOpenGLSample(ScopedSample);
    // ... OpenGL code ...
}

Support for multiple contexts can be added pretty easily if there is demand for the feature. When you shutdown your OpenGL device and context, ensure you notify Remotery before shutting down Remotery itself:

rmt_UnbindOpenGL();

Sampling Metal GPU activity

Remotery can sample Metal command buffers issued to the GPU from multiple threads. As the Metal API does not support finer grained profiling, samples will return only the timing of the bound command buffer, irrespective of how many you issue. As such, make sure you bind and sample the command buffer for each call site:

rmt_BindMetal(mtl_command_buffer);
rmt_ScopedMetalSample(command_buffer_name);

The C API supports begin/end also:

rmt_BindMetal(mtl_command_buffer);
rmt_BeginMetalSample(command_buffer_name);
...
rmt_EndMetalSample();

Sampling Vulkan GPU activity

Remotery can sample Vulkan command buffers issued to the GPU on multiple queues from multiple threads. Command buffers must be submitted to the same queue as the samples are issued to. Multiple queues can be profiled by creating multiple Vulkan bind objects.

rmtVulkanFunctions vulkan_funcs;
vulkan_funcs.vkGetPhysicalDeviceProperties = (void*)my_vulkan_instance_table->vkGetPhysicalDeviceProperties;
vulkan_funcs.vkQueueSubmit = (void*)my_vulkan_device_table->vkQueueSubmit;
// ... All other function pointers

// Parameters are VkInstance, VkPhysicalDevice, VkDevice, VkQueue, rmtVulkanFunctions*, rmtVulkanBind**
// NOTE: The Vulkan functions are copied internally and so do not have to be kept alive after this call.
rmtVulkanBind* vulkan_bind = NULL;
rmt_BindVulkan(instance, physical_device, device, queue, &vulkan_funcs, &vulkan_bind);

Sampling is then a simple case of:

// Explicit begin/end for C
{
    rmt_BeginVulkanSample(vulkan_bind, command_buffer, UnscopedSample);
    // ... Vulkan code ...
    rmt_EndVulkanSample();
}

// Scoped begin/end for C++
{
    rmt_ScopedVulkanSample(vulkan_bind, command_buffer, ScopedSample);
    // ... Vulkan code ...
}

NOTE: Vulkan sampling on Apple platforms via MoltenVK must be done with caution. Metal doesn't natively support timestamps inside of render or compute passes, so MoltenVK simply reports all timestamps inside those scopes as the begin/end time of the entire render pass!

Sampling calls using the same vulkan_bind object measure use the device and queue specified when the bind was created. Once per frame you must call rmt_MarkFrame() to gather GPU timestamps on the CPU.

// End of frame, possibly after calling vkPresentKHR or at the very beginning of the frame
rmt_MarkFrame();

Before you destroy your Vulkan device and queue you can manually clean up resources by calling rmt_UnbindVulkan, though this is done automatically by rmt_DestroyGlobalInstance as well for all rmt_BindVulkan objects:

rmt_UnbindVulkan(vulkan_bind);

Applying Configuration Settings

Before creating your Remotery instance, you can configure its behaviour by retrieving its settings object:

rmtSettings* settings = rmt_Settings();

Some important settings are:

// Redirect any Remotery allocations to your own malloc/free, with an additional context pointer
// that gets passed to your callbacks.
settings->malloc;
settings->free;
settings->mm_context;

// Specify an input handler that receives text input from the Remotery console, with an additional
// context pointer that gets passed to your callback.
// The handler will be called from the Remotery thread so synchronization with a mutex or atomics
// might be needed to avoid race conditions with your threads.
settings->input_handler;
settings->input_handler_context;

remotery's People

Contributors

Stargazers

Watchers

Forkers

dougbinks sriravic sxdtxl fun4jimmy davidlee80 island-org beefmaster sopyer gamedevtech yodamaster nodrev barrettcolin bagobor galek ghiboz hugin84 gamedevforks hjanetzek trianglespct floooh danielgibson wzugang mcanthony hexuallyactive jkhoogland dtmoodie uikit0 andr3wmac jodithetigger shokeywind lalalaring v3n code-disaster demiguise claudiouzelac takaaptech simudream rajkosto isegal wubugui lucked paulecoyote mihaisebea foosforks sorinm nil-ableton psybrus tomjakubowski sandover jazzbre squirrel-republic casperdcl naortega thendrix pedronavf techdojo jafermarq tom-seddon naughtycode jonnyhopper glowmade 4144 gaoxiaojun marcclintdion sherief wuyakuma sjb3d rockhowse alainlompo njlr sorakun rlalance aluedke-microsoft njligames sprig nitsuj33 vk2gpu ylyking donboie darksylinc louandutoits3 michochan guozanhua sosojustdo 3irdparty sejarce pirater tommo viseengine intrigus harold-b valtolibraries vertexodessa choiday nshcat shelim telegrap majun8645 visse sattishv

remotery's Issues

Reduce sample/logging latency by using blocking

The message queue is polled every 10ms, in-between which the thread is put to sleep. Latency increases and messages have the potential to be discarded.

Add wakeup calls with blocking waits in the main thread (semaphores? events?) to process immediately.

Memory Leak

There seems to be a memory leak created when WebSocket_AcceptConnection is called. The client_socket passed in already contains a tcp_socket which is then overwritten inside the function.

I'm not sure about the desired behavior at this point. Is the previously allocated tcp_socket usable or should the old one be cleaned up before assigning the new one?

Asserts vs. errors

Separate asserts from errors, allowing asserts to be used for internal consistency only and turned off in release.

Calls to glGetError can be slow

See #68

Linux version not working

I've tried to run it on linux and with the latest version it doesn't even get a connection between the vizualisation and the running app. I've tried to debug it, but only by adding printfs I got the following assertion:
lib/Remotery.c:3418: SampleTree_Pop: Assertion `sample != tree->root' failed.

I've tried to debug it a bit, but because often in the case of an error you return from a subroutine without a message or an assertion it's hard to find the real source of failure. It seems the extra thread mainloop runs for exactly one time and hangs afterwards.

OSX: 'sys/prctl.h' file not found

Should this work on OSX?

Wrong pointer indirection in NULL check

Please take a look at function WebSocket_Create, lines 2059-2061:

*web_socket = (WebSocket*)malloc(sizeof(WebSocket));
if (web_socket == NULL)
    return RMT_ERROR_MALLOC_FAIL;

It's obvious that the original intent was to check *web_socket, but in the current state this check exhibits undefined behaviour and optimizing compiler may completely remove it as we've already dereferenced the pointer.

Missing implementation of usTimer_Get in POSIX builds

Control reaches end of non-void function, return value is undefined.

Cannot compile the code in debian

Hi,
When I download the whole code repository and tried to compile it in my debian Linux server. I encounter following error:

The pwd is now under the home folder of Remotery-master.
linux1:/uac/msc/yfxue/www> cd Remotery-master/
linux1:/uac/msc/yfxue/www/Remotery-master> cc lib/Remotery.c sample/sample.c -l lib -p thread -lm
sample/sample.c:3:22: fatal error: Remotery.h: No such file or directory
compilation terminated.
linux1:/uac/msc/yfxue/www/Remotery-master> pwd
/uac/msc/yfxue/www/Remotery-master

linux1:/uac/msc/yfxue/www/Remotery-master> ls
./  ../  lib/  LICENSE*  readme.md*  sample/  screenshot.png*  vis/
linux1:/uac/msc/yfxue/www/Remotery-master> ls ./lib/
./  ../  Remotery.c*  Remotery.h*

Then I tried to copy ./lib/Remotery.c to ./sample/ but it still failed as following:

yfxue@linux1:~/www/Remotery-master$ cp lib/Remotery.h ./sample/
yfxue@linux1:~/www/Remotery-master$  cc ./lib/Remotery.c ./sample/sample.c -l ./lib -pthread -lm
/usr/bin/ld: cannot find -l./lib
collect2: error: ld returned 1 exit status

Then I modifed my command as following, it error out also:
yfxue@linux1:~/www/Remotery-master$  cc lib/Remotery.c sample/sample.c -l lib -pthread -lm
/usr/bin/ld: cannot find -llib
collect2: error: ld returned 1 exit status
yfxue@linux1:~/www/Remotery-master$  cc lib/Remotery.c sample/sample.c  -pthread -lm
/tmp/ccy20F2Q.o: In function `rmtLoadLibrary':
Remotery.c:(.text+0x103): undefined reference to `dlopen'
/tmp/ccy20F2Q.o: In function `rmtFreeLibrary':
Remotery.c:(.text+0x11d): undefined reference to `dlclose'
/tmp/ccy20F2Q.o: In function `rmtGetProcAddress':
Remotery.c:(.text+0x142): undefined reference to `dlsym'
/tmp/ccy20F2Q.o: In function `usTimer_Init':
Remotery.c:(.text+0x1a2): undefined reference to `clock_gettime'
/tmp/ccy20F2Q.o: In function `usTimer_Get':
Remotery.c:(.text+0x21d): undefined reference to `clock_gettime'
collect2: error: ld returned 1 exit status
yfxue@linux1:~/www/Remotery-master$

My Linux version as following:

yfxue@linux1:~/www/Remotery-master$ cat /proc/version
Linux version 3.2.0-4-amd64 ([email protected]) (gcc version 4.6.3 (Debian 4.6.3-14) ) 
#1 SMP Debian 3.2.63-2+deb7u2

I am not sure if it suitable for compile the whole code repository under this linux version or it requires some additional setup for running?

Thanks,

D3D profiling

Clean up the CUDA stuff a bit and add GPU profiling for D3D.

Remotery initialisation struct

Add initialisation struct for stuff like custom memory allocators and control over how much memory gets allocated by Remotery.

OpenCL backend

Hi,

for a benchmark software i'm making, could you add the support for an OpenCL backend ?

awesome work man !

Advice on too many threads in Timeline window

Hi,

There doesn't seem to be a way to resize or scroll the content of the Timeline window (or is there?). We have a relatively high number of (IO) threads, more then fit into Timeline window which makes it impossible to view or select some thread timelines. My current solution would be to locally hack the visualizer webpage but I'm wondering whether there is a better way to do this, or how much effort it would be to make the timeline window resizable or vertically scrollable.

This is what it looks like:

Cheers,
-Floh.

assertion failed

Assertion failed: web_socket != NULL, file ..\lib\Remotery.c, line 2192

any idea? :)

also, this happens to be hammering the console log as well:

[13:42:26] Disconnected
[13:42:28] Connecting to ws://127.0.0.1:17815/rmt
[13:42:28] Connected
[13:42:28] Connection Error 
[13:42:28] Disconnected
[13:42:30] Connecting to ws://127.0.0.1:17815/rmt
[13:42:30] Connected
[13:42:30] Connection Error 
[13:42:30] Disconnected
[13:42:32] Connecting to ws://127.0.0.1:17815/rmt
[13:42:32] Connected
[13:42:32] Connection Error 
[13:42:32] Disconnected
[13:42:34] Connecting to ws://127.0.0.1:17815/rmt
[13:42:34] Connected

Late-starting threads have offset time reporting.

The zero-base for sample starts is set when the first sample tree on a thread is created.

This can happen arbitrarily late (imagine spinning up a worker thread). It also introduces error even when you start threads at about the same time.

All threads should have useconds reported on the same timebase with zero having the same meaning on every thread.

Timeline control features

When pausing the web page you have control over panning back in time and zooming and such. If I may can I offer some ideas that I think would improve and accelerate the navigation between samples. It can be a bit time consuming to manually scroll the timeline around when you are trying to analyze the timings of particular samples over time.

Add arrows or some sort of control(mousewheel when hovering over a sample, etc) for each sample in their respective window(when paused), in order to jump forward and backward in the timeline to each instance of that particular sample.

Doesn't build on Linux with RTM_USE_OPENGL

It doesn't build out of the box, I had to do a few changes in Remotery.c:

around Line 5348 it'll explode because WINGDIAPI is unknown. so it should really be an #elif defined(_WIN32) case + there needs to be a new #else case with #define GLAPI extern or something like that.
glXGetProcAddressARB couldn't be found. #include <gl/glx.h> create conflicts with your own GLuint typedefs etc, just adding extern void* glXGetProcAddressARB(const GLubyte*); made it work.. not pretty, though

And sometimes I get segfaults on startup

Program received signal SIGSEGV, Segmentation fault.
Remotery_Destructor (rmt=0xe78750) at ../Libs/Remotery/Remotery.c:4182
4182            Delete(OpenGL, rmt->opengl);

can't reproduce that reliably though and last time I forgot to get a backtrace :-/

Anyway, thanks for this awesome tool, I really enjoy using it and it makes profiling performance problems so much easier :-)

Function order, timing units, percentage

I would find it very useful if the profiler breakdown was closer to other profilers I have used in the past, specifically:

It would be nice if the browser view could (either by default, or as an option) display like:

So:

An expandable tree view for the function stack
Functions orderer in descending percentage order
Overall percentage next to each function (so 87.2% of the frame is spending in PcEngine::Tick and 61.4% is spent in PcEngine::Render, looking further down, 23.8% is spent in OpenGLSprite::Render). Some profilers allow you to toggle whether the percentages are relatively (to parent) or absolute. I believe absolute is the more useful overall.
Specify (somewhere!) what the units are (microseconds?), perhaps milliseconds would be better (so 16565 might become 16.565) in games at least, we typically talk about timing in ms.

This is of course on top of:

Combining samples from the same function (which I have hacked in)
Displaying the hit count of each function

I appreciate there is quite a lot to this enhancement request, and perhaps parts of it would be better suited to subtasks, but I'll put it all here initially, maybe you'll be able to comment on whether the above fits with your vision for this project or not. From my perspective it is a neat little profiler, I'd like to be able to make more use of it (I would also like my dev team to use it) and if it worked somewhat similarly to existing projects it would help a lot.

Data only transmitted if top level sample end is called

The current design accumulates samples via a hierarchy, and then transmits the data if the top level sample has ended.

This seems reasonable, but I've been using a begin/end sample marker around my thread entry function, which causes the Remotery sampler to never send any data for that thread, with the additional problem that memory usage builds up.

Auto generate names

I wanted to be able to just drop in a single macro in a bunch of functions, and have the API auto fill-in the function names for me, i.e.

void func1( ) {
    rmt_ScopedCPUSampleAutoName( );
}
void func2( ) {
    rmt_ScopedCPUSampleAutoName( );
}
int main( int argc, char ** argv ) {
    rmt_ScopedCPUSampleAutoName( );
    func1( );
    func2( );
    return 0;
}

I didn't want to have to manually pass in a name to each call. My solution:

#define rmt_BeginCPUSampleAutoName( ) \
    RMT_OPTIONAL(RMT_ENABLED, { \
        static rmtU32 rmt_sample_hash_##__LINE__ = 0;   \
        _rmt_BeginCPUSample(__FUNCTION__, &rmt_sample_hash_##__LINE__); \
    })

#define rmt_ScopedCPUSampleAutoName( ) \
    RMT_OPTIONAL(RMT_ENABLED, rmt_BeginCPUSampleAutoName( )); \
    RMT_OPTIONAL(RMT_ENABLED, rmt_EndCPUSampleOnScopeExit rmt_ScopedCPUSample##__LINE__);

This generates a call tree like:

Sorry this isn't a proper full request, but I hoped this might be something you'd consider implementing (either as is, or in a better way if you know of one).

Add a f

OSX: Missing NSGLGetProcAddress

I had to make workaround missing NSGLGetProcAddress function in newer OSX.
bkaradzic/bgfx@c3dd887

Getting "Alignment trap:" errors on arm.

As per
http://stackoverflow.com/questions/16548059/how-to-trap-unaligned-memory-access

enable signals on mis-aligned accesses

echo 4 > /proc/cpu/alignment

This gives a SIGBUS in MessageQueue_AllocMessage on the line

msg->thread_sampler = thread_sampler;

MessageQueue_AllocMessage needs to allocate lengths that are 4 byte aligned.

I fixed it by adding the lines:

    // needs to be 4 byte aligned on ARM
    payload_size = ((payload_size + 3) & ~3u);

Don't drop messages that overflow the message queue

Overflow behaviour is to currently just drop the message. This is a nice, non-fatal way of dealing with the problem but not ideal.

Allow the user to allow their threads to block on overflow until more space becomes available.

Add a way to find or profile one off blocks

Currently each thread gets its own block and line in the html page that moves forward in time as it runs and completes samples. Often times it's useful to profile occasional areas of code that are hit for a frame but then lost in the time line.

It would be super useful if one could create or label a one off sample scope so that it can be found easily in the html page. It could be diverted to its own line and block, or perhaps a search feature could be used to find things easier.

I tried to create a temporary sample scope by changing the thread name around the scope hoping it would end up in its own block in the output,but that didn't work.

pause button

hello again,

the button pauses both the graphics and windows, but console text is still displaying and logging.
is this intended?

also, it could be nice if the pause button would enhance status color to be like this:

red = status off, pause off,
green = status on, pause off,
gray = status off, pause on,
yellow = status on, pause on,

btw, the green/red light is crazy since a few ago and blinks all the time, no matter what the connection status is. any idea?

Simpler error handling for the user

Most runtime functions return an error code because they can create thread samplers on the fly. Should we use RegisterThread instead? That would increase API init burden but would prevent the need to return error codes for all API functions.

Alternatively, fold it into the rmt_SendThreadSamples function.

Add a Stats Graph Window

One thing that we've used in the past is a 'Stats' window that would show per-frame things like FPS, Active Entities, Draw Call Count, Memory Usage, etc. as just a simple list. It would be very cool if Remotery had a way to do this and show a small history graph (60 frames worth maybe) of selected stats. Something similar to the Windows 8 Task Manager "Performance" Tab (left side).

Is this something that sounds doable? We are just now starting to look over the source.

Add ability to enable/disable Remotery via project settings

By default Remotery is always enabled because there is #define RMT_ENABLED. There is no way to change it without modifying code.

I propose change to:

#ifndef RMT_ENABLED
#define RMT_ENABLED 1
#endif

With this, code should not use #ifdefs anymore, but rather #if because symbol will be always defined. This simplifies integration with other projects.

Thread-safety problem when there are socket errors

Now that all points of error for sending/receiving data on a socket have been trapped, there are some thread-safety issues with logging text (again).

If there is a failure to log text, the TCP socket for the WebSocket will be correctly shut down. However, the server thread might be using it at the same time.

The only way for this seems to be sticking the log text on a queue or opening a different socket for it.

Remove error codes from public API

These distract from the interface and serve no real purpose other than to show what an error means in the debugger. In order to make use of these at runtime the programmer as to add them all to a big switch statement manually, which will easily get out of sync between versions.

Given that the programmer can't really do anything different based on what the returned error codes are (beyond letting the user know what's going on at runtime), there only really needs to be two catchable error codes at runtime: ERROR and OK.

So, remove the error codes - maybe just push them into the C file so that errors codes are still debuggable. Return an OK/ERROR state from the public API and a function that maps an error to a string for printing at runtime.

Integration proposal

Hi!

I would like to add Remotery support to this project: https://github.com/01org/IntelSEAPI/wiki
Please contact me to discuss details? alexander.a.raud at intel dot com

With respect, Alex.

glGetError and other OpenGL functions are statically linked

There are two issues in use of glGetError in Remotery.

It's statically linked, while all other functions are dynamically loaded.
https://github.com/Celtoys/Remotery/blob/master/lib/Remotery.c#L5372
It's actually called. On some drivers/platforms glGetError is very costly operation, and it should be only optionally called. The best way to deal with it is to wrap all GL calls with GL(_call) macro that can turn on/off glGetError checking when necessary. Also it simplifies and make GL code cleaner. Here is example from my codebase: https://github.com/bkaradzic/bgfx/blob/master/src/renderer_gl.h#L839

This is my current workaround this issue (it fixes both :):
https://github.com/bkaradzic/bgfx/blob/master/3rdparty/remotery/lib/Remotery.c#L5372

Remove memset from MessageQueue

The consumer cleans up messages it has just processed by filling them with zeroes. This is a thread-safe way of allowing multiple producers to allocate message queue memory and keep their own lock on that range of memory until the message content is complete.

This burns write bandwidth.

One potential solution would be to clear just the message ID to zero, making the consumer check for "anything other than the messages I know about." If these message IDs are kept below an small integer value then each message can ensure they never store those values themselves. Tricksy but well-defined and much more efficient.

Missing #ifdef for some CUDA code

Declaration of AreCUDASamplesReady/GetCUDASampleTimes functions and their usage have to be wrapped in #ifdef RMT_USE_CUDA to avoid compilation issues on Linux/OSX.

Visualizer not displaying all threads if thread names not set

If I don't call rmt_SetCurrentThreadName for each thread then the viewer only sees 2 of my threads, although debugging shows that all 8 threads are tracked by Remotery_GetThreadSampler correctly.

I can likely get a repro case up if needed, but was hoping you might have an idea.

Remotery does CPU work with no connected viewer

Samples are built and timed even if the viewer isn't connected.

Don't want to check for connection on each sample submit and don't also want to start accepting samples half way through a tree.

Better thread naming

Just a suggestion ...
Instead of hashing ThreadSampler pointers for thread names. It would make more sense to name threads by their IDs. something like :

static rmtU32 Thread_GetId()
{
#if defined(RMT_PLATFORM_WINDOWS)

    return GetCurrentThreadId();

#elif defined(RMT_PLATFORM_POSIX)

    return (rmtU32)pthread_self();

#endif
}

And inside ThreadSampler_Constructor instead of Base64_Encode line :

 snprintf(thread_sampler->name, sizeof(thread_sampler->name), "Thread #%d", Thread_GetId());

I don't know about windows, but there is also a thread naming feature in posix :
http://man7.org/linux/man-pages/man3/pthread_setname_np.3.html
You can use that to fetch default thread name (if the user has set it before). If the name is empty, use the Thread ID as the name, if not use the name set by user.

Use-after-free error

Found with clang's scan-build. Please consider the following sequence of actions in Remotery_Create:

Line 3527: initialize (*rmt)->thread to NULL.
Line 3548: call Thread_Create.
Line 455: allocate memory for thread structure and save the pointer.
Lines 476 or 484: functions CreateThread/pthread_create fail for some reason.
Lines 478 or 486: call Thread_Destroy with *thread as parameter. It will free the structure, but won't reset the pointer.
Line 535: function frees the memory, but doesn't update pointer.
Line 3549: detect the failure and call Remotery_Destroy.
Line 3574: rmt->thread still points to now free memory and hence non-NULL.
Line 3576: try to access freed memory.

To re-run scan-build use the following commands:

scan-build clang lib/Remotery.c sample/sample.c -I lib -pthread -lm
scan-view /tmp/<displayed-guid>

Samples not combined?

Are the samples supposed to be condensed into a single entry with a # calls sort of tracking?

It didn't seem right to be that all the samples called within a loop would be separate like this.

Web viewer does not update threads if there is no new data

Currently the web viewer does not update threads if they do not have any data - for example an asynchronous task which completes with the thread then waiting on an event for a new task.

This means that as other threads which continue to send data are updated, the timeline presented is not consistent.

Memory Usage

It appears that if you leave an application running for hours with Remotery enabled, memory will steadily increase and ultimately crash your program.

Does it retain sample data indefinitely?

OpenGL profiling

Clean up the CUDA stuff a bit and add GPU profiling for OpenGL.

Missing curly braces around JSON_ERROR_CHECK macro

Having this macro definition

#define JSON_ERROR_CHECK(stmt) error = stmt; if (error != RMT_ERROR_NONE) return error;

and usage like this

if (sample->next_sibling != NULL)
    JSON_ERROR_CHECK(json_Comma(buffer));

leads to very interesting execution path.

I recommend to either add curly braces to if-statement, or (preferably) to macro definition itself.

Put sample queue into the new message queue

The SpSc queue for samples is quite fast and elegant but it suffers from latency issues:

10ms sleep needs replacing with blocking primitives so that the server thread wakes up the instant there are samples.
Threads are visited in unspecified order, meaning fast loops will queue up behind slower loops.

These would be fixable and the code simplified by moving it over to the new message queue.

Feature request: split into sampler and separate 'viewer' backends

Various viewer backends would be useful, so I would propose that separating out the webserver from the sampler would be a good way to make a simple expendable system.

Examples of potentially useful backends:

Webserver (current backend)
Binary network server (similar to current but using binary for external C++ viewer)
In app graphics viewer (using https://github.com/dougbinks/nanovg for example)
Dump to file

Remotery interface cannot currently be used in a shared library

I'd like to be able to use remotery in a shared library, and am prepared to do the work to add exporting functions but would like to know if this was something which you'd be prepared to accept a pull request for this?

Additionally if you have any guidelines let me know, otherwise I'll just try to follow the current code style.

Disconnect/Reconnect every 2-5 seconds

There are a couple calls to Server_Send with a 20ms timeout that frequently appear to fail inside TCPSocket_Send in the timeout busy loop, forcing a disconnect. This happens every few seconds. I'm not sure why it would be timing out on a loopback connection.

If it matters, I'm running the web page in the latest chrome.

Connection problems on Win32

When Remotery is launched about 25% of the times, starts fine but after a few seconds remotery connection log starts showing connection errors every 2 seconds.

[11:08:11] Connecting to ws://127.0.0.1:17815/rmt
[11:08:12] Connection Error 
[11:08:12] Disconnected
[11:08:13] Connecting to ws://127.0.0.1:17815/rmt
[11:08:14] Connection Error 
[11:08:14] Disconnected
[11:08:15] Connecting to ws://127.0.0.1:17815/rmt
[11:08:16] Connection Error 
[11:08:16] Disconnected
[11:08:17] Connecting to ws://127.0.0.1:17815/rmt
[11:08:18] Connection Error 
[11:08:18] Disconnected
[11:08:19] Connecting to ws://127.0.0.1:17815/rmt
[11:08:20] Connection Error 
[11:08:20] Disconnected

Looking at the Chome console, I can see this:

WebSocket connection to 'ws://127.0.0.1:17815/rmt' failed: Could not decode a text frame as UTF-8.
WebSocketConnection.js:54 WebSocket connection to 'ws://127.0.0.1:17815/rmt' failed: Invalid frame header
WebSocketConnection.js:54 WebSocket connection to 'ws://127.0.0.1:17815/rmt' failed: Could not decode a text frame as UTF-8.
WebSocketConnection.js:54 WebSocket connection to 'ws://127.0.0.1:17815/rmt' failed: Invalid frame header
WebSocketConnection.js:54 WebSocket connection to 'ws://127.0.0.1:17815/rmt' failed: Could not decode a text frame as UTF-8.
WebSocketConnection.js:54 WebSocket connection to 'ws://127.0.0.1:17815/rmt' failed: Invalid frame header
WebSocketConnection.js:54 WebSocket connection to 'ws://127.0.0.1:17815/rmt' failed: Could not decode a text frame as UTF-8.
WebSocketConnection.js:89 WebSocket connection to 'ws://127.0.0.1:17815/rmt' failed: Invalid frame headerWebSocketConnection.js:89 OnOpen
WebSocketConnection.js:89 WebSocket connection to 'ws://127.0.0.1:17815/rmt' failed: Could not decode a text frame as UTF-8.
9WebSocketConnection.js:54 WebSocket connection to 'ws://127.0.0.1:17815/rmt' failed: Could not decode a text frame as UTF-8.

On the WebSocketConnection.js file line 54:
this.Socket = new WebSocket(address);

This error are reported by chrome:

WebSocket connection to 'ws://127.0.0.1:17815/rmt' failed: Could not decode a text frame as UTF-8.
WebSocket connection to 'ws://127.0.0.1:17815/rmt' failed: Invalid frame header

Any think I could test to track this issue ?

I wonder if I have too many marks, as I have changed my current profiller marks to call remotery...

sample crashes

sorry to bother, the sample runs for 4-8 seconds and crashes all the time, Win7-64bit. the sample crashes here:
assertion failed: web_socket != NULL in remotery.c line 2139 (WebSocket_Send first assertion).

in the viewer I see that the connection is all the time up and immediately down again. from the chrome-debugger:
[19:17:00] Connecting to ws://127.0.0.1:17815/rmt
[19:17:00] Connected
[19:17:00] Connection Error
[19:17:00] Disconnected
[19:17:02] Connecting to ws://127.0.0.1:17815/rmt
[19:17:02] Connected
[19:17:02] Connection Error
[19:17:02] Disconnected
[19:17:04] Connecting to ws://127.0.0.1:17815/rmt
[19:17:04] Connected
[19:17:04] Connection Error
[19:17:04] Disconnected
[19:17:06] Connecting to ws://127.0.0.1:17815/rmt

[19:17:02] start profiling
[19:17:02] end profiling
[19:17:02] start profiling
[19:17:02] end profiling
[19:17:02] start profiling
[19:17:02] end profiling
[19:17:02] start profiling

first frame of network traffic (Notice also the error in "MainThread"!):

{"id":"SAMPLES","thread_name":"MainTh�~©{"id":"SAMPLES","thread_name":"MainThread","nb_samples":1,"sample_digest":593689054,"samples":[{"name":"delay","id":558789103,"cp
170
19:17:00
{"id":"SAMPLES","thread_name":"MainThread","nb_samples":2,"sample_digest":2560211960,"samples":[{"name":"delay","id":558789103,"cpu_us_start":2959071,"cpu_us_length":1}]}
170
19:17:00
{"id":"SAMPLES","thread_name":"MainThread","nb_samples":2,"sample_digest":2560211960,"samples":[{"name":"delay","id":558789103,"cpu_us_start":2959069,"cpu_us_length":2}]}
170
19:17:00
{"id":"SAMPLES","thread_name":"MainThread","nb_samples":1,"sample_digest":593689054,"samples":[{"name":"delay","id":558789103,"cpu_us_start":2959064,"cpu_us_length":4}]}
169
19:17:00
{ "id": "LOG", "text": "start profiling"}
41
19:17:00
{ "id": "LOG", "text": "end profiling"}
39
19:17:00
{ "id": "PING" }

and after each PING the connection is closed....

here another strange thing:
{"id":"SAMPLES","thread_name":"MainThre�'{ "id": "LOG", "text": "end profiling"}�~

I've added the thread-name to the sample:
if( RMT_ERROR_NONE != rmt_CreateGlobalInstance(&rmt) ) {
return -1;
}

rmt_SetCurrentThreadName("MainThread");

for(;;) {
    rmt_LogText("start profiling");
    delay();

celtoys / remotery Goto Github PK

remotery's Introduction

Remotery

Compiling

Basic Use

Running the Viewer

Sampling CUDA GPU activity

Sampling Direct3D 11 GPU activity

Sampling OpenGL GPU activity

Sampling Metal GPU activity

Sampling Vulkan GPU activity

Applying Configuration Settings

remotery's People

Contributors

Stargazers

Watchers

Forkers

remotery's Issues

Recommend Projects

Recommend Topics

Recommend Org