Comments (11)
I did notice quite a bit of "ILT" symbols which I'm discarding, but I still have more symbols than before, which is good but surprising.
The ILT symbols are "incremental linking thunks" stored in the "* Linker *" module stream. They are produced when compiling with /INCREMENTAL, and are 5-byte jmp thunks. DIA will also return them during enumeration when using SymTagThunk. Are you missing those in your DIA implementation?
It's probably the "public symbols that are not stored in any of the module streams".
Most likely not, because DIA will also return those symbols when enumerating with SymTagPublicSymbol. However, what you might be missing is the fact that with DIA, you have to recurse into returned IDiaSymbol* with ::findChildren. If you don't do that, you will certainly be missing symbols.
Internally, in the module streams, there are symbols which open a scope (e.g. S_LPROC32) and others which close a scope (e.g. S_END). DIA seems to follow this parent-child relationship when enumerating symbols, hence you have to ask for children of returned symbols as well.
In order to find all function symbols in a PDB using DIA, you have to:
- enumerate all SymTagPublicSymbol
- enumerate all SymTagFunction and SymTagBlock (!)
- for each SymTagFunction and SymTagBlock, use findChildren(), recursively, until there are no more symbols returned
Once you do that in DIA, you should be able to get the same number of symbols, but the performance gap between DIA and raw_pdb will become even bigger.
The raw_pdb implementation is much faster than its DIA counterpart.
It would be great if you could provide some numbers for comparison, once you figure out which symbols you are missing in DIA.
from raw_pdb.
I added an example that demonstrates how to do this.
Let me know if that works for you.
from raw_pdb.
It's for Orbit :-)
Ah yes, I always mix up Orbit and Optick :).
I know Orbit from when you first presented it, it seems to have grown a lot during the last years!
from raw_pdb.
Since you specifically asked about function symbols, I guess this is for Optick, so I assume you are only interested in function symbols per se.
The fastest way I know of how to do that would be the following:
- Walk the module symbol streams first and fetch everything that is a function. Keep track of which RVAs you already found, this is neeed for later. The size of the function is stored in any of the S_*PROC.codeSize members.
- At this point, you already know ~90% of all function symbols and are done. However, with stripped PDBs or certain PDBs from middleware providers, there will be public symbols that are not stored in any of the module streams.
- Walk the public symbol stream, ignoring anything that is not a function. This can be done with a simple bit-check against S_PUB32.flags.
- For each public symbol that is a function, consult the previously stored table of RVAs from step 1. If this is a new symbol, you still need to get its size. This can be done by computing the distance between this and the next function symbol.
Since function symbols & sizes are what most profilers are interested in, I will provide an example for that.
from raw_pdb.
Thanks a lot for the detailed example! I'll try it out today and let you know how it goes.
Since you specifically asked about function symbols, I guess this is for Optick
It's for Orbit :-)
We already have two pdb parser implementations, one using LLVM and the other one using the DIA SDK. We're interested in seeing how raw_pdb compares.
from raw_pdb.
Quick update on this. I integrated the sample code you provided and it seems to work as expected. The raw_pdb implementation is much faster than its DIA counterpart. It also found more symbols than both our DIA and LLVM versions, I need to dig a bit more to understand exactly what the difference is. I did notice quite a bit of "ILT" symbols which I'm discarding, but I still have more symbols than before, which is good but surprising. It's probably the "public symbols that are not stored in any of the module streams". Many thanks @MolecularMatters for your help and for the great project!
from raw_pdb.
Again, thanks @MolecularMatters for the detailed answer. This is great information, I'll double check our Llvm and Dia implementations.
from raw_pdb.
It would be great if you could provide some numbers for comparison, once you figure out which symbols you are missing in DIA.
Absolutely!!
from raw_pdb.
However, what you might be missing is the fact that with DIA, you have to recurse into returned IDiaSymbol* with ::findChildren. If you don't do that, you will certainly be missing symbols.
Is this only true when iterating over the different compilads/modules and their children (with the filter for SymTagFunction), or do I also need to do take care of this when getting all children (that have SymTagFunction) from the global scope, right away?
In my experiments, the results were the same.
enumerate all SymTagFunction and SymTagBlock (!)
Also, as far as I understand the documentation, blocks should usually not have a name, and there should be a function surrounding them, right? So for Orbit, it would be fine to ignore those.
from raw_pdb.
Is this only true when iterating over the different compilads/modules and their children (with the filter for SymTagFunction), or do I also need to do take care of this when getting all children (that have SymTagFunction) from the global scope, right away?
In my experiments, the results were the same.
If I remember correctly, Clang likes to store data symbols as children of the global scope sometimes (e,g, function static variables), which MSVC never does.
Also, as far as I understand the documentation, blocks should usually not have a name, and there should be a function surrounding them, right? So for Orbit, it would be fine to ignore those.
I think that is mostly true, but I encountered blocks that don't seem to belong to any other function, and had to be matched against address ranges from other function symbols. That only seemed to be the case for certain kernel symbol PDBs though.
Maybe @rovarma from Superluminal can comment, since I believe he also ran into this.
from raw_pdb.
If I remember correctly, Clang likes to store data symbols as children of the global scope sometimes (e,g, function static variables), which MSVC never does.
As I was only looking into function symbols and not data symbols, that seem to be fine. Thanks for the explanation!
I think that is mostly true, but I encountered blocks that don't seem to belong to any other function, and had to be matched against address ranges from other function symbols. That only seemed to be the case for certain kernel symbol PDBs though.
Thanks for the clarification. If you remember the pdbs in question, that would be great.
from raw_pdb.
Related Issues (20)
- Examples project: NULL pointer dereference HOT 2
- Tons of warnings emitted when building with clang 13.0.1 HOT 3
- Examples project: NULL pointer dereference from appending NULL string HOT 5
- Crash when trying to read symbols from `LLVMCodeGen.pdb` HOT 3
- Can't build HOT 3
- How to get type size HOT 1
- Rewrite PDBs? HOT 11
- Examples fail to compile on Ubuntu 22.04.2 HOT 2
- Unhandled TypeRecordKind 0x1609 HOT 11
- Default MSFStream Type - CoalescedMSFStream vs DirectMSFStream HOT 2
- Unhandled TypeRecordKind 0x110B HOT 1
- ExampleFunctionSymbols trampling HOT 2
- Unhandled record kind 0x1179 HOT 6
- ExampleTypes assert in DisplayFields() HOT 4
- Qt6WebEngineCore.pdb error HOT 18
- Why are so many rva the same, but the functions are different
- jpeg62.pdb bug HOT 2
- Microsoft.VisualStudio.Coverage.Monitor.pdb bug HOT 9
- I am not sure I understand the validity of PDB_NO_ALIAS HOT 2
- Can a pdb file be modified? HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from raw_pdb.