Giter VIP home page Giter VIP logo

Comments (11)

MolecularMatters avatar MolecularMatters commented on August 10, 2024 2

I did notice quite a bit of "ILT" symbols which I'm discarding, but I still have more symbols than before, which is good but surprising.

The ILT symbols are "incremental linking thunks" stored in the "* Linker *" module stream. They are produced when compiling with /INCREMENTAL, and are 5-byte jmp thunks. DIA will also return them during enumeration when using SymTagThunk. Are you missing those in your DIA implementation?

It's probably the "public symbols that are not stored in any of the module streams".

Most likely not, because DIA will also return those symbols when enumerating with SymTagPublicSymbol. However, what you might be missing is the fact that with DIA, you have to recurse into returned IDiaSymbol* with ::findChildren. If you don't do that, you will certainly be missing symbols.

Internally, in the module streams, there are symbols which open a scope (e.g. S_LPROC32) and others which close a scope (e.g. S_END). DIA seems to follow this parent-child relationship when enumerating symbols, hence you have to ask for children of returned symbols as well.

In order to find all function symbols in a PDB using DIA, you have to:

  • enumerate all SymTagPublicSymbol
  • enumerate all SymTagFunction and SymTagBlock (!)
  • for each SymTagFunction and SymTagBlock, use findChildren(), recursively, until there are no more symbols returned

Once you do that in DIA, you should be able to get the same number of symbols, but the performance gap between DIA and raw_pdb will become even bigger.

The raw_pdb implementation is much faster than its DIA counterpart.

It would be great if you could provide some numbers for comparison, once you figure out which symbols you are missing in DIA.

from raw_pdb.

MolecularMatters avatar MolecularMatters commented on August 10, 2024 1

I added an example that demonstrates how to do this.
Let me know if that works for you.

from raw_pdb.

MolecularMatters avatar MolecularMatters commented on August 10, 2024 1

It's for Orbit :-)

Ah yes, I always mix up Orbit and Optick :).
I know Orbit from when you first presented it, it seems to have grown a lot during the last years!

from raw_pdb.

MolecularMatters avatar MolecularMatters commented on August 10, 2024

Since you specifically asked about function symbols, I guess this is for Optick, so I assume you are only interested in function symbols per se.

The fastest way I know of how to do that would be the following:

  1. Walk the module symbol streams first and fetch everything that is a function. Keep track of which RVAs you already found, this is neeed for later. The size of the function is stored in any of the S_*PROC.codeSize members.
  2. At this point, you already know ~90% of all function symbols and are done. However, with stripped PDBs or certain PDBs from middleware providers, there will be public symbols that are not stored in any of the module streams.
  3. Walk the public symbol stream, ignoring anything that is not a function. This can be done with a simple bit-check against S_PUB32.flags.
  4. For each public symbol that is a function, consult the previously stored table of RVAs from step 1. If this is a new symbol, you still need to get its size. This can be done by computing the distance between this and the next function symbol.

Since function symbols & sizes are what most profilers are interested in, I will provide an example for that.

from raw_pdb.

pierricgimmig avatar pierricgimmig commented on August 10, 2024

Thanks a lot for the detailed example! I'll try it out today and let you know how it goes.

Since you specifically asked about function symbols, I guess this is for Optick

It's for Orbit :-)

We already have two pdb parser implementations, one using LLVM and the other one using the DIA SDK. We're interested in seeing how raw_pdb compares.

from raw_pdb.

pierricgimmig avatar pierricgimmig commented on August 10, 2024

Quick update on this. I integrated the sample code you provided and it seems to work as expected. The raw_pdb implementation is much faster than its DIA counterpart. It also found more symbols than both our DIA and LLVM versions, I need to dig a bit more to understand exactly what the difference is. I did notice quite a bit of "ILT" symbols which I'm discarding, but I still have more symbols than before, which is good but surprising. It's probably the "public symbols that are not stored in any of the module streams". Many thanks @MolecularMatters for your help and for the great project!

from raw_pdb.

pierricgimmig avatar pierricgimmig commented on August 10, 2024

Again, thanks @MolecularMatters for the detailed answer. This is great information, I'll double check our Llvm and Dia implementations.

from raw_pdb.

pierricgimmig avatar pierricgimmig commented on August 10, 2024

It would be great if you could provide some numbers for comparison, once you figure out which symbols you are missing in DIA.

Absolutely!!

from raw_pdb.

florian-kuebler avatar florian-kuebler commented on August 10, 2024

However, what you might be missing is the fact that with DIA, you have to recurse into returned IDiaSymbol* with ::findChildren. If you don't do that, you will certainly be missing symbols.

Is this only true when iterating over the different compilads/modules and their children (with the filter for SymTagFunction), or do I also need to do take care of this when getting all children (that have SymTagFunction) from the global scope, right away?

In my experiments, the results were the same.

enumerate all SymTagFunction and SymTagBlock (!)

Also, as far as I understand the documentation, blocks should usually not have a name, and there should be a function surrounding them, right? So for Orbit, it would be fine to ignore those.

from raw_pdb.

MolecularMatters avatar MolecularMatters commented on August 10, 2024

Is this only true when iterating over the different compilads/modules and their children (with the filter for SymTagFunction), or do I also need to do take care of this when getting all children (that have SymTagFunction) from the global scope, right away?

In my experiments, the results were the same.

If I remember correctly, Clang likes to store data symbols as children of the global scope sometimes (e,g, function static variables), which MSVC never does.

Also, as far as I understand the documentation, blocks should usually not have a name, and there should be a function surrounding them, right? So for Orbit, it would be fine to ignore those.

I think that is mostly true, but I encountered blocks that don't seem to belong to any other function, and had to be matched against address ranges from other function symbols. That only seemed to be the case for certain kernel symbol PDBs though.
Maybe @rovarma from Superluminal can comment, since I believe he also ran into this.

from raw_pdb.

florian-kuebler avatar florian-kuebler commented on August 10, 2024

If I remember correctly, Clang likes to store data symbols as children of the global scope sometimes (e,g, function static variables), which MSVC never does.

As I was only looking into function symbols and not data symbols, that seem to be fine. Thanks for the explanation!

I think that is mostly true, but I encountered blocks that don't seem to belong to any other function, and had to be matched against address ranges from other function symbols. That only seemed to be the case for certain kernel symbol PDBs though.

Thanks for the clarification. If you remember the pdbs in question, that would be great.

from raw_pdb.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.