Giter VIP home page Giter VIP logo

Comments (11)

MolecularMatters avatar MolecularMatters commented on August 10, 2024 2

Fixed in 66dccb0.

from raw_pdb.

MolecularMatters avatar MolecularMatters commented on August 10, 2024 1

I checked the PDB files I have to verify this, and they indeed store an array of indices in the SuperBlock, rather than a single index.
Thank you @JustasMasiulis for pointing that out, I will fix this accordingly.

from raw_pdb.

MolecularMatters avatar MolecularMatters commented on August 10, 2024

Thanks for bringing this to my attention!
I haven't seen any PDBs like this yet, it would be appreciated if you had a PDB to share.

In this case, I think the block indices have to be contiguous in memory:
The directory tells us which streams are made up of which blocks, which serves as an indirection and allows PDBs to store their data block-interleaved. In this case, we are reading the block directory, so there is no other indirection available that could tell us where blocks go if they were to be found at another offset in the file.

I'll fix this as soon as I get a PDB file to verify.

from raw_pdb.

pierricgimmig avatar pierricgimmig commented on August 10, 2024

Ok, thanks for the input! So I scanned over 9K pdb files, and it seems I only have 2 that exhibit the problem, and I can't share them. I'll keep searching. At least this is documented and might help anyone running into the same problem.

from raw_pdb.

MolecularMatters avatar MolecularMatters commented on August 10, 2024

Thanks to @rovarma who helped me find PDBs that exhibit this behaviour.
This issue has been fixed in 83516ec.

from raw_pdb.

JustasMasiulis avatar JustasMasiulis commented on August 10, 2024

Thanks to @rovarma who helped me find PDBs that exhibit this behaviour. This issue has been fixed in 83516ec.

IMO that's hardly a fix. If you don't want to implement the case where these blocks are non-contiguous at least do verification of that.

from raw_pdb.

MolecularMatters avatar MolecularMatters commented on August 10, 2024

IMO that's hardly a fix. If you don't want to implement the case where these blocks are non-contiguous at least do verification of that.

That case is implemented, because non-contiguous blocks are all over the place in PDB files.
In this particular case, it is being taken care of by the CoalescedMSFStream in the RawFile, see https://github.com/MolecularMatters/raw_pdb/blob/main/src/PDB_RawFile.cpp#L44.

The bug that needed fixing was the wrong assumption that the indices of all directory blocks will fit into one single block, which is not true for some PDBs.

from raw_pdb.

JustasMasiulis avatar JustasMasiulis commented on August 10, 2024

IMO that's hardly a fix. If you don't want to implement the case where these blocks are non-contiguous at least do verification of that.

That case is implemented, because non-contiguous blocks are all over the place in PDB files. In this particular case, it is being taken care of by the CoalescedMSFStream in the RawFile, see https://github.com/MolecularMatters/raw_pdb/blob/main/src/PDB_RawFile.cpp#L44.

The bug that needed fixing was the wrong assumption that the indices of all directory blocks will fit into one single block, which is not true for some PDBs.

I'm talking specifically about referenced commit and the block map. Unless I'm missing something it seems to me like you are assuming that the stream directory block map is now a contiguous array of indices that can span across multiple blocks.

uint32_t directoryIndicesBlockIndex; // index of the first block that contains an array of indices of directory blocks, where the array itself can span several blocks

However to me it would seem that this definition is not correct and should be as so:

uint32_t directoryIndicesBlockIndices[]; // indices of the blocks that contain array of indices of directory blocks, ...

Meaning it won't necessarily be contiguous.

from raw_pdb.

MolecularMatters avatar MolecularMatters commented on August 10, 2024

Ah, I see what you mean.
I am under the assumption that the MSF SuperBlock is a fixed-size thing, so the block index to the first block that stores the directory indices really is just a single uint32_t index, and not an array of uint32_t indices.

LLVM agrees with this definition: https://llvm.org/docs/PDB/MsfFile.html#msf-superblock
More specifically, it states "The number of ulittle32_t’s in this array is given by ceil(NumDirectoryBytes / BlockSize)." which is also what the current code assumes.

Of course, the LLVM format docs could be wrong, and I'd be happy to implement this in case the SuperBlock contains an array of indices.

from raw_pdb.

JustasMasiulis avatar JustasMasiulis commented on August 10, 2024

LLVM agrees with this definition: https://llvm.org/docs/PDB/MsfFile.html#msf-superblock More specifically, it states "The number of ulittle32_t’s in this array is given by ceil(NumDirectoryBytes / BlockSize)." which is also what the current code assumes.

LLVM has a lot of assumptions that may not hold up for PDBs not generated by LLVM itself.

I'd suggest either reading microsoft's horrendous code base and doing some reversing of DIA or using https://github.com/willglynn/pdb like a middle point of correctness between DIA and LLVM, as while it's probably the most accurate open source implementation it still falls short on what the actual implementation is supposed to be like (IE the address translation).

from raw_pdb.

pierricgimmig avatar pierricgimmig commented on August 10, 2024

Very nice! Many thanks, @MolecularMatters !

from raw_pdb.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.