Giter VIP home page Giter VIP logo

dnfile's People

Contributors

coloursofnoise avatar dependabot[bot] avatar doomedraven avatar malwarefrank avatar mike-hunhoff avatar mmaps avatar vandir avatar williballenthin avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

dnfile's Issues

push v0.11.2 to PyPI

looks like you tagged v0.11.2 but PyPI still hosts v0.11.0. Could you please push v0.11.2 to PyPI? we would love to pull in your recent changes, thank you!

consistent attribute access

Objects' attributes throughout the project should be accessed consistently to reduce cognitive load on developers and library users.

Right now some objects have attributes that are set to None when there is a parse error, while others are not set at all.

Request to add additional resource type support

Requesting that you add the ability to parse BMP images stored as entries within the .NET resources.

Sample:
https://www.virustotal.com/gui/file/0a5dc3b6669cf31e8536c59fe1315918eb4ecfd87998445e2eeb8fed64bd2f2c

dnfile properly identified the resource names and types but the data property is NoneType. Attached is the output from the following code:

pe = dnfile.dnPE(filepath)
for r in pe.net.resources:
    if r.name == “20a87df82283.Resources.resources”:
        for entry in r.data.entries:
            print(f”{r.name}: {entry.name} - {type(entry.data)}“)
            print(entry.__dict__)
            print(entry.struct.__dict__)

image

I know that the open-source project dnSpy does an excellent job of parsing this resource type from .NET executables so maybe some of that logic can be ported into this project.

https://github.com/dnSpyEx/dnSpy

https://github.com/dnSpyEx/dnSpy/blob/master/Extensions/dnSpy.BamlDecompiler/Baml/KnownTypes.cs

https://github.com/dnSpy/dnSpy/blob/2b6dcfaf602fb8ca6462b8b6237fdfc0c74ad994/dnSpy/dnSpy.Contracts.DnSpy/Documents/TreeView/Resources/SerializedImageListStreamerUtilities.cs#L45-L63

https://github.com/dnSpy/dnSpy/blob/2b6dcfaf602fb8ca6462b8b6237fdfc0c74ad994/dnSpy/dnSpy.Contracts.DnSpy/Documents/TreeView/Resources/SerializedImageListStreamerUtilities.cs#L73-L98

Could possibly use this code to dramatically increate support for other types at the same time.

Process strings, user_strings, GUIDs, etc. at time of load

Submitting a request to have things like strings, user_strings, and GUIDs processed when dnfile first loads an executable. Basically implementing the code provided in the following example into dnfile:

https://github.com/malwarefrank/dnfile/blob/b2a24c5eb46995a739c7bb5f626d6f4052ccb753/examples/dnstrings.py

It would be great if the extracted strings could then be simply referenced by the user via a property like dnfile.net.user_strings, which would return a set of extracted user strings.

IndexError in read_compressed_int (utils.py)

I was wondering if you'd be interested by this error, caused by this file.
I found it using CAPA, with dnfile 0.14.1, but it also triggers on 0.15.0.

>>> import dnfile
>>> pe = dnfile.dnPE("e94f7c475e7db0691a2698b5dd349c2b412ffddafa7a3ff85785cbd5ac144fcb")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File ".../dnfile/__init__.py", line 64, in __init__
    super().__init__(name, data, fast_load)
  File ".../pefile.py", line 2895, in __init__
    self.__parse__(name, data, fast_load)
  File ".../dnfile/__init__.py", line 132, in __parse__
    super().__parse__(fname, data, fast_load)
  File ".../pefile.py", line 3328, in __parse__
    self.full_load()
  File ".../pefile.py", line 3439, in full_load
    self.parse_data_directories()
  File ".../dnfile/__init__.py", line 178, in parse_data_directories
    value = entry[1](dir_entry.VirtualAddress, dir_entry.Size)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File ".../dnfile/__init__.py", line 221, in parse_clr_structure
    return ClrData(self, rva, size, self.clr_lazy_load)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File ".../dnfile/__init__.py", line 526, in __init__
    self._init_resources(pe)
  File ".../dnfile/__init__.py", line 574, in _init_resources
    rsrc.parse()
  File ".../dnfile/resource.py", line 289, in parse
    rs.parse()
  File ".../dnfile/resource.py", line 433, in parse
    rsrc_factory.read_rsrc_data_v1(self._data, e_data_offset, self.resource_types, e)
  File ".../dnfile/resource.py", line 113, in read_rsrc_data_v1
    d, v = self.type_str_to_type(entry.type_name, data, offset)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File ".../dnfile/resource.py", line 166, in type_str_to_type
    final_bytes, n = self.read_serialized_data(data, offset)
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File ".../dnfile/resource.py", line 72, in read_serialized_data
    x = utils.read_compressed_int(data[offset:offset + 4])
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File ".../dnfile/utils.py", line 46, in read_compressed_int
    value |= data[1]
             ~~~~^^^
IndexError: index out of range

The file doesn't look to be too badly corrupted, but I may be wrong. 🙂

MethodList list is empty if type has exactly one method

We found an issue where, if a type has exactly one method (e.g., a cctor), the logic that fills the list exits prematurely due to a bug.

This line is responsible:

if (run_start_index != run_end_index) or (run_end_index == max_row):

For example, if the MethodList index is 122 and the next type's MethodList index is 123, the logic in the lines above the quoted line computes the end index as 122 because it subtracts 1.

Problems with strings extractions

Hello, I just got this bug when i was trying to list the strings

pip3 install git+https://github.com/malwarefrank/dnfile -U
DEPRECATION: Configuring installation scheme with distutils config files is deprecated and will no longer work in the near future. If you are using a Homebrew or Linuxbrew Python, please see discussion at https://github.com/Homebrew/homebrew-core/issues/76621
Collecting git+https://github.com/malwarefrank/dnfile
  Cloning https://github.com/malwarefrank/dnfile to /private/var/folders/yt/mbh1wxlj6fq0qqxbjnjfqr200000gn/T/pip-req-build-efg5499r
  Running command git clone --filter=blob:none --quiet https://github.com/malwarefrank/dnfile /private/var/folders/yt/mbh1wxlj6fq0qqxbjnjfqr200000gn/T/pip-req-build-efg5499r
  Resolved https://github.com/malwarefrank/dnfile to commit 92847841e6496453598947a74eb78fa7299ad579
  Running command git submodule update --init --recursive -q
  Preparing metadata (setup.py) ... done
Requirement already satisfied: pefile>=2019.4.18 in /usr/local/lib/python3.9/site-packages (from dnfile==0.10.0) (2021.9.3)
Requirement already satisfied: future in /usr/local/lib/python3.9/site-packages (from pefile>=2019.4.18->dnfile==0.10.0) (0.18.2)
DEPRECATION: Configuring installation scheme with distutils config files is deprecated and will no longer work in the near future. If you are using a Homebrew or Linuxbrew Python, please see discussion at https://github.com/Homebrew/homebrew-core/issues/76621
➜ python3 dnstring.py b9efc289ffd8951a65f66ddf2649c1959ad1b94f1177002b20f05e8ae86853ae
reference to missing table: File
reference to missing table: File
Traceback (most recent call last):
  File "<censured>/dnstring.py", line 42, in <module>
    show_strings(fname)
  File "<censured>/dnstring.py", line 33, in show_strings
    s = dnfile.stream.UserString(buf)
  File "/usr/local/lib/python3.9/site-packages/dnfile/stream.py", line 116, in __init__
    self.value: str = data.decode(encoding)
UnicodeDecodeError: 'utf-16-le' codec can't decode byte 0x00 in position 28: truncated data

im not sure how string can be 29bytes in wide mode

'utf-16-le' codec can't decode byte 0x00 in position 28: truncated data b'S\x00t\x00u\x00b\x00.\x00R\x00e\x00s\x00o\x00u\x00r\x00c\x00e\x00s\x00\x00'

any ideaa?

thank you

question: tests and test files?

I'm interested in adding tests (using pytest, unless you have other preferences) that demonstrate functionality of dnfile.

  1. is pytest ok with you? And ok if I place tests under tests/test_*.py?
  2. where would you like test data, like .NET modules used in the tests?

For reference, in capa, we have a separate repository, capa-testfiles, that we use to hold all the files used during testing, which we reference as a submodule under tests/data/. This makes it possible to checkout in CI via --recurse-submodules but also easy to checkout the source code without pulling down MBs of test data. Of course, this introduces a bit more configuration and maintenance of two repos vs. one.

What would you like to do for dnfile?

AssertionError: assert hasattr(self, "_row_class") (EncLog)

encountered an unexpected exception when parsing the file 0033ca037e0496c5c33e3dc19714fb3e:

❯ python foo.py tests/data/0033ca037e0496c5c33e3dc19714fb3e
Traceback (most recent call last):
...
  File "scripts/print_cil.py", line 29, in main
    pe = dnfile.dnPE(args.path)
  File "/home/user/code/dnfile/src/dnfile/__init__.py", line 61, in __init__
    super().__init__(name, data, fast_load)
  File "/home/user/env/lib/python3.8/site-packages/pefile.py", line 2743, in __init__
    self.__parse__(name, data, fast_load)
  File "/home/user/code/dnfile/src/dnfile/__init__.py", line 129, in __parse__
    super().__parse__(fname, data, fast_load)
  File "/home/user/env/lib/python3.8/site-packages/pefile.py", line 3148, in __parse__
    self.full_load()
  File "/home/user/env/lib/python3.8/site-packages/pefile.py", line 3259, in full_load
    self.parse_data_directories()
  File "/home/user/code/dnfile/src/dnfile/__init__.py", line 174, in parse_data_directories
    value = entry[1](dir_entry.VirtualAddress, dir_entry.Size)
  File "/home/user/code/dnfile/src/dnfile/__init__.py", line 219, in parse_clr_structure
    return ClrData(self, rva, size)
  File "/home/user/code/dnfile/src/dnfile/__init__.py", line 481, in __init__
    self.metadata = ClrMetaData(pe, metadata_rva, metadata_size)
  File "/home/user/code/dnfile/src/dnfile/__init__.py", line 346, in __init__
    s.parse(self.streams_list)
  File "/home/user/code/dnfile/src/dnfile/stream.py", line 375, in parse
    table = mdtable.ClrMetaDataTableFactory.createTable(
  File "/home/user/code/dnfile/src/dnfile/mdtable.py", line 2031, in createTable
    table = cls._table_number_map[number](
  File "/home/user/code/dnfile/src/dnfile/base.py", line 542, in __init__
    assert hasattr(self, "_row_class")
AssertionError

Only load requested metadata tables

I only need to load the AssemblyRef mdtable, but currently there is no way to restrict the tables that get loaded.

Restricting dnfile to only loading the AssemblyRef and tables ManifestResource (required because it's used to parse the resources) results in a considerable speedup:

# before
debug: Parsed data directories in 7.221096945999989 seconds
mons show main --debug  7.39s user 0.27s system 97% cpu 7.891 total
# after
debug: Parsed data directories in 0.010396081999942908 seconds
mons show main --debug  0.29s user 0.03s system 99% cpu 0.326 total

My thoughts on how this could be implemented are either:

  1. Allow the user to provide a list of mdtables that should be loaded to dnPE.__init__
  2. Implement lazy-loading (as much as is possible) for mdtables

I've already set up the former to test this, but I will look into lazy-loading before opening a PR.

AssertionError: assert hasattr(self, "_row_class") (FieldPtr)

when parsing a private sample, we encounter an exception like:

❯ python -m pdb -- ./examples/dndump.py <redacted>
Traceback (most recent call last):
  File "/usr/lib/python3.8/pdb.py", line 1705, in main
    pdb._runscript(mainpyfile)
  File "/usr/lib/python3.8/pdb.py", line 1573, in _runscript
    self.run(statement)
  File "/usr/lib/python3.8/bdb.py", line 580, in run
    exec(cmd, globals, locals)
  File "<string>", line 1, in <module>
  File "/home/user/code/dnfile/examples/dndump.py", line 2, in <module>
    '''
  File "/home/user/code/dnfile/examples/dndump.py", line 320, in main
    dn = dnfile.dnPE(args.input)
  File "/home/user/code/dnfile/src/dnfile/__init__.py", line 61, in __init__
    super().__init__(name, data, fast_load)
  File "/home/user/env/lib/python3.8/site-packages/pefile.py", line 2743, in __init__
    self.__parse__(name, data, fast_load)
  File "/home/user/code/dnfile/src/dnfile/__init__.py", line 129, in __parse__
    super().__parse__(fname, data, fast_load)
  File "/home/user/env/lib/python3.8/site-packages/pefile.py", line 3148, in __parse__
    self.full_load()
  File "/home/user/env/lib/python3.8/site-packages/pefile.py", line 3259, in full_load
    self.parse_data_directories()
  File "/home/user/code/dnfile/src/dnfile/__init__.py", line 174, in parse_data_directories
    value = entry[1](dir_entry.VirtualAddress, dir_entry.Size)
  File "/home/user/code/dnfile/src/dnfile/__init__.py", line 219, in parse_clr_structure
    return ClrData(self, rva, size)
  File "/home/user/code/dnfile/src/dnfile/__init__.py", line 481, in __init__
    self.metadata = ClrMetaData(pe, metadata_rva, metadata_size)
  File "/home/user/code/dnfile/src/dnfile/__init__.py", line 346, in __init__
    s.parse(self.streams_list)
  File "/home/user/code/dnfile/src/dnfile/stream.py", line 336, in parse
    table = mdtable.ClrMetaDataTableFactory.createTable(
  File "/home/user/code/dnfile/src/dnfile/mdtable.py", line 2091, in createTable
    table = cls._table_number_map[number](
  File "/home/user/code/dnfile/src/dnfile/base.py", line 542, in __init__
    assert hasattr(self, "_row_class")
AssertionError
Uncaught exception. Entering post mortem debugging
Running 'cont' or 'step' will restart the program
> /home/user/code/dnfile/src/dnfile/base.py(542)__init__()
-> assert hasattr(self, "_row_class")
(Pdb) self
<dnfile.mdtable.FieldPtr object at 0x7fba4852a460>
(Pdb) self.name
'FieldPtr'
(Pdb) self.number
3
(Pdb)

guids object iterable, indexable

The GUIDS stream is easily iterable since all items must be exact 32 bytes long and can only be referenced in that fashion. So it should be easy to make the .net.guids stream object iterable, indexable, and len-able.

use ECMA standard names for MethodSpec (0x2B) meta table and columns

From ECMA 335 I.22.29 MethodSpec: 0x2B:

The MethodSpec table has the following columns:

  • Method (an index into the MethodDef or MemberRef table, specifying to which
    generic method this row refers; that is, which generic method this row is an
    instantiation of; more precisely, a MethodDefOrRef (§II.24.2.6) coded index)
  • Instantiation (an index into the Blob heap (§II.23.2.15), holding the signature of
    this instantiation)

The MethodSpec table records the signature of an instantiated generic method.
Each unique instantiation of a generic method (i.e., a combination of Method and Instantiation) shall be
represented by a single row in the table

Using ECMA's standard naming would help make it easier to read code that leverages dnfile to parse MethodSpec:

dnfile/src/dnfile/mdtable.py

Lines 1988 to 2025 in 498f6c6

class GenericMethodRowStruct(RowStruct):
Unknown1_CodedIndex: int
Unknown2_BlobIndex: int
class GenericMethodRow(MDTableRow):
Unknown1: codedindex.MethodDefOrRef
Unknown2: bytes
_struct_class = GenericMethodRowStruct
_struct_codedindexes = {
"Unknown1_CodedIndex": ("Unknown1", codedindex.MethodDefOrRef),
}
_struct_blobs = {
"Unknown2_BlobIndex": "Unknown2",
}
def _compute_format(self):
unknown1_size = self._clr_coded_index_struct_size(
codedindex.MethodDefOrRef.tag_bits,
codedindex.MethodDefOrRef.table_names,
)
blob_ind_size = checked_offset_format(self._blob_offsz)
return (
"CLR_METADATA_TABLE_GENERICMETHOD",
(
unknown1_size + ",Unknown1_CodedIndex",
blob_ind_size + ",Unknown2_BlobIndex",
),
)
class GenericMethod(ClrMetaDataTable[GenericMethodRow]):
name = "GenericMethod"
number = 43
_row_class = GenericMethodRow

File Offset missmatch (get_file_offset())

Issue: The offset returned by get_file_offset() is wrong by 0x1E00.

Details:
I try to get the file offset for all the structs printed by dndump.py with struct.get_file_offset():

At:
dndump.py#L191C23-L192C1

Add:

                    ostream.writeln("[%d]:" % (i + 1))
                    ostream.writeln("File offset: " + str(row.struct.get_file_offset()))

Which gives with dotnet-test.dll for example:

 MethodDef:
    [1]:
    File offset: 8748
      Rva:        0x2048
      Name:       .ctor
      Signature:  200001
      ParamList: (empty)
      ImplFlags:
        miIL
        miManaged
      Flags:
        mdHideBySig
        mdPublic
        mdRTSpecialName
        mdReuseSlot
        mdSpecialName

The file offset is 8748, but the file is only 2023 bytes big.

The effective offset is 8748 - 7680:

$ hexdump -vC -s $((8748 - 7680)) -n 16  dotnet-test.dll
0000042c  48 20 00 00 00 00 86 18  20 02 06 00 01 00 50 20  |H ...... .....P |

The RVA=0x2048 is at the beginning of the MethodDef, as little endian: 48 20.

Note that 7680 is 0x1E00.

Any idea where this 0x1E00 offset is coming from? Is it stable and i can just subtract this offset? Is this a bug?
Probably the RVA to Offset calculation is done not completely correctly. There also seems to be no header or section at the offset 0x1E00 in DotNet PE files.

parse method (and field, and ...) signatures

Method (and field, and ...) signatures are represented by data in a custom binary format that is stored in the #Blob stream. The best references I've found for parsing this data are:

  • ECMA-335 6th Edition, II.23.1 and II.23.2, "Blobs and signatures"
  • dnlib SignatureReader.cs

parse .NET resources

The ManifestResource metadata table may contain rows for .NET resources, external and internal. These are different from PE resources and have their own format as far as I can tell.

dump_info exception

dump_info() is causing an exception. The ClrMetaDataTable class has lost its rva member, but that member is being referenced in dump_info()

parse method header and sections

Parse the Method data (pointed to by RVA, see mdtable.MethodDefRow), as much as is needed to perform data-agnostic computation over the bytecode (cryptographic and fuzzy hashes, entropy, value distributions, etc).

See ECMA-335 6th Edition, Section II.25.4 Common Intermediate Language physical layout

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.