Giter VIP home page Giter VIP logo

speakeasy's Introduction

Speakeasy

Speakeasy is a portable, modular, binary emulator designed to emulate Windows kernel and user mode malware.

Check out the overview in the first Speakeasy blog post.

Instead of attempting to perform dynamic analysis using an entire virtualized operating system, Speakeasy will emulate specific components of Windows. Specifically, by emulating operating system APIs, objects, running processes/threads, filesystems, and networks it should be possible to present an environment where samples can fully "execute". Samples can be easily emulated in a container or in cloud services which allow for great scalability of many samples to be simultaneously analyzed. Currently, Speakeasy supports both user mode and kernel mode Windows applications.

Before emulating, entry points are identified within the binary. For example, exported functions are all identified and emulated sequentially. Additionally, dynamic entry points (e.g. new threads, registered callbacks, IRP handlers) that are discovered at runtime are also emulated. The goal here is to have as much code coverage as possible during emulation. Events are logged on a per-entry-point basis so that functionality can be attributed to specific functions or exports.

Speakeasy is currently written entirely in Python 3 and relies on the Unicorn emulation engine in order to emulate CPU instructions. The CPU emulation engine can be swapped out and there are plans to support other engines in the future.

APIs are emulated in Python code in order to handle their expected inputs and outputs in order to keep malware on their "happy path". These APIs and their structure should be consistent with the API documentation provided by Microsoft.


Installation

Speakeasy can be executed in a docker container, as a stand-alone script, or in cloud services. The easiest method of installation is by first installing the required package dependencies, and then running the included setup.py script (replace "python3" with your current Python3 interpreter):

cd <repo_base_dir>
python3 -m pip install -r requirements.txt
python3 setup.py install

A docker file is also included in order to build a docker image, however, Speakeasy's dependencies can be installed on the local system and run from Python directly.


Running within a docker container

The included Dockerfile can be used to generate a docker image.


Building the docker image

  1. Build the Docker image; the following commands will create a container with the tag named "my_tag":
cd <repo_base_dir>
docker build -t "my_tag" .
  1. Run the Docker image and create a local volume in /sandbox:
docker run -v <path_containing_malware>:/sandbox -it "my_tag"

Usage


As a library

Speakeasy can be imported and used as a general purpose Windows emulation library. The main public interface named Speakeasy should be used when interacting with the framework. The lower level emulator objects can also be used, however their interfaces may change in the future and may lack documentation.

Below is a quick example of how to emulate a Windows DLL:

    import speakeasy

    # Get a speakeasy object
    se = speakeasy.Speakeasy()

    # Load a DLL into the emulation space
    module = se.load_module("myfile.dll")

    # Emulate the DLL's entry point (i.e. DllMain)
    se.run_module(module)

    # Set up some args for the export
    arg0 = 0x0
    arg1 = 0x1
    # Walk the DLLs exports
    for exp in module.get_exports():
        if exp.name == 'myexport':
            # Call an export named 'myexport' and emulate it
            se.call(exp.address, [arg0, arg1])

    # Get the emulation report
    report = se.get_report()

    # Do something with the report; parse it or save it off for post-processing

For more examples, see the examples directory.


As a standalone command line tool

For users who don't wish to programatically interact with the speakeasy framework as a library, a standalone script is provided to automatically emulate Windows binaries. Speakeasy can be invoked by running the command speakeasy. This command will parse a specified PE and invoke the appropriate emulator (kernel mode or user mode). The script's parameters are shown below.

usage: speakeasy [-h] [-t TARGET] [-o OUTPUT] [-p [PARAMS ...]] [-c CONFIG] [-m] [-r] [--raw_offset RAW_OFFSET]
                        [-a ARCH] [-d DUMP_PATH] [-q TIMEOUT] [-z DROP_FILES_PATH] [-l MODULE_DIR] [-k] [--no-mp]

Emulate a Windows binary with speakeasy

optional arguments:
  -h, --help            show this help message and exit
  -t TARGET, --target TARGET
                        Path to input file to emulate
  -o OUTPUT, --output OUTPUT
                        Path to output file to save report
  -p [PARAMS ...], --params [PARAMS ...]
                        Commandline parameters to supply to emulated process (e.g. main(argv))
  -c CONFIG, --config CONFIG
                        Path to emulator config file
  -m, --mem-tracing     Enables memory tracing. This will log all memory access by the sample but will impact speed
  -r, --raw             Attempt to emulate file as-is with no parsing (e.g. shellcode)
  --raw_offset RAW_OFFSET
                        When in raw mode, offset (hex) to start emulating
  -a ARCH, --arch ARCH  Force architecture to use during emulation (for multi-architecture files or shellcode). Supported
                        archs: [ x86 | amd64 ]
  -d DUMP_PATH, --dump DUMP_PATH
                        Path to store compressed memory dump package
  -q TIMEOUT, --timeout TIMEOUT
                        Emulation timeout in seconds (default 60 sec)
  -z DROP_FILES_PATH, --dropped-files DROP_FILES_PATH
                        Path to store files created during emulation
  -l MODULE_DIR, --module-dir MODULE_DIR
                        Path to directory containing loadable PE modules. When modules are parsed or loaded by samples, PEs
                        from this directory will be loaded into the emulated address space
  -k, --emulate-children
                        Emulate any processes created with the CreateProcess APIs after the input file finishes emulating
  --no-mp               Run emulation in the current process to assist instead of a child process. Useful when
                        debuggingspeakeasy itself (using pdb.set_trace()).

Examples

Emulating a Windows driver:

user@mybox:~/speakeasy$ speakeasy -t ~/drivers/MyDriver.sys

Emulating 32-bit Windows shellcode:

user@mybox:~/speakeasy$ speakeasy -t ~/sc.bin  -r -a x86

Emulating 64-bit Windows shellcode and create a full memory dump:

user@mybox:~/speakeasy$ speakeasy -t ~/sc.bin  -r -a x64 -d memdump.zip

Configuration

Speakeasy uses configuration files that describe the environment that is presented to the emulated binaries. For a full description of these fields see the README here.


Memory Management

Speakeasy implements a lightweight memory manager on top of the emulator engine’s memory management. Each chunk of memory allocated by malware is tracked and tagged so that meaningful memory dumps can be acquired. Being able to attribute activity to specific chunks of memory can prove to be extremely useful for analysts. Logging memory reads and writes to sensitive data structures can reveal the true intent of malware not revealed by API call logging which is particularly useful for samples such as rootkits.


Speed

Because Speakeasy is written in Python, speed is an obvious concern. Transitioning between native code and Python is extremely expensive and should be done as little as possible. Therefore, the goal is to only execute Python code when it is absolutely necessary. By default, the only events handled in Python are memory access exceptions or Windows API calls. In order to catch Windows API calls and emulate them in Python, import tables are doped with invalid memory addresses so that Python code is only executed when import tables are accessed. Similar techniques are used for when shellcode accesses the export tables of DLLs loaded within the emulated address space of shellcode. By executing as little Python code as possible, reasonable speeds can be achieved while still allowing users to rapidly develop capabilities for the framework.


Limitations

Since we do not rely on a physical OS to handle API calls, object and memory allocation, and I/O operations, these responsibilities fall to the emulator. Upon emulating multiple samples, users are likely to encounter samples that do not fully emulate. This can most likely be attributed to missing API handlers, specific OS implementation details, or environmental factors. For more details see doc/limitations.


Module export parsing

Many malware samples such as shellcode will attempt to manually parse the export tables of PE modules in order resolve API function pointers. An attempt is made to make "decoy" export tables using the emulated function names currently supported but this may not be enough for some samples. The configuration files support two fields named module_directory_x86 and module_directory_x64. These fields are directories that can contain DLLs or other modules that are loaded into the virtual address space of the emulated sample. There is also a command line option (-l) that can specify this directory at runtime. This can be useful for samples that do deep parsing of PE modules that are expected to be loaded within memory.


Adding API handlers

Like most emulators, API calls made to the OS are handled by the framework. Emulated API handlers can be added by simply defining a function with the correct name in its corresponding emulated module. Depending on the outputs expected by the API, it may be sufficient enough to simply return a success code. The argument count must be specified in order for the stack to be cleaned up correctly. If no calling convention is specified, stdcall is assumed. The argument list is passed to the emulated function as raw integers.

Below is an example of an API handler for the HeapAlloc function in the kernel32 module.

    @apihook('HeapAlloc', argc=3)
    def HeapAlloc(self, emu, argv, ctx={}):
        '''
        DECLSPEC_ALLOCATOR LPVOID HeapAlloc(
          HANDLE hHeap,
          DWORD  dwFlags,
          SIZE_T dwBytes
        );
        '''

        hHeap, dwFlags, dwBytes = argv

        chunk = self.heap_alloc(dwBytes, heap='HeapAlloc')
        if chunk:
            emu.set_last_error(windefs.ERROR_SUCCESS)

        return chunk

Further information

speakeasy's People

Contributors

0ssigeno avatar 0xa13d avatar accidentalrebel avatar buffer avatar calebstewart avatar cecio avatar cthulhusec avatar dmsft avatar downwithup avatar drewvis avatar dtrizna avatar evil-e avatar garnetsunset avatar hongthatcong avatar jhsmith avatar jsherman212 avatar jtbennett-fe avatar lucebac avatar malwarefrank avatar mrexodia avatar mwilliams31 avatar patuti avatar raymondlleong avatar re-fox avatar res260 avatar sacx avatar snemes avatar stonerhash avatar te-k avatar williballenthin avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

speakeasy's Issues

invalid_read on several different shellcode

Really neat project thus far, just have a small issue with it. When using the -r flag on several different variations of shellcode (mainly from https://github.com/qilingframework/qiling/tree/master/examples/shellcodes), I keep getting invalid_read. Not really sure what's going on with it, also, is there any reason the urlmon LoadLibrary call failed from below?

Example execution:

 python .\run_speakeasy.py -r -a x86 -t ..\win32_urldownload.bin -o report.json
0x104b: 'kernel32.LoadLibraryA("urlmon")' -> 0x0
Caught error: invalid_read
* Finished emulating
* Saving emulation report to report.json

Execute sample with administrator privilege

Hi there!
Does Speakeasy support the possibility to execute the sample with administrator privilege?
A lot of samples change their execution based on that privilege, so I thought was a good "mode" allowed in the Profile definition.

Question around memory permissions

I was building support for Kernel32.dll, IsBadStringPtr and seeking guidance.

This is largely a copy of IsBadReadPtr with adjustments made for string width.

    @apihook('IsBadStringPtr', argc=2)
    def IsBadStringPtr(self, emu, argv, ctx={}):
        '''
        BOOL IsBadStringPtrW(
            LPCWSTR  lpsz,
            UINT_PTR ucchMax
        );
        '''
        lp, ucchMax = argv
        cw = self.get_char_width(ctx)
        rv = True
        if lp and ucchMax:
            v1 = emu.is_address_valid(lp)
            v2 = emu.is_address_valid(lp + (cw*ucchMax - cw))
            if v1 and v2:
                rv = False
        return rv

The question I have: If I wanted to explicitly check for R permissions of the memory section. What's the best way to do that? I'm assuming that I would get_address_map and then READ_PERMISSION & mm.get_prot()?

mm = emu.get_address_map(lp)
if READ_PERM & mm.get_prot():
   <do something>

Installing with setup.py on Windows does not copy all files

Running the command python setup.py install on Windows 10 (running Python 3.8) produces the following warnings.

warning: manifest_maker: MANIFEST.in, line 3: path 'speakeasy/winenv/decoys/' cannot end with '/'

warning: manifest_maker: MANIFEST.in, line 4: path 'speakeasy/resources/' cannot end with '/'

As a result, these subdirectories are missing from Lib\site-packages\speakeasy_emulator-1.4.8-py3.8.egg

  • speakeasy/winenv/decoys/x86
  • speakeasy/winenv/decoys/x64
  • speakeasy/resources/files
  • speakeasy/resources/web
C:.
├───EGG-INFO
├───speakeasy
│   ├───configs
│   ├───engines
│   │   └───__pycache__
│   ├───windows
│   │   ├───kernel_mods
│   │   │   └───__pycache__
│   │   └───__pycache__
│   ├───winenv
│   │   ├───api
│   │   │   ├───kernelmode
│   │   │   │   └───__pycache__
│   │   │   ├───usermode
│   │   │   │   └───__pycache__
│   │   │   └───__pycache__
│   │   ├───decoys
│   │   │   └───__pycache__
│   │   ├───defs
│   │   │   ├───ndis
│   │   │   │   └───__pycache__
│   │   │   ├───nt
│   │   │   │   └───__pycache__
│   │   │   ├───registry
│   │   │   │   └───__pycache__
│   │   │   ├───wfp
│   │   │   │   └───__pycache__
│   │   │   ├───windows
│   │   │   │   └───__pycache__
│   │   │   ├───winsock
│   │   │   │   └───__pycache__
│   │   │   └───__pycache__
│   │   └───__pycache__
│   └───__pycache__
└───tests
    └───__pycache__

This is due to the trailing slashes found in the paths in these 2 lines of MANIFEST.in.

https://github.com/fireeye/speakeasy/blob/47f9d3602a6b627e928f4f9a05794e4344541a7d/MANIFEST.in#L3#L4

dropped file data is None

file data is None for file when executing log_dropped_files in profiler.py after a run, causes a TypeError

reproduce with 46c3fde520bbfd6acbc2b945db8ae526

Adding a new API hook

When emulating a EXE file, I would add another API hook because of this:

module_entry: Caught error: unsupported_api
Invalid memory read (UC_ERR_READ_UNMAPPED)
Unsupported API: kernel32.InitializeSRWLock (ret: 0x4d41b6)

From MSDN, this function seems pretty simple so, for a learning experience, I would define a new API hook in the framework. Can you point me how to approach this task? I would start from the provided example emu_exe.py

void InitializeSRWLock( PSRWLOCK SRWLock );

Thanks!

Add stack size to json config file

Some shellcode and malwares alloc a large stack, for example 1M size.
File /windows/win32.py, line 191, stack size is hardcoded.
Can you add a option, key: "stack_size" to config file json
User can change the stack size according their need.
Thanks

when emulation fails due to unsupported API, describe which one and where

in the output generated by run_speakeasy i see the following:

0x18001fd13: 'KERNEL32.GetProcAddress(0x77000000, "SleepConditionVariableCS")' -> 0xfeee001c
0x18001fd26: 'KERNEL32.GetProcAddress(0x77000000, "WakeAllConditionVariable")' -> 0xfeee001b
dll_entry.DLL_PROCESS_ATTACH: Caught error: unsupported_api

but which API is not implemented? where did the emulation fail (so i can review it in IDA)?

exception when calling speakeasy's "call" api

File "speakeasy/speakeasy/binemu.py", line 324, in set_func_args
self.mem_write(curr_sp, r)
File "speakeasy/speakeasy/memmgr.py", line 194, in mem_write
self.emu_eng.mem_write(addr, data)
File "speakeasy/speakeasy/engines/unicorn_eng.py", line 196, in mem_write
return self.emu.mem_write(addr, data)
File "python3.7/site-packages/unicorn/unicorn.py", line 442, in mem_write
raise UcError(status)

GetProcAddress issue

Hi!
I encountered problems with get proc address, looks like the emu doesn't want to write to a data section, is this a question of config?

| push crackme.412CE8                                     | 412CE8:"FlsAlloc"
| call esi                                                | esi:GetProcAddress
| xor eax,dword ptr ds:[418480]                           |
| mov dword ptr ds:[41AE20],eax                           | < this is where it fails with access deny err

Also, speaking of emulation being slow, in GetProcAddress handler, looks like some cycles are wasted there:

def GetProcAddress(self, emu, argv, ctx={}):

***

	mods = emu.get_user_modules()
	for mod in mods:
		if mod.get_base() == hmod:
			bn = mod.get_base_name()
			mname, _ = os.path.splitext(bn)
			rv = emu.get_proc(mname, proc)
			# maybe break the loop and stop interating once the rv of the foo is found?

	return rv

I mean this emulator is great for reversing a packer, but I run into constant problems when trying to run through it anything else.

Run from a file offset or RVA

Could you add a command line option for run_speakeasy.py to run from beginning code at a specified file offset or RVA ?
Thanks

How to debug unhandled_interrupt

While emulating, it stuck with this message:

0x50008: Unhandled interrupt: intnum=0x3
shellcode: Caught error: unhandled_interrupt

How to understand where it's failing and eventually trying to fix it in order to continue emulation?

TypeError in create_file_archive

Working on a small speakeasy loader, I hit a bug when trying to dump files.

  File "speakeasy_loader.py", line 52, in emulate_binary
    data = se.create_file_archive()
  File "speakeasy/speakeasy.py", line 612, in create_file_archive
    zf.writestr(file_name, f.get_data())
  File "/usr/lib64/python3.9/zipfile.py", line 1800, in writestr
    zinfo.file_size = len(data)            # Uncompressed size
TypeError: object of type 'NoneType' has no len()

Looks like f.get_data() is returning a None and zipfile is running a len() on it. Triggering the error. This could probably be patched out by having get_data() return a b''

The file used for testing was 23d263b6f55ac81f64c3c3cf628dd169d745e0f2b264581305f2f46efc879587 and the (relevant) code is

se = Speakeasy()
module = se.load_module(filename)
se.run_module(module, all_entrypoints=True)
data = se.create_file_archive()
with open('files.zip', 'wb') as fp:
    fp.write(data)

run_speakeasy parameter parsing

The run_speakeasy.py argument --params doesn't work as intended for an example like --params -log -path <file_path> because argparse treats -log as -l og. A possible solution is to treat the --params argument as a string and perform additional parsing, which could also account for items like <file_path> that may have additional spacing considerations. The updated --params argument would look something like --params="-log -path '<file_path>'".

Debug mode not print out all registers

winemu._hook_code() prints only:

print('0x%x: %s, edi=0x%x : esi=0x%x : ebp=0x%x : eax=0x%x' % (addr, x, self.reg_read('edi'), self.reg_read('esi'), self.reg_read('ebp'), self.reg_read('eax'))) # noqa

when emulation fails due to invalid_read, explain where

in the output produced by run_speakeasy, i see a line like the following, which seems to indicate the end of emulation:

0x18000abef: 'kernel32.GetTickCount()' -> 0x5265cc8
export.Foo: Caught error: invalid_read

where is the faulting instruction? what led up to it? what address could not be read? this would help me triage issues with the emulator and the sample i'm analyzing.

Running a DLL without PE file

Hi,

I am trying to run a function in a DLL using a script similar to this one but I encounter an issue when running GetModuleFileName :

0x1000111: Error while calling API handler for KERNEL32.GetModuleFileNameW:
Traceback (most recent call last):
  File "speakeasy/speakeasy/windows/winemu.py", line 1126, in handle_import_func
    rv = self.api.call_api_func(mod, func, argv, ctx=default_ctx)
  File "speakeasy/speakeasy/winenv/api/winapi.py", line 77, in call_api_func
    return func(mod, self.emu, argv, ctx)
  File "speakeasy/speakeasy/winenv/api/usermode/kernel32.py", line 2852, in GetModuleFileName
    filename = emu.get_process_path()
AttributeError: 'Win32Emulator' object has no attribute 'get_process_path'
0xfeedf02c: call_0x10001650: Caught error: 'Win32Emulator' object has no attribute 'get_process_path'
Invalid memory read (UC_ERR_READ_UNMAPPED)

The get_current_process function does not return anything :

    def GetModuleFileName(self, emu, argv, ctx={}):
        [...]
        if hModule == 0:
            proc = emu.get_current_process()
            filename = proc.get_process_path()

Which actually make sense because I am running a DLL not loaded properly by a PE file, but I wonder if there is an easy way to load a DLL and avoid that issue (I can hook the function and overwrite it but it is quite annoying for something that is really common). Maybe having a script available to fully reproduce rundll32? Or am I missing something?

CreateThread and WaitForXXX APi

Some shellcode and malwares uses CreateThread to download, upload....
And they call WaitForXXXObject to wait until those threads run and finished.
The Python emulation WaitForXXX code return successed immediately, so those threads will not be emulated.
Can you wait and emulate all threads those malware/shellcode created.
Thanks

Bug in get_data and get_hash

File: windows/fileman.py, line: 89
Sometime, get_data return None, so get_hash will failed.
I think we should always check get_data return value, or get_data return b""
Thanks

Unsupported API stats to help contributors

First off, thanks for releasing this tool.

I ran speakeasy against the Malpedia corpus (https://malpedia.caad.fkie.fraunhofer.de/) to get a rough estimate of how many samples successfully emulate with/without tossing errors. While running this test I gathered up a list of the count of unsupported API functions that were causing emulation to halt.

I understand a github issue may not be the best place to store this information. It may be useful for someone looking to contribute by going after the highly used API's first.

The result of ~4k samples (truncating results at 15 - the data has a long tail of one-off's)
(Updated: 2022-02-17 running against c94bb62)

    150 advapi32.CryptImportKey
    128 advapi32.ConvertStringSecurityDescriptorToSecurityDescriptorA
    122 msvbvm60.ordinal_100
    102 user32.OpenInputDesktop
    100 kernel32.LocalFileTimeToFileTime
     99 msvcrt._wgetenv
     96 advapi32.EventRegister
     75 comctl32.ordinal_17
     71 gdi32.GetSystemPaletteEntries
     64 mfc42.ordinal_1576
     60 kernel32.HeapValidate
     56 advapi32.RegCreateKeyExA
     55 kernel32.GetThreadPreferredUILanguages
     54 advapi32.InitializeSecurityDescriptor
     52 shell32.SHGetSpecialFolderPathA
     45 advapi32.RegCreateKeyExW
     41 kernel32.GetTimeZoneInformation
     40 msvcrt.__p___initenv
     31 shlwapi.PathFileExistsW
     30 userenv.GetUserProfileDirectoryW
     30 kernel32.GetTempFileNameA
     29 user32.GetWindowRect
     27 kernel32.SetFileAttributesW
     27 kernel32.SetFileAttributesA
     27 iphlpapi.GetAdaptersInfo
     26 user32.MapVirtualKeyW
     25 oleaut32.SysAllocStringLen
     24 kernel32.InitializeSRWLock
     23 user32.RegisterClipboardFormatA
     22 shell32.SHGetSpecialFolderPathW
     22 ntdll.VerSetConditionMask
     21 user32.GetCursorInfo
     21 urlmon.ObtainUserAgentString
     20 user32.RegisterClassA
     20 kernel32.GetProcessAffinityMask
     19 wininet.HttpAddRequestHeadersA
     19 oleaut32.ordinal_2
     19 kernel32.RtlPcToFileHeader
     19 kernel32.GetSystemWow64DirectoryA
     19 advapi32.RegSetValueExA
     18 kernel32.SetFilePointerEx
     18 advapi32.SetEntriesInAclA
     17 msvcrt.atexit
     17 advapi32.RegisterServiceCtrlHandlerExW
     16 oleaut32.SysReAllocStringLen
     15 ntdll.RtlAdjustPrivilege
     15 msvcrt._ismbblead
     15 kernel32.SetProcessShutdownParameters
     15 kernel32.GlobalMemoryStatusEx
     15 kernel32.FreeResource
     15 gdiplus.GdiplusStartup

In addition to the above data, Malpedia publicly posts an API frequency graph: https://malpedia.caad.fkie.fraunhofer.de/stats/api_dll_frequencies

_initterm function table

_initterm walks a table of function pointers. Attempting to implement this using setup_callback() results in the first callback returning to _initterm's return address. Looking for a solution to walk the table and properly return from _initterm.

Debugging Exception

When emulating an executable (exe) with run_speakeasy.py I stuck at this exception after a while:

0x408d14: Exception caught: code:0xc0000005, handler=0x527558, instr="mov ecx, dword ptr [eax + ecx*4]"
module_entry: Caught error: Invalid memory read (UC_ERR_READ_UNMAPPED)

How do you suggest approaching this kind of troubleshooting?

Thanks.

upx_unpack.py error

Thank you for your creative thinking
I have an exe UPX last version file,
I test your code and it can not handle it, Do you have any suggestion

C:\Python>upx_unpack.py -f 1.exe -o 2.exe
[*] Unpacking module with section hop
Traceback (most recent call last):
File "C:\Python\upx_unpack.py", line 67, in
main(args)
File "C:\Python\upx_unpack.py", line 48, in main
start = base + upx0.VirtualAddress
AttributeError: 'NoneType' object has no attribute 'VirtualAddress'

Shellcode emulation issue

While attempting to build Speakeasy support in Thug [1] I spotted a potential shellcode emulation issue. Still had no time to investigate it (will do soon) but just wanted to point it out.

While analyzing a local sample I got these results

$ thug -l samples/exploits/22196.html
[2020-09-10 17:06:24] <object classid="clsid:77829F14-D911-40FF-A2F0-D11DB8D6D0BC" id="pwnage">
</object>
[2020-09-10 17:06:24] ActiveXObject: 77829F14-D911-40FF-A2F0-D11DB8D6D0BC
[2020-09-10 17:06:24] [NCTAudioFile2 ActiveX] Overflow in SetFormatLikeSample
[2020-09-10 17:06:24] [EXPLOIT Classifier] URL: samples/exploits/22196.html (Rule: CVE-2007-0018, Classification: )
[2020-09-10 17:06:24] [Shellcode Profile] 
UINT WINAPI WinExec (
     LPCSTR lpCmdLine = 0x4181a1 =>
           = "calc.exe";
     UINT uCmdShow = 0;
) =  0x20;
void ExitThread (
     DWORD dwExitCode = 0;
) =  0x0;

The shellcode profile is generated by libemu/pylibemu in this case. When attempting to analyze the exact same shellcode with Speakeasy I get

{'arch': 'x86',
 'emu_version': '1.4.5',
 'emulation_total_runtime': 0.008,
 'entry_points': [{'apihash': 'e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855',
                   'apis': [],
                   'dynamic_code_segments': [],
                   'ep_args': ['0x41420000',
                               '0x41421000',
                               '0x41422000',
                               '0x41423000'],
                   'ep_type': 'shellcode',
                   'error': {'address': '0x2019',
                             'instr': 'retf 0x7cff',
                             'interrupt_num': 13,
                             'pc': '0x2019',
                             'regs': {'eax': '0x00000000',
                                      'ebp': '0x01204000',
                                      'ebx': '0x00000000',
                                      'ecx': '0x00001418',
                                      'edi': '0x00000000',
                                      'edx': '0x00000000',
                                      'eip': '0x00002019',
                                      'esi': '0xfeedf000',
                                      'esp': '0x01203fe8'},
                             'stack': ['sp+0x00: 0x41420000 -> '
                                       'emu.shellcode_arg_0.0x41420000',
                                       'sp+0x04: 0x41421000 -> '
                                       'emu.shellcode_arg_1.0x41421000',
                                       'sp+0x08: 0x41422000 -> '
                                       'emu.shellcode_arg_2.0x41422000',
                                       'sp+0x0c: 0x41423000 -> '
                                       'emu.shellcode_arg_3.0x41423000',
                                       'sp+0x10: 0xfeedf000',
                                       'sp+0x14: 0x00007000 -> '
                                       'emu.struct.ETHREAD.0x7000'],
                             'type': 'unhandled_interrupt'},
                   'ret_val': '0x0',
                   'start_addr': '0x1000'}],
 'mem_tag': 'emu.shellcode.4d546f0ac5350b72622f4bb0a39920e735935d92dccc83fde5393ce8b6ec6e51',
 'os_run': 'windows.6_1',
 'path': None,
 'report_version': '1.1.0',
 'sha256': '4d546f0ac5350b72622f4bb0a39920e735935d92dccc83fde5393ce8b6ec6e51',
 'size': 4662,
 'strings': {'in_memory': {'ansi': [], 'unicode': []},
             'static': {'ansi': ['AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA^',
                                 'IIIIIIIIIIIIIIIIIQZ7jJXP0B1ABkBAZB2BA2AA0AAX8BBPuzIYlm81T7pePUPLKG55lLKQlC5RXs1jOLKBoUHnkaOQ0TAzKsyLKUdNkwqZN4qiPLYnLK4o044VgjajjFmdAO2ZKl4Uk1D4dFd0uKUNkaOEtEQzKpfnkvlbkNkSo5LuQjKNkeLnkVaXkk9QLDdc4iS7AIPu4nkQPDpk5YPrXdLNkaPflNkPpELnMLKCXwxjKEYlKmPLpS0S0uPLK3XElcofQHvu0QFlIL8ncO0akRpbHXoxNm0u0bHNxinNjDNpWkOKWU3rAPl0cFNCUT8e5C0J'],
                        'unicode': []}},
 'timestamp': 1599750384}

Let me point out this does not happen for every Thug local exploit sample but just a few of them.

[1] https://github.com/buffer/thug

Add an entry point for run_speakeasy

I think it would be very useful to have an entry point in the speakeasy setup, in order to call run_speakeasy.py easily.

All it would require is to move the run_speakeasy into the speakeasy package and then have something like that in the setup.py :

    entry_points= {
        'console_scripts': [ 'speakeasy=speakeasy.cli:main' ]
    },

Is there any arguments against that? Any case in which it would be less easy than a standalone script?

Support multiple hook for same API

Hi everyone!
First of all, I want to thank you all for your work, the state of this emulator is amazing.

This issue has been created to have a feature request.

As is now, Speakeasy does not support multiple hook for the same Windows API (if I'm not mistaken).
A useful use case would have been when an user would, for example, hook every API using the wildcard parameter to log the state of the emulation till that API, and have a secondary hook to a particular API.
The second hook will not be registered and only the first hook (the logging one) would execute.

How to correctly close an emulation

Hey,

first of all let me say a huge thank you for this awesome tool!

My question:
I'm trying to run two emulations in the same script:

se = speakeasy.Speakeasy(logger=get_logger())
sc_addr = se.load_shellcode('/tmp/shell1', arch)
se.reg_write('RSI',sc_addr)
se.emu.mem_map(sc_addr - 0x100, 0x100)
se.run_shellcode(sc_addr, offset=0x0)

se = speakeasy.Speakeasy(logger=get_logger())
sc_addr = se.load_shellcode('/tmp/shell2', arch)
se.reg_write('RSI',sc_addr)
se.emu.mem_map(sc_addr - 0x100, 0x100)
se.run_shellcode(sc_addr, offset=0x0)

With this sequence of instructions, I can run shell1, but shell2 is failing with a invalid read at the very beginning. If I try to invert the order (so I run shell2 before shell1), now the first runs fine, the second fails with the same invalid read error.
So, I think I need to "clean up" things after the first emulation, but I don't have very clear how. I tried to use the

se.emu.mem_purge()

but no luck. I did an initial investigation and I saw that the invalid read comes after the access to the allocated PEB. Let me know if you want I go deeper with this. But may be I'm just closing the emulation in the wrong way (actually I'm not closing it at all :-)).

Thanks in the meantime.

IoCreateDevice doesn't allocate memory for DeviceExtension

When calling IoCreateDevice with a DeviceExtension size specified the memory isn't being allocated for the extension.

status = IoCreateDevice(
             DriverObject,
             sizeof(DeviceExtension),
             0i64,
             FILE_DEVICE_UNKNOWN,
             FILE_DEVICE_SECURE_OPEN,
             0,
             &DeviceObject);
  if ( status >= 0 )
  {
    DeviceExtension = (DeviceExtension *)DeviceObject->DeviceExtension;
    ExInitializeFastMutex(&DeviceExtension->lock);
"error": {
                "type": "invalid_write",
                "pc": "0x140004cc2",
                "address": "0x8",
                "instr": "mov dword ptr [rax], ecx",

Relax unicorn requirements from 1.0.2rc4 to 1.0.2?

The requirements.txt specifies that the required version of unicorn is 1.0.2rc4. However, the latest unicorn version is 1.0.2 (it's no longer a release candidate). As a result, running speakeasy produces an error:

$ emu_exe.py -f file.exe
....
....
AssertionError: Requires unicorn version: 1.0.2rc4

Would you consider relaxing the requirements to support unicorn 1.0.2 or higher?

Wildcard search not finding directories

A call like FindFirstFileW("C:\*", X) doesn't find a result despite the default config specifying the file path c:\programdata\mydir\myfile.bin. Improve the search capability to find directories specified in configuration file paths.

log the exit conditions of an emulation

it is not always clear why speakeasy has stopped emulation. logging the reason for stopping (even if it is simply because it ended naturally) would be helpful for understanding what happened better. including the address of the last instruction emulated and the context.

Load dependencies files (especially ntoskrnl.exe) to memory

Hi. Some malware drivers reads memory of mapped ntoskrnl.exe to find desired kernel methods. Are there any possibilities to load my decoy ntoskrnl.exe, hal.dll, etc to the memory before the malware start? Like it looks in miasm sandbox with param -i

[INFO]: Loading module name 'win32_dll/ntoskrnl.exe'
[ERROR]: Cannot open win32_dll/bootvid.dll
[ERROR]: Cannot open win32_dll/kdcom.dll
[ERROR]: Cannot open win32_dll/ci.dll
[ERROR]: Cannot open win32_dll/clfs.sys
[ERROR]: Cannot open win32_dll/pshed.dll
[ERROR]: Cannot open win32_dll/hal.dll
[WARNING]: Create dummy entry for 'pshed.dll'
[WARNING]: Create dummy entry for 'hal.dll'
[WARNING]: Create dummy entry for 'bootvid.dll'
[WARNING]: Create dummy entry for 'kdcom.dll'
[WARNING]: Create dummy entry for 'clfs.sys'
[WARNING]: Create dummy entry for 'ci.dll'

# sb.run()
# got my handled exception
# running ipython

In [1]: sb.jitter.vm
Out[1]:
Addr               Size               Access Comment
0x130000           0x10000            RW_    Stack
0x400000           0x1000             RW_    'driver': PE Header
0x401000           0x63000            R__    'driver': b'.text\x00\x00\x00'
0x464000           0x5000             R__    'driver': b'.rdata\x00\x00'
0x469000           0xC000             RW_    'driver': b'.data\x00\x00\x00'
0x476000           0x2000             R__    'driver': b'.reloc\x00\x00'
0x800000           0x34               RW_
0xA0000000         0x1000             RW_    'win32_dll/ntoskrnl.exe': PE Header
0xA0001000         0x115000           R__    'win32_dll/ntoskrnl.exe': b'.text\x00\x00\x00'
0xA0116000         0x1000             R__    'win32_dll/ntoskrnl.exe': b'_PAGELK\x00'
0xA0117000         0x1000             R__    'win32_dll/ntoskrnl.exe': b'POOLMI\x00\x00'
0xA0118000         0x2000             R__    'win32_dll/ntoskrnl.exe': b'POOLCODE'
0xA011A000         0x45000            RW_    'win32_dll/ntoskrnl.exe': b'.data\x00\x00\x00'
0xA015F000         0x1000             RW_    'win32_dll/ntoskrnl.exe': b'ALMOSTRO'
0xA0160000         0x2000             RW_    'win32_dll/ntoskrnl.exe': b'SPINLOCK'
0xA0162000         0x1AD000           R__    'win32_dll/ntoskrnl.exe': b'PAGE\x00\x00\x00\x00'
0xA030F000         0x12000            R__    'win32_dll/ntoskrnl.exe': b'PAGELK\x00\x00'
0xA0321000         0x5000             R__    'win32_dll/ntoskrnl.exe': b'PAGEKD\x00\x00'
0xA0326000         0x18000            R__    'win32_dll/ntoskrnl.exe': b'PAGEVRFY'
0xA033E000         0x2000             R__    'win32_dll/ntoskrnl.exe': b'PAGEHDLS'
0xA0340000         0x5000             R__    'win32_dll/ntoskrnl.exe': b'PAGEBGFX'
0xA0345000         0x3000             RW_    'win32_dll/ntoskrnl.exe': b'PAGEVRFB'
0xA0348000         0x12000            R__    'win32_dll/ntoskrnl.exe': b'.edata\x00\x00'
0xA035A000         0x9000             RW_    'win32_dll/ntoskrnl.exe': b'PAGEDATA'
0xA0363000         0xD000             RW_    'win32_dll/ntoskrnl.exe': b'PAGEKDD\x00'
0xA0370000         0x3000             R__    'win32_dll/ntoskrnl.exe': b'PAGEVRFC'
0xA0373000         0x1000             RW_    'win32_dll/ntoskrnl.exe': b'PAGEVRFD'
0xA0374000         0x42000            RW_    'win32_dll/ntoskrnl.exe': b'INIT\x00\x00\x00\x00'
0xA03B6000         0x35000            R__    'win32_dll/ntoskrnl.exe': b'.rsrc\x00\x00\x00'
0xA03EB000         0x1A000            R__    'win32_dll/ntoskrnl.exe': b'.reloc\x00\x00'

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.