• Stars
    star
    1,035
  • Rank 44,530 (Top 0.9 %)
  • Language
    C++
  • License
    MIT License
  • Created about 3 years ago
  • Updated over 2 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Thread Stack Spoofing - PoC for an advanced In-Memory evasion technique allowing to better hide injected shellcode's memory allocation from scanners and analysts.

Thread Stack Spoofing / Call Stack Spoofing PoC

A PoC implementation for an advanced in-memory evasion technique that spoofs Thread Call Stack. This technique allows to bypass thread-based memory examination rules and better hide shellcodes while in-process memory.

Intro

This is an example implementation for Thread Stack Spoofing technique aiming to evade Malware Analysts, AVs and EDRs looking for references to shellcode's frames in an examined thread's call stack. The idea is to hide references to the shellcode on thread's call stack thus masquerading allocations containing malware's code.

Implementation along with my ShellcodeFluctuation brings Offensive Security community sample implementations to catch up on the offering made by commercial C2 products, so that we can do no worse in our Red Team toolings. 💪

Implementation has changed

Current implementation differs heavily to what was originally published. This is because I realised there is a way simpler approach to terminate thread's call stack processal and hide shellcode's related frames by simply writing 0 to the return address of the first frame we control:

void WINAPI MySleep(DWORD _dwMilliseconds)
{
    [...]
    auto overwrite = (PULONG_PTR)_AddressOfReturnAddress();
    const auto origReturnAddress = *overwrite;
    *overwrite = 0;

    [...]
    *overwrite = origReturnAddress;
}

The previous implementation, utilising StackWalk64 can be accessed in this commit c250724.

This implementation is much more stable and works nicely on both Debug and Release under two architectures - x64 and x86.

Demo

This is how a call stack may look like when it is NOT spoofed:

not-spoofed

This in turn, when thread stack spoofing is enabled:

spoofed

Above we can see that the last frame on our call stack is our MySleep callback. One can wonder does it immediately brings opportunities new IOCs? Hunting rules can look for threads having call stacks not unwinding into following expected thread entry points located within system libraries:

kernel32!BaseThreadInitThunk+0x14
ntdll!RtlUserThreadStart+0x21

However the call stack of the spoofed thread may look rather odd at first, a brief examination of my system shown, that there are other threads not unwinding to the above entry points as well:

legit call stack

The above screenshot shows a thread of unmodified Total Commander x64. As we can see, its call stack pretty much resembles our own in terms of initial call stack frames.

Why should we care about carefully faking our call stack when there are processes exhibiting traits that we can simply mimic?

How it works?

The rough algorithm is following:

  1. Read shellcode's contents from file.
  2. Acquire all the necessary function pointers from dbghelp.dll, call SymInitialize
  3. Hook kernel32!Sleep pointing back to our callback.
  4. Inject and launch shellcode via VirtualAlloc + memcpy + CreateThread. The thread should start from our runShellcode function to avoid having Thread's StartAddress point into somewhere unexpected and anomalous (such as ntdll!RtlUserThreadStart+0x21)
  5. As soon as Beacon attempts to sleep, our MySleep callback gets invoked.
  6. We then overwrite last return address on the stack to 0 which effectively should finish the call stack.
  7. Finally a call to ::SleepEx is made to let the Beacon's sleep while waiting for further communication.
  8. After Sleep is finished, we restore previously saved original function return addresses and execution is resumed.

Function return addresses are scattered all around the thread's stack memory area, pointed to by RBP/EBP register. In order to find them on the stack, we need to firstly collect frame pointers, then dereference them for overwriting:

stack frame

(the above image was borrowed from Eli Bendersky's post named Stack frame layout on x86-64)

	*(PULONG_PTR)(frameAddr + sizeof(void*)) = Fake_Return_Address;

Initial implementation of ThreadStackSpoofer did that in walkCallStack and spoofCallStack functions, however the current implementation shows that these efforts are not required to maintain stealthy call stack.

Example run

Use case:

C:\> ThreadStackSpoofer.exe <shellcode> <spoof>

Where:

  • <shellcode> is a path to the shellcode file
  • <spoof> when 1 or true will enable thread stack spoofing and anything else disables it.

Example run that spoofs beacon's thread call stack:

PS D:\dev2\ThreadStackSpoofer> .\x64\Release\ThreadStackSpoofer.exe .\tests\beacon64.bin 1
[.] Reading shellcode bytes...
[.] Hooking kernel32!Sleep...
[.] Injecting shellcode...
[+] Shellcode is now running.
[>] Original return address: 0x1926747bd51. Finishing call stack...

===> MySleep(5000)

[<] Restoring original return address...
[>] Original return address: 0x1926747bd51. Finishing call stack...

===> MySleep(5000)

[<] Restoring original return address...
[>] Original return address: 0x1926747bd51. Finishing call stack...

How do I use it?

Look at the code and its implementation, understand the concept and re-implement the concept within your own Shellcode Loaders that you utilise to deliver your Red Team engagements. This is an yet another technique for advanced in-memory evasion that increases your Teams' chances for not getting caught by Anti-Viruses, EDRs and Malware Analysts taking look at your implants.

While developing your advanced shellcode loader, you might also want to implement:

  • Process Heap Encryption - take an inspiration from this blog post: Hook Heaps and Live Free - which can let you evade Beacon configuration extractors like BeaconEye
  • Change your Beacon's memory pages protection to RW (from RX/RWX) and encrypt their contents - using Shellcode Fluctuation technique - right before sleeping (that could evade scanners such as Moneta or pe-sieve)
  • Clear out any leftovers from Reflective Loader to avoid in-memory signatured detections
  • Unhook everything you might have hooked (such as AMSI, ETW, WLDP) before sleeping and then re-hook afterwards.

Actually this is not (yet) a true stack spoofing

As it's been pointed out to me, the technique here is not yet truly holding up to its name for being a stack spoofer. Since we're merely overwriting return addresses on the thread's stack, we're not spoofing the remaining areas of the stack itself. Moreover we're leaving our call stack unwindable meaking it look anomalous since the system will not be able to properly walk the entire call stack frames chain.

However I'm aware of these shortcomings, at the moment I've left it as is since I cared mostly about evading automated scanners that could iterate over processes, enumerate their threads, walk those threads stacks and pick up on any return address pointing back to a non-image memory (such as SEC_PRIVATE - the one allocated dynamically by VirtuaAlloc and friends). A focused malware analyst would immediately spot the oddity and consider the thread rather unusual, hunting down our implant. More than sure about it. Yet, I don't believe that nowadays automated scanners such as AV/EDR have sorts of heuristics implemented that would actually walk each thread's stack to verify whether its un-windable ¯\_(ツ)_/¯ .

Surely this project (and commercial implementation found in C2 frameworks) gives AV & EDR vendors arguments to consider implementing appropriate heuristics covering such a novel evasion technique.

In order to improve this technique, one can aim for a true Thread Stack Spoofer by inserting carefully crafted fake stack frames established in an reverse-unwinding process. Read more on this idea below.

Implementing a true Thread Stack Spoofer

Hours-long conversation with namazso teached me, that in order to aim for a proper thread stack spoofer we would need to reverse x64 call stack unwinding process. Firstly, one needs to carefully acknowledge the stack unwinding process explained in (a) linked below. The system when traverses Thread call stack on x64 architecture will not simply rely on return addresses scattered around the thread's stack, but rather it:

  1. takes return address
  2. attempts to identify function containing that address (with RtlLookupFunctionEntry)
  3. That function returns RUNTIME_FUNCTION, UNWIND_INFO and UNWIND_CODE structures. These structures describe where are the function's beginning address, ending address, and where are all the code sequences that modify RBP or RSP.
  4. System needs to know about all stack & frame pointers modifications that happened in each function across the Call Stack to then virtually rollback these changes and virtually restore call stack pointers when a call to the processed call stack frame happened (this is implemented in RtlVirtualUnwind)
  5. The system processes all UNWIND_CODEs that examined function exhbits to precisely compute the location of that frame's return address and stack pointer value.
  6. Through this emulation, the System is able to walk down the call stacks chain and effectively "unwind" the call stack.

In order to interfere with this process we wuold need to revert it by having our reverted form of RtlVirtualUnwind. We would need to iterate over functions defined in a module (let's be it kernel32), scan each function's UNWIND_CODE codes and closely emulate it backwards (as compared to RtlVirtualUnwind and precisely RtlpUnwindPrologue) in order to find locations on the stack, where to put our fake return addresses.

namazso mentions the necessity to introduce 3 fake stack frames to nicely stitch the call stack:

  1. A "desync" frame (consider it as a gadget-frame) that unwinds differently compared to the caller of our MySleep (having differnt UWOP - Unwind Operation code). We do this by looking through all functions from a module, looking through their UWOPs, calculating how big the fake frame should be. This frame must have UWOPS different than our MySleep's caller.
  2. Next frame that we want to find is a function that unwindws by popping into RBP from the stack - basically through UWOP_PUSH_NONVOL code.
  3. Third frame we need a function that restores RSP from RBP through the code UWOP_SET_FPREG

The restored RSP must be set with the RSP taken from wherever control flow entered into our MySleep so that all our frames become hidden, as a result of third gadget unwinding there.

In order to begin the process, one can iterate over executable's .pdata by dereferencing IMAGE_DIRECTORY_ENTRY_EXCEPTION data directory entry. Consider below example:

    ULONG_PTR imageBase = (ULONG_PTR)GetModuleHandleA("kernel32");
    PIMAGE_NT_HEADERS64 pNthdrs = PIMAGE_NT_HEADERS64(imageBase + PIMAGE_DOS_HEADER(imageBase)->e_lfanew);

    auto excdir = pNthdrs->OptionalHeader.DataDirectory[IMAGE_DIRECTORY_ENTRY_EXCEPTION];
    if (excdir.Size == 0 || excdir.VirtualAddress == 0)
        return;

    auto begin = PRUNTIME_FUNCTION(excdir.VirtualAddress + imageBase);
    auto end = PRUNTIME_FUNCTION(excdir.VirtualAddress + imageBase + excdir.Size);

    UNWIND_HISTORY_TABLE mshist = { 0 };
    DWORD64 imageBase2 = 0;

    PRUNTIME_FUNCTION currFrame = RtlLookupFunctionEntry(
        (DWORD64)caller,
        &imageBase2,
        &mshist
    );

    UNWIND_INFO *mySleep = (UNWIND_INFO*)(currFrame->UnwindData + imageBase);
    UNWIND_CODE myFrameUwop = (UNWIND_CODE)(mySleep->UnwindCodes[0]);

    log("1. MySleep RIP UWOP: ", myFrameUwop.UnwindOpcode);

    for (PRUNTIME_FUNCTION it = begin; it < end; ++it)
    {
        UNWIND_INFO* unwindData = (UNWIND_INFO*)(it->UnwindData + imageBase);
        UNWIND_CODE frameUwop = (UNWIND_CODE)(unwindData->UnwindCodes[0]);

        if (frameUwop.UnwindOpcode != myFrameUwop.UnwindOpcode)
        {
            // Found candidate function for a desynch gadget frame

        }
    }

The process is a bit convoluted, yet boils down to reverting thread's call stack unwinding process by substituting arbitrary stack frames with carefully selected other ones, in a ROP alike approach.

This PoC does not follows replicate this algorithm, because my current understanding allows me to accept the call stack finishing on an EXE-based stack frame and I don't want to overcompliate neither my shellcode loaders nor this PoC. Leaving the exercise of implementing this and sharing publicly to a keen reader. Or maybe I'll sit and have a try on doing this myself given some more spare time :)

More information:


Word of caution

If you plan on adding this functionality to your own shellcode loaders / toolings be sure to AVOID unhooking kernel32.dll. An attempt to unhook kernel32 will restore original Sleep functionality preventing our callback from being called. If our callback is not called, the thread will be unable to spoof its own call stack by itself.

If that's what you want to have, than you might need to run another, watchdog thread, making sure that the Beacons thread will get spoofed whenever it sleeps.

If you're using Cobalt Strike and a BOF unhook-bof by Raphael's Mudge, be sure to check out my Pull Request that adds optional parameter to the BOF specifying libraries that should not be unhooked.

This way you can maintain your hooks in kernel32:

beacon> unhook kernel32
[*] Running unhook.
    Will skip these modules: wmp.dll, kernel32.dll
[+] host called home, sent: 9475 bytes
[+] received output:
ntdll.dll            <.text>
Unhook is done.

Modified unhook-bof with option to ignore specified modules


Final remark

This PoC was designed to work with Cobalt Strike's Beacon shellcodes. The Beacon is known to call out to kernel32!Sleep to await further instructions from its C2. This loader leverages that fact by hooking Sleep in order to perform its housekeeping.

This implementation might not work with other shellcodes in the market (such as Meterpreter) if they don't use Sleep to cool down. Since this is merely a Proof of Concept showing the technique, I don't intend on adding support for any other C2 framework.

When you understand the concept, surely you'll be able to translate it into your shellcode requirements and adapt the solution for your advantage.

Please do not open Github issues related to "this code doesn't work with XYZ shellcode", they'll be closed immediately.


☕ Show Support ☕

This and other projects are outcome of sleepless nights and plenty of hard work. If you like what I do and appreciate that I always give back to the community, Consider buying me a coffee (or better a beer) just to say thank you! 💪


Author

   Mariusz Banach / mgeeky, 21
   <mb [at] binary-offensive.com>
   (https://github.com/mgeeky)

More Repositories

1

Penetration-Testing-Tools

A collection of more than 170+ tools, scripts, cheatsheets and other loots that I've developed over years for Red Teaming/Pentesting/IT Security audits purposes.
PowerShell
2,514
star
2

cobalt-arsenal

My collection of battle-tested Aggressor Scripts for Cobalt Strike 4.0+
PowerShell
1,033
star
3

RedWarden

Cobalt Strike C2 Reverse proxy that fends off Blue Teams, AVs, EDRs, scanners through packet inspection and malleable profile correlation
Python
922
star
4

ShellcodeFluctuation

An advanced in-memory evasion technique fluctuating shellcode's memory protection between RW/NoAccess & RX and then encrypting/decrypting its contents
C++
922
star
5

ProtectMyTooling

Multi-Packer wrapper letting us daisy-chain various packers, obfuscators and other Red Team oriented weaponry. Featured with artifacts watermarking, IOCs collection & PE Backdooring. You feed it with your implant, it does a lot of sneaky things and spits out obfuscated executable.
PowerShell
869
star
6

PackMyPayload

A PoC that packages payloads into output containers to evade Mark-of-the-Web flag & demonstrate risks associated with container file formats. Supports: ZIP, 7zip, PDF, ISO, IMG, CAB, VHD, VHDX
Python
853
star
7

decode-spam-headers

A script that helps you understand why your E-Mail ended up in Spam
Python
558
star
8

Stracciatella

OpSec-safe Powershell runspace from within C# (aka SharpPick) with AMSI, Constrained Language Mode and Script Block Logging disabled at startup
C#
494
star
9

ElusiveMice

Cobalt Strike User-Defined Reflective Loader with AV/EDR Evasion in mind
C
417
star
10

tomcatWarDeployer

Apache Tomcat auto WAR deployment & pwning penetration testing tool.
Python
409
star
11

UnhookMe

UnhookMe is an universal Windows API resolver & unhooker addressing problem of invoking unmonitored system calls from within of your Red Teams malware
C++
341
star
12

SharpWebServer

Red Team oriented C# Simple HTTP & WebDAV Server with Net-NTLM hashes capture functionality
C#
279
star
13

AzureRT

AzureRT - A Powershell module implementing various Azure Red Team tactics
PowerShell
227
star
14

expdevBadChars

Bad Characters highlighter for exploit development purposes supporting multiple input formats while comparing.
Python
202
star
15

msidump

MSI Dump - a tool that analyzes malicious MSI installation packages, extracts files, streams, binary data and incorporates YARA scanner.
Python
191
star
16

RobustPentestMacro

This is a rich-featured Visual Basic macro code for use during Penetration Testing assignments, implementing various advanced post-exploitation techniques.
VBScript
144
star
17

Exploit-Development-Tools

A bunch of my exploit development helper tools, collected in one place.
Python
140
star
18

VisualBasicObfuscator

Visual Basic Code universal Obfuscator intended to be used during penetration testing assignments.
Python
135
star
19

msi-shenanigans

Proof of Concept code and samples presenting emerging threat of MSI installer files.
Python
77
star
20

PE-library

Lightweight Portable Executable parsing library and a demo peParser application.
C++
72
star
21

HEVD_Kernel_Exploit

Exploits pack for the Windows Kernel mode driver HackSysExtremeVulnerableDriver written for educational purposes.
C++
63
star
22

procmon-filters

SysInternals' Process Monitor filters repository - collected from various places and made up by myself. To be used for quick Behavioral analysis of testing specimens. Inspired and based on Lenny Zeltser's collection.
58
star
23

PhishingPost

PHP Script intdended to be used during Phishing campaigns as a credentials collector linked to backdoored HTML <form> action parameter
PHP
56
star
24

burpContextAwareFuzzer

BurpSuite's payload-generation extension aiming at applying fuzzed test-cases depending on the type of payload (integer, string, path; JSON; XML; GWT; binary) and following encoding-scheme applied originally.
Python
39
star
25

CustomXMLPart

A PoC weaponising CustomXMLPart for hiding malware code inside of Office document structures.
VBA
37
star
26

dirbuster

wfuzz, SecLists and john -based dirbusting / forceful browsing script intended to be used during web pentest assingments
Shell
34
star
27

ntfs-journal-viewer

Utterly simple NTFS Journal dumping utility. Handy when it comes to Computer Forensics and Malware Forensics Ops.
C
33
star
28

digitalocean-app-redirector

Reverse-HTTP Redirector via DigitalOcean Apps Platform
Python
27
star
29

LISET

Light System Examination Toolkit (LISET) - logs & activity & configuration gathering utility that comes handy in fast Windows incident response (either forensic or malware oriented).
Batchfile
27
star
30

RPISEC-MBE-Solutions

Solutions to the RPISEC MBE / Modern Binary Exploitation VM & course.
Python
19
star
31

prc_xchk

User-mode process cross-checking utility intended to detect naive malware hiding itself by hooking IAT/EAT.
C++
17
star
32

PEInfo

Another Portable Executable files analysing stuff
C++
17
star
33

mgeeky

9
star
34

stegano1

College project implementing some of the compression and image steganographic algorithms.
C++
5
star
35

DISASM

Simple disassembling library (currently only x86)
C++
4
star
36

linux-utils

Some linux utils I've coded and decided to share.
C
2
star
37

Symulacja-Reaktora-Jadrowego

(Polish only) Program przygotowywany na uczelnie w ramach kursu "Symulacje Komputerowe". Przedstawia hipotetyczna prace reaktora jadrowego w roznych stanach i konfiguracjach.
MATLAB
1
star