EDRs & Shellcode Loaders
In this post I have covered the basics of how EDR products work on Windows and techniques to get around them (some source code included).
- Windows API Call Flow
- API Hooking & Unhooking
- Kernel Callbacks & User Land ETW
- Parent Child Process Relationships
- Static & Dynamic Analysis
- Execution Methods
- Red Teaming in the EDR Age
- Reversing & Bypassing EDRs
- Remaining Invisible in the Age of EDR
- A tale of EDR bypass methods
- Adventures in Dynamic Evasion
- Blinding EDR on Windows
Shellcode Loaders 101
Creating a basic shellcode loader requires 3 main steps including allocating memory, moving shellcode into that memory, and then executing the shellcode. To achieve these steps we can utilize Win32 APIs which will allow us to interact with the Windows OS.
Typically the shellcode we are executing will be in regards to initial access to establish a connection to our C2 infrastructure. Most C2 frameworks provide the ability to generate shellcode for us usually in the form of a .bin file, i.e Cobalt Strike’s “Raw” payload. However, we can also utilize tools such as Donut to convert .NET assemblies into shellcode for usage with our loaders instead.
Lastly, we can either choose to build a loader which executes shellcode within it’s own process space (shellcode execution) or forces a remote process to execute shellcode (shellcode/process injection). When performing injection it is ideal to inject into a process which normally produces network activity such as svchost or a web browser. Injecting into a process such as notepad will likely result in the process getting killed immediately as notepad normally does not produce network traffic.
API Call Flow
Before diving into how defensive products work it is important to understand what happens when we call Win32 APIs.
Since the introduction of Windows PatchGuard applications cannot directly access or modify memory in the kernel space (aka kernel land) and instead function in the user space (aka user land) with kernel land reserved for the kernel and drivers.
In order to interact with the kernel, applications call “high-level” Win32 APIs which will usually be found in Kernel32.dll or Kernelbase.dll. These APIs will then call “lower-level” APIs found in ntdll.dll which are known as “syscalls” and are not officially documented by Microsoft, however, unofficial documentation exists.
In the below diagram we can see that when the high-level API WriteFile which is found in Kernel32.dll is called, it then calls the low-level API NtWriteFile which is found in ntdll.dll, which then calls NtWriteFile in Ntoskrnl.exe (the kernel).
Going back to the previously mentioned PatchGuard, it is important to note that it also applies to AV/EDR products and forces them to rely on events in user land to gather telemetry. The exception to this are kernel callbacks which Microsoft created as a compromise for limiting AV/EDR vendors visibility into the kernel and will be discussed later.
AV & EDRs
EDRs are considered to be the “next-gen” antivirus products and collect telemetry from the following sources which will be discussed below:
- API Hooking
- Kernel Callbacks & ETW
- Parent Child Process Relations
- Static & Dynamic Analysis
EDR products will load a DLL into spawned processes and then “hook” APIs which are commonly abused by malware. The most common type of “hook” is a jmp instruction which overwrites the first few bytes of the API. This jmp instruction redirects the flow of the API call to the EDR where it determines if the API call is safe or not and chooses to either kill the process or allow it to continue.
Below we can see what this looks like in a debugger, and following the jmp instruction would lead us to the EDR’s DLL.
To defeat these API hooks we can either unhook the APIs or use custom syscalls.
EDRs typically only perform hooking when a process is created, thus, DLLs are stored on disk in C:\Windows\System32\ free of hooks. Therefore, we can read these DLLs from disk (specifically the .text section) and then overwrite the DLLs in our process to remove the hooks. One thing to be aware of is that the APIs used to unhook DLLs may also be hooked, therefore, unhooking should be performed using custom syscalls. An unofficial list of APIs hooked by each EDR product can be found here.
An example of a program which unhooks ntdll.dll is shown below.
- A copy of ntdll.dll is mapped from disk into memory
- We locate the .text section in both the hooked and mapped copies
- We change the memory permissions of the hooked .text section to RWX, then write the clean .text section over the hooked .text section to remove the hooks, and finally restore the original memory permissions.
The above program unhooks ntdll.dll in a local process, however, if our goal is to unhook ntdll.dll in a remote process only a simple change is needed. Because ntdll.dll is required to be at the same base address system wide on Windows, the .text section address of ntdll.dll in our local process will be the same as in a remote process as well. To unhook ntdll.dll in a remote process we only need to replace the first argument in the NtProtectVirtualMemory and NtWriteVirtualMemory API calls above with a handle to the remote process.
Currently the parameter is set to the handle variable “curproc” which is a handle to the local process via the GetCurrentProcess API call. We can replace this with a handle to a remote process by instead calling the NtOpenProcess API.
Instead of calling NT APIs (syscalls) found in ntdll.dll, we can create our own syscalls using assembly which mimic the APIs found in ntdll.dll and then call them directly. Because these syscalls are custom made they will not pass through ntdll.dll when called and will not be intercepted by the EDR product. Essentially the custom syscall will directly call the kernel syscall equivalent API in Ntoskrnl.exe and will not pass through kernel32.dll, ntdll.dll, or any other DLL for that matter.
One thing to be aware of is that not all syscall generation frameworks are created equally. Utilizing embedded syscalls will result in your executable containing syscall instructions which can be easily identified through basic static analysis. Additionally, the syscall instruction itself will be executed from outside of ntdll.dll, which is a red flag to any EDR product. An improved version of this technique is to instead still use the assembly code provided by SysWhispers to set up the registers for us, but then to execute the syscall, instead locate a syscall instruction within ntdll.dll and then jump to it. (This technique is included in the SysWhispers3 project linked below).
Note: Most syscalls generation frameworks will produce functions with the exact same name as real ntdll.dll APIs, however, this is simply for ease of use and when called they will not pass through ntdll.dll. Additionally, we should change these function names to evade any detection rules based on the presence of function names, i.e changing NtAllocateVirtualMemory to NotNtAVM
Despite the previously mentioned PatchGuard which prevents applications from directly interacting with the kernel, Kernel Callbacks allow EDR products to gather some telemetry from the kernel. Kernel callbacks are also the kernel component of ETW (will be discussed more in the next section).
This is done by installing a driver, which can then request notifications for specific events and react accordingly. Some examples of events are below:
- Image Loads (PsSetLoadImageNotifyRoutine)
- Thread Creations (PsSetCreateThreadNotifyRoutine)
- Process Creations (PsSetCreateProcessNotifyRoutine)
One example of how an EDR product may utilize these callbacks is by implementing the PsSetCreateProcessNotifyRoutine callback to notify the EDR driver anytime a process is created. When the driver receives a notification it can then tell the EDR application to inject it’s hooking DLL into the newly created process.
Bypassing kernel callbacks is beyond the scope of this post but generally speaking they involve loading a vulnerable driver to then tamper with these callbacks.
Note: From a user land perspective, we can add time delays between API calls which may help to de-correlate kernel level telemetry. An example of this is demonstrated in the DripLoader project which introduces time delays between the NtAllocateVirtualMemory, NtWriteVirtualMemory, and NtProtectVirtualMemory API calls.
Event Tracing for Windows (ETW) is described by Microsoft as “an efficient kernel-level tracing facility that lets you log kernel or application-defined events to a log file”.
ETW has components in both kernel land and user land, with the previously mentioned Kernel callbacks representing the kernel land component, and this section focusing on the user land component contained within ntdll.dll.
Essentially ETW is extremely verbose logging of everything an application does.
Monitoring a .NET shellcode runner with ETW below we are able to see events for the allocation of memory (CreateSegment) and the creation of a thread (ThreadCreated).
Additionally, as shown by Adam Chester in this blog, ETW can also detect malicious .NET code by analyzing assembly names, namespaces, class names, and method names. All of this telemetry can be used by the EDR product in it’s decision making to identify malicious code.
Fortunately for us the user land ETW events are generated from user land within ntdll.dll by the process itself via various API calls, and because we control the process in user land we can tamper with this telemetry.
The easiest way to tamper with this form of telemetry is by patching ETW APIs with a “ret” instruction, so that anytime they are called instead of generating an event the program returns to the next instruction.
Below is an example of a program which will patch the EtwEventWrite API found in ntdll.dll in a local x64 process.
- We obtain a handle to the current process via the GetCurrentProcess API
- We find the address of the EtwEventWrite API via the GetProcAddress API
- We change the memory permissions of the EtwEventWrite API to RW via the NtProtectVirtualMemory API
- We overwrite the first few bytes of the EtwEventWrite API with the hex opcode stored in the patch variable via the NtWriteVirtualMemory API. The hex opcode will zero the rax register via xor rax, rax and then follow it with a ret instruction.
- Lastly, we change the memory permissions of the EtwEventWrite API back to RX.
Before patching EtwEventWrite:
After patching EtwEventWrite:
Because ntdll.dll is required to be at the same base address system-wide on Windows, the EtwEventWrite API will be found at the same address in a local process and a remote process. To patch EtwEventWrite in a remote process we can reuse the GetProcAddress API call to locate the memory address of EtwEventWrite, and then replace the first argument in the NtProtectVirtualMemory and NtWriteVirtualMemory API calls above with a handle to a remote process.
This can be done by calling the NtOpenProcess API instead of GetCurrentProcess, which will return a handle to a remote process instead of the local process.
Below is an implementation of this and an example of a program which will patch the EtwEventWrite API in a remote x64 process.
Before patching EtwEventWrite in a remote process:
After patching EtwEventWrite in a remote process:
Parent Child Process Relationships
Parent & child process relationships can be analyzed to potentially detect malicious behavior. For example Word or Excel trying to spawn a PowerShell process would likely be blocked even if no malicious activity was being performed.
Additionally, if we injected Cobalt Strike shellcode into svchost, and then began trying to execute shell commands from our beacon, command prompt would spawn as a child of svchost which would be suspicious.
Fortunately, PPID spoofing exists and when creating a process we can specify which parent process it should spawn as a child of (this is also what Cobalt Strike’s “ppid” command does under the hood).
Below is an example of a program which will spawn a wermgr process as a child of chrome in a suspended state.
For more information on PPID spoofing I recommend checking out ired.team’s blog.
At a high level most antivirus and EDR products will make use of some form of static or dynamic analysis to try to identify if a program is malicious.
Static analysis involves analyzing files for known bad check sums, strings, byte sequences, etc.
We can evade static analysis by changing function names, removing comments and ASCII art, and ensuring any shellcode is encrypted. One thing to note is that encrypting shellcode will increase the entropy of our loader. If the entropy of our loader exceeds some threshold set by the EDR product (usually around 6.7–7.0) it may be blocked from executing.
Dynamic analysis may involve executing a program in a sandbox to monitor actions it performs such as modifying a registry key or creating a connection to a remote server. Sandbox environments usually cannot emulate a real environment though so we can implement logic checks to determine if our program is being ran in a sandbox, and if so, exit or do non-malicious activity.
Some checks we can perform are:
- Is the system domain joined?
- Does the system have less than 4GB of RAM?
- Does the system have less than 2 processors?
- Is the time zone set to UTC?
Sample code snippets on how to implement these checks can be found here.
Additionally, if the loader is written in any of the following languages Microsoft’s Antimalware Scan Interface (AMSI) can also be utilized to detect malicious activity in memory.
When a program is ran, amsi.dll is injected into the process and utilizes the AmsiScanBuffer and AmsiScanString functions to detect malware.
Similarly to how we patched EtwEventWrite, the AMSI functions can also be patched as amsi.dll is loaded into our user land process which we control.
Spoofing Code Signing Certificates
Some EDR products will not check whether a code signing certificate is valid or not or a vendor name may be included in a whitelist, so we can apply a spoofed certificate to our loader to make it appear more legitimate.
Note: While some EDR products do not validate code signing certificates, some EDR products do. In this case having a spoofed code signing certificate can draw more scrutiny from the EDR product, especially if it is the same signature as the EDR product, i.e applying a spoofed Microsoft certificate if the EDR is Defender ATP.
Generally speaking the setup before execution of most shellcode loaders tends to be the same:
- Anti-sandbox checks are ran
- DLLs are unhooked
- ETW and/or AMSI are patched
- Memory is allocated via NtAllocateVirtualMemory with RW permissions or a DLL is loaded for module stomping (overwriting a DLL).
- Shellcode is written into memory via NtWriteVirtualMemory
- Memory permissions where the shellcode resides are changed to RX via NtProtectVirtualMemory
At this point everything is set up and the only thing which must take place is the execution of the shellcode. The execution method that the shellcode loader will use is entirely up to the creator, however, I have had success with the following methods listed from most to least successful.
- Module Stomping & Thread Creation - This involves loading a DLL into a process, overwriting it with shellcode, and then creating a thread via NtCreateThreadEx so that the thread appears to be backed by a file on disk.
- Thread Hijacking - This involves suspending a thread via NtSuspendThread, modifying it’s RIP to point to the shellcode via Nt[Get/Set]ContextThread, and then resuming the thread via NtResumeThread.
- Windows Callback Functions — I genuinely have no idea what a callback function is or how they work, however, tons of them exist and how to utilize them to execute shellcode is documented here.
- Asynchronous Procedure Calls (APCs) - This involves queuing an APC to a thread via NtQueueApcThread and then when the thread enters into an alertable state (if it is not already) it will execute the APC. In a local process we can force threads into an alertable state via the NtTestAlert API.
If you would like to see the source code for some of these execution methods, I will defer you to the following projects:
- TymSpecial - Created by myself, written in C++ & wrapped in Python
- ScareCrow - Created by Matt Eidelberg, written in Go
- TikiTorch - Created by RastaMouse, written in C#
Hopefully you have gained something out of this article and now have a better understanding of how EDR products work and some techniques to get around them so that you’re no longer left like