Loaders & Bypassing Windows EDRs

12 min readMar 30, 2022

In this post I have covered the basics of how EDR products work on Windows and techniques to get around them (some source code included).

Topics Covered

Windows API Call Flow
API Hooking & Unhooking
Syscalls
Kernel Callbacks & User Land ETW
Parent Child Process Relationships
Static & Dynamic Analysis
Execution Methods

Credits

Shellcode Loaders 101

Creating a basic shellcode loader requires 3 main steps including allocating memory, moving shellcode into that memory, and then executing the shellcode. To achieve these steps we can utilize Win32 APIs which will allow us to interact with the Windows OS.

Typically the shellcode we are executing will be in regards to initial access to establish a connection to our C2 infrastructure. Most C2 frameworks provide the ability to generate shellcode for us usually in the form of a .bin file, i.e Cobalt Strike’s “Raw” payload. However, we can also utilize tools such as Donut to convert .NET assemblies into shellcode for usage with our loaders instead.

Lastly, we can either choose to build a loader which executes shellcode within it’s own process space (shellcode execution) or forces a remote process to execute shellcode (shellcode/process injection). When performing injection it is ideal to inject into a process which normally produces network activity such as svchost or a web browser. Injecting into a process such as notepad will likely result in the process getting killed immediately as notepad normally does not produce network traffic.

API Call Flow

Before diving into how defensive products work it is important to understand what happens when we call Win32 APIs.

Since the introduction of Windows PatchGuard applications cannot directly access or modify memory in the kernel space (aka kernel land) and instead function in the user space (aka user land) with kernel land reserved for the kernel and drivers.

In order to interact with the kernel, applications call “high-level” Win32 APIs which will usually be found in Kernel32.dll or Kernelbase.dll. These APIs will then call “lower-level” APIs found in ntdll.dll which are known as “syscalls” and are not officially documented by Microsoft, however, unofficial documentation exists.

In the below diagram we can see that when the high-level API WriteFile which is found in Kernel32.dll is called, it then calls the low-level API NtWriteFile which is found in ntdll.dll, which then calls NtWriteFile in Ntoskrnl.exe (the kernel).

Going back to the previously mentioned PatchGuard, it is important to note that it also applies to AV/EDR products and forces them to rely on events in user land to gather telemetry. The exception to this are kernel callbacks which Microsoft created as a compromise for limiting AV/EDR vendors visibility into the kernel and will be discussed later.

AV & EDRs

EDRs are considered to be the “next-gen” antivirus products and collect telemetry from the following sources which will be discussed below:

API Hooking
Kernel Callbacks & ETW
Parent Child Process Relations
Static & Dynamic Analysis

API Hooking

EDR products will load a DLL into spawned processes and then “hook” APIs which are commonly abused by malware. The most common type of “hook” is a jmp instruction which overwrites the first few bytes of the API. This jmp instruction redirects the flow of the API call to the EDR where it determines if the API call is safe or not and chooses to either kill the process or allow it to continue.

Below we can see what this looks like in a debugger, and following the jmp instruction would lead us to the EDR’s DLL.

To defeat these API hooks we can either unhook the APIs or use custom syscalls.

API Unhooking

EDRs typically only perform hooking when a process is created, thus, DLLs are stored on disk in C:\Windows\System32\ free of hooks. Therefore, we can read these DLLs from disk (specifically the .text section) and then overwrite the DLLs in our process to remove the hooks. One thing to be aware of is that the APIs used to unhook DLLs may also be hooked, therefore, unhooking should be performed using custom syscalls. An unofficial list of APIs hooked by each EDR product can be found here.

An example of a program which unhooks ntdll.dll is shown below.

A copy of ntdll.dll is mapped from disk into memory
We locate the .text section in both the hooked and mapped copies
We change the memory permissions of the hooked .text section to RWX, then write the clean .text section over the hooked .text section to remove the hooks, and finally restore the original memory permissions.

Before unhooking:

After unhooking:

The above program unhooks ntdll.dll in a local process, however, if our goal is to unhook ntdll.dll in a remote process only a simple change is needed. Because ntdll.dll is required to be at the same base address system wide on Windows, the .text section address of ntdll.dll in our local process will be the same as in a remote process as well. To unhook ntdll.dll in a remote process we only need to replace the first argument in the NtProtectVirtualMemory and NtWriteVirtualMemory API calls above with a handle to the remote process.

Currently the parameter is set to the handle variable “curproc” which is a handle to the local process via the GetCurrentProcess API call. We can replace this with a handle to a remote process by instead calling the NtOpenProcess API.

Syscalls

Instead of calling NT APIs (syscalls) found in ntdll.dll, we can create our own syscalls using assembly which mimic the APIs found in ntdll.dll and then call them directly. Because these syscalls are custom made they will not pass through ntdll.dll when called and will not be intercepted by the EDR product. Essentially the custom syscall will directly call the kernel syscall equivalent API in Ntoskrnl.exe and will not pass through kernel32.dll, ntdll.dll, or any other DLL for that matter.

One thing to be aware of is that not all syscall generation frameworks are created equally. Utilizing embedded syscalls will result in your executable containing syscall instructions which can be easily identified through basic static analysis. Additionally, the syscall instruction itself will be executed from outside of ntdll.dll, which is a red flag to any EDR product. An improved version of this technique is to instead still use the assembly code provided by SysWhispers to set up the registers for us, but then to execute the syscall, locate a syscall instruction within ntdll.dll and then jump to it. (This technique is included in the SysWhispers3 project linked below).

Corresponding assembly code for embedded syscall

Syscall instructions identified from static analysis

Corresponding assembly code of jumping to a syscall within ntdll

SysWhispers is a project which can generate assembly code for most NT APIs and is compatible with C++ loaders. Alternatively, D/Invoke is available for C# based loaders.

Note: Most syscalls generation frameworks will produce functions with the exact same name as ntdll.dll APIs, however, this is simply for ease of use and when called they will not pass through ntdll.dll. We should change these function names to evade any static detection rules based on the presence of these strings, i.e changing NtAllocateVirtualMemory to NotNtAVM. For this you can leverage a crap python script I wrote here.

Kernel Callbacks

Despite the previously mentioned PatchGuard which prevents applications from directly interacting with the kernel, Kernel Callbacks allow EDR products to gather some telemetry from the kernel. Kernel callbacks are also the kernel component of ETW (will be discussed more in the next section).

This is done by installing a driver, which can then notify the EDR about specific events and it can react accordingly. Some examples of events are below:

Image Loads (PsSetLoadImageNotifyRoutine)
Thread Creations (PsSetCreateThreadNotifyRoutine)
Process Creations (PsSetCreateProcessNotifyRoutine)

One example of how an EDR product may utilize these callbacks is by implementing the PsSetCreateProcessNotifyRoutine callback to notify the EDR anytime a process is created. When the driver receives a notification it can then tell the EDR application to inject it’s hooking DLL into the newly created process.

Bypassing kernel callbacks is beyond the scope of this post but generally speaking they involve loading a vulnerable driver to then tamper with these callbacks. EDRSandblast is a project which is dedicated to this form of tampering.

If you are up against an EDR which makes heavy use of kernel telemetry such as Microsoft ATP the following may help:

From a user land perspective, we can add time delays between API calls which may help to de-correlate kernel level telemetry. An example of this is demonstrated in the DripLoader project which introduces time delays between the NtAllocateVirtualMemory, NtWriteVirtualMemory, and NtProtectVirtualMemory API calls.
Implement thread stack spoofing into your loader. An example of this technique can be found here.

ETW

Event Tracing for Windows (ETW) is described by Microsoft as “an efficient kernel-level tracing facility that lets you log kernel or application-defined events to a log file”.

ETW has components in both kernel land and user land, with the previously mentioned Kernel callbacks representing the kernel land component, and this section focusing on the user land component contained within ntdll.dll.

Essentially ETW is extremely verbose logging of everything an application does.

Monitoring a .NET shellcode runner with ETW below we are able to see events for the allocation of memory (CreateSegment) and the creation of a thread (ThreadCreated).

ETW generating an event for allocation of memory (CreateSegment)

ETW generating an event for the creation of a thread (ThreadCreated)

Additionally, as shown by Adam Chester in this blog, ETW can also detect malicious .NET code by analyzing assembly names, namespaces, class names, and method names. All of this telemetry can be used by the EDR product in it’s decision making to identify malicious code.

Patching ETW

Fortunately for us the user land ETW events are generated from user land within ntdll.dll by the process itself via various API calls, and because we control the process in user land we can tamper with this telemetry.

The easiest way to tamper with this form of telemetry is by patching ETW APIs with a “ret” instruction, so that anytime they are called instead of generating an event the program returns to the next instruction.

Below is an example of a program which will patch the EtwEventWrite API found in ntdll.dll in a local x64 process.

We obtain a handle to the current process via the GetCurrentProcess API
We find the address of the EtwEventWrite API via the GetProcAddress API
We change the memory permissions of the EtwEventWrite API to RW via the NtProtectVirtualMemory API
We overwrite the first few bytes of the EtwEventWrite API with the hex opcode stored in the patch variable via the NtWriteVirtualMemory API. The hex opcode will zero the rax register via xor rax, rax and then follow it with a ret instruction.
Lastly, we change the memory permissions of the EtwEventWrite API back to RX.

Code to patch EtwEventWrite in a local process

Before patching EtwEventWrite:

After patching EtwEventWrite:

Because ntdll.dll is required to be at the same base address system-wide on Windows, the EtwEventWrite API will be found at the same address in a local process and a remote process. To patch EtwEventWrite in a remote process we can reuse the GetProcAddress API call to locate the memory address of EtwEventWrite, and then replace the first argument in the NtProtectVirtualMemory and NtWriteVirtualMemory API calls above with a handle to a remote process.

This can be done by calling the NtOpenProcess API instead of GetCurrentProcess, which will return a handle to a remote process instead of the local process.

Below is an implementation of this and an example of a program which will patch the EtwEventWrite API in a remote x64 process.

Code to patch EtwEventWrite in a remote process

Parent Child Process Relationships

Parent & child process relationships can be analyzed to potentially detect malicious behavior. For example Word or Excel trying to spawn a PowerShell process would likely be blocked even if no malicious activity was being performed.

Additionally, if we injected Cobalt Strike shellcode into svchost, and then began trying to execute shell commands from our beacon, command prompt would spawn as a child of svchost which would be suspicious.

Fortunately, PPID spoofing exists and when creating a process we can specify which parent process it should spawn as a child of (this is also what Cobalt Strike’s “ppid” command does under the hood).

Below is an example of a program which will spawn a wermgr process as a child of chrome in a suspended state.

Code to spawn a suspended wermgr process as a child of chrome

wermgr spawned in a suspended state as a child of chrome

For more information on PPID spoofing I recommend checking out ired.team’s blog.

Static Analysis

At a high level most antivirus and EDR products will make use of some form of static or dynamic analysis to try to identify if a program is malicious.

Static analysis involves analyzing files for known bad check sums, strings, byte sequences, etc.

We can evade static analysis by changing function names, removing comments and ASCII art, and ensuring any shellcode is encrypted. One thing to note is that encrypting shellcode will increase the entropy of our loader. If the entropy of our loader exceeds some threshold set by the EDR product (usually around 6.7–7.0) it may be blocked from executing.

Dynamic Analysis

Dynamic analysis may involve executing a program in a sandbox to monitor actions it performs such as modifying a registry key or creating a connection to a remote server. Sandbox environments usually cannot emulate a real environment though so we can implement logic checks to determine if our program is being ran in a sandbox, and if so, exit or do non-malicious activity.

Some checks we can perform are:

Is the system domain joined?
Does the system have less than 4GB of RAM?
Does the system have less than 2 processors?
Is the time zone set to UTC?

Sample code snippets on how to implement these checks can be found here.

Additionally, if the loader is written in any of the following languages Microsoft’s Antimalware Scan Interface (AMSI) can also be utilized to detect malicious activity in memory.

PowerShell
JScript
VBScript
C#
VBA

When a program is ran, amsi.dll is injected into the process and utilizes the AmsiScanBuffer and AmsiScanString functions to detect malware.

Similarly to how we patched EtwEventWrite, the AMSI functions can also be patched as amsi.dll is loaded into our user land process which we control.

Rasta Mouse created a C# patch which can be found here, and for PowerShell we can leverage the amsi.fail project.

Generally, if your code is being detected by AMSI it is better to alter your code than to patch AMSI, as it is highly signatured and almost guaranteed to generate an alert.

Spoofing Code Signing Certificates

Some EDR products will not check whether a code signing certificate is valid or not or a vendor name may be included in a whitelist, so we can apply a spoofed certificate to our loader to make it appear more legitimate.

Some tools we can use to perform this are CarbonCopy, SigThief, and LimeLighter.

Note: While some EDR products do not validate code signing certificates, some EDR products do. In this case having a spoofed code signing certificate can draw more scrutiny from the EDR product, especially if it is the same signature as the EDR product, i.e applying a spoofed Microsoft certificate if the EDR is Defender ATP.