How to debug corruption in the managed heap

Recently, I was faced with a managed heap corruption which was something new to me. I was very frustrated with it and had to learn many things to be able to debug it. I want to thank Seva Titov who gave me right direction to start. His answer was concise and very helpful. I want to log the actions I have taken to debug the problem for my own reference. Probably this will be helpful for others who are new to this.

Debug Heap Corruption in .NET 4:

How to suspect the heap corruption?

Briefly:

  1. The application crashes randomly with no regards to the applied exception catching and even goes through blankets like catch(Exception) which are supposed to catch all exceptions.

  2. Examining the CLR stack in the application crash dumps shows the garbage collector on the top of the stack:

    000000001dabd8c8 000007feea129a1d [**HelperMethodFrame**: 000000001dabd8c8]
    000000001dabda00 000007fee90cfce8 System.Text.StringBuilder.ExpandByABlock(Int32)
    000000001dabda40 000007fee90cfba4 System.Text.StringBuilder.Append(Char*, Int32)
    ...
    
    EXCEPTION_RECORD:  ffffffffffffffff -- (.exr 0xffffffffffffffff)
    ExceptionAddress: 000007feea129a1d (**clr!WKS::gc_heap**::find_first_object+0x0000000000000092)
       ExceptionCode: c0000005 (Access violation)
      ExceptionFlags: 00000000
    NumberParameters: 2
       Parameter[0]: 0000000000000000
       Parameter[1]: 0000000000003d80
    ...
    
  3. The CLR stack always shows different points. Whether the crash occurred or the code which is shown is clearly irrelevant, like StringBuilder’s method which is shown to cause the exception.

For more details refer to .NET Crash: Managed Heap Corruption calling unmanaged code.

Going step by step. Each next step is used if the previous one doesn’t help.

Step 1. Check the code.

Check the code for unsafe or native code usages:

  1. Review the code for unsafe, DllImport statements.
  2. Download .NET Reflector and use it to analyze the application assemblies for PInvoke. In the same way, analyze the third-party assemblies which are used by the application.

If unsafe or native code usage is found, direct extra attention to those. The most common cause of the heap corruption in such cases is a buffer overflow or an argument type mismatch. Ensure that the buffer supplied to the native code to fill is big enough and that all arguments passed to the native code are of the expected type.

Step 2. Check if this corrupted state exception can be caught.

To handle such exceptions, one need to decorate the method which contains the catch(Exception) statement with the [HandleProcessCorruptedStateExceptions] attribute or apply the following in the app.config file:

<configuration>
    <runtime>
        <legacyCorruptedStateExceptionsPolicy enabled="true" />
    </runtime>
</configuration>

In the case the exception was caught successfully, you can log and examine it. This means this is not a corrupted heap issue.

Corrupted heap exceptions cannot be handled at all: HandleProcessCorruptedStateExceptions doesn’t seem to work.

More information on corrupted state exceptions, see All about Corrupted State Exceptions in .NET4.

Step 3. Live debugging.

In this step, we debug the crashing application live in the production environment (or where we can reproduce the crash).

Download Debugging Tools for Windows from Microsoft Windows SDK for Windows 7 and .NET Framework 4 (a web installer will be downloaded which will allow selecting the required components to install – mark all components). It will install both 32 and 64 bit (if your system is x64) versions of the required debugging tools.

Here one needs to know how to attach WinDbg to a live process, how to take crash dumps and examine them, how to load SOS extension in WinDbg (google for details).

Enable debugging helpers:

  1. Launch Application Verifier (C:\Program Files\Application Verifier – use the required edition, either x86 or x64, depending on your executable compilation mode), add your executable there in the left pane and in the right pane check one node “Basics / Heaps”. Save the changes.

  2. Launch Global Flags helper (C:\Program Files\Debugging Tools for Windows\gflags.exe – again select the correct edition, x86 or x64). Once Global Flags is started, go to the “Image File” tab and at the top text box enter the name of your executable file without any paths (for example, “MyProgram.exe”). Then press the Tab key and set the following boxes:

    • Enable heap tail checking
    • Enable heap free checking
    • Enable heap parameter checking
    • Enable heap validation on call
    • Disable heap coalesce on free
    • Enable page heap
    • Enable heap tagging
    • Enable application verifier
    • Debugger (type the path to the installed WinDbg in the text box to the right, for example, C:\Program Files\Debugging Tools for Windows (x64)\windbg.exe -g).

    For more details, refer to Heap Corruption, Part 2.

  3. Go to “Control Panel/System and Security/System” (or right-click “Computer” in the Start menu and select “Properties”. There click “Advanced system settings”, in the displayed dialog, go to “Advanced” tab and click the “Environment Variables” button. In the displayed dialog, add a new System variable (if you are an system administrator – a User variable otherwise – you need need to logout/login in this case). The required variable is “COMPLUS_HeapVerify” with a value of “1”. More details can be found in Stack Overflow question .NET/C#: How to set debugging environment variable COMPLUS_HeapVerify?.

Now we are ready to start debugging. Start the application. WinDbg should start automatically for it. Leave the application running until it crashes into WinDgb and then examine the dump.

TIP: To quickly remove Global Flags, Application Verifier and the debugger attachment settings, delete the following key in the registry:
x64 – HKEY_LOCAL_MACHINE\SOFTWARE\Wow6432Node\Microsoft\Windows NT\CurrentVersion\Image File Execution Options\*YourAppName*

Step 4. Enable MDAs.

Try to use the Managed Debugging Assistants. Details are in Stack Overflow question What MDAs are useful to track a heap corruption?.

MDAs must be used along with WinDbg. I used them even along with Global Flags and Application Verifier.

Step 5. Enable GCStress.

Using GCStress is an extreme option, because the application becomes almost unusable, but it is still a way to go. More details are in GCStress: How to turn on in Windows 7?.

Step 6. Compile for x86.

If your application is currently being compiled for “Any CPU” or “x64” platform, try to compile it for “x86” if there is no difference for you which platform to use. I saw this reported to solve the problem for somebody.

Step 7. Disable concurrent GC – this is what worked for me

There is a reported known issue in .NET 4 reported in the thread Access Violation in .NET 4 Runtime in gc_heap::garbage_collect with no unmanaged modules. The problem can be solved by disabling the concurrent GC in the app.config file:

<?xml version="1.0"?>
<configuration>
    <runtime>
        <gcConcurrent enabled="false" />
    </runtime>
</configuration>

Leave a Comment

tech