Overhead of a .NET array?

Question

Here’s a slightly neater (IMO) short but complete program to demonstrate the same thing:

using System;

class Test
{
    const int Size = 100000;

    static void Main()
    {
        object[] array = new object[Size];
        long initialMemory = GC.GetTotalMemory(true);
        for (int i = 0; i < Size; i++)
        {
            array[i] = new string[0];
        }
        long finalMemory = GC.GetTotalMemory(true);
        GC.KeepAlive(array);

        long total = finalMemory - initialMemory;

        Console.WriteLine("Size of each element: {0:0.000} bytes",
                          ((double)total) / Size);
    }
}

But I get the same results – the overhead for any reference type array is 16 bytes, whereas the overhead for any value type array is 12 bytes. I’m still trying to work out why that is, with the help of the CLI spec. Don’t forget that reference type arrays are covariant, which may be relevant…

EDIT: With the help of cordbg, I can confirm Brian’s answer – the type pointer of a reference-type array is the same regardless of the actual element type. Presumably there’s some funkiness in object.GetType() (which is non-virtual, remember) to account for this.

So, with code of:

object[] x = new object[1];
string[] y = new string[1];
int[] z = new int[1];
z[0] = 0x12345678;
lock(z) {}

We end up with something like the following:

Variables:
x=(0x1f228c8) <System.Object[]>
y=(0x1f228dc) <System.String[]>
z=(0x1f228f0) <System.Int32[]>

Memory:
0x1f228c4: 00000000 003284dc 00000001 00326d54 00000000 // Data for x
0x1f228d8: 00000000 003284dc 00000001 00329134 00000000 // Data for y
0x1f228ec: 00000000 00d443fc 00000001 12345678 // Data for z

Note that I’ve dumped the memory 1 word before the value of the variable itself.

For x and y, the values are:

The sync block, used for locking the hash code (or a thin lock – see Brian’s comment)
Type pointer
Size of array
Element type pointer
Null reference (first element)

For z, the values are:

Sync block
Type pointer
Size of array
0x12345678 (first element)

Different value type arrays (byte[], int[] etc) end up with different type pointers, whereas all reference type arrays use the same type pointer, but have a different element type pointer. The element type pointer is the same value as you’d find as the type pointer for an object of that type. So if we looked at a string object’s memory in the above run, it would have a type pointer of 0x00329134.

The word before the type pointer certainly has something to do with either the monitor or the hash code: calling GetHashCode() populates that bit of memory, and I believe the default object.GetHashCode() obtains a sync block to ensure hash code uniqueness for the lifetime of the object. However, just doing lock(x){} didn’t do anything, which surprised me…

All of this is only valid for “vector” types, by the way – in the CLR, a “vector” type is a single-dimensional array with a lower-bound of 0. Other arrays will have a different layout – for one thing, they’d need the lower bound stored…

So far this has been experimentation, but here’s the guesswork – the reason for the system being implemented the way it has. From here on, I really am just guessing.

All object[] arrays can share the same JIT code. They’re going to behave the same way in terms of memory allocation, array access, Length property and (importantly) the layout of references for the GC. Compare that with value type arrays, where different value types may have different GC “footprints” (e.g. one might have a byte and then a reference, others will have no references at all, etc).
Every time you assign a value within an object[] the runtime needs to check that it’s valid. It needs to check that the type of the object whose reference you’re using for the new element value is compatible with the element type of the array. For instance:
```
object[] x = new object[1];
object[] y = new string[1];
x[0] = new object(); // Valid
y[0] = new object(); // Invalid - will throw an exception
```

This is the covariance I mentioned earlier. Now given that this is going to happen for every single assignment, it makes sense to reduce the number of indirections. In particular, I suspect you don’t really want to blow the cache by having to go to the type object for each assigment to get the element type. I suspect (and my x86 assembly isn’t good enough to verify this) that the test is something like:

Is the value to be copied a null reference? If so, that’s fine. (Done.)
Fetch the type pointer of the object the reference points at.
Is that type pointer the same as the element type pointer (simple binary equality check)? If so, that’s fine. (Done.)
Is that type pointer assignment-compatible with the element type pointer? (Much more complicated check, with inheritance and interfaces involved.) If so, that’s fine – otherwise, throw an exception.

If we can terminate the search in the first three steps, there’s not a lot of indirection – which is good for something that’s going to happen as often as array assignments. None of this needs to happen for value type assignments, because that’s statically verifiable.

So, that’s why I believe reference type arrays are slightly bigger than value type arrays.

Great question – really interesting to delve into it 🙂

Leave a Comment Cancel reply