top of page

Array iteration performance in C#

Implementing the sum of the items in an array is very simple. I think most developers would implement it this way:

static int Sum(int[] array)
{
    var sum = 0;
    for (var index=0; index<array.Length; index++)
        sum += array[index];
    return sum;
}

There’s actually a simpler alternative in C#:

static int Sum(int[] array)
{
    var sum = 0;
    foreach (var item in array)
        sum += item;
    return sum;
}

Another alternative is to use the Sum() operation provided by LINQ. It can be applied to any enumerable, including arrays.


So, how do these fair in terms of performance?


You can see that, comparing to using a for loop, using a foreach loop is almost 1.4 times faster, while using LINQ is almost 7 times slower and allocates on the heap. Why is that?

NOTE: All benchmarks were performed on .NET 6.0 Preview 5. The results may vary on other target frameworks.


LINQ


If you check the source code for the Sum() in System.Linq, you’ll find that it uses a foreach loop. So, if using a foreach is faster than a for, why is it so slow in this case?


Notice that Sum() is an extension method for the type IEnumerable<int>. Unlike the Where() operation, Sum() does not have a special case for when the source in an array.

ReSharper and Roslyn analyzers may suggest changing source.Where(predicate).First() to source.First(predicate). IGNORE THEM when source is an array! First(predicate) doesn’t have a special case for when the source in an array, making it slower.

Sum() implementation in LINQ expands to something like this:

static int Sum(IEnumerable<int> source)
{
    var sum = 0;
    IEnumerator<int> enumerator = source.GetEnumerator();
    try    
    {
        while(enumerator.MoveNext())
            sum += enumerator.Current;    
    }
    finally    
    {
        enumerator.Dispose()    
    }
    return sum;
}

There are several performance issues with this code.


GetEnumerator(), for an interface, allocates an enumerator on the heap. It requires a try/finally to dispose the enumerator, making it impossible to inline this method.


For each item in the collection, it calls the MoveNext() method and the Current property, which are virtual calls in the case of interfaces. This makes the enumeration of the array almost 7 times slower.

ReSharper and Roslyn analyzers may suggest changing parameters of type array to an interface, IGNORE THEM! Create an overload if you need to support other enumerable types.


Foreach

How can foreach be faster than a for loop?


The foreach in C# usually generates code very similar to the one expanded for the LINQ case. In case of arrays, it’s handled differently. It expands to code equivalent to the one generated for a for loop.


So, how can it still be faster than in the first example?


The code generated for the foreach is slightly different. It uses a temporary variable for the array. It then uses the Length property and the indexer of this variable. This way the JIT compiler can remove bounds checking throughout the iteration.

static int Sum(int[] array)
{
    var source = array;
    var sum = 0;
    for (var index = 0; index<source.Length; index++)  
    {
        var item = source[index];
        sum += item;  
    }
    return sum;
}


NOTE: The temporary variable should only be needed for non-local variables. It’s been pointed out in the comments that array is a parameter, so it’s already a local variable. For some reason RyuJIT misses the optimization in this case.

So, how do these fair in terms of performance?


As you can see, the optimized for version has the same performance as foreach. Having temporary variables may actually improve performance.

ReSharper and Roslyn analyzers may suggest to inline the temporary variable, IGNORE THEM!


Slice

Sometimes we may want to iterate just a portion of the array. I think most developers would implement the following:

static int Sum(int[] source, int offset, int count)
{
    var sum = 0;
    for (var index = offset; index<offset+count; index++)
        sum += source[index];
    return sum;        
}

Once again, there’s a simpler alternative:

static int Sum(ReadOnlySpan<int> source)
{
    var sum = 0;
    foreach (var item in source)
        sum += item;
    return
    sum;
}

Notice that the parameter now is of type ReadOnlySpan<T>. It represents a slice/portion of contiguous memory. This way, a portion of an array can be passed into the method. The foreach will only iterate through that portion.


So, how do these fair in terms of performance?


Both implementations have similar performance but around 1.33 time slower than when iterating the full array. If you know you’re iterating the full array, prefer the first implementation.


In this case using temporary variables doesn’t make much difference. The compiler cannot infer if offset and count are always within bounds.


SIMD

All the iteration above were sequencial, meaning that the items in the array were handled one element at the time.


CPUs may allow the simultaneous handling of multiple items using SIMD. This is available in .NET through the System.Numerics namespace, or the more recent System.Runtime.Intrinsics namespace.


Here is an implementation of Sum() using System.Numerics:

static int Sum(ReadOnlySpan<int> source)
{
    var sum = 0;
    if (Vector.IsHardwareAccelerated && 
                                    source.Length>Vector<int>.Count*2) 
    // use SIMD    
    {
        var vectors = MemoryMarshal.Cast<int, Vector<int>>(source);
        var vectorSum = Vector<int>.Zero;
        
        foreach (var vector in vectors)
            vectorSum += vector;
        
        for (var index = 0; index<Vector<int>.Count; index++)
            sum += vectorSum[index];
        
        var count = source.Length % Vector<int>.Count;
        source = source.Slice(source.Length-count, count);    
    }
    foreach (var item in source)
        sum += item;
    return sum;
}

It performs the following:

  1. Checks if SIMD is available and if it’s worth using it, otherwise go to 7.

  2. Casts the array of int to an array of Vector<int>.

  3. Creates a Vector<int> initialized to zeros (sumVector).

  4. Sums all the vectors from the array of Vector<int> into sumVector.

  5. Each sumVector item contains a partial sum. Sum them all.

  6. The length of the original array may not be a multiple of the size of Vector<int>. Calculate the number of items not handled and slice the original ReadOnlySpan<int>.

  7. Sum all the not handled items.

So, how do these fair in terms of performance?



It’s almost 5 times faster than the fastest until now.


List<T>

List<T> can also be iterated in two different ways. Using a for loop, which will use its indexer, or using a foreach or LINQ, which will use an enumerator.


To guarantee performance, we can implement one more overload of Sum() using a for loop:

static int Sum(List<int> source)
{
    var sum = 0;
    for (var index = 0; index<source.Count; index++)
        sum += source[index];
    return sum;
}

ReSharper and Roslyn analyzers may suggest changing parameters of type List<T>to an interface, IGNORE THEM! List<T> uses a value type enumerator to avoid heap allocations and virtual calls. Using the indexer will still be slower. Create an overload if you need to support other enumerable types.

Starting from .NET 5.0, there’s a method CollectionsMarshal.AsSpan<T>(List<>) that returns a reference to the List<T> inner array as a Span<T>.


We can then implement a Sum(List<int>) that calls the SIMD version of Sum(ReadOnlySpan<int>) declared above:

public int Sum(List<int> source)
    => Sum(CollectionsMarshal.AsSpan(source));

Using foreach makes it 1.28 times slower, while using AsSpan() makes it 5.6 times faster.


Conclusions

Iteration of an array is a special case for the compiler. It may optimize the iteration but small changes may confuse it. The compiler does not optimize the iteration on other types of collections.


ReSharper and Roslyn analyzer are not always to be trusted in terms of performance.


System.Linq is slow for arrays in most cases. Avoid its use when the source is an array or use one of the many alternatives.

List<T> can be iterated as a Span<T>.

Sequential processing is inherently slow. Favor the use of SIMD.



Source: Medium; Andao Almada


The Tech Platform


0 comments

Comments


bottom of page