IEnumerable, It's Not a Collection.

·

4 min read

IEnumerable<T> has become one of the most popular ways not only to manipulate collections, but to pass them around as a read only structure. The power of LINQ and the safety of read-only immutable structures make it an excellent choice when modeling your domain.

But with great power, comes great responsibility, and IEnumerable<T> has it's fair share of quirks to trip up unsuspecting developers.

It's an Enumerable, not a Collection

While we often use IEnumerable<T> as a way to pass around collections, especially if we aren't picky about the underlying data structure. Maybe it's an T[], maybe it's a List<T>. We don't know and we (allegedly) don't care.

But there are lots of objects that implement the IEnumerable<T> interface and they aren't collections, and thus don't behave how we might expect them to. Often times developers will be unaware that these concrete types are in their codebases, assuming that whatever IEnumerable object they are passing around represents an actual collection. Even if you've always specified a list or array, something as simple as using LINQ can open the door to some of these other types.

These types can get us in to trouble by assuming more than the IEnumerable<T> interface promises.

Some Enumerables are lazy

Consider the following code:

public class Batter
{
  public string Name { get; }
  public bool IsAtBat { get; private set; }
  public bool IsOnDeck { get; private set; }

  public Batter(string name)
  {
    Name = name;
    IsOnDeck = false;
  }

  public void MarkAsOnDeck()
  {
    IsOnDeck = true;
  }

  public void MarkAsAtBat()
  {
    IsOnDeck = false;
    IsAtBat = true;
  }
}

public class BattingOrder
{
  public IEnumerable<Batter> BattingOrder { get; }

  public BattingOrder(IEnumerable<string> battingOrderNames)
  {
    BattingOrder = CreateBatters(battingOrderNames);
    BattingOrder.First().MarkAsAtBat();
    BattingOrder.Skip(1).First().MarkAsOnDeck();
  }

  public IEnumerable<Batter> CreateBatters(IEnumerable<string> battingOrderNames)
    => battingOrderNames.Select(name => new Batter(name));

  public Batter GetCurrentBatter()
    => BattingOrder.First(batter => batter.IsAtBat);
}

This code contains a bug, but not in any of the domain logic that's expressed.

The first Batter in the batting order should be at bat. The second Batter in the batting order should be on deck.

But when GetCurrentBatter() is called, the function fails. It can't find a batter that is currently at bat.

How can that be? We created the IEnumerable<Batter> instance, took the first one and marked it as at bat, and took the second one and marked it as on deck.

The key is right here

  public IEnumerable<Batter> CreateBatters(IEnumerable<string> battingOrderNames)
    => battingOrderNames.Select(name => new Batter(name));

In this method we use the Select() method to convert an enumerable of string, to an enumerable of Batter, and that's just it. We have an enumerable of batters which makes no guarantees about being a collection. In this specific case, the returned enumerable is lazy. Meaning every time we iterate through it (or call some specific LINQ methods on it) we're going to re-execute the mapping, which means we're constantly getting new references to Batter objects.

When we call this method here

  public BattingOrder(IEnumerable<string> battingOrderNames)
  {
    BattingOrder = CreateBatters(battingOrderNames);
    BattingOrder.First().MarkAsAtBat();
    BattingOrder.Skip(1).First().MarkAsOnDeck();
  }

We're creating a lazy enumerable, then enumerating it twice. We're grabbing the first enumeration, then the second time, we're creating a new enumerable that skips the first enumeration, and then enumerating it to grab the second one, but remember each time we enumerate the BattingOrder enumerable, we create new instances of Batter objects. Meaning the "first" Batter we skip over in the last line, isn't the same Batter instance we marked as at bat on the previous line.

So finally when we call this method

public Batter GetCurrentBatter()
    => BattingOrder.First(batter => batter.IsAtBat);

We're enumerating the enumerable again, creating a whole new set of Batter instances, none of which have been marked as at bat. This seemingly straight forward and readable code, contains a very hard to spot bug, that causes it to always fail.

What to do instead

What's the solution to avoiding these kinds of errors? If we know we're going to mutate an object (such as marking a Batter as at bat), then we must necessarily have a constant reference to the object being mutated. To ensure that any enumerable we obtain will preserve these references, it's a good idea to copy the enumerable into a private collection, which we know will preserve instances. In our case, we need just a single method call to fix our error.

  public BattingOrder(IEnumerable<string> battingOrderNames)
  {
                                                    //Copy enumerable into a list
                                                    //to preserve references
    BattingOrder = CreateBatters(battingOrderNames).ToList();
    BattingOrder.First().MarkAsAtBat();
    BattingOrder.Skip(1).First().MarkAsOnDeck(); 
  }

With this change, even though the BattingOrder variable is still types as an IEnumerable<Batter> we know that the object itself is an instance of List<Batter> which means we can be confident that calling .First() will always return the same object instance, that is, the first item in the list.

If you get passed an IEnumerable<T> as a parameter, it's almost always a good idea to call .ToList() or .ToArray() on it before assigning to variable, property or field. Not only does this ensure our reference isn't mutated by someone else, we also know that any mutations we might perform will persist.