Refactoring with MoreLINQ - Thinking in LINQ: Harnessing the power of functional programing in .NET applications (2014)

Thinking in LINQ: Harnessing the power of functional programing in .NET applications (2014)

Chapter 5. Refactoring with MoreLINQ

In the preceding chapter, you saw how LINQ can help replace existing loops. Loop constructs can sometimes range from difficult to impossible to comprehend, especially when nested. This chapter extends the loop-to-LINQ replacement concept by showing how the open source LINQ API called MoreLINQ can help you refactor legacy code. By going beyond the core LINQ operators, the MoreLINQ API offers a wide range of operators that you can readily use to replace looping or looping/branching logic. After reading this chapter, you should be able to rewrite such code by using methods found in the MoreLINQ API.

5-1. Getting MoreLINQ

MoreLINQ is a library written by Jon Skeet (see https://twitter.com/jonskeet), which you can find on Google Code (http://code.google.com/p/morelinq.files/source/browse/).

The following sections show how looping constructs might have been written to solve specific problems, and how the MoreLINQ operators can help you refactor those loops.

5-2. Using the Scan Operator

The Scan operator applies a function cumulatively on a sequence, yielding the result at each stage.

Problem

Find the cumulative sum of a given integer sequence.

Solution

Use the Scan operator instead of using loops. The following table shows a loop solution in the left column, and the equivalent MoreLINQ solution, using Scan, on the right.

Loop

MoreLINQ

int[] numbers = {1,2,3,4};
int[] sums = new int[numbers.Length];
for(int i=0;i<numbers.Length;i++)
{
for(int j = 0;j<=i;j++)
sums[i]+=numbers[j];
}
sums.Dump();

int[] numbers = {1,2,3,4};

numbers.Scan((a,b)=>a + b).Dump();

How It Works

The output of this code is the cumulative sum of the sequence 1,2,3,4, as shown in Figure 5-1.

image

Figure 5-1. Result of the cumulative sum operation

At first glance, Scan might look similar to Aggregate. But Aggregate runs the folding function provided on every element of the collection and returns a final value. In contrast, Scan returns the sum of elements up to the current number, for each number, until it reaches the end of the list. So the difference is that the last element of the result obtained from a Scan call is the same as the result of an Aggregate call.

5-3. Using the Slice Operator

The Slice operator extracts a slice from a sequence. The first argument to Slice is the starting index, and the second is the run length of the sequence segment to be extracted.

Problem

Extract four elements, starting at the fourth element.

Solution

Use the Slice operator rather than using loops and doing the bookkeeping yourself.

Loop

MoreLINQ

int[] values = {1,2,3,4,5,6,7,8,9,10};

int k = 4;

int start = 3;

int[] slice = new int[k];

for(int i = start,j = 0 ; i< start + k ;i++,j++)

slice[j] = values[i];

slice.Dump();

int[] values = {1,2,3,4,5,6,7,8,9,10};

int[] slice = values.Slice(3,4).ToArray();

How It Works

You can think of this method as a shortcut to the Skip(m).Take(n) idiom, where you simply pass m and n as the arguments to Slice.

5-4. Using the Interleave Operator

The Interleave operator joins two sequences, taking elements from each sequence alternately.

Problem

Join two integer sequences such that each element in the resultant sequence is taken in turn from one of the source sequences. You can imagine that these integer sequences are formed by the contents of packets arriving over the network, and the processor must pick one element from each sequence, alternating between sequences. Another way to imagine this is as a kind of simple load-balancing strategy.

Solution

Use the Interleave operator instead of loops.

Loop

MoreLINQ

int[] n1 = {1,3,4,5};
int[] n2 = { 4,6};

int[] total = new

int[n1.Length + n2.Length];
int first = 0; int second = 0;
int index = 0; for(;index< n1.Length+n2.Length;index++)
{
if(index % 2 == 0 )
{
//Pick element from the first
if(first!=n1.Length)
{
total[index] = n1[first];
first++;
}
else {
if (second != n2.Length)
{
total[index] = n2[second];
second++;
}
}
}
else
{
//Pick element from the second
if(second!=n2.Length)
{
total[index] = n2[second];
second++;
}
else
{
if (first != n1.Length)
{
total[index] = n1[first];
first++;
}
}
}
}
total.Dump();

int[] n1 = {1,3,4,5};
int[] n2 = { 4,6};
int[] total = n1.Interleave(n2)
.ToArray();

How It Works

In Figure 5-2, the first sequence is longer than the second sequence. The first sequence consists of dark circles, and the second sequence consists of light circles. The resulting sequence created by Interleave takes one element from each source sequence in turn, so the result has one dark circle followed by a light one, and so on. This goes on until the operation reaches the end of the shorter sequence (Sequence #2, in this case), after which the code simply appends elements from the longer sequence until it reaches the end of that sequence as well.

image

Figure 5-2. How Interleave() works to generate the interleaved sequence

5-5. Using the Windowed Operator

The Windowed operator generates content as a collection of collections, creating several intermediate collections for a sliding window of a given size from any given collection.

Problem

Find the moving average from a given sequence in which the moving window size is 2.

Solution

Use the Windowed operator to generate a sequence of sequences instead of using a nested looping construct.

Loop

MoreLINQ

int[] values = {1,2,3,4,5,6,7,8,9,10,11};
List<List<int>> windowVals = new List<List<int>>();
for(int i=0;i<values.Length-1;i++)
{
List<int> inner = new List<int>();
for(int j=i;j<i+2;j++)
inner.Add(values[j]);
windowVals.Add(inner);
}
List<double> movingAvgs = new List<double>();
for(int i=0;i<windowVals.Count;i++)
{
double avg = 0;
for(int j = 0;j<windowVals[i].Count;j++)
avg += windowVals[i][j];
movingAvgs.Add(avg/windowVals[i].Count);
}
movingAvgs.Dump("Moving Averages using loops");

int[] values = {1,2,3,4,5,6,7,8,9,10,11};

values.Windowed(2)

.Select(list => list.Average())

.Dump("Moving Averages");

How It Works

Both the looping example and the Windowed example generate the output shown in Figure 5-3.

image

Figure 5-3. Moving average of numbers 1 to 11 with two numbers taken at a time

Windowed generates an enumerable of enumerables, in which each enumerable holds the values that result from sliding the window. Figure 5-4 shows how Windowed works in more detail. The underline denotes the sliding window.

image

Figure 5-4. How the sliding window moves and generates intermediate collections

The call to Select (list => list.Average()) projects the average of each list generated by Windowed.

5-6. Using the Cartesian Operator

The Cartesian operator finds the Cartesian product of a series of collections. As in the other examples in this chapter, this can help you replace vanilla nested loops with more-readable code.

Problem

Find the volumes of all parallelepipeds from a given set of lengths, breadths, and heights.

Solution

Use the Cartesian operator instead of three nested loops.

Loop

MoreLINQ

int[] lengths = {1,2,3,4,5,6,7};

int[] breadths = {1,1,2,3,1,3};

int[] heights = {2,1,3,1,4};

List<int> volumes = new List<int>();

for(int r = 0;r<lengths.Length; r++)

{

for(int c = 0; c< breadths.Length; c++)

{

for(int z=0;z<heights.Length;z++)

{

volumes.Add(lengths[r]*breadths[c]*heights[z]);

}

}

}

int[] lengths = {1,2,3,4,5,6,7};

int[] breadths = {1,1,2,3,1,3};

int[] heights = {2,1,3,1,4};

List<int> volumesLINQ = lengths

.Cartesian(breadths, (b,l)=> b * l)

.Cartesian(heights, (a,b)=> a * b)

.ToList();

volumesLINQ.Take(10).Dump();

How It Works

Nested loops are sometimes referred to as bow-and-arrow patterns, because they assume a shape similar to a bow and arrow. As shown in the preceding code, you can replace such a bow and arrow pattern with a call to Cartesian. The example on the left in the Solution section above shows applying Cartesian with three nested loops; however, you can use the MoreLINQ Cartesian command to generate Cartesian products nested as deeply as required. Cartesian takes two arguments. The first is an enumerable over which the command will iterate, while the second is a lambda expression that represents the inner calculation. If you have single statements inside the innermost loop, you can translate those loops into a sequence of calls to Cartesian. Figure 5-5 shows the output.

image

Figure 5-5. The first 10 calculated volumes of the set of parallelepipeds

5-7. Using the Partition Operator

The Partition operator partitions a sequence according to a given number of items.

Problem

Let’s say that you have a 10-element array and you want partition the array into three sections such that the first part holds 30 percent of the elements (three, in this case), the middle section holds 60 percent of the elements (six, in this case), and the last part holds 10 percent of the elements (one, in this case).

Solution

Loop

MoreLINQ

int[] values = {1,2,3,4,5,6,7,8,9,10};

//We want a List<List<int>>

//with values distributed according to the

//given percentages.

//This means there will need to be three lists

//where the first one will have 30% elements

//second one 60% and the last one 10%.

int[] percentages = {30,60,10};

int[] numbersOfItems = new int[percentages.Length];

for(int i = 0;i<percentages.Length;i++)

numbersOfItems[i] = (int) Math.Floor( (double)values.Length*percentages[i]/100);

List<List<int>> distributions = new List<List<int>>();

for(int i = 0;i<numbersOfItems.Length;i++)

{

List<int> innerList = new List<int>();

if(i==0)

{

for(int j=0;j<numbersOfItems[i];j++)

innerList.Add(values[j]);

}

else

{

int index = 0;

for(int k = 0;k<i;k++)

index+= numbersOfItems[k];

for (int j = index; j < index +

numbersOfItems[i];j++ )

innerList.Add(values[j]);

}

distributions.Add(innerList);

}

distributions.Dump("Partitioned as per percentage");

int[] values = {1,2,3,4,5,6,7,8,9,10};

int[] percentages = {30,60,10};

List<IEnumerable<int>> distributions = new List<IEnumerable<int>>();

int[] numbersOfItems = percentages

.Select (n=> (int) Math.Floor( (double)values.Length*n/100)).ToArray();

distributions = values.Partition(numbersOfItems)

.ToList();

distributions.Dump("Partitioned as per percentage");

How It Works

Both versions generate the output shown in Figure 5-6.

image

Figure 5-6. The result of partitioning an array with the given distribution percentage

Partition has two overloaded versions. The first version takes a single integer that represents a single size for all the partitions. The other overloaded version takes an IEnumerable<int> whose values represent different sizes for the required partitions. The example first obtains the number of elements required in each partition by calculating that from the given percentages, and stores those values in the numbersOfItems array, which is passed to the second overload of Partition.

5-8. Using the Index Operator

The Index operator returns an enumeration with the index and the value at each index for the given collection.

Problem

Keep track of the index for each character in a given string.

Solution

Loop

MoreLINQ

string input = "LINQ";
List<KeyValuePair<int,char>> indices = new
List<KeyValuePair<int,char>>();
for(int i = 0; i< input.Length; i++)
{
indices.Add(new KeyValuePair<int,char>(i,input[i]));
}
indices.Dump();

char[] chars = "LINQ".ToCharArray(); chars.Index().Dump();

Both of these examples produce the result shown in Figure 5-7. For the looping construct, however, the result is strongly typed to be a List<KeyValuePair<int, char>>.

image

Figure 5-7. The key/value pairs for the word LINQ. The Key columnn shows the index of the letters, while the Value column represents the character values at those indices

How It Works

Index returns an IEnumerable<int,T> when used on a collection of type T. This operator might not seem so useful, but it can be. For example, in the next section, you will see how Index can help write generic code to remove consecutive duplicates from a given collection, which is a rather common programming task.

5-9. Using the PairWise Operator

The PairWise operator performs an action over two neighboring/consecutive elements of a given collection and returns an enumerable with the result of all such actions performed across the collection. In other words, if you need to perform an operation on each pair of elements, starting from the leftmost element, then PairWise is the operator you need. I won’t show the looping code to do this, but you’ve probably written similar loops before.

Problem

Remove consecutive repeated characters from a string. If you want to experiment further, see if you can use the same method to remove consecutive identical dates.

Solution

The logic for this can be wrapped inside a LINQ extension, as shown here:

public static class MyLinqEx
{
public static IEnumerable<T> RemoveConsecutiveDuplicates<T>(this IEnumerable<T> input)
where T:IComparable
{
var conditions = input.Pairwise((a,b)=>a.Equals(b));
var dontPickIndices = conditions.Index()
.Where (c => c.Value==true)
.Select(k => k.Key);
return Enumerable.Range(0,input.Count())
.Where (e => !dontPickIndices.Contains(e))
.Select(k => input.ElementAt(k));
}
}

Here is how you can use this operator:

void Main()
{
"LLIIIINNQQ".ToCharArray().RemoveConsecutiveDuplicates().Dump();
}

How It Works

The extension is built around two basic building blocks, Index and PairWise operator. The PairWise operator generates the output in Figure 5-8 by comparing each pair of letters in the input to see whether they’re the same.

image

Figure 5-8. The state of neighborhood similarity of two characters

Next, the Index operator generates the output shown in Figure 5-9.

image

Figure 5-9. The indices of neighborhood similarity check result

By selecting the keys for which the value is true, you would get the indices that have the same characters as those of their next consecutive neighbor. Conversely, by avoiding these indices, you get a list of consecutive nonrepeating characters. The variable dontPickIndices holds the true indices.

5-10. The ForEach Operator

The ForEach operator performs an action on all elements of a given sequence. Without this, the result would need to be projected to a strongly typed list (List<T>) to run an operation on each member, which can lead to performance issues when the list is large.

Problem

Perform an operation on each element in an enumerable.

Solution

Use the ForEach operator instead of looping or converting the enumerable to a strongly typed list.

Loop

MoreLINQ

int[] numbers = {1,2,8,7,5,6,4,3};

Action<int> ack = a =>
Console.WriteLine(

DateTime.Today.AddDays(a)

.DayOfWeek);

foreach (var integer in numbers)

ack.Invoke(integer);

int[] numbers = {1,2,8,7,5,6,4,3}; Action<int> ack = a => Console.WriteLine(DateTime.Today.AddDays(a).DayOfWeek); numbers.ForEach(ack);

How It Works

The ForEach operator takes an action that it then invokes on each member of the collection on which ForEach is being invoked.

5-11. Using the MinBy/MaxBy Operator

The MinBy operator finds the value from the given source collection that results in the minimum value of a given function. The LINQ standard query operator Min finds the minimum value of the generated values from the given collection by the given formula.

Problem

Find the value from a set of values that minimizes the given function. Here’s an example: Assume you have a list of distances from various cities to a zoo stored in an array—and a tiger has escaped from the zoo. The zoo authorities want to notify people in the city closest to a 10 kilometer radius around the zoo.

Solution

Loop

MoreLINQ

int[] distances = {23,41,11,34,45};

int x = distances[0];

for(int i = 0;i<distances.Length;i++)

if(distances[i]-10<x-10)

x = distances[i];

x.Dump();

int[] distances = {23,41,11,34,45};
//The value that minimizes the given function f(x) = x - 10
//in this case distances.
MinBy(a => a - 10).Dump();
//The minimum value of the values projected by the given formula distances.Min (a => a - 10).Dump();

The output of the loop code is 11, while the output of the LINQ code is 11 and then 1. The outputs differ because I added a call to the LINQ Min() operator at the end of the LINQ code to help make the difference between these two operators clear. The extra 1 is the result of the Min()call.

How It Works

The LINQ code generates 11 and then 1 as output. The value 11 minimizes the function f(x) = x – 10, denoted by the lambda expression a => a – 10.

The final call to the LINQ operator Min() generates 1 because that’s the minimum value obtained from the projection of all values in the array distance by the function a – 10. By virtue of this internal projection, the values will be {13, 31, 1, 24, 35}. Because 1 is the minimum value in that set, it’s the value that Min() returns.

Similar to MinBy(), MoreLINQ offers another operator called MaxBy(), which finds the value that maximizes a given function.

Summary

There are several other operators available in MoreLINQ. However, the ones covered here are the most interesting, and for the most part, are not easily reproducible by combining the standard LINQ operators. However, I recommend you explore the library documentation for yourself. The project has a very good unit test suite, which can help you figure out what the code does.