Thinking in LINQ: Harnessing the power of functional programing in .NET applications (2014)
Chapter 5. Refactoring with MoreLINQ
In the preceding chapter, you saw how LINQ can help replace existing loops. Loop constructs can sometimes range from difficult to impossible to comprehend, especially when nested. This chapter extends the loop-to-LINQ replacement concept by showing how the open source LINQ API called MoreLINQ can help you refactor legacy code. By going beyond the core LINQ operators, the MoreLINQ API offers a wide range of operators that you can readily use to replace looping or looping/branching logic. After reading this chapter, you should be able to rewrite such code by using methods found in the MoreLINQ API.
5-1. Getting MoreLINQ
MoreLINQ is a library written by Jon Skeet (see https://twitter.com/jonskeet), which you can find on Google Code (http://code.google.com/p/morelinq.files/source/browse/).
The following sections show how looping constructs might have been written to solve specific problems, and how the MoreLINQ operators can help you refactor those loops.
5-2. Using the Scan Operator
The Scan operator applies a function cumulatively on a sequence, yielding the result at each stage.
Problem
Find the cumulative sum of a given integer sequence.
Solution
Use the Scan operator instead of using loops. The following table shows a loop solution in the left column, and the equivalent MoreLINQ solution, using Scan, on the right.
Loop |
MoreLINQ |
int[] numbers = {1,2,3,4}; |
int[] numbers = {1,2,3,4}; numbers.Scan((a,b)=>a + b).Dump(); |
How It Works
The output of this code is the cumulative sum of the sequence 1,2,3,4, as shown in Figure 5-1.
Figure 5-1. Result of the cumulative sum operation
At first glance, Scan might look similar to Aggregate. But Aggregate runs the folding function provided on every element of the collection and returns a final value. In contrast, Scan returns the sum of elements up to the current number, for each number, until it reaches the end of the list. So the difference is that the last element of the result obtained from a Scan call is the same as the result of an Aggregate call.
5-3. Using the Slice Operator
The Slice operator extracts a slice from a sequence. The first argument to Slice is the starting index, and the second is the run length of the sequence segment to be extracted.
Problem
Extract four elements, starting at the fourth element.
Solution
Use the Slice operator rather than using loops and doing the bookkeeping yourself.
Loop |
MoreLINQ |
int[] values = {1,2,3,4,5,6,7,8,9,10}; int k = 4; int start = 3; int[] slice = new int[k]; for(int i = start,j = 0 ; i< start + k ;i++,j++) slice[j] = values[i]; slice.Dump(); |
int[] values = {1,2,3,4,5,6,7,8,9,10}; int[] slice = values.Slice(3,4).ToArray(); |
How It Works
You can think of this method as a shortcut to the Skip(m).Take(n) idiom, where you simply pass m and n as the arguments to Slice.
5-4. Using the Interleave Operator
The Interleave operator joins two sequences, taking elements from each sequence alternately.
Problem
Join two integer sequences such that each element in the resultant sequence is taken in turn from one of the source sequences. You can imagine that these integer sequences are formed by the contents of packets arriving over the network, and the processor must pick one element from each sequence, alternating between sequences. Another way to imagine this is as a kind of simple load-balancing strategy.
Solution
Use the Interleave operator instead of loops.
Loop |
MoreLINQ |
int[] n1 = {1,3,4,5}; |
int[] n1 = {1,3,4,5}; |
How It Works
In Figure 5-2, the first sequence is longer than the second sequence. The first sequence consists of dark circles, and the second sequence consists of light circles. The resulting sequence created by Interleave takes one element from each source sequence in turn, so the result has one dark circle followed by a light one, and so on. This goes on until the operation reaches the end of the shorter sequence (Sequence #2, in this case), after which the code simply appends elements from the longer sequence until it reaches the end of that sequence as well.
Figure 5-2. How Interleave() works to generate the interleaved sequence
5-5. Using the Windowed Operator
The Windowed operator generates content as a collection of collections, creating several intermediate collections for a sliding window of a given size from any given collection.
Problem
Find the moving average from a given sequence in which the moving window size is 2.
Solution
Use the Windowed operator to generate a sequence of sequences instead of using a nested looping construct.
Loop |
MoreLINQ |
int[] values = {1,2,3,4,5,6,7,8,9,10,11}; |
int[] values = {1,2,3,4,5,6,7,8,9,10,11}; values.Windowed(2) .Select(list => list.Average()) .Dump("Moving Averages"); |
How It Works
Both the looping example and the Windowed example generate the output shown in Figure 5-3.
Figure 5-3. Moving average of numbers 1 to 11 with two numbers taken at a time
Windowed generates an enumerable of enumerables, in which each enumerable holds the values that result from sliding the window. Figure 5-4 shows how Windowed works in more detail. The underline denotes the sliding window.
Figure 5-4. How the sliding window moves and generates intermediate collections
The call to Select (list => list.Average()) projects the average of each list generated by Windowed.
5-6. Using the Cartesian Operator
The Cartesian operator finds the Cartesian product of a series of collections. As in the other examples in this chapter, this can help you replace vanilla nested loops with more-readable code.
Problem
Find the volumes of all parallelepipeds from a given set of lengths, breadths, and heights.
Solution
Use the Cartesian operator instead of three nested loops.
Loop |
MoreLINQ |
int[] lengths = {1,2,3,4,5,6,7}; int[] breadths = {1,1,2,3,1,3}; int[] heights = {2,1,3,1,4}; List<int> volumes = new List<int>(); for(int r = 0;r<lengths.Length; r++) { for(int c = 0; c< breadths.Length; c++) { for(int z=0;z<heights.Length;z++) { volumes.Add(lengths[r]*breadths[c]*heights[z]); } } } |
int[] lengths = {1,2,3,4,5,6,7}; int[] breadths = {1,1,2,3,1,3}; int[] heights = {2,1,3,1,4}; List<int> volumesLINQ = lengths .Cartesian(breadths, (b,l)=> b * l) .Cartesian(heights, (a,b)=> a * b) .ToList(); volumesLINQ.Take(10).Dump(); |
How It Works
Nested loops are sometimes referred to as bow-and-arrow patterns, because they assume a shape similar to a bow and arrow. As shown in the preceding code, you can replace such a bow and arrow pattern with a call to Cartesian. The example on the left in the Solution section above shows applying Cartesian with three nested loops; however, you can use the MoreLINQ Cartesian command to generate Cartesian products nested as deeply as required. Cartesian takes two arguments. The first is an enumerable over which the command will iterate, while the second is a lambda expression that represents the inner calculation. If you have single statements inside the innermost loop, you can translate those loops into a sequence of calls to Cartesian. Figure 5-5 shows the output.
Figure 5-5. The first 10 calculated volumes of the set of parallelepipeds
5-7. Using the Partition Operator
The Partition operator partitions a sequence according to a given number of items.
Problem
Let’s say that you have a 10-element array and you want partition the array into three sections such that the first part holds 30 percent of the elements (three, in this case), the middle section holds 60 percent of the elements (six, in this case), and the last part holds 10 percent of the elements (one, in this case).
Solution
Loop |
MoreLINQ |
int[] values = {1,2,3,4,5,6,7,8,9,10}; //We want a List<List<int>> //with values distributed according to the //given percentages. //This means there will need to be three lists //where the first one will have 30% elements //second one 60% and the last one 10%. int[] percentages = {30,60,10}; int[] numbersOfItems = new int[percentages.Length]; for(int i = 0;i<percentages.Length;i++) numbersOfItems[i] = (int) Math.Floor( (double)values.Length*percentages[i]/100); List<List<int>> distributions = new List<List<int>>(); for(int i = 0;i<numbersOfItems.Length;i++) { List<int> innerList = new List<int>(); if(i==0) { for(int j=0;j<numbersOfItems[i];j++) innerList.Add(values[j]); } else { int index = 0; for(int k = 0;k<i;k++) index+= numbersOfItems[k]; for (int j = index; j < index + numbersOfItems[i];j++ ) innerList.Add(values[j]); } distributions.Add(innerList); } distributions.Dump("Partitioned as per percentage"); |
int[] values = {1,2,3,4,5,6,7,8,9,10}; int[] percentages = {30,60,10}; List<IEnumerable<int>> distributions = new List<IEnumerable<int>>(); int[] numbersOfItems = percentages .Select (n=> (int) Math.Floor( (double)values.Length*n/100)).ToArray(); distributions = values.Partition(numbersOfItems) .ToList(); distributions.Dump("Partitioned as per percentage"); |
How It Works
Both versions generate the output shown in Figure 5-6.
Figure 5-6. The result of partitioning an array with the given distribution percentage
Partition has two overloaded versions. The first version takes a single integer that represents a single size for all the partitions. The other overloaded version takes an IEnumerable<int> whose values represent different sizes for the required partitions. The example first obtains the number of elements required in each partition by calculating that from the given percentages, and stores those values in the numbersOfItems array, which is passed to the second overload of Partition.
5-8. Using the Index Operator
The Index operator returns an enumeration with the index and the value at each index for the given collection.
Problem
Keep track of the index for each character in a given string.
Solution
Loop |
MoreLINQ |
string input = "LINQ"; |
char[] chars = "LINQ".ToCharArray(); chars.Index().Dump(); |
Both of these examples produce the result shown in Figure 5-7. For the looping construct, however, the result is strongly typed to be a List<KeyValuePair<int, char>>.
Figure 5-7. The key/value pairs for the word LINQ. The Key columnn shows the index of the letters, while the Value column represents the character values at those indices
How It Works
Index returns an IEnumerable<int,T> when used on a collection of type T. This operator might not seem so useful, but it can be. For example, in the next section, you will see how Index can help write generic code to remove consecutive duplicates from a given collection, which is a rather common programming task.
5-9. Using the PairWise Operator
The PairWise operator performs an action over two neighboring/consecutive elements of a given collection and returns an enumerable with the result of all such actions performed across the collection. In other words, if you need to perform an operation on each pair of elements, starting from the leftmost element, then PairWise is the operator you need. I won’t show the looping code to do this, but you’ve probably written similar loops before.
Problem
Remove consecutive repeated characters from a string. If you want to experiment further, see if you can use the same method to remove consecutive identical dates.
Solution
The logic for this can be wrapped inside a LINQ extension, as shown here:
public static class MyLinqEx
{
public static IEnumerable<T> RemoveConsecutiveDuplicates<T>(this IEnumerable<T> input)
where T:IComparable
{
var conditions = input.Pairwise((a,b)=>a.Equals(b));
var dontPickIndices = conditions.Index()
.Where (c => c.Value==true)
.Select(k => k.Key);
return Enumerable.Range(0,input.Count())
.Where (e => !dontPickIndices.Contains(e))
.Select(k => input.ElementAt(k));
}
}
Here is how you can use this operator:
void Main()
{
"LLIIIINNQQ".ToCharArray().RemoveConsecutiveDuplicates().Dump();
}
How It Works
The extension is built around two basic building blocks, Index and PairWise operator. The PairWise operator generates the output in Figure 5-8 by comparing each pair of letters in the input to see whether they’re the same.
Figure 5-8. The state of neighborhood similarity of two characters
Next, the Index operator generates the output shown in Figure 5-9.
Figure 5-9. The indices of neighborhood similarity check result
By selecting the keys for which the value is true, you would get the indices that have the same characters as those of their next consecutive neighbor. Conversely, by avoiding these indices, you get a list of consecutive nonrepeating characters. The variable dontPickIndices holds the true indices.
5-10. The ForEach Operator
The ForEach operator performs an action on all elements of a given sequence. Without this, the result would need to be projected to a strongly typed list (List<T>) to run an operation on each member, which can lead to performance issues when the list is large.
Problem
Perform an operation on each element in an enumerable.
Solution
Use the ForEach operator instead of looping or converting the enumerable to a strongly typed list.
Loop |
MoreLINQ |
int[] numbers = {1,2,8,7,5,6,4,3}; Action<int> ack = a => DateTime.Today.AddDays(a) .DayOfWeek); foreach (var integer in numbers) ack.Invoke(integer); |
int[] numbers = {1,2,8,7,5,6,4,3}; Action<int> ack = a => Console.WriteLine(DateTime.Today.AddDays(a).DayOfWeek); numbers.ForEach(ack); |
How It Works
The ForEach operator takes an action that it then invokes on each member of the collection on which ForEach is being invoked.
5-11. Using the MinBy/MaxBy Operator
The MinBy operator finds the value from the given source collection that results in the minimum value of a given function. The LINQ standard query operator Min finds the minimum value of the generated values from the given collection by the given formula.
Problem
Find the value from a set of values that minimizes the given function. Here’s an example: Assume you have a list of distances from various cities to a zoo stored in an array—and a tiger has escaped from the zoo. The zoo authorities want to notify people in the city closest to a 10 kilometer radius around the zoo.
Solution
Loop |
MoreLINQ |
int[] distances = {23,41,11,34,45}; int x = distances[0]; for(int i = 0;i<distances.Length;i++) if(distances[i]-10<x-10) x = distances[i]; x.Dump(); |
int[] distances = {23,41,11,34,45}; |
The output of the loop code is 11, while the output of the LINQ code is 11 and then 1. The outputs differ because I added a call to the LINQ Min() operator at the end of the LINQ code to help make the difference between these two operators clear. The extra 1 is the result of the Min()call.
How It Works
The LINQ code generates 11 and then 1 as output. The value 11 minimizes the function f(x) = x – 10, denoted by the lambda expression a => a – 10.
The final call to the LINQ operator Min() generates 1 because that’s the minimum value obtained from the projection of all values in the array distance by the function a – 10. By virtue of this internal projection, the values will be {13, 31, 1, 24, 35}. Because 1 is the minimum value in that set, it’s the value that Min() returns.
Similar to MinBy(), MoreLINQ offers another operator called MaxBy(), which finds the value that maximizes a given function.
Summary
There are several other operators available in MoreLINQ. However, the ones covered here are the most interesting, and for the most part, are not easily reproducible by combining the standard LINQ operators. However, I recommend you explore the library documentation for yourself. The project has a very good unit test suite, which can help you figure out what the code does.