Thinking in LINQ: Harnessing the power of functional programing in .NET applications (2014)
Chapter 4. Refactoring with LINQ
When I help my colleagues refactor their loops by using LINQ, they always ask me, “How do you know what LINQ operator to use?” I am sure my colleagues are not alone. This chapter is dedicated to providing detailed examples to help answer that question.
After reading this chapter, you should be able to look at code snippets and know which ones can be replaced with a LINQ query. Think of LINQ operators as similar to Lego blocks. After you know how to use them, you can see and replace a repetitive pattern in your code by gluing together LINQ operators, leading to cleaner, more-intuitive, and thus more maintainable code. Apart from elegance, there is another good reason to transform good old loops into LINQ queries: by doing that, you can make queries run in parallel, using all the cores of the development system by using Parallel LINQ (PLINQ). Parallel queries often run much faster, but remember that LINQ queries aren’t inherently faster unless you use parallelism.
4-1. Replacing Loops by Using LINQ Operators
Looping is a basic construct in programming. When someone learns a new programming language, they have to learn the syntax of how to loop through a collection; otherwise, they can’t do anything useful. C# has four looping constructs: the for loop, the do-while loop, the while loop, and the foreach loop. C# programmers are familiar with these looping constructs. However, except for the simplest loops, it is often very difficult—if not impossible—to discern a loop’s intention simply by looking at it. That’s true even of single loops, let alone nested ones. If you have been programming for a while, you will know what I mean.
Several looping constructs appear more often than others in code. Replacing these repeating looping constructs with standard LINQ operators usually results in shorter, more-intuitive code. This section shows how you can replace traditional looping constructs (which sometimes can become ugly quickly) with simpler, smaller, and intuitive LINQ queries. The biggest advantage of using LINQ over looping constructs is that you get to move the code one step closer to the concept. For example, consider the sentence “Check whether any element in the collection matches a given condition.” A looping construct doesn’t visually reflect the intent of that sentence. But LINQ operators and LINQ queries do.
The recipes in this chapter show looping constructs and the equivalent code using LINQ operators side by side. Each section begins with a LINQ query operator that you can use to simplify the code, followed by the problem statement and a side-by-side comparison of the loop-based and LINQ-based approaches.
A General Strategy to Transform a Loop to a LINQ Query
A loop has three parts: initialization, condition, and loop variable-state-change handler. If you can rewrite your logic using a foreach loop at each stage, your transformation will become simpler. To do that, follow this three-step process:
1. Identify the range of the loop.
2. Identify the conditional block.
3. Find the appropriate LINQ operator to replace the conditional block.
You‘ll follow this procedure in the following example.
Suppose you have a loop like this:
for(int k = 0 ; k < numbers.Length ; k++)
if( numbers [ k ] > threshold )
goodNumbers.Add( numbers [ k ] );
In this case, the code loops through an array called numbers, and if the element at a given index is greater than a predefined threshold, it adds that element to goodNumbers.
You could easily rewrite this by using a foreach loop:
foreach(var n in numbers)
if( n > threshold)
goodNumbers.Add(n)
You translate the for loop to foreach because doing so gets rid of all the temporary looping variables. The next step is to identify the LINQ operator that can help you transform the conditional block. In this case, the code simply applies a filter, so the filter operator Where fits the bill. The range of the loop is the range of numbers.
Now reorder these statements. This is closer to the equivalent LINQ statement:
goodNumbers.Add(n)
foreach(var n in numbers)
if(n>threshold)
Then do the following:
· Replace the first .Add(n) with = .
· Replace foreach(var n in numbers) with Enumerable.Range(0,numbers.Length).
· Replace if(n > threshold) with Where ( n => n > threshold).ToList(); .
After making those substitutions, you will have this LINQ script:
goodNumbers = Enumerable.Range(0, numbers.Length).Where(n=>n > threshold).ToList();
This strategy is applicable for any level of depth of the looping construct that you want to refactor using LINQ.
4-2. The Any Operator
The Any operator returns true if there is at least a single element in a collection that matches a given condition.
Problem
Find out whether any number in a collection is greater than 150.
Solution
Use the Any operator to replace a for loop.
How It Works
The for loop on the left uses a conditional statement in each loop iteration to test the value of each item in the nums collection. To discover that, you have to read the code carefully. In contrast, the refactored LINQ code on the right makes it immediately obvious that the code is checking for any value within the collection greater than 150.
Loop (Imperative Paradigm) |
LINQ (Functional Paradigm) |
int[] nums = {14,21,24,51,131,1,11,54}; |
int[] nums = {14,21,24,51,131,1,11,54}; |
4-3. The All Operator
The All operator is useful when you want to check whether all elements in a collection match a given condition.
Problem
Determine whether all elements in a collection are less than 150.
Solution
Use the All operator to replace the for loop.
How It Works
The for loop on the left uses a conditional statement in each loop iteration to test the value of each item in the nums collection. To discover that, you have to read the code carefully. In contrast, the refactored LINQ code on the right makes it immediately obvious that the code is checking for all values of the collection less than 150.
Loop |
LINQ |
int[] nums = {14,21,24,51,131,1,11,54}; |
int[] nums = {14,21,24,51,131,1,11,54}; |
4-4. The Take Operator
The Take operator selects the first specified number of elements from the given collection.
Problem
Extract the first four elements.
Solution
Use the Take operator to replace a for loop that iterates over the first four elements.
How It Works
The for loop on the left loops through the first four elements and puts these numbers in a different array. However, it is evident looking at the LINQ syntax that we want to extract the first four elements.
Loop |
LINQ |
int[] nums = {14,21,24,51,131,1,11,54}; |
int[] nums = {14,21,24,51,131,1,11,54}; int[] first4 = new int[4]; first4 = nums.Take(4).ToArray(); |
4-5. The Skip Operator
The Skip operator picks all except the first k elements elements from a collection.
Problem
Pick all elements of a given integer array except the first four elements.
Solution
Use the Skip operator to replace a for loop.
How It Works
The for loop uses two loop counters to keep track of elements being iterated. However, the LINQ implementation reads like plain English. This also eliminates the need to maintain looping counters.
Loop |
LINQ |
int[] nums = {14,21,24,51,131,1,11,54}; |
int[] nums = {14,21,24,51,131,1,11,54}; int[] skip4 = new int[nums.Length - 4]; skip4 = nums.Skip(4).ToArray(); skip4.Dump(); |
4-6. The TakeWhile Operator
The TakeWhile operator enables you to take elements from a collection as long as a given condition is true.
Problem
Pick elements from the start of an unsorted integer array as long as the given condition (the number is less than 50, in this case) is true.
Solution
Use the TakeWhile operator to replace the for loop and branching statement.
How It Works
Imagine TakeWhile as shorthand for the looping syntax, where the condition of the nested if statement is expressed in the lambda expression n => n < 50. The final call to ToList() returns a list of integers that are less than 50, taken from the beginning of the array.
Loop |
LINQ |
int[] nums = {14,21,24,51,131,1,11,54}; |
int[] nums = {14,21,24,51,131,1,11,54}; List<int> until50 = new List<int>(); until50 = nums.TakeWhile (n => n < 50).ToList(); |
4-7. The SkipWhile Operator
SkipWhile skips elements as long as a given condition is true. As soon as the condition becomes false, the operator starts picking values.
Problem
Pick all elements of a given integer array that are not evenly divisible by 7.
Solution
Use the SkipWhile operator to replace a for loop and branching.
How It Works
The condition inside the loop becomes the lambda expression.
Loop |
LINQ |
int[] nums = {14,21,24,51,131,1,11,54}; |
int[] nums = {14,21,24,51,131,1,11,54}; List<int> skipWhileDivisibleBy7 = nums.SkipWhile (n => n % 7 == 0).ToList(); |
In the next chapter, you will learn about the TakeUntil() and SkipUntil() operators available in the MoreLINQ project. They are mirrors of the TakeWhile() and SkipWhile() operators.
4-8. The Where Operator
The Where operator finds elements that match a given condition. Think of it as the looping and branching construct all in one. This is one of the most used operators.
Problem
Pick all elements of a given integer array that are greater than 50.
Solution
Use the Where operator to replace a for loop and branching inside the loop.
How It Works
The for loop uses two loop counters to keep track of elements being iterated. However, the LINQ implementation reads like plain English. This eliminates the need to maintain looping counters, and the intent of the code becomes immediately evident.
Loop |
LINQ |
int[] nums = {14,21,24,51,131,1,11,54}; |
int[] nums = {14,21,24,51,131,1,11,54}; int[] above50 = nums.Where (n => n > 50).ToArray(); |
4-9. The Zip Operator
The Zip operator applies a specified function to the corresponding elements of two sequences to generate a result sequence.
Problem
Print the full names of all family members, including the salutation, first name, and last name.
Solution
Use the Zip operator to replace a for loop.
How It Works
The for loop uses a loop counter to keep track of the current index and prints the values for each array (salutation and name, in this case) as long as there are elements. The LINQ statement does that same thing. The lambda function ((salutation, name ) => salutation + " " + name + " Smith") does the work of concatenating parts of the name for each individual person.
Loop |
LINQ |
string[] salutations = string[] names = {"Patrick","Nancy","Jon","Jane"}; List<string> allNames = new for(int i=0; i< salutations.Length; i++) allNames.Add(salutations[i] + " " + |
string[] salutations = string[] names = {"Patrick","Nancy","Jon","Jane"}; salutations.Zip(names, (salutation, name ) => salutation + " " + name + " Smith") .Dump(); |
4-10. OrderBy and OrderByDescending Operators
Sorting shouldn’t hurt. Use OrderBy and OrderByDescending to sort in order and in descending order, respectively.
Problem
Sort an array of strings based on their length.
Solution
Use the OrderBy operator to replace Comparer logic.
How It Works
The default sorting for string values is alphabetical. So to sort a bunch of strings by their lengths, a comparer must be implemented. But with LINQ, there is no need to create a custom comparer. The key to use for sorting is passed in the form of a lambda expression: in this case, item => item.Length.
Loop |
LINQ |
public class StringLengthComparer : void Main() |
string[] codes = {"abc","bc","a","d","abcd"}; List<string> codesAsList = codes .OrderBy ( item => item.Length).ToList(); |
To sort the string in reverse order of their lengths, just change the logic from x.LengthCompareTo(y.Length) to y.Length.CompareTo(x.Length). In the LINQ version, using OrderByDescending() will do the trick.
4-11. The Distinct Operator
The Distinct operator finds unique elements from a given collection.
Problem
Find unique names from the list of a given names.
Solution
Use the Distinct operator.
How It Works
The Distinct operator has two overloaded versions. The first one uses the default comparer for the type of the collection. The second one expects a custom comparer. In the next chapter, you will learn about the DistinctBy() operator, which lets you pass a lambda expression instead of a comparer.
Loop |
LINQ |
string[] names = {"Sam","David","Sam","Eric","Daniel","Sam"}; |
string[] names = {"Sam","David","Sam","Eric","Daniel","Sam"}; List<string> distinctNames = names.Distinct().ToList(); distinctNames.Dump("Unique names"); |
4-12. The Union Operator
The Union operator finds the union of two given collections.
Problem
Find the union of a couple of string arrays.
Solution
Use the Union operator.
How It Works
The Union operator has two overloaded versions. The first one uses the default comparer for the type of the collection. The second one expects a custom comparer. For the current example, the default comparer is fine. However, if you need some other custom comparer logic to determine uniqueness, you have to implement a custom comparer.
static void Main(string[] args)
{
string[] names1 = { "Sam", "David", "Sam", "Eric", "Daniel", "Sam" };
string[] names2 = { "David", "Eric", "Samuel" };
string[] names = new string[names1.Length + names2.Length];
for (int i = 0; i < names1.Length; i++)
names[i] = names1[i];
for (int i = 0, j = names1.Length; i < names2.Length; i++, j++)
names[j] = names2[i];
List<string> unionNames = new List<string>();
Array.Sort(names);
for (int i = 0; i < names.Length - 1 ; i++)
{
if (names[i] != names[i + 1])
{
if (unionNames.Count > 0)
{
if (unionNames[unionNames.Count - 1] != names[i])
unionNames.Add(names[i]);
}
else
unionNames.Add(names[i]);
}
else
{
if (unionNames[unionNames.Count - 1] != names[i])
unionNames.Add(names[i]);
}
}
if (names[names.Length - 1] != names[names.Length - 2])
unionNames.Add(names[names.Length - 1]);
}
The preceding implementation is the most straightforward. You can use a Dictionary to perform the union operation, storing the elements as the key of the dictionary and later producing a list of all keys. However, the argument is, you can save all that and let LINQ handle it by using the LINQ operator Union():
unionNames = names1.Union(names2).ToList();
4-13. The Intersect Operator
The Intersect operator finds the intersection of two collections.
Problem
Find the intersection of a couple of string arrays.
Solution
Use the Intersect operator.
How It Works
The Intersect operator has two overloaded versions. The first one uses the default comparer for the type of the collection. The second one expects a custom comparer. For the current example, the default comparer is fine. However, if you need some other custom comparer logic to determine uniqueness, you have to implement a custom comparer.
Loop |
LINQ |
string[] names1 = {"Sam","David","Sam","Eric","Daniel","Sam"}; |
string[] names1 = {"Sam","David","Sam","Eric","Daniel","Sam"}; commonNames = names1.Intersect(names2).ToList(); commonNames.Dump(); |
4-14. The Except Operator
The Except operator finds the elements that are exclusively available in one of the given collections.
Problem
Finding names that are exclusively available in one collection and not in another.
Solution
Use the Except operator.
How It Works
The Except operator has two overloaded versions. The first one uses the default comparer for the type of the collection. The second one expects a custom comparer. For the current example, the default comparer is fine. However, if you need some other custom comparer logic to determine uniqueness, you have to implement a custom equality comparer.
Loop |
LINQ |
string[] names1 = {"Sam","David","Eric","Daniel"}; |
string[] names1 = {"Sam","David","Eric","Daniel"}; exclusiveNames = names1.Except(names2).ToList(); |
4-15. The Concat Operator
The Concat operator concatenates two sequences together, back to back.
Problem
Generate a list of all names (including duplicates, if any) by concatenating two lists of names.
Solution
Use the Concat operator.
How It Works
Concat is useful because it saves you from having to keep track of the size of the array. Using Concat(), you will make sure to avoid one-off errors.
Loop |
Linq |
string[] names1 = {"Sam","David","Erik","Daniel"}; |
string[] names1 = {"Sam","David","Erik","Daniel"}; |
4-16. The SequenceEqual Operator
The SequenceEqual operator checks whether two sequences have the same element at each index, starting from the 0th index and maintaining the order.
Problem
Check whether two integer arrays are equal.
Solution
Use the SequenceEqual operator.
How It Works
The SequenceEqual operator has two overloaded versions. The first one uses the default comparer for the type of the collection. The second one expects a custom comparer. For the current example, the default comparer is fine. However, if you need some other custom comparer logic to determine uniqueness, you have to implement a custom equality comparer.
Loop |
Linq |
public bool IsSequenceEqual(int[] first,int[] second) |
int[] codes = {343,2132,12,32143,234}; |
A different situation arises when we need to check for availability of all elements from a source collection in another collection, disregarding the order of occurrence. SequenceEqual() works only when the elements in both the participating collections appear in the same order. A solution would be to apply OrderBy() calls to both of the participating sequences and then do a SequenceEqual() call. However, the following approach using the All() operator solves that problem without sorting.
Loop |
LINQ |
int[] codes = {343,2132,12,32143,234}; } |
int[] codes = {343,2132,12,32143,234}; |
4-17. The Of Type Operator
The OfType operator finds elements of only the given type from a collection that has elements of several types.
Problem
Extract only the string values from an object array that has other types of elements apart from strings.
Solution
Use the OfType operator instead of looping and branching.
How It Works
OfType can be used for sanity checking. For example, let’s say you have an object array that is meant to be filled with only strings. Before doing anything with the content of the array, it is good to verify that all the elements of the array are actually strings. To do so, it will be enough to check whether the length of OfType<string>() is the same as the length of the array.
Loop |
LINQ |
object[] things = {"Sam",1,DateTime.Today,"Eric"}; foreach (var v in things) if( v.GetType() == typeof(string)) v.Dump(); |
object[] things = {"Sam",1,DateTime.Today,"Eric"}; things.OfType<string>().Dump(); |
4-18. The Cast Operator
Safe casting isn’t hard and shouldn’t hurt. The Cast<T>() operator can cast any loosely typed collection to a strongly typed collection of the given type T.
Problem
Create a strongly typed collection from a loosely typed one.
Solution
Use the Cast operator.
How It Works
In the following code snippet, the LINQ code creates IEnumerable<string> from an object array.
Loop |
LINQ |
object[] things = {"Sam","Dave","Greg","Travis","Dan",2}; List<string> allStrings = new List<string>(); foreach (var v in things) { string z = v as string; if(z!=null) allStrings.Add(z); } |
object[] things = {"Sam","Dave","Greg","Travis","Dan",2}; things.Select (t => t as string) .Where (t => t != null ) .Cast<string>() .Dump(); |
4-19. The Aggregate Operator
The Aggregate operator joins the elements of a given collection by using a provided lambda function.
Problem
Create a comma-separated list using the names given in a string array.
Solution
Use the Aggregate operator.
How It Works
This works the same way as the comma-quibbling problem code in Chapter 3. The lambda function
(f,s) => f + " " + s) is used to generate the comma-separated list.
Loop |
LINQ |
string[] names = {"Greg","Travis","Dan"}; for (int k = 0; k< names.Length - 1; k++) Console.Write(names[k]+","); //Printing the last name (one off logic) Console.Write(names[names.Length - 1]); |
string[] names = {"Greg","Travis","Dan"}; names.Aggregate((f,s)=>f+","+s).Dump(); |
So far, you have seen how to use several LINQ operators to replace traditional loop-based logic, leading to cleaner, intuitive code. In the next section, you will see how operators from a community LINQ project can be used to refactor loops.
4-20. Replacing Nested Loops
Be warned! Replacing nested loops with LINQ standard query operators might look flat, but the complexity doesn’t change. However, the point is that by using LINQ operators, the code does look more intuitive.
The most common form of nested loops is a set of two loops. The strategy to replace loops with LINQ is to use projection with SelectMany().
The SelectMany Operator
If we want to print all the characters of all the words for each word in a given array, we can use nested loops or we can replace nested loops with SelectMany(), as shown next. Although this example is trivial, it is deliberately chosen so that you can relate it to one of your own one-to-many situations. You can use this operator to flatten your dictionary-like collections.
string[] words = {"dog", "elephant", "fox", "bear"};
List<char> allChars = new List<char>();
foreach(string word in words)
{
allChars.AddRange(word.ToCharArray());
}
words.SelectMany (w => w.ToCharArray()).Dump();
Removing Nested Loops by Using SelectMany
Let’s say we have the following nested loop. This simple nested loop just adds two integers together:
List<int> fromLoop = new List<int>();
for(int i = 0;i<10;i++)
for(int j = 0 ; j < 10 ; j ++ )
fromLoop.Add( i + j);
Here is the same loop implemented by using LINQ operators:
int[] initialValues = Enumerable.Range(0,10).ToArray();
List<int> fromLINQ = Enumerable.Range(0,10)
.SelectMany (e => initialValues.Select (v => v + e )).ToList();
//Finally check whether you have the same values or not.
fromLoop.SequenceEqual(fromLINQ).Dump();
This returns true as both the sequences are equal.
Replacing If-Else Blocks Inside a Loop
The philosophy behind replacing a loop-if-else-end-loop block with a bunch of LINQ statements is that flat is better than nesting. Let’s say you have a loop like this:
for(int i = 0;i<4; i++)
{
if (i%2==0)
{
someThings.Insert(0,i);
}
else if((2*i+1)%2==0)
{
someThings.Add(i);
}
else //everything else falls here
{
someThings.Add(i);
someThings.Add(i+1);
}
}
This can be replaced with the following three LINQ statements:
List<int> someThings = new List<int>();
Enumerable.Range(0,4).Where(i => i%2==0).ForEach( a => someThings.Insert(0,a));
Enumerable.Range(0,4).Where(i => (2*i+1)%2==0).ForEach( a => someThings.Add(a));
Enumerable.Range(0,4).Where(i => (2*i+1)%2!=0 && i%2!=0).ForEach( a =>
someThings.AddRange(new int[]{a,a + 1}));
The strategy for this approach is simple and can be declared by the following three steps:
· Range (using the Range operator)
· Filter (using the Where operator )
· Perform the action (using the ForEach operator)
The idea is to segregate loops as different Project Filter Action blocks and give each block a single responsibility. This way, it will be simpler to refactor when needed.
4-21. Running Code in Parallel Using AsParallel() and AsOrdered() Operators
Making use of all your computing power is simple with LINQ. By using the AsParallel() operator, you can “automagically” make sure that your code runs faster. But be warned, plugging in AsParallel() doesn’t always guarantee faster execution time. Sometimes it might take longer to distribute the task to multiple processors, and it can take a longer time running the code in parallel than in sequential mode. AsParallel() splits the input data to multiple groups so the order of the elements in the input doesn’t remain intact. If you care about the order of the elements in the result, plug in AsOrdered() right after the AsParallel() call.
Problem
Create a program that finds all the prime numbers from 1 to 10,000—fast.
Solution
The solution is as follows:
using System;
using System.Collections.Generic;
using System.Diagnostics;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
namespace RefactoringWithAsParallel
{
class Program
{
static void Main(string[] args)
{
Stopwatch w = new Stopwatch();
w.Start();
List<int> Qs = new List<int>();
List<int> Qsp = new List<int>();
for (int i = 0; i < 2; i++)
Qs = Enumerable.Range(1, 10000).Where(d => Enumerable.Range(2, d / 2)
.All(e => d % e != 0)).ToList();
w.Stop();
double timeWithoutParallelization = w.Elapsed.TotalMilliseconds;
Stopwatch w2 = new Stopwatch();
w2.Start();
for (int i = 0; i < 2; i++)
Qsp = Enumerable.Range(1, 10000).AsParallel().Where(d =>
Enumerable.Range(2, d / 2)
.All(e => d % e != 0)).ToList();
w2.Stop();
double timeWithParallelization = w2.Elapsed.TotalMilliseconds;
double percentageGainInPerformance = (timeWithoutParallelization -
timeWithParallelization) /
timeWithoutParallelization;
bool isSame = Qs.SequenceEqual(Qsp);
}
}
}
How It Works
Although the algorithm used to check whether the number is prime or not is naïve, that’s not the point. The point is that adding AsParallel() makes the code faster. I recommend that you run the program multiple times and check the value of percentageGainInPerformance. For me, that value was roughly between 29% and 45%. However, you will see that isSame is false, because the order of the elements in the result obtained by applying AsParallel() is not the same as that of the input. If you want to guarantee the order, add AsOrdered() right afterAsParallel(), as shown next.
Qsp = Enumerable.Range(1, 10000).AsParallel().AsOrdered().Where(d => Enumerable.Range(2, d / 2)
.All(e => d % e != 0)).ToList();
Note that adding AsOrdered() decreases the performance gain a little. And if you think for a while, that’s intuitive. Because after the result is obtained, the program has to order it back as per the order of the elements in the input collection.
Summary
This chapter provided some strategies for refactoring loops with LINQ queries, resulting in cleaner, more-intuitive code. You can make a query run in parallel just by using the AsParallel() operator after the collection, and you can order the result by calling AsOrdered() if the order is important to you. The next chapter takes this concept further to explore using LINQ to help improve readability and maintainability—and even improve code performance—by implementing embedded domain-specific languages (DSLs) for several practical purposes.