Static Code Analysis - Thinking in LINQ: Harnessing the power of functional programing in .NET applications (2014)

Thinking in LINQ: Harnessing the power of functional programing in .NET applications (2014)

Chapter 7. Static Code Analysis

Programmers always tend to think that code and data are separate. However, for a general-purpose framework such as LINQ, code is also data. By taking advantage of LINQ and .NET Reflection, you can perform a great deal of static code analysis and gain a lot of insight into code. This chapter presents several LINQ scripts that will help you accrue knowledge about your code base.

7-1. Finding Verbose Type Names in the .NET 3.5 Framework

Naming is personal, and naming conventions and the length of names varies between programmers and teams. However, the first step in enforcing naming conventions is knowing what names have been used. For example, if you wanted to find out the longest and shortest names that Microsoft gives to a type in a .NET assembly, you can do that easily by using LINQ.

Problem

Find the most verbose type names in .NET 3.5.

Solution

Enter the following LINQ code in a new LINQPad query. Set the Language drop-down to C# statement(s). Make sure the path in the first line appears on a single line:

Directory.GetFiles(@"C:\Program Files\Reference Assemblies\
Microsoft\Framework\v3.5","*.dll")
.SelectMany (d => Assembly.LoadFrom(d).GetTypes()
.Where (a => a.IsClass && a.IsPublic)
.Select (a =>new { Namespace = a.Namespace,
Name = a.Name,
Length = a.Name.Length}))
.ToLookup (d => d.Length)
.OrderByDescending (d => d.Key)
.Select (d => d.ElementAt(0) )
.Take(20)
.Dump("Top 20 most verbose types in .NET 3.5");

This code produces the output shown in Figure 7-1.

image

Figure 7-1. The top 20 longest type names in the .NET 3.5 framework

Image Note This example and others in this chapter assume that Windows is installed on the root of your C:\ drive; if not, you will need to modify the path appropriately.

How It Works

The first SelectMany() call returns an IEnumerable<Type>. This list of types includes all the types for the entire .NET 3.5 framework. The Where() clause filters out everything except public classes. The next call to Select() creates a custom projection with three attributes (Namespace, Name, and Length) that apply to the name of the type. Later these are projected using ToLookup() as a lookup table. For each key of the lookup table, the code takes the first entry and projects it by using the call to Select (d => d.ElementAt(0)). This example shows 20 such items.

7-2. Finding the Number of Overloads for a Method

Sometimes you can refactor function overloading by using .NET generics. Other times, function overloads are exactly what you need. But the decision depends on the function algorithm. Therefore, before deciding to refactor, knowing how many overloads a method has can be crucial.

Problem

Find the number of overloads that each LINQ standard operator has.

Solution

Enter the following code into a new LINQPad query, selecting C# Statement(s) from the Language drop-down. Note that you will need to change the path in the first line if the framework is installed in a nonstandard location on your computer.

Image Note Make sure that the entire path C:\Program Files\Reference Assemblies\Microsoft\Framework\v3.5 in the following code appears on a single line; otherwise, the example won’t work.

Directory.GetFiles(@"C:\Program Files\Reference Assemblies\
Microsoft\Framework\v3.5","*.dll")
.SelectMany (d => Assembly.LoadFrom(d).GetTypes()
.SelectMany (a => a.GetMethods()))
.Where (d => d.IsPublic
&& d.DeclaringType.Namespace=="System.Linq"
&& !d.Name.StartsWith("get_")
&& !d.Name.StartsWith("set_"))
.ToLookup (d => d.Name)
.Select (d => new { MethodName = d.Key,
Overloads = d.Count ()})
//Overloads = 1 doesn't make sense.
.Where (d => d.Overloads>=2)
.OrderByDescending (d => d.Overloads)
.Take(10)//Show only the top 10 entries
.Dump();

This code produces the output shown in Figure 7-2.

image

Figure 7-2. Partial result of the number of overloads for all methods in System.Linq

How It Works

This example shows nested SelectMany() calls, which find all the methods of all the types available in the System.Linq namespace.

The first SelectMany() call returns a list of the public methods from all the types. Then the Where() clause filters out methods that don’t belong to the System.Linq namespace or that are getter/setter functions for properties, leaving only public methods from the System.Linqnamespace.

Next, this list is used to create a lookup table in which the key of the table is the name of the method. Later it projects the lookup table values. The key of the lookup table is the name of the method, and the total number of entries for each key is the number of overloads. These results are projected using the following call to Select:

.Select (d => new { MethodName = d.Key, Overloads = d.Count ()})

When the value of Overloads is 1, the method doesn’t have any overloads. The Where() clause filters out these values. Finally, the results are sorted by the number of overloads in descending order.

To save space, I have limited the result to just the top ten values. You can see the complete results by commenting out the Take() call. The result is quite interesting. Who would have thought that Sum()—the method to perform summation on a given collection of items—would have 60 overloads?

7-3. Finding the Size of a Namespace

The size of a namespace is defined by the number of types it contains. The greater the number of types a namespace includes, the greater its conceptual load. In other words, it will take longer to discover what a namespace is useful for if it contains a lot of types. During refactoring, such information can be crucial.

Problem

Find the number of types in a namespace.

Solution

Enter the following code in LINQPad. Set the Language drop-down to C# Statement(s).

//Find conceptual load for all namespaces in .NET 3.5
//Conceptual load is the total number of public types in the namespace

Directory.GetFiles(@"C:\Program Files\Reference Assemblies\
Microsoft\Framework\v3.5","*.dll")
.SelectMany (d => Assembly.LoadFrom(d).GetTypes()
.Where (a => a.IsClass && a.IsPublic))
.ToLookup (d => d.Namespace)
.ToDictionary (d => d.Key, d => d.Count ())
.OrderByDescending (d => d.Value )
.Take(10)//Only the first 10 elements are shown
.Dump();

The preceding code produces the output shown in Figure 7-3.

image

Figure 7-3. The number of types available in various namespaces

How It Works

The first SelectMany() call returns all the public types available in the .NET 3.5 framework. Don’t be surprised if it contains some types and namespaces you may never have seen. The truth is that few people have ever seen the entire list of .NET types. Don’t worry about all the names.

The call to ToLookup() creates a lookup table with the keys as the namespaces. Figure 7-4 shows a partial view of that lookup table. Here’s the code I used to get that partial view:

Directory.GetFiles(@"C:\Program Files\Reference Assemblies\
Microsoft\Framework\v3.5","*.dll")
.SelectMany (d => Assembly.LoadFrom(d).GetTypes()
.Where (a => a.IsClass && a.IsPublic))
.ToLookup (d => d.Namespace)
.OrderBy (d => d.Count () )
.Take(4)
.Dump();

image

Figure 7-4. Showing a partial view of the lookup table

As you can see, the value of each key of the lookup table is an object of type IGrouping<string, Type>. So .ToDictionary (d => d.Key, d => d.Count ()) creates a dictionary in which the keys are the same as those of the lookup table. The dictionary values are the count of types available in that list.

Finally, the code sorts the dictionary entries by the number of types they contain, in descending order. This example shows only the first ten such entries. To show the complete results, remove the Take() call.

7-4. Finding the Code-to-Comment (C# Style) Ratio

Commenting code is necessary because even the original authors of programs can find it difficult to understand what a particular portion of code does after some time has passed. While refactoring, it is beneficial to know the code-to-comment ratio for the code to be refactored. The ratio helps identify code that isn’t sufficiently commented.

Problem

Write a LINQ script to find the code-to-comment ratio of a C# code. Assume that there are no C-style comments (/* ... */) in the code.

Solution

Enter the following code in a LINQPad query tab. Set the Language drop-down to C# Statement(s).

string code = @"//This is a test
int x = 10;//set x to 10
//increase x by one
x++;
var rad = Radius(x);//Find radius";

var lookup = code.Split(new string[]{Environment.NewLine,";"}
,StringSplitOptions.RemoveEmptyEntries)
.Select (line => line.Trim())
.Select (line =>
new
{
Line = line,
IsComment = line.StartsWith("//")
})
.ToLookup (line => line.IsComment);

lookup.Select (entry =>
new
{
Component = entry.Key==true?"Comment":"Code",
Percentage = 100*Math.Round((double)entry.Count()/
(double)lookup.SelectMany (l => l).Count(),2)
})
.Dump("Code to Comment Ratio");

This produces the output shown in Figure 7-5.

image

Figure 7-5. Code-to-comment ratio for a sample code snippet

How It Works

As the first step, this script tokenizes the entire code snippet, resulting in multiple lines. Each line that starts with // is assumed to be a comment line; otherwise, the code assumes it’s a code line. The second Select() call, shown here

.Select (line =>
new
{
Line = line,
IsComment = line.StartsWith("//")
})

creates a projection of anonymous type with two attributes: Line and IsComment. A lookup table is created from this projection in which the key is the value of IsComment. Because IsComment is a Boolean field, there will be only two entries in the lookup table. Figure 7-6 shows the lookup table for this example.

image

Figure 7-6. Lookup table showing code vs. comment splits

As you can see in Figure 7-6, there are four comment entries and three code entries, making a total of seven lines of code. So the percentage of code lines is 400/7, or roughly 57 percent.

The code (double)lookup.SelectMany (l => l).Count() finds the total number of lines in the code snippet.

7-5. Finding the Size of Types

The size of a type can be expressed as the number of public methods it exposes. The greater the number of public methods, the greater the size. Generally, best practice is to avoid types with a large number of methods. Therefore, being able to determine the size of public types in a framework is a good starting point for refactoring.

Problem

Write a LINQ script to find the size of all public types in .NET 3.5.

Solution

Enter the following LINQ script into a new LINQPad query:

Directory.GetFiles(@"C:\Program Files\Reference Assemblies\
Microsoft\Framework\v3.5","*.dll")
.SelectMany (d => Assembly.LoadFrom(d).GetTypes()
.Where (a => a.IsClass && a.IsPublic)
.Select ( s =>
new
{
TypeName = s.FullName,
MethodCount = s.GetMethods()
.Count(m => m.IsPublic
&& !m.Name.StartsWith("get_")
&& !m.Name.StartsWith("set_"))}))
.OrderByDescending (d => d.MethodCount)
.Take(10)
.Dump();

The preceding code produces the output shown in Figure 7-7.

image

Figure 7-7. Size of public types in .NET 3.5

How It Works

The first call, to SelectMany(), returns an IEnumerable of all the public classes. The second call, to Select(), projects this result as an IEnumerable of an anonymous type that has two attributes: the type name, and the number of public methods that aren’t property getters or setters. Note that names of property getter methods start with get_ and set_, respectively.

Finally, the code sorts the projected list in descending order based on the number of methods (MethodCount). For brevity, I have used the Take() operator to pick only the first ten elements.

7-6. Generating Documentation Automatically

Sometimes you get to use libraries that don’t come with explicit documentation. LINQ can help you generate documentation on-the-fly.

Problem

Write a LINQ script to generate documentation automatically from the DLL and the corresponding XML file.

Solution

Write the following query in a LINQPad query tab:

Image Note You need to add the MoreLINQ DLL and namespace to LINQPad to run this script.

public string GetSummary(string total, string methodName)
{
string search = methodName;
string summary = total.Substring(
total.IndexOf(search)+search.Length);
summary = summary.Substring(
summary.IndexOf("<summary>")+"<summary>".Length);
summary = summary.Substring(0,summary.IndexOf("</summary"));
return summary;
}
void Main()
{
string moreLINQdll = @"C:\MoreLINQ\MoreLINQ.dll";
string xmlFilePath = @"C:\MoreLINQ\MoreLinq.xml";
StreamReader sr = new StreamReader (xmlFilePath);
string total = sr.ReadToEnd();
sr.Close();
total = total
.Replace("<c>",string.Empty).Replace("</c>",string.Empty)
.Replace("<","<").Replace(">",">");
var allMethods = Assembly
.LoadFrom(moreLINQdll)
.GetTypes()
.Where (a => a.IsPublic )
.ToList()
.Select(t => new KeyValuePair<string,
List<KeyValuePair<string,string>>>
(t.Name,t.GetMethods()
.Where (x => x.IsPublic
&& (!x.Name.StartsWith("get_")
&& !x.Name.StartsWith("set_")
&& !x.Name.StartsWith("GetHashCode")
&& !x.Name.StartsWith("ToString")
&& !x.Name.StartsWith("Equals")
&& !x.Name.StartsWith("CompareTo")
&& !x.Name.StartsWith("GetType")))
.Select (x => new
KeyValuePair<string,string>
(x.Name, GetSummary(
total,t.Name+"."+x.Name)))
.DistinctBy(z => z.Key)
.ToList()))
.First()
.Dump();
}

This generates the output shown in Figure 7-8.

image

Figure 7-8. Partial documentation of MoreLINQ methods

How It Works

Because every class will include the methods of the Object class, you can get rid of those methods. Also, you want to ignore class properties along with their getter and setter methods.

The call to Where() does that:

.Where (x => x.IsPublic &&
(!x.Name.StartsWith("get_")
&& !x.Name.StartsWith("set_")
&& !x.Name.StartsWith("GetHashCode")
&& !x.Name.StartsWith("ToString")
&& !x.Name.StartsWith("Equals")
&& !x.Name.StartsWith("CompareTo")
&& !x.Name.StartsWith("GetType")))

At the heart of this script is the following data structure:

KeyValuePair<string,List<KeyValuePair<string,string>>>

This nested KeyValuePair structure holds all the methods (including overloads) of all the public classes available in the explicitly loaded assembly. The key of the outer KeyValuePair denotes the public class name, while the keys of the inner key/value pair represent the names of the methods. The values of the inner key/value pair represent the summary of the method. The summary is extracted from the XML documentation that was written by the library developers.

There can be many entries of the same type. This script uses the DistinctBy operator from MoreLINQ to remove duplicates by class name.

For this example, I chose to show only the documentation for the first type in the library. To get the documentation for all the types in a library (which is generally what you will want), remove the call to First().

7-7. Finding Inheritance Relationships

One best practice guideline is to avoid classes with deep inheritance relationships. Therefore, it’s useful to be able to explore the inheritance relationships within a given framework.

Problem

Write a LINQ script to find out the inheritance relationship between several classes in the given framework.

Solution

Write the following code in a new LINQPad query tab. As usual, the path must appear without the following:

Directory.GetFiles(@"C:\Program Files\Reference Assemblies\
Microsoft\Framework\v3.5","*.dll")
.SelectMany (d => Assembly.LoadFrom(d).GetTypes().Where
(a => a.IsPublic && a.IsClass)
.Select (a => new { Parent = a.BaseType, Name = a.Name}))
.Where (d => d.Parent!=null)
.Select (a => new { Parent = a.Parent.Name , Name = a.Name})
.ToLookup (a => a.Parent )
.Take(10)
.Dump();

The preceding code generates the output shown in Figure 7-9.

image

Figure 7-9. Showing inheritance relationships between several types in the .NET framework

How It Works

BaseType returns the type from which the current type inherits. Therefore, the name property of BaseType returns the name of the parent class. The code creates the lookup table by using the parent class as the key and its children as the values.

7-8. Locating Complex Methods

Creating methods that require lots of parameters is generally a bad idea. The rule of thumb is that methods with seven parameters (plus or minus two) are generally too complex to use and understand easily. Such methods scream for refactoring.

Problem

Write a LINQ script to discover methods that require a large number of input parameters.

Solution

Write the following code in a new LINQPad query:

//Locate highly complex methods with lots of arguments
Directory.GetFiles(@"C:\Program Files\Reference Assemblies\
Microsoft\Framework\v3.5","*.dll")
.SelectMany (d => Assembly.LoadFrom(d).GetTypes()
.SelectMany (a => a.GetMethods()))
.Where (d => !d.Name.StartsWith("get_")
&& !d.Name.StartsWith("set_"))
.Select (d => new { MethodName = d.Name,
NameSpace = d.DeclaringType.Namespace,
Class = d.DeclaringType.FullName,
NumberOfParameters = d.GetParameters().Count()} )
.Where (d => d.NameSpace=="System.Linq")
.OrderByDescending (d => d.NumberOfParameters )
.Take(20)
.Dump();

The preceding code generates the output shown in Figure 7-10.

image

Figure 7-10. The top 20 methods sorted by the number of arguments they take

How It Works

The explanation for this example is similar to its predecessors. The code first makes a projection, and then sorts it in descending order based on the number of parameters, providing a list of the most complex methods in each namespace. To save space, I have limited the output to only 20 methods. To see the full list, remove the Take(20) call.

Summary

In this chapter, you’ve seen several examples of how you can use LINQ to Reflection to quickly find details and gain insights into a code base. These examples should help illustrate that you can use LINQ to query essentially any data. Code is usually considered separate from data, but by using LINQ, you can treat code itself as data. Besides showing how to use LINQ to Reflection, the examples in this chapter exemplify several idiomatic LINQ usages—for example, projecting followed by creation of a lookup. The next chapter follows up on the idea of LINQ as a general-purpose tool to perform scripting-like exploratory data analysis.