Advanced Modeling and Querying Techniques - Code-First Development with Entity Framework (2015)

Code-First Development with Entity Framework (2015)

Chapter 5. Advanced Modeling and Querying Techniques

In this chapter, you will learn how to use advanced modeling techniques to create the database structure. We will learn how to use complex types to create data structures that are reusable in multiple entity types. We will learn how to use enumerations to create a range of distinct values for a column or property. We will understand how to split an entity across multiple tables. We will learn how to support existing databases, while using names for classes and properties that do not match tables and columns in our database. We will also look at additional querying techniques, including aggregation, paging, grouping, and projections.

In this chapter, we will cover how to:

· Create complex types, reusable in many entities

· Define an enumeration and use it in a query

· Create an entity that is stored in multiple tables

· Use explicit column and table names in entity to table mappings

· Create queries that use projections with anonymous and explicit types

· Summarize data, using aggregate functions

· Create windowed queries

· Use explicit joins in queries

· Use set operations

Advanced modeling techniques

So far, we have covered many straightforward scenarios that one can easily model with Entity Framework to create database structures. All of them are mapped to one table with scalar values to a class with a matching set of properties. There are use cases when this approach does not work quite as well, and we will walk through a functionality in Entity Framework that supports more complex modeling techniques.

Complex types

Complex types are classes that map to a subset of columns in a table in the database. They are similar to entity classes, except that they do not contain key fields and do not directly map to an entire table. Complex types are helpful when we have the same set of properties that are common to multiple entities. Another use case is when we want to group some properties, in order to provide a clear semantic meaning to such a group of properties. By introducing a complex type into our modeling workflow, we provide more consistency for database structures across multiple tables. This occurs because we define the common attributes for such tables in a single place, which will be our complex type. The prototypical example is address fields. Given the examples we have seen in prior chapters, let's add addresses to both person and company classes. Here is how the address class, referred to as complex type, looks:

public class Address

{

public string Street { get; set; }

public string City { get; set; }

public string State { get; set; }

public string Zip { get; set; }

}

We are looking at a simple class here with a set of properties that define an address. Here is the same code in VB.NET:

Public Class Address

Public Property Street() As String

Public Property City() As String

Public Property State() As String

Public Property Zip() As String

End Class

The second step is to make this class part of a larger picture by introducing the Address type property in both the Company and Person classes. Here is, for example, how the Company class looks after this change:

public class Company

{

public Company()

{

Persons = new HashSet<Person>();

Address = new Address();

}

public int CompanyId { get; set; }

public string CompanyName { get; set; }

public Address Address { get; set; }

public ICollection<Person> Persons { get; set; }

}

There is an important step we need to take now. We need to initialize an instance of the Address class in the Company class's constructor. Without this simple step, we can easily encounter a null reference exception any time we create a new instance of Company and try to set a street on the address. When data is retrieved from the database via a query, Entity Framework automatically initializes the instance of the Address class during the materialization process. An additional initialization that we just added manually exists to cover the creation of new entity scenarios. Here is how the code looks in VB.NET:

Public Class Company

Public Sub New()

Persons = New HashSet(Of Person)

Address = new Address()

End Sub

Property CompanyId() As Integer

Property CompanyName() As String

Property Address() As Address

Overridable Property Persons() As ICollection(Of Person)

End Class

The next step is to provide the configuration for our complex type. We can do so in a way that is virtually identical to entity classes, by providing a configuration class for the complex type. The only difference is that we use a different base class,ComplexTypeConfiguration, not EntityTypeConfiguration. The code inside this configuration class is identical to the code in entity configuration classes, using the exact same property configuration methods.

For example, consider this code snippet:

public class AddressMap : ComplexTypeConfiguration<Address>

{

public AddressMap()

{

Property(p => p.Street)

.HasMaxLength(40)

.IsRequired();

}

}

The preceding example only shows one property being configured, but in the code provided with this book, all properties are configured in the same fashion as the Street property. Finally, we must remember to add an instance of AddressMap to the collection of configurations of the context. For example, consider this code snippet:

protected override void OnModelCreating(DbModelBuilder modelBuilder)

{

modelBuilder.Configurations.Add(new CompanyMap());

modelBuilder.Configurations.Add(new AddressMap());

}

Here is how the same code looks in VB.NET:

Public Class AddressMap

Inherits ComplexTypeConfiguration(Of Address)

Public Sub New()

Me.Property(Function(p)p.Street).HasMaxLength(40).IsRequired()

End Sub

End Class

Protected Overrides Sub OnModelCreating(ByVal modelBuilder As DbModelBuilder)

modelBuilder.Configurations.Add(New CompanyMap)

modelBuilder.Configurations.Add(New AddressMap)

End Sub

If we were to run this code and look at the created database structure, we would see that the Address columns names in the Company table are prefixed with the complex type's name. So, a column to store the name of the street is called Address_Street. This is typically not something that we want, which leads us to the next discussion about supporting explicit column and table names.

Using an explicit table and column mappings

There are many use cases that require us to explicitly specify a column or a table name. We just saw one, but there are more. For example, we can add Entity Framework on top of an existing database that uses a naming convention developers of data access layer do not like. Explicit names solve this problem.

In order to specify a column name that is different from a matching property name, we can use the HasColumnName method available for primitive property configurations, as shown in the following code snippet:

public class AddressMap : ComplexTypeConfiguration<Address>

{

public AddressMap()

{

Property(p => p.Street)

.HasMaxLength(40)

.IsRequired()

.HasColumnName("Street");

One can configure properties on entity types in the exact same way we just configured our complex type. We can see more examples in the code that accompanies this book. Here is how this looks in VB.NET:

Public Class AddressMap

Inherits ComplexTypeConfiguration(Of Address)

Public Sub New()

Me.Property(Function(p) p.Street) _

.HasMaxLength(40) _

.IsRequired() _

.HasColumnName("Street")

In order to specify the table name for an entity, we have to use the ToTable method of the EntityTypeConfiguration class. For example, here is how we can specify the table name for a person type entity:

public class PersonTypeMap : EntityTypeConfiguration<PersonType>

{

public PersonTypeMap()

{

ToTable("TypeOfPerson");

This example is a bit contrived, as we could have just as easily changed the class name. Here is how we can specify a name for a table in VB.NET:

Public Class PersonTypeMap

Inherits EntityTypeConfiguration(Of PersonType)

Public Sub New()

ToTable("TypeOfPerson")

Adding supporting columns

In addition to changing column names, sometimes we want to add a property to an entity that we do not want to store in the database. In other words, we want to add some business logic into an entity to help us work with it outside of Entity Framework. I am by no means advocating embedding business logic inside our entity classes, but merely providing an alternative to computed columns in SQL Server. For example, let's add the FullName property to our Person class and return a concatenation of LastName and FirstName:

public string FullName

{

get

{

return string.Format("{0} {1}", FirstName, LastName);

}

set

{

var names = value.Split(new string[] { " " }, StringSplitOptions.RemoveEmptyEntries);

FirstName = names[0];

LastName = names[1];

}

}

Here is how the same property looks in VB.NET:

Public Property FullName() As String

Get

Return String.Format("{0} {1}", FirstName, LastName)

End Get

Set(value As String)

Dim names = value.Split(New String() {" "}, StringSplitOptions.RemoveEmptyEntries)

FirstName = names(0)

LastName = names(1)

End Set

End Property

If we run this code, we will see that a new column called FullName was added to the People table. This is not what we want; there is no reason for us to persist the full name. To fix the problem, we just need to use the Ignore method of the EntityTypeConfigurationclass. This is shown in the following example:

public class PersonMap : EntityTypeConfiguration<Person>

{

public PersonMap()

{

Ignore(p => p.FullName);

This approach of ignoring certain properties in a persistence layer could prove useful when developers are dealing with legacy databases. One thing we must remember is that we cannot query based on ignored properties, since they do not exist in the backend. Here is how this code looks in VB.NET:

Public Class PersonMap

Inherits EntityTypeConfiguration(Of Person)

Public Sub New()

Ignore(Function(p) p.FullName)

Enumerations

We are all familiar with the use of enumerations. They make our code much more readable, since we can use descriptive names instead of magic numbers. For example, let's say that each type Person can be in one of three states: Active, Inactive, or Unknown. This is a prototypical scenario that calls for the use of enumerations. Entity Framework now has full support for enumerations. First of all, we need to define enumeration itself. For example, consider this code snippet:

public enum PersonState

{

Active,

Inactive,

Unknown

}

This can also be shown in VB.NET, like the following code:

Public Enum PersonState

Active

Inactive

Unknown

End Enum

The next step is to simply add a property of the type PersonState to the Person class. For instance, consider this code fragment:

public class Person

{

public PersonState PersonState { get; set; }

Here is how this new property is defined in VB.NET:

Public Class Person

Property PersonState() As PersonState

Technically, there is nothing else we need to do. We can just run this code to create our database structure. Once this is done, queries like the following one would work:

var people = context.People

.Where(p=>p.PersonState == PersonState.Inactive);

Here is the same query in VB.NET:

Dim people = context.People _

.Where(Function(p) p.PersonState = PersonState.Inactive)

Writing readable, easy to understand code is very important, and native support for enumerations in Entity Framework is very useful for such situations.

Using multiple tables for a single entity

The ability to split an entity across multiple tables plays an important role in scenarios where we have to store Binary Large OBjects (BLOBs) in the database, which is a commonly occurring situation. Some database administrators like to see BLOBs in a separate table, especially if they are not frequently accessed, in order to optimize a database's physical storage. As we recall, we represent BLOBs (the varbinary(MAX) column type in the SQL Server case) as byte arrays in .NET. Entity Framework aims to abstract a developer from storage details, so ideally we do not want our entities to reflect storage specific details. This is where the entity splitting feature comes in, which allows us to store one entity in multiple tables, with a subset of properties persisted in each table. Let's say that we want to store a person's photo, which can be a large object, in a separate table. We can use the Map method of the EntityTypeConfiguration class in order to configure such properties. We will demonstrate how to configure multiple properties, because the syntax is slightly different for two or more properties versus just a single property.

First of all, here is how our Person class looks with new properties:

public class Person

{

public byte[] Photo { get; set; }

public byte[] FamilyPicture { get; set; }

The same code in VB.NET looks as follows:

Public Class Person

Property Photo() As Byte()

Property FamilyPicture() As Byte()

In order to split an entity, Person in our case, we need to specify the table for each subset of columns, using explicit table names, similarly to what we did previously in this chapter. We will use anonymous types in order to provide property groupings. This code belongs in the configuration class for the Person class; EntityTypeConfiguration of type Person in our case. For example, consider this code snippet:

public class PersonMap : EntityTypeConfiguration<Person>

{

public PersonMap()

{

Map(p =>

{

p.Properties(ph =>

new

{

ph.Photo,

ph.FamilyPicture

});

p.ToTable("PersonBlob");

});

Map(p =>

{

p.Properties(ph =>

new

{

ph.Address,

ph.BirthDate,

ph.FirstName,

ph.HeightInFeet,

ph.IsActive,

ph.LastName,

ph.MiddleName,

ph.PersonState,

ph.PersonTypeId

});

p.ToTable("Person");

});

}

}

We need to omit the actual primary key property, person's identifier, from both mappings because it actually belongs in both tables. We move binary columns to the PersonBlob table, keeping the rest of the columns in the Person table. We also mapped the complex type property as part of the same approach. Here is how code looks in VB.NET:

Public Class PersonMap

Inherits EntityTypeConfiguration(Of Person)

Public Sub New()

Map(Sub(p)

p.Properties(Function(m) _

New With {

m.Photo,

m.FamilyPicture})

p.ToTable("PersonBlob")

End Sub)

Map(Sub(p)

p.Properties(Function(m) _

New With {

m.Address,

m.BirthDate,

m.FirstName,

m.HeightInFeet,

m.IsActive,

m.LastName,

m.MiddleName,

m.PersonState,

m.PersonTypeId})

p.ToTable("Person")

End Sub)

End Sub

End Class

We can also do the opposite, that is, map multiple entity types to a single table. This process is called table splitting versus entity splitting in the preceding example. The use case for this scenario is exactly the same; we want to separate infrequently accessed properties into its own class, but then relate the two classes together. The data is stored in a single table, but only frequently used properties will be accessed by the main entity. The code is exactly the same as we saw previously with any two related entities, except we map both of them to the same table, using the explicit table mapping for both entities with the exact same table name. As a result, we can retrieve one entity, but omit related entities with large column data from the query. When we need to load related large object data, simply use the Include method covered previously.

Advanced querying techniques

We talked about querying techniques in previous chapters. The topic of data retrieval is very extensive and requires developers to be thoroughly familiar with everything LINQ has to offer, in order to address scenarios that come up on a daily basis. In this chapter, we will cover many advanced topics. We want to make sure that developers are prepared for the vast majority of tasks they do daily, but some rarely encountered scenarios may not be addressed in this book.

Projections

Projections refer to a process of retrieving a subset of columns from one or more tables in a single query instead of all the columns, as we saw earlier in this book. Projections are very important from the perspective of efficiency and performance. If we only need to present the first and last names of a person to a user of our software, we have no reason to get all the columns from the Person table. Now we are faced with a question. How do we represent this data from the object perspective? We can still use the Person class, but if we do not populate all the properties, then we will potentially mislead ourselves or other developers. Instead, we have two better options to use for a class to read the data into:

· Use anonymous types

· Use an explicitly defined class that matches our query data

If we comsume a projection query's results in the same method as where the query itself is defined, we use anonymous types. Anonymous types in .NET are classes that are not explicitly named and their structure is derived from usage. However, if the data is passed around from one method to another, it is much more convenient to define a type for projection query. We will take a look at both approaches.

Another question we want to address is how to include data from multiple tables in a single projection, which is also possible with Entity Framework. Moreover, we need to remember that projections can be combined with sorting and filtering techniques, as well as other querying concepts. Let's demonstrate these concepts with the following problem. We want to select active persons, sorted by their last and first names, but only show the first name, last name, and a person type's name in the resulting set. The Person type's name will come from an entity related to a person. Properties that expose the related data are referred to as association properties. Let's first use the anonymous type. In order to create a projection using LINQ's method syntax, we need to use the Select method, as shown in the following example:

var people = context.People

.Where(p => p.PersonState == PersonState.Active)

.OrderBy(p => p.LastName)

.ThenBy(p => p.FirstName)

.Select(p => new

{

p.LastName,

p.FirstName,

p.PersonType.TypeName

});

foreach (var person in people)

{

Console.WriteLine("{0} {1} {2}",

person.FirstName, person.LastName, person.TypeName);

}

A lot of the preceding code looks familiar already. The Select method itself takes one parameter, an expression of a function, where the function parameter is an instance of the source of our query, a person instance in our case. The function returns an instance of the result of the selection. In this case, we use an anonymous type with three properties. We do not specify the property names explicitly, so the source's property names are used. We can clearly see this as we look at the code that loops through the results. Here is how the code looks in VB.NET:

Dim people = context.People _

.Where(Function(p) p.PersonState = PersonState.Active) _

.OrderBy(Function(p) p.LastName) _

.ThenBy(Function(p) p.FirstName) _

.Select(Function(p) New With { _

p.LastName, _

p.FirstName, _

p.PersonType.TypeName _

})

For Each person In people

Console.WriteLine("{0} {1} {2}",

person.FirstName,

person.LastName,

person.TypeName)

Next

This is how the code looks using the query syntax:

var query = from onePerson in context.People

where onePerson.PersonState == PersonState.Active

orderby onePerson.LastName, onePerson.FirstName

select new

{

Last = onePerson.LastName,

First = onePerson.FirstName,

onePerson.PersonType.TypeName

};

Even though we did not define the type that the query returns, we still have the full IntelliSense support and strong typing, due to the fact that .NET creates an actual type for our anonymous type declaration. We can also rename the columns in the resulting anonymous type, which is what we did in the preceding query syntax example. Of course, we can rename properties in the same way in statements that use the method syntax. Here is the same code in VB.NET:

Dim query = From onePerson In context.People

Where onePerson.PersonState = PersonState.Active

Order By onePerson.LastName, onePerson.FirstName

Select New With { _

.Last = onePerson.LastName, _

.First = onePerson.FirstName, _

onePerson.PersonType.TypeName _

}

Now, let's see how the code changes when an explicit type is used. The only change in both approaches is that in addition to the New keyword, we need to specify the actual type. For example, if we convert the query syntax example to use the explicit type, the code will look as follows:

var explicitQuery =

from onePerson in context.People

where onePerson.PersonState == PersonState.Active

orderby onePerson.LastName, onePerson.FirstName

select new PersonInfo

{

LastName = onePerson.LastName,

FirstName = onePerson.FirstName,

PersonType = onePerson.PersonType.TypeName,

PersonId = onePerson.PersonId

};

PersonInfo is the type we use to capture the query results into, and we can easily see that it just has a handful of properties that mostly match the Person class with the exception of the type name property, PersonType, which is a string-based property; for example, consider this code snippet:

public class PersonInfo

{

public int PersonId { get; set; }

public string PersonType { get; set; }

public string FirstName { get; set; }

public string LastName { get; set; }

}

Here is how the code looks in VB.NET:

Dim explicitQuery = _

From onePerson In context.People

Where onePerson.PersonState = PersonState.Active

Order By onePerson.LastName, onePerson.FirstName

Select New PersonInfo With { _

.LastName = onePerson.LastName, _

.FirstName = onePerson.FirstName, _

.PersonType = onePerson.PersonType.TypeName, _

.PersonId = onePerson.PersonId

}

We did not have to do anything special to get the data from a related table, and this is a fact worth noting. We simply walked through the relationship chain to get to the correct data, then included only the data we needed from the related entity, which is thePersonType name. We can also select multiple entities into a property, assuming that this property is a collection type. For example, we can include a list of Phones with our data by adding IEnumerable of the string Phones property to our PersonInfo class and selecting the phone numbers as follows:

var explicitQuery =

from onePerson in context.People

where onePerson.PersonState == PersonState.Active

orderby onePerson.LastName, onePerson.FirstName

select new PersonInfo

{

LastName = onePerson.LastName,

FirstName = onePerson.FirstName,

PersonType = onePerson.PersonType.TypeName,

PersonId = onePerson.PersonId,

Phones = onePerson.Phones.Select(ph=>ph.PhoneNumber)

};

foreach (var person in explicitQuery)

{

Console.WriteLine("{0} {1} {2} {3}",

person.FirstName, person.LastName,

person.PersonType, person.PersonId);

foreach (var phone in person.Phones)

{

Console.WriteLine(" " + phone);

}

}

What we see in the preceding code is two nested projection queries. The second one selects only one column from a set of related entities, Phones, for each type person. This query will result in a complex select statement sent to the backend, but it will only be a single query to execute, although it results in a set of entities, each with another set of related entities. This demonstrates the true power of Entity Framework in solving very complex problems, which would require a significant amount of code if a stored procedure were to be used. Here is how this code looks in VB.NET:

Dim explicitQuery = _

From onePerson In context.People

Where onePerson.PersonState = PersonState.Active

Order By onePerson.LastName, onePerson.FirstName

Select New PersonInfo With { _

.LastName = onePerson.LastName, _

.FirstName = onePerson.FirstName, _

.PersonType = onePerson.PersonType.TypeName, _

.PersonId = onePerson.PersonId, _

.Phones = onePerson.Phones.Select( _

Function(ph) ph.PhoneNumber)

}

For Each person In explicitQuery

Console.WriteLine("{0} {1} {2} {3}",

person.FirstName,

person.LastName,

person.PersonType,

person.PersonId)

For Each phone In person.Phones

Console.WriteLine(" " + phone)

Next

Next

If you like, try to write the preceding query using the method syntax, as an exercise.

Aggregations and grouping

Next, we will discuss how to aggregate the data. Aggregation is a process of getting a summary view of data, and converting a set of data values into a single value. There are many functions that LINQ supports that allow us to aggregate the data. Of course, by extension, Entity Framework supports them as well. It is important to note that aggregation occurs in the backend, not in memory inside .NET runtime, which would be inefficient. It is possible to accidentally aggregate in memory. This occurs when we get the results first in a .NET collection, then aggregate against this collection. To avoid the problem, make sure that you construct a query with aggregations in it and then execute it once. Here are common aggregation methods we use:

· Count: This counts the number of entities, typically based on a condition

· Sum: This creates a sum, typically of numeric values

· Min: This determines a minimum value

· Max: This determines a maximum value

· Average: This computes an average value

All of these functions can, of course, be combined with other queries. For example, let's find the minimum BirthDate of all the people that come in our sample database:

var min = context.People.Min(p => p.BirthDate);

Let's try to determine the maximum BirthDate in VB.NET:

Dim min = context.People.Max(Function(p) p.BirthDate)

These examples are pretty simple, getting a scalar value aggregate using the DateTime type in our example.

The Count function optionally accepts a condition that specifies what needs to evaluated to true in order to be counted, for example, consider this code snippet:

var count = context.People.Count(p =>

p.PersonState == PersonState.Active);

Here is how this count looks in VB.NET:

Dim count = context.People.Count( _

Function(p) p.PersonState = PersonState.Active)

This code computes the total number of people in the table that are in the active state, where the state is stored in the PersonState column inside the People table.

We can use aggregations inside other queries. For example, let's count the number of phone numbers for each person, modifying the projection query we used in the beginning of the chapter:

var people = context.People

.Select(p => new

{

p.LastName,

p.FirstName,

p.Phones.Count

});

We used an anonymous type and aggregate function inline to set an additional property on the resulting object. Here is the same code in VB.NET:

Dim people = context.People _

.Select(Function(p) New With { _

p.LastName, _

p.FirstName, _

p.Phones.Count

})

We use the method syntax in the preceding examples. We can combine aggregation method calls with the code created, with the query syntax as well.

Advanced query construction

We can chain LINQ methods, calling one after another, as we saw. For example, we can combine ordering and filtering in a single query, by chaining the OrderBy and Where methods together. All of them return the IQueryable interface. It also has a subinterface,IOrderedQueryable, that is used to represent results of ordered queries. If we write any query with Entity Framework that returns a set of objects, it can abstractly be thought of as an implementation of IQueryable. As a result, we can modify the query, creating a new query, and our changes are not executed until this final query is enumerated. Let's create a query by adding a condition and an order separately, and then applying an aggregate function to the result.

We will compute the sum of persons' heights in our example, as shown in the following code:

var query = from onePerson in context.People

where onePerson.PersonState == PersonState.Active

select new

{

onePerson.HeightInFeet,

onePerson.PersonId

};

query = query.OrderBy(p => p.HeightInFeet);

var sum = query.Sum(p => p.HeightInFeet);

Console.WriteLine(sum);

Let's walk through this code, as we want to make sure it is clear as to what is going in. Firstly, we create a basic query, filtering in only the active people and selecting a person identifier and height from the People collection, which corresponds to the People table in the database. What is important to point out now is that no SQL query has executed against the database so far, since we did not yet enumerate the results of our query. Next, we sort the query, using the method syntax. Finally, we access the query result by using aggregate functions. Aggregate functions will cause the final query to execute, the only query that results in SQL statements to be sent to the database. This query will sum up the height of all the active people. Here is how this code looks in VB.NET:

Dim query = From onePerson In context.People _

Where onePerson.PersonState = PersonState.Active

Select New With { _

onePerson.HeightInFeet, _

onePerson.PersonId

}

query = query.OrderBy(Function(p) p.HeightInFeet)

Dim sum = query.Sum(Function(p) p.HeightInFeet)

Console.WriteLine(sum)

We talked about deferred query execution before, but we used to create a query in a single line of code. There is no such limitation in Entity Framework. There is a specific use case to create queries this way. For example, we get a complex criteria object, and we need to check a specific property on our criteria before deciding whether we need to add a filter condition. The ability to create a query in multiple steps may come in handy in such scenarios.

On the other hand, we can also embed such criteria data directly in our query, as shown in the following code snippet:

query = from onePerson in context.People

where !criteria.FilterActive ||

(criteria.FilterActive &&

onePerson.PersonState == PersonState.Active)

select new

{

onePerson.HeightInFeet,

onePerson.PersonId

};

In the preceding example, we are working with a criteria object with the FilterActive property. We directly embed checks to see whether this property is set to true in our query code and execute the conditional code as a result. This is a very powerful concept that allows us to write code that goes far beyond the simple queries we worked on in previous chapters. Here is how this query looks in VB.NET:

query = From onePerson In context.People _

Where Not criteria.FilterActive Or _

(criteria.FilterActive And

onePerson.PersonState = PersonState.Active)

Select New With { _

onePerson.HeightInFeet, _

onePerson.PersonId

}

Entity Framework will interpret the .NET code and will convert it to appropriate SQL syntax. Not all .NET code will work for us in such a fashion, but many commonly used functions will work. For example, many string functions such as ToUpper, ToLower, and Containswill translate into appropriate SQL queries. Boolean and number comparison will work fine as well. Our default approach should be to use .NET code as part of our queries. .NET functions are translated properly via a backend provider as part of SQL query execution and interpretation. Entity Framework also has additional useful functions that are located in the System.Data.Entity.DbFunctions class. It exposes many DateTime and string operations that come in handy in a number of scenarios. For example, this is how we can add days to a particular date as part of our query:

query = from onePerson in context.People

where DbFunctions.AddDays(onePerson.BirthDate, 2) >

new DateTime(1970,1,1)

select new

{

onePerson.HeightInFeet,

onePerson.PersonId

};

Here is what this code looks like in VB.NET:

query = From onePerson In context.People _

Where DbFunctions.AddDays(onePerson.BirthDate, 2) >

New DateTime(1970, 1, 1)

Select New With { _

onePerson.HeightInFeet, _

onePerson.PersonId

}

To summarize, we can often find a way to embed .NET native functions inside our queries. As a result, it looks like we are ordinarily writing C# or VB.NET code, but it results in appropriate backend queries.

Paging data with windowing functions

Paging through our data is an extremely common scenario. It allows us to write applications that are efficient and do not consume large amounts of memory. Just imagine what would happen if we try to show in our user interface all the data from a table with millions of rows in it. Entity Framework via LINQ comes with an easy way to accomplish paging. Windowing functions, also known as paging functions, have this name because they provide us with a small window into large amounts of data. When we page through data, we deal with just two numbers:

· The page number that we want to show

· The number of rows per page

Entity Framework has two methods, Skip and Take, to allow developers to page through the data. The Skip method allows us to skip some number of rows, essentially rows from previous pages. The Take method allows us to retrieve some number of rows. Here is how we can retrieve a page of data from our table. Our criteria object contains the current page number and number of rows per page or page size; for example, consider this code snippet:

var people = context.People

.OrderBy(p => p.LastName)

.Skip((criteria.PageNumber - 1) * criteria.PageSize)

.Take(criteria.PageSize);

We perform simple math to figure out how many rows we need to skip, based on the page number and page size from criteria object.

Note

We must sort the data before using the Skip method, or an exception will be thrown.

Here is how the same code looks in VB.NET:

Dim query = context.People _

.OrderBy(Function(p) p.LastName) _

.Skip((criteria.PageNumber - 1) * criteria.PageSize) _

.Take(criteria.PageSize)

Note

We can easily combine paging with other functions, such as filtering.

Using joins

Joins allow developers to combine data from multiple tables into a single query, based on a condition that relates rows from two tables together. We already performed this function in previous examples when we dealt with related tables. Relationships simply abstract out the underlying database joins from developers. Joins come into play when Entity Framework's overall logical model does not specify relationships, for some reason. This may occur when data is not fully normalized or because we are working with a legacy database that we may not be able to change. At that point, we must join unrelated tables based on a condition. The join keyword in LINQ translates into the INNER JOIN operation in SQL, thus eliminating unmatched rows from the result set. This is how INNER JOIN works; we must have at least one matching row in the table on the right-hand side of the join, in order to see data from the table on the left-hand side of the join. Let's examine a simple use case where we manually join a person and person type, retrieving the first and last names of a person along with the type name using joins. Just like in SQL, we will see the same components to construct joins: the left-hand table or collection, right table, join condition, and selection of data based on data available via joins. Here ishow this looks using the LINQ query syntax:

var people = from person in context.People

join personType in context.PersonTypes

on person.PersonTypeId equals personType.PersonTypeId

select new

{

person.LastName,

person.FirstName,

personType.TypeName

};

We need to declare row-level variables for both the joined tables, person, and personType in the preceding example. We see the new usage of the equals keyword that is now used to support join conditions inside LINQ. It is used to specify the relationship condition between the left-hand table, People, and the right table, PersonTypes. The PersonTypeId value must be the same in rows from both tables in order to be included in the result set. Finally, we use the projection to select a subset of columns from both tables using variables for each row. The same code in VB.NET looks very similar, as shown in the following code:

Dim people =

From person In context.People

Join personType In context.PersonTypes

On person.PersonTypeId Equals personType.PersonTypeId

Select New With

{

person.LastName,

person.FirstName,

personType.TypeName

}

You undoubtedly noticed similarities with SQL syntax. If we switch to the method syntax for LINQ, here is the code we will see:

people = context.People

.Join(

context.PersonTypes,

person => person.PersonTypeId,

personType => personType.PersonTypeId,

(person, type) => new

{

Person = person,

PersonType = type

})

.Select(p => new

{

p.Person.LastName,

p.Person.FirstName,

p.PersonType.TypeName

});

The Join method in LINQ takes the same parameters as any join would: the right-hand table and condition. The left-hand table is what Join is called on. The final parameter specifies what needs to be selected out of the join; in our case, we just select all the data values from both the tables. We can use an anonymous type or explicit types, and select a subset of data. Even though we appear to select every column, this is not how SQL is constructed, only data specified in the final Select query is retrieved from the database, that is, the person's last and first names and matching type name. Here is the same code in VB.NET:

people = context.People _

.Join(context.PersonTypes, _

Function(person) person.PersonTypeId, _

Function(personType) personType.PersonTypeId, _

Function(person, personType) New With {

.Person = person,

.PersonType = personType}) _

.Select(Function(p) New With {

p.Person.LastName,

p.Person.FirstName,

p.PersonType.TypeName

})

We will see the LEFT OUTER join implementation in the next section. The LEFT OUTER join allow us to see the data from the table on the left-hand side of the join, even when we have to match rows in the table on the right.

Groupings and left outer joins

So far, we saw simple groupings that aggregate based on a single column and produced a single result. There are many other scenarios where we need to provide more than one column in a grouped or aggregated result set. For example, we want to see how many people we have, based on a month they were born in. We are not starting with a simple example, but this example will give us a thorough understanding of how to group and select data at the same time, as shown in the following code snippet:

var query =

from onePerson in context.People

group onePerson by new { onePerson.BirthDate.Value.Month }

into monthGroup

select new

{

Month = monthGroup.Key.Month,

Count = monthGroup.Count()

};

The query is based on the Person collection from the People property of our context. We loop through the data based on the person's data, specified in the onePerson row variable. We group based on the month of BirthDate. We use an anonymous type to demonstrate that we can group based on more than one field, just by adding more properties to this anonymous type, which specifies the group key(s). We also name the group so that we can access its data in final select. We select an anonymous type as well, grabbing key values from the group as well as our aggregate value, using the Count function. You can use other aggregate functions in a grouping. Here is how the same looks in VB.NET:

Dim query =

From onePerson In context.People

Group onePerson By personWithBirthday =

New With {.Month = onePerson.BirthDate.Value.Month}

Into monthGroup = Group

Select New With

{

.Count = monthGroup.Count(),

.Month = personWithBirthday.Month

}

There are some differences from C#. We have to name our group variable (personWithBirthday) and use the Group keyword when defining the monthGroup variable. Let's also take a look at the same grouping functions, using the method syntax:

var methodQuery =

context.People

.GroupBy(

onePerson => new { onePerson.BirthDate.Value.Month },

monthGroup => monthGroup)

.Select(monthGroup => new

{

Month = monthGroup.Key.Month,

Count = monthGroup.Count()

});

The GroupBy method in LINQ typically takes two parameters; that is, the group definition or key fields and the selector that will be used when retrieving group data. We are quite familiar with the Select method now, and we can use our group variable to select not only the group's key fields, but also additional aggregates based on the group definition. Here is how the same query looks in VB.NET:

Dim methodQuery =

context.People _

.GroupBy(Function(person) New With { .Month = person.BirthDate.Value.Month}, _

Function(monthGroup) monthGroup) _

.Select(Function(monthGroup) New With

{

.Month = monthGroup.Key.Month,

.Count = monthGroup.Count()

})

Let's now take a look at LEFT OUTER joins with LINQ. Unfortunately, there is no keyword in LINQ that supports LEFT OUTER joins out of the box. Instead, we have to use grouping with a new keyword: DefaultIfEmpty. This function returns the default for the right table's selectable data if it is missing, typically just the null value. Another new keyword we will see is GroupJoin, which correlates two sets of data based on a condition, thus returning the results of the join. Let's take a look at an example. Our Person object supports null for the person type value. So, let's select a few columns from the Person table and person type, using the "Unknown" string when the person type is null, as shown in this code snippet:

var query =

from person in context.People

join personType in context.PersonTypes

on person.PersonTypeId equals personType.PersonTypeId into finalGroup

from groupedData in finalGroup.DefaultIfEmpty()

select new

{

person.LastName,

person.FirstName,

TypeName = groupedData.TypeName ?? "Unknown"

};

Let's read this statement. We are starting with the People dataset. We join the person types based on the person type identifying, grouping the data based on person. In the next line, we specify that we want to see the default value for the person type inside the group, if the joined data from the right-hand table (the person type) is missing. This is what the DefaultIfEmpty method does. Finally, we select from the grouped data and primary table's data, coalescing the missing person type with the "Unknown" word. Here is how the same code looks in VB.NET:

Dim query =

From person In context.People

Group Join personType In context.PersonTypes

On person.PersonTypeId Equals personType.PersonTypeId Into finalGroup = Group

From groupedData In finalGroup.DefaultIfEmpty()

Select New With

{

.LastName = person.LastName,

.FirstName = person.FirstName,

.TypeName = If(groupedData.TypeName Is Nothing, "Unknown", groupedData.TypeName)

}

VB.NET code is slightly different. We still start with the people set. We join with the person type while grouping the data based on person. We have to use the Group keyword to specify that we want everything from the Group type. In the next line, we see the secondfrom keyword, selecting our grouped data. In the next few lines, we select the person data and coalesce the missing type with the word "Unknown".

We have not seen two from keywords in the same query before. This is possible, and in terms of SQL, similar to the older, pre-JOIN syntax, where we can specify multiple tables in the From portion of our SQL statement and specify the join condition inside the WHEREportion.

Let's take a look at the same statement using the method syntax. In this case, both C# and VB.NET code use the GroupJoin method:

var methodQuery = context.People

.GroupJoin(

context.PersonTypes,

person => person.PersonTypeId,

personType => personType.PersonTypeId,

(person, type) => new

{

Person = person,

PersonType = type

})

.SelectMany(groupedData =>

groupedData.PersonType.DefaultIfEmpty(),

(group, personType) => new

{

group.Person.LastName,

group.Person.FirstName,

TypeName = personType.TypeName ?? "Unknown"

});

We see something new in this code, so let's walk through it. We start with person again, and then we join with person types, based on the person type identifier. We then select both person and personType data in the grouped results, based on person. Next, we see theSelectMany keyword. We project a specified function, DefaultIfEmpty, onto the grouped results, and then select some data from person and person type, substituting null with the word "Unknown", where personType is missing.

Here is how this code looks in VB.NET:

Dim methodQuery =

context.People _

.GroupJoin(

context.PersonTypes,

Function(person) person.PersonTypeId,

Function(personType) personType.PersonTypeId,

Function(person, type) New With

{

.Person = person,

.PersonType = type

}) _

.SelectMany(Function(groupedData) _

groupedData.PersonType.DefaultIfEmpty(),

Function(group, personType) New With

{

.LastName = group.Person.LastName,

.FirstName = group.Person.FirstName,

.TypeName = If(personType.TypeName Is Nothing, "Unknown", personType.TypeName)

})

SelectMany has another usage in LINQ. In addition to LEFT OUTER joins, it is also used in selecting many child and parent records into a single result set. Previously, we saw that if we select person and Phones, we see one person object with a collection of phones in the Phones property. We could unroll this data with SelectMany and select a result where person data is repeated with each child that belongs to the parent in question; for example, seeing the first name of a person and their phone number, where the person's first name is repeated for each phone. This is shown in the following code snippet:

var query =

from onePerson in context.People

from onePhone in onePerson.Phones

orderby onePerson.LastName, onePhone.PhoneNumber

select new

{

onePerson.LastName,

onePerson.FirstName,

onePhone.PhoneNumber

};

We see familiar code here. We use the from keyword twice to bring two sets of records together. We sort them. Then, we select some data from both sets using a projection. Here is the same query in VB.NET:

Dim query =

From onePerson In context.People

From onePhone In onePerson.Phones

Order By onePerson.LastName, onePhone.PhoneNumber

Select New With

{

onePerson.LastName,

onePerson.FirstName,

onePhone.PhoneNumber

}

We can also write the same statement using the method syntax. This time, use the SelectMany function. This function takes an expression with related data, Phones in our case, which is applied to another set of data, the person set in our case. The second parameter to the function is the selector that is the output of the final result; for example, consider this code snippet:

var methodQuery =

context.People

.SelectMany(person => person.Phones, (person, phone) => new

{

person.LastName,

person.FirstName,

phone.PhoneNumber

})

.OrderBy(p => p.LastName)

.ThenBy(p => p.PhoneNumber);

Here is how the code looks in VB.NET:

Dim methodQuery =

context.People _

.SelectMany( _

Function(person) person.Phones, _

Function(person, phone) New With

{

person.LastName,

person.FirstName,

phone.PhoneNumber

}) _

.OrderBy(Function(p) p.LastName) _

.ThenBy(Function(p) p.PhoneNumber)

Set operations

LINQ supports the following set operators:

· Distinct

· Union

· Intersect

· Except

Distinct, just as the same implies, returns a set of unique values across a set of data. For example, let's determine all unique person types across all of the people in the table:

var uniqueQuery = context.People

.Select(p => p.PersonType.TypeName)

.Distinct();

It is important to note that the distinct selection is not limited to a single field as in the preceding example. We can apply the Distinct operator to results that contain many columns or properties, both anonymous and explicit types. Here is how the same code looks in VB.NET:

Dim uniqueQuery = context.People _

.Select(Function(p) p.PersonType.TypeName) _

.Distinct()

We can also perform unions of multiple queries. Union combines results of two queries into a single result set. For example, if we want to select all names of both People and Companies into a single results set, we can do something like the following:

var unionQuery = context.People

.Select(p => new

{

Name = p.LastName + " " + p.FirstName,

RowType = "Person"

})

.Union(context.Companies.Select(c => new

{

Name = c.CompanyName,

RowType = "Company"

}))

.OrderBy(result => result.RowType)

.ThenBy(result => result.Name);

We are using an anonymous type in this case with two properties: name and record type. We see a new interesting capability, that is, the ability to concatenate strings as part of a query. This code is executed by the backend, as is the case in all the examples we covered so far. Entity Framework is smart enough to handle such use cases. We also apply a sort order to the result set. Here is how the code looks in VB.NET:

Dim unionQuery = context.People _

.Select(Function(p) New With

{

.Name = p.LastName + " " + p.FirstName,

.RowType = "Person"

}) _

.Union(context.Companies.Select(Function(c) New With

{

.Name = c.CompanyName,

.RowType = "Company"

})) _

.OrderBy(Function(result) result.RowType) _

.ThenBy(Function(result) result.Name)

Intersect looks for common data between two queries. Except looks for differences between two query input sets. Syntactically, there is no difference between Union and Intersect. So, to exercise this idea, try to replace the Union keyword with the Intersect or Exceptkeyword and, examine the results.

Self-test questions

Q1. Which base class do you use to configure a class used to contain a number of properties that are common to multiple entities?

1. EntityTypeConfiguration

2. ComplexTypeConfiguration

3. CommonTypeConfiguration

Q2. You must have table names and class names that always match in Entity Framework, true or false?

Q3. Every property in an entity is always persisted to the database, true or false?

Q4. What is the name of the technique that involves selecting a subset of columns from a table via a query in Entity Framework?

1. Projection

2. Subquery

3. You cannot do this in Entity Framework

Q5. You must declare a result type to dynamically select columns from multiple tables via a single query, true or false?

Q6. You must use the Join keyword to select data from related tables in a single query, true or false?

Q7. Which function can be used in a method syntax query to repeat parent data along with unique child data in a single query result?

1. Select

2. GroupJoin

3. SelectMany

Q8. Which LINQ method can be used to find all unique values for one column in a table?

1. Distinct

2. Unique

3. Union

Q9. Which single LINQ method can be used to accomplish the selection of data from multiple tables, that is similar to LEFT OUTER JOIN in SQL?

1. Join

2. LeftJoin

3. RightJoin

4. You have to use a combination of methods, as no single method exists

Q10. Which two methods can you use to page through the data?

1. Miss and Yield

2. Skip and Take

3. Group and Take

Q11. You cannot create grouping queries based on multiple fields from a table, true or false?

Summary

We covered a lot of ground in this chapter. Since practice makes perfect, I always recommend that everyone tries out the covered concepts by writing code and experimenting. We looked at some new modeling techniques. We saw that we can create additional classes, called complex types, to group properties common to multiple entities to ensure consistency in our models. We saw that using enumerations in models can lead to more readable code, and were convinced that Entity Framework has first-class support for enumerations. We saw that we do not have to have the names of our classes and properties match database structure exactly. We could use the HasColumnName and ToTable methods to provide alterative names at the database level.

We looked at many advanced query techniques. Most concepts can be used with both the query and method syntax of LINQ. We used projections to select subsets of columns from one or more tables in a single query. We saw how we can accomplish projection queries with anonymous and explicit types. We saw how we can aggregate the data to compute maximum, minimum, and a sum of data to get single values. We were also able to group the data in more advanced ways, providing grouping by many properties, and aggregates inside the same result set, using the GroupBy method. We looked at how we can use the same technique in order to accomplish the LEFT OUTER join functionality, and the ability to retrieve the data from the left-hand table, even though there are no matching rows in the table on the right. We saw how SelectMany can be used to create a functionality similar to JOIN in SQL, where some of the data from one table is repeated in matching child rows in the result set. Finally, we took a look at set operations that allow us to find distinct data as well as combine data from multiple queries via union operations.

In the next chapter, we will look at working with stored procedures and views, and database artifacts that do not map directly to entities, as tables do. We will also see how to handle concurrency, a circumstance where multiple users try to update the same data at the same time. We will understand the advantages of using Entity Framework's asynchronous API.