Synchronization - Concurrency in C# Cookbook (2014)

Concurrency in C# Cookbook (2014)

Chapter 11. Synchronization

When your application makes use of concurrency (as practically all .NET applications do), then you need to watch out for situations where one piece of code needs to update data while other code needs to access the same data. Whenever this happens, you need to synchronize access to the data. The recipes in this chapter cover the most common types used to synchronize access. However, if you use the other recipes in this book appropriately, you’ll find that a lot of the more common synchronization is already done for you by the appropriate libraries. Before diving into the synchronization recipes, let’s take a closer look at some common situations where synchronization may or may not be required.

TIP

The synchronization explanations in this section are slightly simplified, but the conclusions are all correct.

There are two major types of synchronization: communication and data protection. Communication is used when one piece of code needs to notify another piece of code of some condition (e.g., a new message has arrived). We’ll cover communication more thoroughly in the actual recipes; the remainder of this intro will discuss data protection.

We need to use synchronization to protect shared data when all three of these conditions are true:

§ Multiple pieces of code are running concurrently.

§ These pieces are accessing (reading or writing) the same data.

§ At least one piece of code is updating (writing) the data.

The reason for the first condition should be obvious; if your entire code just runs from top to bottom and nothing ever happens concurrently, then you never have to worry about synchronization. This is the case for some simple Console applications, but the vast majority of .NET applications do use some kind of concurrency. The second condition means that if each piece of code has its own local data that it doesn’t share, then there’s no need for synchronization; the local data is independent from any other pieces of code. There’s also no need for synchronization if there is shared data but the data never changes; the third condition covers scenarios like configuration values and the like that are set at the beginning of the application and then never change. If the shared data is only read, then it doesn’t need synchronization.

The purpose of data protection is to provide each piece of code with a consistent view of the data. If one piece of code is updating the data, then we use synchronization to make those updates appear atomic to the rest of the system.

It takes some practice to learn when synchronization is necessary, so we’ll walk through a few examples before actually starting the recipes in this chapter. As our first example, consider the following code:

async Task MyMethodAsync()

{

int val = 10;

await Task.Delay(TimeSpan.FromSeconds(1));

val = val + 1;

await Task.Delay(TimeSpan.FromSeconds(1));

val = val - 1;

await Task.Delay(TimeSpan.FromSeconds(1));

Trace.WriteLine(val);

}

If this method is called from a thread-pool thread (e.g., from within Task.Run), then the lines of code accessing val may run on separate thread-pool threads. But does it need synchronization? No, because none of them can be running at the same time. The method is asynchronous, but it is also sequential (meaning it progresses one part at a time).

OK, let’s complicate the example a bit. This time we’ll run concurrent asynchronous code:

class SharedData

{

public int Value { get; set; }

}

async Task ModifyValueAsync(SharedData data)

{

await Task.Delay(TimeSpan.FromSeconds(1));

data.Value = data.Value + 1;

}

// WARNING: may require synchronization; see discussion below.

async Task<int> ModifyValueConcurrentlyAsync()

{

var data = new SharedData();

// Start three concurrent modifications.

var task1 = ModifyValueAsync(data);

var task2 = ModifyValueAsync(data);

var task3 = ModifyValueAsync(data);

await Task.WhenAll(task1, task2, task3);

return data.Value;

}

In this case, we’re starting three modifications that run concurrently. Do we need synchronization? The answer is, “it depends.” If we know that the method is called from a GUI or ASP.NET context (or any context that only allows one piece of code to run at a time), then there is no synchronization necessary because when the actual data modification code runs, it runs at a different time than the other two data modifications. For example, if this is run in a GUI context, there is only one UI thread that will execute each of the data modifications, so it must do them one at a time. So, if we know the context is a one-at-a-time context, then there is no synchronization needed. However, if this same method is called from a thread-pool thread (e.g., from Task.Run), then synchronization would be necessary. In that case, the three data modifications could run on separate thread-pool threads and update data.Value simultaneously, so we would need to synchronize access to data.Value.

Now let’s make our data a private field instead of something we pass around, and consider one more wrinkle:

private int value;

async Task ModifyValueAsync()

{

await Task.Delay(TimeSpan.FromSeconds(1));

value = value + 1;

}

// WARNING: may require synchronization; see discussion below.

async Task<int> ModifyValueConcurrentlyAsync()

{

// Start three concurrent modifications.

var task1 = ModifyValueAsync();

var task2 = ModifyValueAsync();

var task3 = ModifyValueAsync();

await Task.WhenAll(task1, task2, task3);

return value;

}

The same discussion also applies to this code; if the context may be a thread pool context, then synchronization is definitely necessary. But there’s an additional wrinkle here. Previously, we created a SharedData instance that was shared among the three modifying methods; this time, the shared data is an actual private field. This means that if the calling code calls ModifyValueConcurrentlyAsync multiple times, each of those separate calls shares the same value. We may want to apply synchronization even in a one-at-a-time context if we want to avoid that kind of sharing. To put that another way, if we want to make it so that each call to ModifyValueConcurrentlyAsync waits until all previous calls have completed, then we will need to add synchronization. This is true even if the context ensures that only one thread is used for all the code (i.e., the UI thread). Synchronization in this scenario is actually a kind of throttling for asynchronous methods (see Recipe 11.2).

Let’s look at one more async example. You can use Task.Run to do what I call “simple parallelism”—a basic kind of parallel processing that doesn’t provide the efficiency and configurability that the true parallelism of Parallel/PLINQ does. The following code updates a shared value using simple parallelism:

// BAD CODE!!

async Task<int> SimpleParallelismAsync()

{

int val = 0;

var task1 = Task.Run(() => { val = val + 1; });

var task2 = Task.Run(() => { val = val + 1; });

var task3 = Task.Run(() => { val = val + 1; });

await Task.WhenAll(task1, task2, task3);

return val;

}

In this case, we have three separate tasks running on the thread pool (via Task.Run), all modifying the same val. So, our conditions apply, and we certainly do need synchronization here. Note that we do need synchronization even though val is a local variable; it is still shared between threads even though it is local to this one method.

Moving on to true parallel code, let’s consider an example that uses the Parallel type:

void IndependentParallelism(IEnumerable<int> values)

{

Parallel.ForEach(values, item => Trace.WriteLine(item));

}

Since this code uses Parallel, we must assume we’re running on multiple threads. However, the body of the parallel loop (item => Trace.WriteLine(item)) only reads from its own data; there’s no data sharing between threads here. The Parallel class divides the data among threads so that none of them has to share its data. Each thread running its loop body is independent from all the other threads running the same loop body. So, no synchronization of this code is necessary.

Let’s look at an aggregation example similar to the one covered in Recipe 3.2:

// BAD CODE!!

int ParallelSum(IEnumerable<int> values)

{

int result = 0;

Parallel.ForEach(source: values,

localInit: () => 0,

body: (item, state, localValue) => localValue + item,

localFinally: localValue => { result += localValue; });

return result;

}

In this example, we are again using multiple threads; this time, each thread starts with its local value initialized to 0 (() => 0), and for each input value processed by that thread, it adds the input value to its local value ((item, state, localValue) => localValue + item). Finally, all the local values are added to the return value (localValue => { result += localValue; }). The first two steps aren’t problematic because there’s nothing shared between threads; each thread’s local and input values are independent from all other threads’ local and input values. However, the final step is problematic; when each thread’s local value is added to the return value, we have a situation where there is a shared variable (result) that is accessed by multiple threads and updated by all of them. So, we need to use synchronization in that final step (seeRecipe 11.1).

The PLINQ, dataflow, and reactive libraries are very similar to the Parallel examples: as long as your code is just dealing with its own input, it doesn’t have to worry about synchronization. I find that if I use these libraries appropriately, there is very little need for me to add synchronization to most of my code.

Lastly, let’s discuss collections for a bit. Remember that the three conditions requiring synchronization are multiple pieces of code, shared data, and data updates.

Immutable types are naturally threadsafe because they cannot change; it’s not possible to update an immutable collection, so no synchronization is necessary. For example, this code does not require synchronization because when each separate thread-pool thread pushes a value onto the stack, it is actually creating a new immutable stack with that value, leaving the original stack unchanged:

async Task<bool> PlayWithStackAsync()

{

var stack = ImmutableStack<int>.Empty;

var task1 = Task.Run(() => Trace.WriteLine(stack.Push(3).Peek()));

var task2 = Task.Run(() => Trace.WriteLine(stack.Push(5).Peek()));

var task3 = Task.Run(() => Trace.WriteLine(stack.Push(7).Peek()));

await Task.WhenAll(task1, task2, task3);

return stack.IsEmpty; // Always returns true.

}

However, when your code uses immutable collections, it’s common to have a shared “root” variable that is not itself immutable. In that case, you do have to use synchronization. In the following code, each thread pushes a value onto the stack (creating a new immutable stack) and then updates the shared root variable. In this example, we do need synchronization to update the stack variable:

// BAD CODE!!

async Task<bool> PlayWithStackAsync()

{

var stack = ImmutableStack<int>.Empty;

var task1 = Task.Run(() => { stack = stack.Push(3); });

var task2 = Task.Run(() => { stack = stack.Push(5); });

var task3 = Task.Run(() => { stack = stack.Push(7); });

await Task.WhenAll(task1, task2, task3);

return stack.IsEmpty;

}

Threadsafe collections (e.g., ConcurrentDictionary) are quite different. Unlike immutable collections, threadsafe collections can be updated. However, they have all the synchronization they need built in, so you don’t have to worry about it. If the following code updated a Dictionaryinstead of a ConcurrentDictionary, it would need synchronization; but since it is updating a ConcurrentDictionary, it does not need synchronization:

async Task<int> ThreadsafeCollectionsAsync()

{

var dictionary = new ConcurrentDictionary<int, int>();

var task1 = Task.Run(() => { dictionary.TryAdd(2, 3); });

var task2 = Task.Run(() => { dictionary.TryAdd(3, 5); });

var task3 = Task.Run(() => { dictionary.TryAdd(5, 7); });

await Task.WhenAll(task1, task2, task3);

return dictionary.Count; // Always returns 3.

}

11.1. Blocking Locks

Problem

You have some shared data and need to safely read and write it from multiple threads.

Solution

The best solution for this situation is to use the lock statement. When a thread enters a lock, it will prevent any other threads from entering that lock until the lock is released:

class MyClass

{

// This lock protects the _value field.

private readonly object _mutex = new object();

private int _value;

public void Increment()

{

lock (_mutex)

{

_value = _value + 1;

}

}

}

Discussion

There are many other kinds of locks in the .NET framework, such as Monitor, SpinLock, and ReaderWriterLockSlim. These lock types should almost never be used in most applications. In particular, it is natural for developers to jump to ReaderWriterLockSlim when there is no need for that level of complexity. The basic lock statement handles 99% of cases quite well.

There are four important guidlines when using locks:

§ Restrict lock visibility.

§ Document what the lock protects.

§ Minimize code under lock.

§ Never execute arbitrary code while holding a lock.

First, you should strive to restrict lock visibility. The object used in the lock statement should be a private field and never should be exposed to any method outside the class. There is usually at most one lock per type; if you have more than one, consider refactoring that type into separate types. You can lock on any reference type, but I prefer to have a field specifically for use with the lock statement, as in the last example. In particular, you should never lock(this) or lock on any instance of Type or string; these locks can cause deadlocks because they are accessible from other code.

Second, document what the lock protects. This is easy to overlook when initially writing the code but becomes more important as the code grows in complexity.

Third, do your best to minimize the code that is executed while holding a lock. One thing to watch for is blocking calls; it is not ideal to block at all while holding a lock.

Finally, do not ever call arbitrary code under lock. Arbitrary code can include raising events, invoking virtual methods, or invoking delegates. If you must execute arbitrary code, do so after the lock is released.

See Also

Recipe 11.2 covers async-compatible locks. The lock statement is not compatible with await.

Recipe 11.3 covers signaling between threads. The lock statement is intended to protect shared data, not send signals between threads.

Recipe 11.5 covers throttling, which is a generalization of locking. A lock can be thought of as throttling to one at a time.

11.2. Async Locks

Problem

You have some shared data and need to safely read and write it from multiple code blocks, which may be using await.

Solution

The .NET framework SemaphoreSlim type has been updated in version 4.5 to be compatible with async. It can be used as such:

class MyClass

{

// This lock protects the _value field.

private readonly SemaphoreSlim _mutex = new SemaphoreSlim(1);

private int _value;

public async Task DelayAndIncrementAsync()

{

await _mutex.WaitAsync();

try

{

var oldValue = _value;

await Task.Delay(TimeSpan.FromSeconds(oldValue));

_value = oldValue + 1;

}

finally

{

_mutex.Release();

}

}

}

However, SemaphoreSlim can only be used this way on .NET 4.5 and other newer platforms. If you are on an older platform or writing a portable class library, you can use the AsyncLock type from the Nito.AsyncEx library:

class MyClass

{

// This lock protects the _value field.

private readonly AsyncLock _mutex = new AsyncLock();

private int _value;

public async Task DelayAndIncrementAsync()

{

using (await _mutex.LockAsync())

{

var oldValue = _value;

await Task.Delay(TimeSpan.FromSeconds(oldValue));

_value = oldValue + 1;

}

}

}

Discussion

The same guidelines from Recipe 11.1 also apply here, specifically:

§ Restrict lock visibility.

§ Document what the lock protects.

§ Minimize code under lock.

§ Never execute arbitrary code while holding a lock.

Keep your lock instances private; do not expose them outside the class. Be sure to clearly document (and carefully think through) exactly what a lock instance protects. Minimize code that is executed while holding a lock. In particular, do not call arbitrary code; this includes raising events, invoking virtual methods, and invoking delegates. Platform support for asynchronous locks is detailed in Table 11-1.

Table 11-1. Platform support for asynchronous locks

Platform

SemaphoreSlim.WaitAsync

AsyncLock

.NET 4.5

Yes

Yes

.NET 4.0

No

Yes

Mono iOS/Droid

Yes

Yes

Windows Store

Yes

Yes

Windows Phone Apps 8.1

Yes

Yes

Windows Phone SL 8.0

Yes

Yes

Windows Phone SL 7.1

No

Yes

Silverlight 5

No

Yes

TIP

The AsyncLock type is in the Nito.AsyncEx NuGet package.

See Also

Recipe 11.4 covers async-compatible signaling. Locks are intended to protect shared data, not act as signals.

Recipe 11.5 covers throttling, which is a generalization of locking. A lock can be thought of as throttling to one at a time.

11.3. Blocking Signals

Problem

You have to send a notification from one thread to another.

Solution

The most common and general-purpose cross-thread signal is ManualResetEventSlim. A manual-reset event can be in one of two states: signaled or unsignaled. Any thread may set the event to a signaled state or reset the event to an unsignaled state. A thread may also wait for the event to be signaled.

The following two methods are invoked by separate threads; one thread waits for a signal from the other:

class MyClass

{

private readonly ManualResetEventSlim _initialized =

new ManualResetEventSlim();

private int _value;

public int WaitForInitialization()

{

_initialized.Wait();

return _value;

}

public void InitializeFromAnotherThread()

{

_value = 13;

_initialized.Set();

}

}

Discussion

ManualResetEventSlim is a great general-purpose signal from one thread to another, but you should only use it when appropriate. If the “signal” is actually a message sending some piece of data across threads, then consider using a producer/consumer queue. On the other hand, if the signals are just used to coordinate access to shared data, then you should use a lock instead.

There are other thread synchronization signal types in the .NET framework that are less commonly used. If ManualResetEventSlim doesn’t suit your needs, consider AutoResetEvent, CountdownEvent, or Barrier.

See Also

Recipe 8.6 covers blocking producer/consumer queues.

Recipe 11.1 covers blocking locks.

Recipe 11.4 covers async-compatible signals.

11.4. Async Signals

Problem

You need to send a notification from one part of the code to another, and the receiver of the notification must wait for it asynchronously.

Solution

If the notification only needs to be sent once, then you can use TaskCompletionSource<T> to send the notification asynchronously. The sending code calls TrySetResult, and the receiving code awaits its Task property:

class MyClass

{

private readonly TaskCompletionSource<object> _initialized =

new TaskCompletionSource<object>();

private int _value1;

private int _value2;

public async Task<int> WaitForInitializationAsync()

{

await _initialized.Task;

return _value1 + _value2;

}

public void Initialize()

{

_value1 = 13;

_value2 = 17;

_initialized.TrySetResult(null);

}

}

The TaskCompletionSource<T> type can be used to asynchronously wait for any kind of situation—in this case, a notification from another part of the code. This works well if the signal is only sent once, but does not work as well if you need to turn the signal off as well as on.

The Nito.AsyncEx library contains a type AsyncManualResetEvent, which is an approximate equivalent of ManualResetEvent for asynchronous code. The following example is fabricated, but it shows how to use the AsyncManualResetEvent type:

class MyClass

{

private readonly AsyncManualResetEvent _connected =

new AsyncManualResetEvent();

public async Task WaitForConnectedAsync()

{

await _connected.WaitAsync();

}

public void ConnectedChanged(bool connected)

{

if (connected)

_connected.Set();

else

_connected.Reset();

}

}

Discussion

Signals are a general-purpose notification mechanism. But if that “signal” is a message, used to send data from one piece of code to another, then consider using a producer/consumer queue. Similarly, do not use general-purpose signals just to coordinate access to shared data; in that situation, use a lock.

TIP

The AsyncManualResetEvent type is in the Nito.AsyncEx NuGet package.

See Also

Recipe 8.8 covers asynchronous producer/consumer queues.

Recipe 11.2 covers asynchronous locks.

Recipe 11.3 covers blocking signals, which can be used for notifications across threads.

11.5. Throttling

Problem

You have highly concurrent code that is actually too concurrent, and you need some way to throttle the concurrency.

Code is too concurrent when parts of the application are unable to keep up with other parts, causing data items to build up and consume memory. In this scenario, throttling parts of the code can prevent memory issues.

Solution

The solution varies based on the type of concurrency your code is doing. These solutions all restrict concurrency to a specific value. Reactive Extensions has more powerful options, such as sliding time windows; throttling Rx is covered more thoroughly in Recipe 5.4.

Dataflow and parallel code all have built-in options for throttling concurrency:

IPropagatorBlock<int, int> DataflowMultiplyBy2()

{

var options = new ExecutionDataflowBlockOptions

{

MaxDegreeOfParallelism = 10

};

return new TransformBlock<int, int>(data => data * 2, options);

}

// Using Parallel LINQ (PLINQ)

IEnumerable<int> ParallelMultiplyBy2(IEnumerable<int> values)

{

return values.AsParallel()

.WithDegreeOfParallelism(10)

.Select(item => item * 2);

}

// Using the Parallel class

void ParallelRotateMatrices(IEnumerable<Matrix> matrices, float degrees)

{

var options = new ParallelOptions

{

MaxDegreeOfParallelism = 10

};

Parallel.ForEach(matrices, options, matrix => matrix.Rotate(degrees));

}

Concurrent asynchronous code can be throttled by using SemaphoreSlim:

async Task<string[]> DownloadUrlsAsync(IEnumerable<string> urls)

{

var httpClient = new HttpClient();

var semaphore = new SemaphoreSlim(10);

var tasks = urls.Select(async url =>

{

await semaphore.WaitAsync();

try

{

return await httpClient.GetStringAsync(url);

}

finally

{

semaphore.Release();

}

}).ToArray();

return await Task.WhenAll(tasks);

}

Discussion

Throttling may be necessary when you find your code is using too many resources (for example, CPU or network connections). Bear in mind that end users usually have less powerful machines than developers, so it is better to throttle by a little too much than not enough.

See Also

Recipe 5.4 covers throttling for reactive code.