Wrapping Up - Seven Concurrency Models in Seven Weeks (2014)

Seven Concurrency Models in Seven Weeks (2014)

Chapter 9. Wrapping Up

Congratulations on making it through all seven weeks!

We’ve covered a great deal of ground, from the fine-grained parallelism supported by a data-parallel GPU to the massive scale of a MapReduce cluster. Along the way, we’ve seen not only how concurrency and parallelism allow us to exploit the power of modern multicore CPUs, but also many other benefits of moving beyond conventional, sequential code:

· We saw how Elixir, Hadoop, and Storm all distribute computation across a cluster of independent machines, allowing us to create solutions that can recover when hardware fails.

· When looking at core.async, we saw how concurrency could rescue us from the “callback hell” commonly associated with event handling.

· In the chapter on functional programming, we saw how a concurrent solution could be both simpler and easier to understand than its sequential equivalent.

Let’s take a look at what this means for the future.

Where Are We Going?

More than two decades ago, I predicted that parallel and distributed programming were about to go mainstream, so I don’t have a fantastic track record as a pundit. Nevertheless, I believe that the increasing importance of concurrency and parallelism have clear implications for the future of programming.

The Future Is Immutable

To my mind, one lesson shines through all others—immutability is going to play a much larger part in the code we write in the future than it has in the past.

Immutability is most obviously relevant to functional programming—avoiding mutable state is what makes parallelism and concurrency so easy in functional code. But we don’t have to write functional programs for immutability to be beneficial. Let’s look at the evidence from the last few weeks:

· Although Clojure isn’t a pure functional language, its core data structures are immutable and therefore persistent (as we saw in Persistent Data Structures). And persistent data structures allow Clojure to support mutable references that separate identity from state, avoiding the problems normally associated with mutable state.

· Although it’s not typically constructed using functional code at the lowest level, immutability lies at the heart of the Lambda Architecture—by restricting the batch layer to eternally true (immutable) raw data, we can safely distribute that data across a cluster, process it in parallel and recover from both technical and human faults.

· Although Elixir is not a pure functional language, its lack of mutable variables is a key enabler for the impressive efficiency and reliability of the Erlang virtual machine upon which it runs.

· The messages sent by both actor and CSP applications are immutable.

· Immutability is even helpful when writing threads and locks--based programs—the more data that’s immutable, the fewer locks we need and the less we need to worry about memory visibility.

It seems clear that, even if you’re not using a functional language, the frameworks you use and the code you write are going to be increasingly influenced by functional principles. This is great news—not only will it make it easier for us to exploit parallelism and concurrency, but it will make our code simpler, easier to understand, and more reliable.

The Future Is Distributed

The primary reason for the current resurgence of interest in parallelism and concurrency is the multicore crisis. Instead of individual cores becoming faster, we’re seeing CPUs with more and more cores. The good news is that we can exploit those cores by using the techniques we’ve seen over the last few weeks.

But there’s another crisis coming our way—memory bandwidth. Current-generation machines with two, four, or eight cores can communicate effectively via shared memory. But what about when we have sixteen, thirty-two, or sixty-four cores?

If the number of cores continues to increase at the current rate, shared memory is going to become the bottleneck, which means that we’re going to have to worry about distributed memory. The computer of the future may be contained within a single box, but from the programmer’s point of view it’s likely to look more like a cluster of independent computers.

This makes it inevitable, I think, that techniques based on message passing, like actors and CSP, will become more important over time.

You won’t be surprised to hear that the last seven weeks haven’t been a completely exhaustive exploration of your options when it comes to concurrent and parallel development. So what didn’t we cover?

Roads Not Taken

One of the hardest decisions we had to make when creating this book was what to leave out. Here’s a quick summary of the roads we didn’t take, as well as some pointers if you want to investigate them yourself.

Fork/Join and Work-Stealing

Fork/Join is an approach to parallelism popularized by the Cilk language,[81] a parallel variant of C/C++, but implementations are now available for many environments, including Java.[82] Fork/Join is particularly suited to divide-and-conquer algorithms, such as those we saw in Divide and Conquer (indeed, Clojure’s reducers make use of Java’s Fork/Join framework under the hood).

Fork/Join implementations typically make use of work-stealing to share tasks across a thread pool, an approach very similar to Clojure’s go blocks (see Go Blocks).

Dataflow

We briefly touched on dataflow in Dataflow, but the subject really deserves more discussion. The primary reason why we didn’t cover it further is that none of the attempts to create a general-purpose dataflow language have been particularly compelling. The best example is probably the multiparadigm programming language Oz (part of the Mozart Programming System).[83]

This doesn’t mean dataflow isn’t important, though—quite the opposite. Dataflow-based parallelism is extremely heavily used in hardware design—both VHDL and Verilog are dataflow languages.[84][85]

Reactive Programming

Closely related to dataflow is reactive programming, in which programs automatically react to the propagation of changes. Interest in reactive programming has increased recently thanks to Microsoft’s Rx (Reactive Extensions) library and others.[86][87]

In this form, reactive programming has significant parallels with several of the technologies we’ve covered, including Storm’s topologies and those based on message passing, such as actors and CSP.

Functional Reactive Programming

Functional reactive programming is a type of reactive programming that extends functional programming by explicitly modeling time. Elm runs in the browser and implements a concurrent version of FRP.[88] Like core.async, it provides a means to avoid the callback hell associated with handling events. Elm is one of the languages covered in the next book in this series, Seven More Languages in Seven Weeks [TDMD14].

Grid Computing

Grid computing is a very loosely coupled approach to building a distributed cluster. Elements of a grid are typically very heterogeneous and geographically distributed, potentially even joining and leaving the grid on an ad hoc basis.

The best known example of grid computing is probably the SETI@Home project, which allows anyone to donate computing power to a number of projects.[89]

Tuple Spaces

A tuple space is a form of distributed associative memory that can be used to implement interprocess communication. Tuple spaces were first introduced in the Linda coordination language (which, incidentally, was the subject of my PhD thesis back in the early 1990s), and there are several tuple space-based systems under active development.[90][91][92]

Over to You

I’m a car nut, so the metaphors I’ve used at the start of each chapter have all been automotive. Like vehicles, programming problems come in a huge range of shapes and sizes. Whether you work on the computing equivalent of a lightweight bespoke racer, a mass-produced family sedan, or a heavy truck, the one thing I can say with confidence is that parallelism and concurrency will be increasingly important.

It’s my sincere hope that, whether or not you use any of them directly, the different approaches and technologies we’ve seen over the last seven weeks will inspire you to tackle your future projects with confidence. Drive (thread-)safely!

Footnotes

[81]

http://www.cilkplus.org

[82]

http://docs.oracle.com/javase/8/docs/api/java/util/concurrent/ForkJoinPool.html

[83]

http://mozart.github.io

[84]

http://en.wikipedia.org/wiki/VHDL

[85]

http://en.wikipedia.org/wiki/Verilog

[86]

https://rx.codeplex.com

[87]

https://github.com/Netflix/RxJava

[88]

http://elm-lang.org

[89]

http://setiathome.ssl.berkeley.edu

[90]

http://en.wikipedia.org/wiki/Linda_(coordination_language)

[91]

http://river.apache.org/

[92]

https://github.com/vjoel/tupelo