Calculating cumulative binomial distributions - Introduction - F# Deep Dives (2015)

F# Deep Dives (2015)

Part 1. Introduction

In chapter 1, we looked at a number of business problems that people regularly face in the industry, and we discussed how F# can help you overcome those problems. But when professional software developers look at functional-first programming for the first time, they sometimes say that it looks difficult. In the first part of the book, we’re going to look at a couple of examples that demonstrate that nothing could be further from the truth.

Functional-first programming may appear unfamiliar at first, but the ideas behind it are simple. The following two chapters demonstrate this well. In chapter 2, Kit Eason explains how to implement one mathematical task that his team faced when building a life-expectancy modeling system. You’ll see that you can often solve problems by composing a couple of functions together in an easy way. At the same time, we’ll begin diving into solid software engineering and unit testing.

Chapter 3 also doesn’t require deep understanding of F#. We’ll look at implementing a parser for the Markdown format in an extensible way. You’ll see that by following a few basic principles, you can write code that is reusable and extensible. In this chapter, you’ll learn about pattern matching in F#: one of the fundamental pieces that make it easy to express complex logic with simple code.

As mentioned in the preface, this book isn’t an introduction to F#, but chapters 2 and 3, together with the appendix, cover some basic F# language features that may be useful if you haven’t written much F# before. At the same time, these two chapters describe solutions to real-world problems. The takeaway is that you can achieve interesting things with just a little F# knowledge. Take this as encouragement to start playing with F# and do your next project with it!

Chapter 2. Calculating cumulative binomial distributions

Kit Eason

As you learned in chapter 1, many development projects involve taking financial or statistical models that are specified in some other way (such as Microsoft Excel or MathWorks’ MATLAB) and translating them into robust, scalable systems. This task can be arduous. This chapter is a detailed case study of how my colleagues and I used F# to solve a problem we encountered during this process. I could’ve chosen from many other examples from the same project, which involved a substantial F# calculation engine. But we went up several blind alleys to solve this one, and that gives me the opportunity to show you how a feature can be brought to market promptly using an experiment-and-fail-fast approach—something at which F# excels.

The case study involves applying statistical tests to a life-expectancy model. Actuaries often use such mathematical models to evaluate the future liabilities of a pension scheme (see, for example, figure 1). If the model suggests that the life expectancies of the members will be long, then the liabilities of the scheme will be higher, because it will be paying out to pensions for a longer period of time. Conversely, if the model suggests that life expectancies will be relatively short, the liabilities are lower. Therefore, the accuracy of the model is crucial to the projected solvency of the scheme: too few assets and long projected lifespans mean disaster! For this reason, actuaries periodically examine the fit between the models they’ve been using and the actuarial experience of pension scheme members. Have members lived longer than we predicted over the past few years or not as long?

Figure 1. Pension scheme life expectancy modeling process

One way to measure the quality of fit between the model and the experience, as it’s called, is to calculate a cumulative binomial distribution. Because this is a book about F# and not actuarial mathematics, we won’t delve any further into the significance of the test. As a developer, you need to ensure that the calculation is performed correctly.

Our case study starts at the point where the binomial distribution test (and many other statistical tests) had already been implemented in a fairly large-scale project. We’d used the Math.NET Binomial.CumulativeDistribution method, and it worked perfectly for small datasets. But for any substantial dataset, we were getting a numerical overflow. So the task at hand, and the subject of this chapter, is to reimplement the calculation of this distribution in our own code.

Implementing the formula

In this section, you’ll take the first steps in implementing the formula for cumulative binomial distribution. In doing so, you’ll learn general lessons about coding calculations in F#, including how to perform iterative accumulation without loops, and how and when to use aliases and custom operators to simplify code.

The formula for cumulative binomial distribution

Figure 2 shows the mathematical formula for calculating cumulative binomial distribution. Although you don’t need to understand the statistical subtleties, you do need to understand the notation used in the formula, and the next subsections explore this topic. If you’re already familiar with the notation involved, skip to the section “Coding the formula.”

Figure 2. Formula for cumulative binomial distribution

Factorials

The exclamation marks in the formula mean factorial—that is, take all the numbers between 1 and the number in question (inclusive), and multiply them together. For instance, 2! is 2, 3! is 6 (1 × 2 × 3) and 4! is 24 (1 × 2 × 3 × 4). The interesting thing about factorials is that they get bigger rather quickly as n increases. For example, 10! is 3,628,800.

Calculating factorials is a great F# 101 example. The following listing shows the same calculation done in three different styles.

Listing 1. Three approaches to calculating the n factorial

let FactWithMutation n =

let mutable result = 1

for i in 1..n do

result <- result * i

result

let rec FactWithRecursion n =

match n with

| 0 -> 1

| _ -> n * FactWithRecursion (n-1)

let FactWithFold n =

[1..n] |> Seq.reduce (*)

Naturally, being an F# developer, I consider the third approach to be the most elegant, but that’s a matter of personal style. All three work fine for modest values of n. (For clarity, I’ve omitted checks for unusual inputs like n < 1, which you’d need to handle correctly in practice.) In the spirit of “not optimizing prematurely,” we won’t worry at this point about which is fastest.

Capital Sigma

Going back to the formula, you have the construct shown in figure 3. As an imperative programmer (for example, in “classic” C#), you’d mentally translate this construct into a for loop and an accumulator variable. As a LINQ or F# programmer, you might be more inclined to interpret this as a .Sum() (LINQ) or Seq.sumBy (F#) operation.

Figure 3. Capital sigma

Superscripts and multiplication

The superscripts (as in pk) mean that the first term is raised to the power of the second. Raising to a power can be expressed in F# using the ** operator. Finally, terms that are joined together without symbols between them are multiplied.

Coding the formula

You have all the mathematics you need to understand the formula. On to the code. I felt that a good way for you to get started would be to code a naïve version of the formula in F#, get some passing tests working for modest input values, and then investigate what happened for larger inputs.

Listing 2. First attempt at coding the cumulative binomial distribution

In this listing you nest the Seq.reduce version of the Fact function into the Binomial function and then improve it a bit to cope with inputs less than 1.

I have mixed feelings about nesting inner functions like this. On the plus side, you can simplify the logic of an algorithm with little helper functions without polluting the wider namespace with items that are only locally relevant. On the minus side, unit tests can’t see these inner functions. For this reason, it’s wise to keep them simple; and if they get complicated, you need to refactor them out to a location where they can be seen by unit-testing frameworks. I find that in practice I often use nested functions, but I always keep them simple. Or if I don’t keep them simple, I soon come to regret it.

You’ll also notice that you’re creating a couple of simple aliases. The first of these is an alias for the Fact function that we just talked about. The alias is !, which allows you to nearly replicate the n! syntax used in the original formula, albeit as a prefix.

User-defined operators are a construct that F# developers view with considerable caution. I see one big advantage and one big disadvantage:

Advantage

Disadvantage

As in the example we’re developing here, custom operators can help you keep your code concise and perhaps reflect some other notation system, such as the ! operator.

They can easily end up obfuscating your code. This problem has been relieved somewhat for users of Visual Studio 2013, which supports tooltips and “navigate to definition” for custom operators.

I tend to keep custom operators local to the point at which they’re used so it’s obvious from the context what’s going on. I don’t define many custom operators that are widely scoped across a substantial codebase. Having said that, there are examples, most notably WebSharper (www.websharper.com), that use custom operators extensively and to great effect. The exclamation point (!) is already used in F# to express dereferencing, so perhaps in this case it might have been better to use a different symbol for the factorial operator—one that doesn’t mean anything else, such as ~~.

The next alias is f for float. It seems like a trivial alias, but it reflects one of the few real annoyances of using F# in a numerical and mathematical context: the fact that you can’t freely intermix floating-point and integer values in operations such as addition and multiplication. Although you can understand why the language is designed this way (it prevents many bugs and simplifies the type inference process), in practical terms it’s an impediment when coding calculations such as the one we’re dealing with here. Aliasing the float cast operator to f mitigates this somewhat. For fairly obvious reasons related to naming, it’s best to do this only locally.

An alternative to casting to float in the main calculation would have been to get the Fact function to return a floating-point value by casting at the end of its body. This approach would have worked, but it feels a little wrong to return a floating-point value from what’s essentially an integer operation.

With the functions, aliases, and operators in place, the final two lines of the code end up reflecting, reasonably closely, the logic and even some of the notation of the original formula:

[0..x]

|> Seq.sumBy (fun k -> (f(!n) /

(f(!k * !(n - k))) * (p**(f(k)) * ((1. - p)**f(n-k)))))

Figure 4 shows how the original formula maps to the code.

Figure 4. How the formula maps to the code

At this point, you’ve coded the formula without using loops, instead using higher-order functions such as Seq.reduce and Seq.sumBy. You’ve also used aliases and custom operators to make the code more concise—although you should keep in mind that custom operators in particular are to be used with caution and restraint.

You could now integrate this code into the wider project, replacing the Math.NET Binomial.CumulativeDistribution method that was failing. But because you’re dealing with code you wrote yourself, it’s probably time to think about adding some unit tests.

Adding tests

All significant code should have unit tests. Fortunately, unit testing in F# is a straightforward, even pleasant process. In this section you’ll add unit tests to the Binomial function to prove that it produces correct results across the range of values that the life expectancy project will be processing. In doing so, you’ll learn how to use the open source libraries NUnit and FsUnit to simplify the unit-testing process, and you’ll see how to use Excel to generate test cases. You can also find more about testing F# code in chapter 12.

Adding NUnit and FsUnit

NUnit is a widely used unit-testing framework for Microsoft .NET. FsUnit builds on NUnit to help you write unit tests in the most F#-idiomatic way possible.

If you’re using Visual Studio, you can use NuGet to add FsUnit to your project. In Visual Studio, select Project > Manage NuGet Packages. You can search online for fsunit and install it (figure 5). NuGet is aware that FsUnit requires NUnit and will install it as well.

Figure 5. Finding and installing FsUnit using NuGet

You’ll also need some kind of test runner. This requirement isn’t specific to F#—you’ll need the same thing to run NUnit tests for a C# project. I favor NCrunch (www.ncrunch.net) or TestDriven.Net (www.testdriven.net). Another option is the NUnit Test Adapter (installable via the Visual Studio Gallery).

To use the attributes and methods that FsUnit and NUnit provide, you must open the appropriate namespace above your tests:

open NUnit.Framework

open FsUnit

Having set up your test frameworks, the next step is to generate some test cases.

Generating test cases in Excel

For numerical programming, I find it useful to generate test cases in Excel. This approach has several advantages. Excel is an extensively used and trusted application. Also, when coding a mathematical or numerical system, you’ll often find that either your product is replacing a system that originally consisted of Excel spreadsheets, or whoever tests it uses Excel results as a comparison. Finally, you can make Excel do the work of getting your test case syntax right.

Excel already implements a BINOM.DIST function, which replicates the calculation you’re trying to use here. You just need to set the final parameter of the function to TRUE because the distribution you’re calculating is the cumulative version. So, start off your Excel spreadsheet with a few possible values of x, n, and p:

x

n

p

1

10

0.5

2

10

0.5

3

10

0.5

4

10

0.5

5

10

0.5

Then add a column for the BINOM.DIST formula, and enter the formula itself:

=BINOM.DIST(A2,B2,C2, TRUE)

Be sure to copy the formula for all your permutations.

As I mentioned, you can have Excel write the syntax of the test case attributes. This is a matter of concatenating some string literals and the relevant cell values together. Here’s the formula for doing this:

="[<TestCase("&A2&","&B2&","&C2&","&D2&")>]"

This formula generates nicely formatted test case attributes like the following, which you can paste straight into your F# code:

[<TestCase(1,10,0.5,0.0107421875)>]

Now that you have a test case, you need a test! The following one is nice and minimal—and the code passes.

Listing 3. Unit test

Flushed with your early success, add a few more cases to Excel and paste them into the code:

[<TestCase(1,10,0.5,0.0107421875)>]

[<TestCase(2,10,0.5,0.0546875)>]

[<TestCase(3,10,0.5,0.171875)>]

[<TestCase(4,10,0.5,0.376953125)>]

[<TestCase(5,10,0.5,0.623046875)>]

Four more tests, four more passes!

But when you start to permute the second parameter, you meet with disaster:

Just when you were beginning to get confident! And these are modest-size inputs. In the real system that uses this code, the n input could easily be in the tens of thousands and beyond.

I’m sure at this point anyone who has had anything to do with factorial calculations will be screaming at the page—and sure enough, when you investigate a bit further, the simple Fact inner function is the culprit. Here are the values it produces for the first 15 positive integers, together with results produced by Wolfram|Alpha (www.wolframalpha.com):

n

Fact n

Wolfram|Alpha

0

1

1

1

1

1

2

2

2

3

6

6

4

24

24

5

120

120

6

720

720

7

5,040

5,040

8

40,320

40,320

9

362,880

362,880

10

3,628,800

3,628,800

11

39,916,800

39,916,800

12

479,001,600

479,001,600

13

1,932,053,504

6,227,020,800

14

1,278,945,280

87,178,291,200

15

2,004,310,016

1,307,674,368,000

The Fact function runs out of steam for inputs above 12. Not a great performance given the size of inputs it will be facing in real life. How are you to address this?

Exposing the Fact function to unit testing

Your next step is something I hinted at much earlier. If there’s a nested function that has any risk of failure, you should consider moving it out so it’s accessible to unit testing. Having de-nested Fact, you can generate the tests from the Wolfram|Alpha results earlier.

Listing 4. Unit tests for larger inputs, from Wolfram|Alpha

The results immediately give you a clue about what’s going wrong—the test cases for 13, 14, and 15 don’t even compile:

Error 1 This number is outside the allowable range for 32-bit signed

integers C:\Code\VS2013\BookFactorial\BookFactorial\Library1.fs 56

19 BookFactorial

Clearly you’re dealing with values that are too big to be held as integers.

Returning large integers from the Fact function

What happens if you alter the Fact function to return a large integer, as embodied in .NET’s BigInteger type? You need to tweak the tests a bit because big integers can’t be represented directly in [<TestCase()>] attributes. You’ll represent them instead as strings and parse them in the body of the test.

Listing 5. Expressing big integers as strings

You can also amend the Fact function to return a big integer.

Listing 6. Version of the Fact function that returns a big integer

let Fact n =

match n with

| _ when n <= 1 -> 1I

| _ -> [1I..System.Numerics.BigInteger(n)]

|> Seq.reduce (*)

Note that although the range 1..n doesn’t need to consist of big integers (n won’t be that large), making it do so using the I suffix and the BigInteger converter forces the BigInteger implementation of the multiplication operator to be used. This in turn causes Seq.reduce to return aBigInteger result.

With that change in place, the extra tests pass, so at least you’ve extended the reach of Fact a little. Although its result rapidly gets too large to represent in a test case even as a string, you can use F# Interactive to prove that it runs:

val Fact : n:int -> System.Numerics.BigInteger

> #time;;

--> Timing now on

> Fact 100000;;

Real: 00:00:16.844, CPU: 00:00:18.953, GC gen0: 1373, gen1: 161, gen2: 160

val it : System.Numerics.BigInteger =

2824229407960347874293421578024535518477494926091224850578918086542977

950901063017872551771...

{IsEven = true;

IsOne = false;

IsPowerOfTwo = false;

IsZero = false;

Sign = 1;}

>

For the moment let’s gloss over the fact that this takes a few seconds, and a lot of garbage-collection activity, to run. It’ll be faster in production, right?

Now let’s return to the Binomial function and adjust it to use the new Fact function. At the moment, Binomial doesn’t compile, because Fact is producing big integers that can’t be freely multiplied, divided, and so forth with other types.

Processing large integers in the Binomial function

You’re going to need a fair bit more casting to even begin to make the big integer approach work, which means reluctantly moving away from the style where you try to calculate the whole formula in a single line. Instead, you’ll calculate sections of the formula separately. The following listing is my rough-and-ready attempt to do this.

Listing 7. Version of the Binomial function that supports the BigInteger type

If you paste this code into your editor, you’ll see that it doesn’t compile. The message you get relates to the final line (“The type float is not compatible with the type System.Numerics.BigInteger”), but if you hover around a bit you’ll find that the real problem occurs earlier : the valueterm1over2 is a big integer. The mathematics of the situation require you to have a floating-point value here, but the framework doesn’t have a way of directly dividing one big integer by another to produce a floating-point value. The nearest you can get is BigRational (available as part of Math.NET Numerics), which will give you a genuine result for the division of two big integers. But this is put in rational form, and there’s no straightforward way to recover a floating-point number from the rational.

We could go down a few other paths here. One of these, suggested by a colleague, is to add up the logarithms of 1..n and take the exponential of the resulting sum (see the next listing). This code works well and is fast—but for n greater than 170, it produces a result of infinity.

Listing 8. Calculating factorials using logarithms

let LogFact n =

if (n<1) then

1.0

else

[|1..n|] |> Seq.map float |> Seq.map log |> Seq.sum |> exp

Time for a rethink

Up to now, we’ve taken the approach of blindly translating the original, canonical formula into code. Although the final results are modest numbers, following the formula means calculating intermediate results that are somewhat colossal. For example, if your calculation requires you to calculate 100! (9.332621544e+157), this is a number that greatly exceeds the number of protons in the universe: a mere 1080. You can play around with the mechanics all you like, but calculations involving numbers this big are always slow and unmanageable.

At this point I often find myself having an imaginary conversation with that great scourge of the software developer, the Astute User. Here’s how the conversation might go:

Me: “I’m not sure this is even possible. The intermediate results we have to calculate are gigantic.”

Astute user: “Then how is Excel doing this? And that’s presumably coded in C++. You’re telling me your fancy new language can’t do something that Excel has been doing (fast) for years.”

Me: “Umm.”

An alternative approach

After I had that conversation with myself, I returned to my favorite search engine and looked for “cumulative binomial distribution Excel overflow.” I got lucky—there’s a detailed blog post on that topic ats http://support.microsoft.com/kb/827459.

This blog post outlines how earlier versions of Excel hit this overflow problem when calculating cumulative binomial distributions—and, more important, how it was solved. Don’t worry too much about the mathematical justification for the technique; instead let’s focus on the pseudocode that Microsoft was good enough to publish. Here it is in full.

Listing 9. Excel’s algorithm

Implementing the Excel algorithm

I’ve sent you up some blind alleys already, so you need to create a systematic plan for implementing this algorithm:

1. Extend the range of your unit tests to reflect the kind of population and sample sizes your code will be faced with in practice.

2. Blindly translate the published pseudocode, warts and all, into F#, and get the tests passing. I say this because the pseudocode plainly has considerable room for refactoring. But if you start with the literal-minded implementation, you won’t saddle yourself with the dual tasks of both interpreting the pseudocode and refactoring it in one pass.

3. Refactor the implementation into reasonably elegant F# code, ensuring that the tests continue to pass.

For step 1 of this plan, the following listing contains a somewhat extended set of tests that probe the response of the algorithm to some high values of n.

Listing 10. Test cases for some larger inputs

And for step 2, the next listing presents a deliberately simple-minded implementation of the pseudocode.

Listing 11. Literal implementation of the Excel pseudocode

Well-written pseudocode often translates directly into F#, and there’s something to be said for leaving the code at that. But before you make any refactoring decisions, you’d better check your tests.

They don’t all quite pass, but they’re close:

Test 'ExcelAlgorithm+Given the Binomial function.the result is calculated

correctly<Double>(4,10,0.5d,0.376953125d)' failed:

Expected: 0.376953125d

But was: 0.37695312500000006d

at FsUnit.TopLevelOperators.should[a,a](FSharpFunc`2 f, a x, Object y)

Let’s address that slight floating-point accuracy error. It’s common in this kind of coding to accept a degree of inaccuracy; and certainly in the actuarial domain, where so much else is approximated, an error in the 17th decimal place is acceptable. Using FsUnit and NUnit, it’s easy to change the test to accept a degree of approximation, using the equalWithin function:

member t.``the result is calculated correctly``(x, n, p, expected) =

let errorMargin = 10E-9

let actual = Binomial x n p

actual |> should (equalWithin errorMargin) expected

At this point, you could argue, you’re done! You’ve got an algorithm that works, that’s quick, that’s based on a published and reputable source, that passes all tests, and that will integrate seamlessly into the project codebase. I’d definitely check in my code when it reached this stage!

But it’s not polished yet. In the next section, you’ll do some refactoring work to make your code idiomatic and eliminate such nasties as repetition.

Refactoring

Any astute reviewer of your code as it stands would want to address at least the following issues:

· You should always calculate the cumulative binomial distribution, so the cumulative flag and associated logic should be eliminated.

· The first and the second while loops have a suspicious amount in common. Would it be possible to eliminate this repetition?

· There are a lot of mutable values. There’s nothing inherently wrong with this, especially if, as in this case, the mutation is all local. But it would at least be interesting to see what happens to the logic when these are eliminated. (Doing so can often lead to a deeper insight into the algorithm you’re implementing.)

· In a similar vein, functional programmers often want to explore the replacement of loops with recursion, so you should see how that plays out.

Let’s take these issues one at a time.

Eliminating the cumulative flag

The code includes the cumulative flag because it’s a literal-minded implementation of the pseudocode. You don’t need it, because the requirement is always for the cumulative version of the binomial calculation. Removing it is a fairly simple exercise, but it still manages to eliminate several lines.

Listing 12. Removing the cumulative flag

let Binomial x n p =

let f = float

let mutable totalUnscaledP = 0.

let essentiallyZero = 10.E-12

let m = (f(n) * p) |> truncate |> int

totalUnscaledP <- totalUnscaledP + 1.

let mutable unscaledResult =

if m <= x then 1.

else 0.

let mutable previousValue = 1.

let mutable isDone = false

let mutable k = m + 1

while not isDone && k <= n do

let currentValue = previousValue * f(n - k + 1) * p / (f(k) * (1. - p))

totalUnscaledP <- totalUnscaledP + currentValue

if k <= x then unscaledResult <- unscaledResult + currentValue

if currentValue <= essentiallyZero then isDone <- true

previousValue <- currentValue

k <- k + 1

previousValue <- 1.

isDone <- false

k <- m - 1

while not isDone && k >= 0 do

let currentValue = previousValue * f(k + 1) * (1. - p) / (f(n - k) * p)

totalUnscaledP <- totalUnscaledP + currentValue

if k <= x then unscaledResult <- unscaledResult + currentValue

if currentValue <= essentiallyZero then isDone <- true

previousValue <- currentValue

k <- k - 1

unscaledResult / totalUnscaledP

Identifying common functions between the two while loops

The two while loops in the code have a lot in common. Perhaps this repetition can be eliminated so the code is closer to the ideal of Don’t Repeat Yourself (DRY).

To begin, four points in the two while loops can be pulled out into separate functions. Currently the refactoring is something of a leap of faith, because you may end up making the code longer. But by extracting these lines as functions, you pave the way for later, more concise refactoring.

Here are the four functions you can extract. This line

let currentValue = previousValue * f(n - k + 1) * p / (f(k) * (1. - p))

and this line

let currentValue = previousValue * f(k + 1) * (1. - p) / (f(n - k) * p)

become this function:

let CalcCurrent value k =

if k > m then

value * float(n - k + 1) * p / (float(k) * (1. - p))

else

value * float(k + 1) * (1. - p) / (float(n - k) * p)

This line

if k <= x then unscaledResult <- unscaledResult + currentValue

becomes this function:

let CalcUnscaled x k acc increment =

if k <= x then acc + increment

else acc

This line

if currentValue <= essentiallyZero then isDone <- true

becomes this function:

let Done current = current <= essentiallyZero

And finally this line

k <- k + 1

and this line

k <- k - 1

become this function:

let NextK k = if k > m then k + 1 else k - 1

This leaves the loops looking like the next listing.

Listing 13. Pulling out some common functions

while not isDone && k <= n do

let currentValue = CalcCurrent previousValue k

totalUnscaledP <- totalUnscaledP + currentValue

unscaledResult <- CalcUnscaled x k unscaledResult currentValue

isDone <- Done currentValue

previousValue <- currentValue

k <- NextK k

while not isDone && k >= 0 do

let currentValue = CalcCurrent previousValue k

totalUnscaledP <- totalUnscaledP + currentValue

unscaledResult <- CalcUnscaled x k unscaledResult currentValue

isDone <- Done currentValue

previousValue <- currentValue

k <- NextK k

It’s important not to get carried away and try to unify the two loops at this point. It’s better to run tests (and possibly even check in or at least shelve the changes) so that you can have confidence in the newly extracted functions before changing anything else.

Eliminating duplication, mutables, and loops

The tests still pass, so let’s move on to trying to unify the loops. At this stage you can’t avoid tackling three problems at once, because these are different faces of the same issue:

· The code duplication represented by the two while loops

· The use of mutable values

· The use of loops rather than recursion

It’s probably worth reiterating at this point that there is nothing inherently wrong with the use of (local) mutable values or loops, but their presence is often an indicator that a more fundamental manifestation of the algorithm is available. Whether you listen to this cue is an interesting balance of performance, personal style, and readability. There’s no general right answer.

What still differs between the two while loops? Not much: they run for different ranges of k, a couple of flags and values are reset between the first and second loops, and two values set in the first loop are updated further in the second. Let’s write a function that’s equivalent to both loops and worry about how to invoke for both ranges later. The following listing shows a recursive function that embodies the logic from both manifestations of the loop.

Listing 14. Calculate function that replaces both loops

let rec Calculate k totalUnscaledProbability previous unscaled =

let current = CalcCurrent previous k

let totalUnscaledProbability' = totalUnscaledProbability + current

let unscaled' = CalcUnscaled x k unscaled current

if Done current then

unscaled', totalUnscaledProbability'

else

Calculate (NextK k) totalUnscaledProbability' current unscaled'

Listing 14 still uses the four functions you pulled out of the loops—these functions remain unchanged—and passes those values that need to change between iterations as parameters.

Unfortunately, you do need to accept that the new recursive function must be called twice. This is because the function has to be calculated for two different ranges, with the inputs of the second call derived from the outputs of the first. Here’s how you can tie it together.

Listing 15. Calling Calculate for both ranges of k

let InitialUnscaled = if (m <= x) then 1. else 0.

let UnscaledResultAboveM, TotalUnscaledProbabilityAboveM =

Calculate (m+1) 1. 1. InitialUnscaled

let UnscaledResult, TotalUnscaledProbability =

Calculate (m-1) TotalUnscaledProbabilityAboveM 1. UnscaledResultAboveM

UnscaledResult / TotalUnscaledProbability

When you run the unit tests against this logic, they pass! At this point I’d regard the code as production ready and would check it in and move on to the next problem.

Summary

Cumulative binomial distribution is a statistical formula that at first sight seems comparatively straightforward to code. But the intermediate factorial values you need to calculate using the canonical formula make a straightforward algorithm impractical. Fortunately, F# makes it easy for you to “fail fast” in your attempt to code the algorithm in a naïve way, and it also allows you to express a more appropriate algorithm easily and concisely.

In this chapter you learned a little about creating custom operators—which should be used only with care and restraint—and about aliasing functions locally for greater conciseness. You installed FsUnit and NUnit to facilitate unit testing, and you used FsUnit’s equalWithin function to allow a margin of approximation for your floating-point tests. You saw how unit test cases can be generated easily in Excel and pasted straight into code. If you’re interested in testing, chapter 12 discusses the topic in depth.

F# is a great language for implementing calculations. But as F# exponents are fond of saying, it’s not a magic bullet. You must combine exploratory coding with the greater rigor of unit testing to ensure production-ready results.

About the author

Kit Eason is a senior software developer for Adbrain, a data intelligence company. He has developed software for automotive engineering companies, universities, energy suppliers, banks, and financial services corporations. Kit has also run many educational and training courses, including evening classes, women’s electronics classes, and CNC training for engineers. Kit is a regular speaker at Skills-Matter in London and at various conferences and meet-ups. He works mainly in F# and C# and speaks on F# and related matters. You can find him on Twitter @kitlovesfsharp and via his blog: www.kiteason.com.