Data Push Apps with HTML5 SSE (2014)
Chapter 3. A Delightfully Realistic Data Push Application
This chapter will build upon the code we created in the previous chapter to implement a realistic (warts and all) data push application (see the next section for the problem domain that has been chosen). For this chapter and the following two, the code we build will still only work in browsers with SSE support; then in Chapters 6 and 7, I will show how you can adapt both the frontend and backend to work with older browsers.
NOTE
Because this chapter is SSE only, if you are testing on an Android device you need to install either Firefox for Android or Chrome for Android. If you are testing on Windows, install Firefox, Chrome, Safari, or Opera. C’mon, I’m sure you already have at least one of those installed—you told me you were a professional developer!
This chapter contains a bit of backend PHP code that may not feel relevant to your own application. I suggest you at least skim it, because you will see it built upon in later chapters and it shows, step-by-step, one approach for unit testing and functional testing of data push systems.
Our Problem Domain
The problem domain I will cover in this and the next few chapters is from the finance industry. It has its own jargon—almost as bad as the software industry—so I will introduce some of the terminology you will meet, and just enough background information to help you understand some of the design decisions.
The job of our application is to broadcast FX bid/ask quotes from a bank or broker to traders. The first bit of jargon is FX. This stands for Foreign eXchange; in other words, the buying and selling of currencies. It is a global decentralized market. Yikes, more jargon. A decentralized market means there is no single place where currencies are traded. Compare this to a stock exchange, where there is a single place to buy and sell shares in a company. (That is not strictly true; large companies might list their shares on two or three stock exchanges.)
The broker is a business. But it doesn’t try to make money off of speculating about currency movements the way the traders do. Instead, brokers make their money off of the spread (and sometimes a commission as well). The spread is the difference between the bid and the ask price. The bid price is the lower of the two prices: it is how much the broker is willing to buy the currency for. It is how much you get if you choose to sell. The ask price is slightly higher and is how much the broker is willing to sell for. It is how much you have to pay if you want to buy.
The FX market is global. The New York stock market is just open during business hours in the New York time zone. But people want to buy and sell currencies all throughout the day, all around the world. It is a 24/5 market. By convention it opens at 5 p.m. on Sunday, New York local time (which is the start of the business week in New Zealand), and closes at 5 p.m. on Friday, again New York time.
The major currencies that are traded, with their abbreviations, are US dollar (USD), the euro (EUR), Japanese yen (JPY), British pound (GBP), Australian dollar (AUD), Canadian dollar (CAD), and the Swiss franc (CHF). Typically, an FX broker will be listing between 6 and 40 FX pairs (also called symbols).
What does all this mean to us?
§ We have to send two prices from the server to the client, along with a timestamp.
§ We need to do this for more than one currency pair.
§ We have to do it with minimal latency (sudden movements and stale prices will cost our traders money).
§ Our application will be running for 120 hours in a row, then will have nothing to do for 48 hours, before the cycle repeats.
The Backend
The backend demonstrated in this chapter is more complicated than the one shown in Chapter 2. We want multiple data feeds (aka symbols); call it multiplexing when you need to impress your boss. We want it to be used for repeatable tests, we want realistic-looking data, and we want it to be in sync for each client that connects. All without using a database. Those are quite a few demands! But it can be done. We will use a few techniques:
§ Use a one-line JSON protocol.
§ Use a random seed. A given random seed will always give the same stream of data. In our case it will give a completely predictable set of ticks for each symbol.
§ Allow the random seed to be specified by the client. This allows a client to request the same test data over and over.
§ Add together cycles of different periods, with a bit of random noise added on. This makes the data look realistic. (This book is not the place for a discussion of random walks and efficient market theory. Find a passing economist if you are interested in that subject.)
§ Measure clock drift and adjust for it.
DESIGN FOR TESTABILITY
There are two ways to design any system, with regard to testing. The first is with no consideration for testability. The second is to make it easy to test; but this does not usually come for free, because it often requires adding extra variables and extra functions.
However, a system that has been designed for testability is not just easier to test, it is faster to test. In extreme cases it can be the difference between calling a getter (completing a test in a matter of milliseconds), and a horribly complicated solution involving screen scraping and OCR that takes seconds to run. That has a knock-on effect: tests that complete quickly are run more often, bugs are found sooner and in less time, so your product is delivered sooner and is of better quality. If your test suite can be run every 5 minutes, then when it breaks, you instantly know which line of code broke it. Contrast this with a test suite that is so slow it can only be run on the weekend. You come in Monday morning and it might take you until Tuesday to work out which of your changes last week introduced the problem. (The complex testing solutions also tend to be fragile—sensitive to minor changes in layout, for instance.)
In our case, our system spits out random (okay, pseudorandom) data. Design for Testability here means taking control of the random sequence, so it can be exactly repeated if the need arises. This is a testing design pattern called Parameter Injection.
To complicate things, there might not just be memory and CPU involved, but also a network—so runtime could vary quite a lot from test run to test run, and we put timestamps to millisecond accuracy in the JSON we send back. Therefore, we need to find a way to make sure the timestamps are repeatable. How we tackle this is covered in the main text. (If we didn’t do this, our choice would just be to range-check the fields in the data we get: make sure each timestamp is formatted correctly and is later than the previous timestamp, make sure the prices are between 95.00 and 105.00, etc. This is better than nothing, but could lead to missing subtle bugs and regressions.)
The first design decision we will make is to pass JSON strings as the message. We’ll send back exactly one JSON string per line, and one per message. This is a reasonable design decision anyway, because JSON is flexible and allows hierarchical data, but as you will see in later chapters the one-line-per-message decision makes adapting our code to non-supporting browsers easier.
NOTE
If you read Our Problem Domain on the FX industry, you will know we are broadcasting both bid and ask quotes. I chose to do this deliberately, rather than just send a single price, because it makes things harder. If the server just has a single price we’d be tempted to make simpler design decisions. Then we would need to do lots of refactoring if we decided to add a second value. By using two pieces of data, it will be easy to change our code to support N pieces of data; and it will still work fine even if we only have a single value.
Figure 3-1 shows the high-level view of what the backend’s main loop (a deliberate infinite loop, just as in Chapter 2) will be doing.
Figure 3-1. Backend’s main loop
Before we enter that loop we have some initialization steps: define a class, create our test symbols, process client input parameters, and set the Content-Type header. Here is our first draft of the script, using hardcoded prices (where the only initialization step we need at this stage is setting the header):
<?php
header("Content-Type: text/event-stream");
while(true){
$sleepSecs = mt_rand(250,500)/1000.0;
usleep( $sleepSecs * 1000000 );
$d=array(
"timestamp" => gmdate("Y-m-d H:i:s"),
"symbol" => "EUR/USD",
"bid" => 1.303,
"ask" => 1.304,
);
echo "data:".json_encode($d)."\n\n";
@ob_flush();@flush();
}
Rather than try to debug it over an SSE connection, I suggest you first run it from the command line:
php fx_server.hardcoded.php
That is one of the beauties of the SSE protocol: it is a simple text protocol. Press Ctrl-C to stop it. You should have seen output like this:
data:{"timestamp":"2014-02-28 06:09:03","symbol":"EUR\/USD","bid":1.303, ↵
"ask":1.304}
data:{"timestamp":"2014-02-28 06:09:04","symbol":"EUR\/USD","bid":1.303, ↵
"ask":1.304}
data:{"timestamp":"2014-02-28 06:09:08","symbol":"EUR\/USD","bid":1.303, ↵
"ask":1.304}
Note that the forward slash in EUR/USD gets escaped in the JSON. Also, because of the call to gmdate those are GMT timestamps we see there. This is a good habit: always store and broadcast your data in GMT, and then adjust on the client if you want it shown in the user’s local time zone.
JSON/SSE PROTOCOL OVERHEAD
How much wastage is there in choosing JSON for all data transmission? For instance, how does the use of JSON compare with sending our data using a minimalist CSV encoding (data:2014-02-28 03:15:24,EUR/USD,1.303,1.304). And how much wastage is there in the SSE protocol itself?
The last question is easy: the SSE overhead is 6 bytes per message, the “data:” and the extra line break. This is compared to the fallback approaches we will look at in Chapters 6 and 7.
Our JSON string is longer than it needs to be; to make it readable I have chosen verbose names, but the JSON message could instead have looked like this:
data:{"t":"2014-02-28 06:09:03","s":"EUR\/USD","b":1.303,"a":1.304}
What about a binary protocol? Well, neither JavaScript nor SSE get on well with binary, but ignoring that, let’s have 4 bytes for the timestamp (though if you need milliseconds, or want it to work past 2030, you will end up using 8 bytes), 7 bytes plus a zero-terminator for the symbol, and 8 bytes each for bid/ask as doubles. That gives us 28 bytes (assuming end-of-record is implicit).Table 3-1 summarizes all that.
NOTE
Because we flush data immediately (to get minimal latency), you might want to also include the overhead of a TCP/IP packet and Ethernet frame around each message. That might be fair if you are comparing to a polling approach. For instance, if the pushed data averages one message per second, there will be 59 times more TCP/IP packets compared to a once-every-60-second-poll. Possibly even more if WiFi and mobile networks are involved. But if polling (and especially if long-polling, see Chapter 6), don’t forget to allow for the HTTP headers, in each direction, on each request. Remember cookies and auth headers get sent with every request, too.
As I mentioned in Chapter 1, if you want to make a useful comparison of two alternatives, in my opinion the best way is to build both approaches, and then benchmark each, under the most realistic load you can manage. Unless you are building an intranet application, realistic also means the server and the test clients should be in different data centers.
Table 3-1. Byte comparison of different data formats
Using SSE |
Using Fallbacks |
|
Binary |
34 |
28 |
CSV |
46 |
40 |
JSON-short |
69 |
63 |
JSON-readable |
86 |
80 |
Before you make decisions based on those numbers though, remember that SSE communication can, and should, be gzipped, and you can expect that the more compact your format, the less compression gzip can do.
Our FX data will be nice and regular, so you might be tempted to go with CSV instead of JSON. I am going to continue to use JSON because in other applications your data might not be so simple (JSON can cope with nested data structures) and because it makes development easier if we need to add another field. In fact, you will see a more complicated data structure being used as this application evolves. And I will stick with readable field names, to help us keep our sanity.
Our first draft, fx_server.hardcoded.php, implements two of the three parts of our high-level algorithm: it sleeps and it sends the data to the client. In the next section we will implement choosing the symbol and price instead of hardcoding them.
The Frontend
We are going to develop the backend a lot more, but now that we have the simplest possible server-side script, let’s create the simplest possible HTML page to go with it:
<!doctype html>
<html>
<head>
<meta charset="UTF-8">
<title>FX Client: latest prices</title>
</head>
<body>
<table border="1" cellpadding="4" cellspacing="0">
<tr><th>USD/JPY</th><th>EUR/USD</th><th>AUD/GBP</th></tr>
<tr><td id="USD/JPY"></td><td id="EUR/USD"></td><td id="AUD/GBP"></td></tr>
</table>
<script>
var es = new EventSource("fx_server.hardcoded.php");
es.addEventListener("message", function(e){
var d = JSON.parse(e.data);
document.getElementById(d.symbol).innerHTML = d.bid;
},false);
</script>
</body>
</html>
When you load that in a browser you will see a three-cell table, and the middle cell, labelled EUR/USD, will appear as 1.303. Then nothing. It looks as dull as dishwater, doesn’t it? But, behind the scenes, the server is actually sending the 1.303 over and over again. This frontend, basic though it is, will work with each of the improvements we are about to make to the backend.
If you followed along in Chapter 2, the first two lines of the JavaScript should look familiar. Create an EventSource object, specifying the server to connect to. Then assign a message event handler. e.data contains a string in JSON format, so the first line of our event handler is var d=JSON.parse(e.data);[13] to turn that into a JavaScript object.
NOTE
If the JSON data is bad, it will throw an exception. Starting in Chapter 5, we will wrap it in try and catch, as part of making the code production quality.
The other line of our event handler starts with document.getElementById(d.symbol), which finds the HTML table cell that has been marked with one of id="USD/JPY", id="EUR/USD", and id="AUD/GBP".[14] Then the second half of that line fills it with the bid price:.innerHTML=d.bid;.
We will come back and do more on the frontend, but now let’s go back and work on the backend some more.
Realistic, Repeatable, Random Data
Earlier we created a script that does repeatable data; now we have to make it random and realistic. The first problem with fx_server.hardcoded.php is that there is only a single symbol (currency pair); I want different symbols. Because each symbol has a lot in common and only the numbers will be different, I have created a class, FXPair, as shown in the following code. If PHP classes look unfamiliar, see Classes in PHP in Appendix C.
<?php
class FXPair{
/** The name of the FX pair */
private $symbol;
/** The mean bid price */
private $bid;
/** The spread. Add to $bid to get "ask" */
private $spread;
/** Accuracy to quote prices to */
private $decimalPlaces;
/** Number of seconds for one bigger cycle */
private $longCycle;
/** Number of seconds for the small cycle */
private $shortCycle;
/** Constructor */
public function __construct($symbol,$b,$s,$d,$c1,$c2){
$this->symbol = $symbol;
$this->bid = $b;
$this->spread = $s;
$this->decimalPlaces = $d;
$this->longCycle = $c1;
$this->shortCycle = $c2;
}
/** @param int $t Seconds since 1970 */
public function generate($t){
$bid = $this->bid;
$bid+= $this->spread * 100 *
sin( (360 / $this->longCycle) * (deg2rad($t % $this->longCycle)) );
$bid+= $this->spread * 30 *
sin( (360 / $this->shortCycle) *(deg2rad($t % $this->shortCycle)) );
$bid += (mt_rand(-1000,1000)/1000.0) * 10 * $this->spread;
$ask = $bid + $this->spread;
return array(
"timestamp"=>gmdate("Y-m-d H:i:s",$t),
"symbol"=>$this->symbol,
"bid"=>number_format($bid,$this->decimalPlaces),
"ask"=>number_format($ask,$this->decimalPlaces),
);
}
}
We have member values for bid, spread, and decimal places. For our purposes, bid stores the mean price: our values will fluctuate around this price. spread is the difference between the bid and ask prices (see Our Problem Domain). Why do we have a value to store the number of decimal places? By convention, currencies involving JPY (Japanese yen) are shown to three decimal places; others are shown to five decimal places.
We then have two more member variables: long_cycle and short_cycle. If you look at generate you will see these control the speed at which the price rises and falls. We use two cycles to make the cyclical behavior more interesting; the first, slower cycle has a weight of 100, and the second, shorter cycle has a relative weight of 30. In addition, we add in some random noise, with a weight of 10. Are you wondering about (mt_rand(-1000,1000)/1000.0)? PHP does not have a function for generating random floating point numbers. So we create a random integer between –1000 and +1000 (inclusive) and then divide by 1000 to turn it into a –1.000 to +1.000 random float. In each case, we multiply by the spread and by the weight.
NOTE
See Random Functions in Appendix C for why we use mt_rand, and how the random seed is set.
Finally, generate returns an associative array (aka an object in JavaScript, a dictionary in .NET, a map in C++) of the values. We use number_format to chop off extra decimal places. So, 98.1234545984 gets turned into 98.123.
Now how do we use this class? At the top of fx_server.seconds.php we create one object for each FX pair (EUR/USD appears twice because we want it to update twice as often):
$symbols = array(
new FXPair("EUR/USD", 1.3030, 0.0001, 5, 360, 47),
new FXPair("EUR/USD", 1.3030, 0.0001, 5, 360, 47),
new FXPair("USD/JPY", 95.10, 0.01, 3, 341, 55),
new FXPair("AUD/GBP", 1.455, 0.0002, 5, 319, 39),
);
Next, in our main loop we choose which symbol to modify randomly:
$ix = mt_rand(0,count($symbols)-1);
And then the hardcoded $d array in fx_server.hardcoded.php can be replaced with a call to generate:
$d = $symbols[$ix]->generate($t);
The full fx_server.seconds.php is shown here:
<?php
include_once("fxpair.seconds.php");
header("Content-Type: text/event-stream");
$symbols = array(
new FXPair("EUR/USD", 1.3030, 0.0001, 5, 360, 47),
new FXPair("EUR/USD", 1.3030, 0.0001, 5, 360, 47),
new FXPair("USD/JPY", 95.10, 0.01, 3, 341, 55),
new FXPair("AUD/GBP", 1.455, 0.0002, 5, 319, 39),
);
while(true){
$sleepSecs = mt_rand(250,500)/1000.0;
usleep( $sleepSecs * 1000000 );
$t = time();
$ix = mt_rand(0,count($symbols)-1);
$d = $symbols[$ix]->generate($t);
echo "data:".json_encode($d)."\n\n";
@ob_flush();@flush();
}
Note a few things about this code. The price we generate is solely based on the current time. We never store a previous value, which we then increase/decrease randomly; this might have been your first idea for implementing random prices. As well as being nice and clean and enabling repeatable, reliable testing, this also brings with it a little bonus: we can put two entries for EUR/USD in our array to get twice as many prices generated for it.
See Falling Asleep in Appendix C for why I use usleep() instead of sleep().
Do you wonder why we assign $t in the main loop, when all we do is pass it to generate()? Why not put the $t = time(); inside of generate()? This comes back to Design for Testability: by using a parameter we can pass in a certain value and always get back the same output fromgenerate(). So we can easily create a unit test of generate(). If we don’t do this, the global function time() becomes a dependency of the generate() function. And that sucks. (“That sucks” summarizes about 100 pages from xUnit Test Patterns by Gerard Meszaros (Addison-Wesley); refer to that book if you want to understand this in more depth.)
Fine-Grained Timestamps
When you run fx_server.seconds.php from the command line, you will see something like this:
data:{"timestamp":"2014-02-28 06:49:55","symbol":"AUD\/GBP","bid":"1.47219", ↵
"ask":"1.47239"}
data:{"timestamp":"2014-02-28 06:49:56","symbol":"USD\/JPY","bid":"94.956", ↵
"ask":"94.966"}
data:{"timestamp":"2014-02-28 06:49:56","symbol":"EUR\/USD","bid":"1.30931", ↵
"ask":"1.30941"}
data:{"timestamp":"2014-02-28 06:49:57","symbol":"EUR\/USD","bid":"1.30983", ↵
"ask":"1.30993"}
data:{"timestamp":"2014-02-28 06:49:57","symbol":"EUR\/USD","bid":"1.30975", ↵
"ask":"1.30985"}
data:{"timestamp":"2014-02-28 06:49:57","symbol":"AUD\/GBP","bid":"1.47235", ↵
"ask":"1.47255"}
data:{"timestamp":"2014-02-28 06:49:58","symbol":"AUD\/GBP","bid":"1.47129", ↵
"ask":"1.47149"}
This data looks nice and random, doesn’t it? But if you watch it for long enough you will spot the long and short cycles we programmed in. Notice that EUR/USD has two entries with the same timestamp. What we will do next is incorporate milliseconds into our timestamps.
We only need to make these changes to our code:
1. In our main loop, use microtime(true) instead of time().
2. In generate(), include milliseconds in our formatted timestamp.
microtime(true) returns a float: the current timestamp in seconds since 1970 (just like time() did) but to microsecond accuracy.
What about formatting our timestamp? What we currently have is:
'timestamp'=>gmdate("Y-m-d H:i:s",$t),
This still works. Even though $t is a floating point number, it is still seconds since 1970 and PHP will implicitly convert it to an int for the gmdate() function. So we just need to paste on the number of milliseconds.
We can get that number with ($t*1000)%1000 (multiply by 1,000 to turn $t into milliseconds since 1970, then just get the last three digits), and then use sprintf to format it so it is always three digits, and preceded by a decimal point:
'timestamp'=>gmdate("Y-m-d H:i:s",$t).
sprintf(".%03d",($t*1000)%1000),
Here is the full version of the new FXPair class:
<?php
class FXPair{
/** The name of the FX pair */
private $symbol;
/** The mean bid price */
private $bid;
/** The spread. Add to $bid to get "ask" */
private $spread;
/** Accuracy to quote prices to */
private $decimalPlaces;
/** Number of seconds for one bigger cycle */
private $longCycle;
/** Number of seconds for the small cycle */
private $shortCycle;
/** Constructor */
public function __construct($symbol,$b,$s,$d,$c1,$c2){
$this->symbol = $symbol;
$this->bid = $b;
$this->spread = $s;
$this->decimalPlaces = $d;
$this->longCycle = $c1;
$this->shortCycle = $c2;
}
/** @param float $t Seconds since 1970, to microsecond accuracy */
public function generate($t){
$bid = $this->bid;
$bid += $this->spread * 100 *
sin( (360 / $this->longCycle) * (deg2rad($t % $this->longCycle)) );
$bid += $this->spread * 30 *
sin( (360 / $this->shortCycle) *(deg2rad($t % $this->shortCycle)) );
$bid += (mt_rand(-1000,1000)/1000.0) * 10 * $this->spread;
$ask = $bid + $this->spread;
return array(
"timestamp" => gmdate("Y-m-d H:i:s",$t).
sprintf(".%03d", ($t*1000)%1000),
"symbol" => $this->symbol,
"bid" => number_format($bid, $this->decimalPlaces),
"ask" => number_format($ask, $this->decimalPlaces),
);
}
}
And here is the fx_server.milliseconds.php script that uses it:
<?php
include_once("fxpair.milliseconds.php");
header("Content-Type: text/event-stream");
$symbols = array(
new FXPair("EUR/USD", 1.3030, 0.0001, 5, 360, 47),
new FXPair("EUR/USD", 1.3030, 0.0001, 5, 360, 47),
new FXPair("USD/JPY", 95.10, 0.01, 3, 341, 55),
new FXPair("AUD/GBP", 1.455, 0.0002, 5, 319, 39),
);
while(true){
$sleepSecs = mt_rand(250,500)/1000.0;
usleep( $sleepSecs * 1000000 );
$t = microtime(true);
$ix = mt_rand(0,count($symbols)-1);
$d = $symbols[$ix]->generate($t);
echo "data:".json_encode($d)."\n\n";
@ob_flush();@flush();
}
When we run fx_server.milliseconds.php, we now see something like this:
data:{"timestamp":"2014-02-28 06:49:55.081","symbol":"AUD\/GBP", ↵
"bid":"1.47219","ask":"1.47239"}
data:{"timestamp":"2014-02-28 06:49:56.222","symbol":"USD\/JPY", ↵
"bid":"94.956","ask":"94.966"}
data:{"timestamp":"2014-02-28 06:49:56.790","symbol":"EUR\/USD", ↵
"bid":"1.30931","ask":"1.30941"}
data:{"timestamp":"2014-02-28 06:49:57.002","symbol":"EUR\/USD", ↵
"bid":"1.30983","ask":"1.30993"}
data:{"timestamp":"2014-02-28 06:49:57.450","symbol":"EUR\/USD", ↵
"bid":"1.30972","ask":"1.30982"}
data:{"timestamp":"2014-02-28 06:49:57.987","symbol":"AUD\/GBP", ↵
"bid":"1.47235","ask":"1.47255"}
data:{"timestamp":"2014-02-28 06:49:58.345","symbol":"AUD\/GBP", ↵
"bid":"1.47129","ask":"1.47149"}
In the book’s source code, there is a file called fx_client.basic.milliseconds.html that allows you to view this in the browser (Figure 3-2). Each time you run the script you will see the three currencies going up and down, and if watching paint dry is one of your hobbies you will probably quite enjoy this. And as long as you don’t mind watching it for at least six minutes (the length of the long cycle), this is also good enough for manual testing. But each time you run the script, the exact prices, the order in which the symbols appear, and of course the timestamps, are different. Refer back to Design for Testability for why we want to do something about this.
Figure 3-2. fx_client with milliseconds, after running for a few seconds
Taking Control of the Randomness
NOTE
The rest of this chapter is only backend enhancements; if you are more interested in the frontend, you could skip ahead to Chapter 4 now.
As an experiment, take your fx_server.milliseconds.php script and at the top add this one line: mt_srand(123);. This sets the random seed to a value of your choosing.
Stop it. Run it again. What do you notice? If you thought setting the seed would give you repeatable results, that must have come as a nasty shock. Everything is different. But look closely, and you’ll see the order of the ticking symbols is consistent: EUR/USD three times, then USD/JPY, then AUD/GBP, then USD/JPY three times.[15] That makes sense because the code to control the next symbol is simple randomness: $ix = mt_rand(0,count($symbols)-1);.
If you look really closely, you’ll also see that the difference between timestamps is almost the same. For example, I see a gap of 431ms on one run, 430ms on another run, and 431ms on a third try. This also makes sense because the time between ticks is also simple randomness:$sleepSecs=mt_rand(250,500)*1000;. The difference in timing is due to CPU speed, how busy the machine is at the time, and the flapping of the wings of a butterfly on the other side of Earth.
But why are the prices different? Because they are based on $t (the current time on the server), with just a little random noise added in. So we need to take control of $t. Now, was your first thought, “Let’s change the system clock, just before running each unit test”? I like your style. You are a useful person to have around when we have a wall to get through and the only tool we have is a sledgehammer. To be honest, I thought of it too.
But in this case there is an easier way to get through this wall—there is a door. And it was us who put it there earlier. I am talking about the way we pass $t to generate(), rather than having generate() call microtime(true) itself.
Just to get a feel for this, replace the $t = microtime(true); line with $t=1234567890.0;. Now it outputs:
data:{"timestamp":"2009-02-13 23:31:30.000","symbol":"EUR\/USD",↵
"bid":"1.31103","ask":"1.31113"}
And it is that exact same line every time you run the script, regardless of the CPU, load, or insect behavior.
Obviously we do not want it to be February 13, 2009 forever. Here is the next version of our code, which gives us the option to take control of $t:
<?php
include_once("fxpair.milliseconds.php");
header("Content-Type: text/event-stream");
$symbols = array(
new FXPair("EUR/USD", 1.3030, 0.0001, 5, 360, 47),
new FXPair("EUR/USD", 1.3030, 0.0001, 5, 360, 47),
new FXPair("USD/JPY", 95.10, 0.01, 3, 341, 55),
new FXPair("AUD/GBP", 1.455, 0.0002, 5, 319, 39),
);
if(isset($argc) && $argc>=2)
$t = $argv[1];
elseif(array_key_exists("seed",$_REQUEST))
$t = $_REQUEST["seed"];
else{
$t = microtime(true);
echo "data:{\"seed\":$t}\n\n";
}
mt_srand($t*1000);
while(true){
$sleepSecs = mt_rand(250,500)/1000.0;
usleep( $sleepSecs * 1000000 );
$t += $sleepSecs;
$ix = mt_rand(0,count($symbols)-1);
$d = $symbols[$ix]->generate($t);
echo "data:".json_encode($d)."\n\n";
@ob_flush();@flush();
}
Compared to fx_server.milliseconds.php, the main change is the block of code just before the main loop. But, in fact, the code is quite mundane. If run from the command line (if(isset($argc)...), it gets the seed from the first command-line parameter; if run from a web server, it looks for input[16] called seed and uses that ($_REQUEST['seed'];). And when neither are set, it initializes from the current time, and then it outputs a line to say what seed it is using. This last point is so that if something goes wrong you have the seed to reproduce the stream of data. Once we’ve got our random seed, we call mt_srand from one of those three places. We multiply $t by 1,000; mt_srand will truncate it to an int, so this is our way of saying we care about millisecond accuracy, but not microsecond accuracy.
In our main loop, the changes are simple. $t=microtime(true); has been removed from the start of the loop, and at the end of the loop, $t is incremented by the number of seconds we slept. In other words, if $t is 1234567890.0, meaning we are pretending it is 2009-02-13 23:31:30.000, and then we sleep for 0.325 seconds, we update $t such that we now pretend the current time is 2009-02-13 23:31:30.325.
Making Allowance for the Real Passage of Time
What a fun section title! As far as unit testing goes, the code at the end of the previous section is good enough. But did you try using it without a random seed? To make what is happening clear, I added this[17] just above the line that starts echo "data:"...:
$now=microtime(true);
echo ":".
gmdate("Y-m-d H:i:s",$now).
sprintf(".%03d",($now*1000)%1000).
"\n";
Starting a line with a colon is a way to enter a comment in SSE. You cannot access comments from a browser, so run this from the command line. At the start, you will see $now and $t are in sync. But after a few ticks, $now might be a few milliseconds slower. Go put the kettle on, and when you come back the gap will be in the hundreds of milliseconds. Run it for 24 hours and it will be minutes wrong. (By the way, the problem exists when you give a seed too; it is just harder to spot.)
Well, it is just test data, it doesn’t really matter. But adjusting sleep to match the passage of time is a tool you might need in your toolbox, so let’s quickly do it.
We will use a variable, $clock, to store the server clock time. That is initialized to the current time at the start of our script. But the real action is at the end of the main loop. $now=microtime(true); is back! Then we calculate the time slip with $adjustment = $now - $clock;. The key concept is when we go to sleep, we sleep for a bit less than we thought we wanted to:
usleep( ($sleepSecs - $adjustment) * 1000000);
$t is updated as before, i.e., $sleepSecs without using $adjustment. But then we also update $clock in exactly the same way. $clock represents the time we expect the server clock to have if we are running on an infinitely fast processor.
The full code for fx_server.adjusting.php is shown in the following code block, and you can find fx_server.adjusting_with_datestamp.php in the book’s source code, which uses SSE comments again to show that the artificial data is spit out at exactly the same pace as the real passage of time. You will also find fx_client.basic.adjusting.html, which connects to it (this version displays the seed that was chosen), and fx_client.basic.adjusting123.html, which sets an explicit seed, and thus shows repeatable data each time you reload.
<?php
include_once("fxpair.milliseconds.php");
header("Content-Type: text/event-stream");
$symbols = array(
new FXPair('EUR/USD', 1.3030, 0.0001, 5, 360, 47),
new FXPair('EUR/USD', 1.3030, 0.0001, 5, 360, 47),
new FXPair('USD/JPY', 95.10, 0.01, 3, 341, 55),
new FXPair('AUD/GBP', 1.455, 0.0002, 5, 319, 39),
);
$clock = microtime(true);
if(isset($argc) && $argc>=2)
$t = $argv[1];
elseif(array_key_exists('seed',$_REQUEST))
$t = $_REQUEST['seed'];
else{
$t = $clock;
echo "data:{\"seed\":$t}\n\n";
}
mt_srand($t*1000);
while(true){
$sleepSecs = mt_rand(250,500)/1000.0;
$now = microtime(true);
$adjustment = $now - $clock;
usleep( ($sleepSecs - $adjustment) * 1000000 );
$t += $sleepSecs;
$clock += $sleepSecs;
$ix = mt_rand(0,count($symbols)-1);
$d = $symbols[$ix]->generate($t);
echo "data:".json_encode($d)."\n\n";
@ob_flush();@flush();
}
Taking Stock
We covered a lot of ground in this chapter. Step by step, we designed a random data backend that incorporates Design for Testability principles (while learning a little about how FX markets work), then pushed that data to clients using SSE. But our development was quite rapid, so the next chapter will start with some refactoring, and then it will add some data storage features.
[13] Every browser that supports SSE has JSON.parse. However, when we talk about fallbacks for older browsers we will find JSON.parse is not available in really old browsers, most notably IE6/IE7. There is a simple way to patch it, though.
[14] DOM IDs in HTML5 can contain just about anything except whitespace. However, if you need this code to run on HTML4 browsers such as IE7 or IE8, you will need to sanitize the symbol names that the data feeds gives you. For example, convert all nonalphanumerics to “_”, and make the DOM IDs "USD_JPY", "EUR_USD", etc. (Also make sure a digit is not the first character, and for IE6 (!!) support, make sure an underline is also not the first character.)
[15] The exact random sequence, for a given seed, might change between PHP versions, and possibly between OSes. I used PHP 5.3 on 64-bit Linux when writing this.
[16] Yes, I’m using $_REQUEST deliberately, so it can come from GET, POST, or even cookie data. In this particular case, being able to set the random seed from a cookie is a feature, not a bug! See Superglobals in Appendix C for more on PHP superglobals.
[17] You’ll find this in the book’s source code as fx_server.repeatable_with_datestamp.php.