No More Ivory Tower: Making Our Application Production-Quality - Data Push Apps with HTML5 SSE (2014)

Data Push Apps with HTML5 SSE (2014)

Chapter 5. No More Ivory Tower: Making Our Application Production-Quality

In the previous couple of chapters we created a backend that pushes out FX prices for multiple symbols, and a frontend that displays them in any browser that supports SSE. One way we need to improve this application is by getting it to work in older desktop and mobile browsers that do not have SSE support. But there is another axis we need to improve in, because at the moment I still regard this as a toy example. It is not production-quality yet.

What do I mean by production-quality? Quite a few things. I mean that when things go wrong the system will recover automatically. I mean it works with real-world constraints (the one we will show in this chapter is dealing with the FX market shutting down on weekends). And I mean dealing with the case where we sent out bad data by mistake and now need to fix it.

Error Handling

In Chapter 4, we attached an event handler for the error message. We named that function handleError, and now we have to decide what is going to go into it. By the end of this chapter we will be auto-reconnecting whenever the backend server goes down. We will also keep trying to connect if it is not available. But we will be doing these things with or without an error callback. The error callback is just informative—only of interest to programmers, not to end users. So we might as well make it as simple as:

function handleError(e){

console.log(e);

}

I said “informative.” I was exaggerating. The object has no message, no error code. The only slightly useful thing is the target element. This is your EventSource object. Inside it you will find the URL you connected to, and an element called readyState (or, in full,e.target.readyState). If readyState is 2, it means “CLOSED.” This means your URL is bad— a connection could not be made. If readyState is 0, it means “CONNECTING,” and that means you had a connection, but it got closed and the browser is trying to auto-reconnect. And ifreadyState is 1, or “OPEN,” by the time you look at it, it means the reconnect already happened.

Bad JSON

If the server sends a JSON string that isn’t a JSON string, or is badly formatted (sometimes even as much as a stray comma or line feed), the browser might throw an exception. This will stop everything from working. And that is bad. So, instead of just writing var d = JSON.parse(s);, this is the production-quality approach:

try{

var d = JSON.parse(s);

}catch(e){

console.log("BAD JSON:" + s + "\n" + e);

return;

}

Adding Keep-Alive

I can sometimes go weeks, even months, without any really important news I need to tell Mum. But how can she tell the difference between me having no news, me forgetting to pay my ISP and phone company, and me having been hit by a fiery comet and lying in a hospital? So, every now and again I email Mum to tell her: “Bills paid, no fiery comets.” Or more simply: “I’m alive.”

In network terms, keep-alive is a packet of data that is sent every N seconds, or after N seconds of inactivity on a socket, just to let the other end of the socket know that everything is OK and that there simply hasn’t been anything to communicate. (You will also see this concept referred to as a heartbeat.) Some browsers might kill a connection and auto-reconnect after so many seconds of socket inactivity. In addition, proxy servers might kill a connection if it goes quiet. So we’re going to send keep-alive messages every 15 seconds to prevent this from happening. Why 15? The SSE draft proposal mentions that number. It is probably more frequent than is really needed, but on the other hand, it is not frequent enough to ever likely be the bottleneck of your system.

So, that decides N. The other design decision we have is whether we send the keep-alive every 15 seconds, or only after 15 seconds of quiet. It is not a very important decision, so I suggest you do whichever is easiest to code on the server side for you.

Be aware that keep-alives could affect TCP/IP bandwidth shaping, and in particular its Slow-Start Restart mechanism. My advice: don’t worry about it. If the difference between a 15-second, 30-second, or 90-second SSE keep-alive is having a significant effect on your network load, there could be bigger problems elsewhere. (For starters, why do you have so many SSE connections that are not sending any real data?) Alternatively, you could configure your server to not use slow-start (it is an OS-level setting).

With mobile, keep-alive brings other considerations. For instance, the keep-alive might be stopping the application from going to sleep, thus draining the battery rapidly. If your data is naturally infrequent, but is also predictable, consider using setTimeout to fetch data, instead of streaming it.

Server Side

The keep-alive can be as simple as sending a blank SSE comment line. How do we do that? It is just a line that starts with a colon. You may remember we used SSE comments to add some troubleshooting output in the section Making Allowance for the Real Passage of Time. Here’s an example:

echo ":\n\n";@flush();@ob_flush();

Alternatively, we could send a blank data message:

echo "data:\n\n";@flush();@ob_flush();

What is the difference between sending an SSE comment line and sending an SSE data line? On the server side it is four bytes, but on the client side there is all the difference in the world. The latter triggers the EventSource message handler, and the former does not.[19] We want the latter, for reasons we will cover later when talking about client-side handling.

So, if we are going to send a real data packet, let’s also include a timestamp (this can then be used to identify clocks that are out of sync or suspiciously large latency between server and client). Because we are using JSON for our messages, it is no trouble to also identify the message as a keep-alive, as follows:

sendData( array(

"action" => "keep-alive",

"timestamp" => gmdate("Y-m-d H:i:s")

) );

You can find an example using this in the sample code for the book: fx_server.keepalive.php. Because our application is constantly sending data, there would never be 15 seconds of quiet, and so we would never get chance to send a keep-alive. (Basically, the keep-alive concept is pointless for our particular application.) But to allow us to test it on the frontend, we use the regular send pattern. We do this by initializing $nextKeepalive = time() + 15; just before entering the main infinite loop. Simply put, that line says send the next keep-alive message 15 seconds from now. Then the start of the main loop, just after the sleep, now looks like:

while(true){

...

if(time() > $nextKeepalive){

sendData( array(

"action" => "keep-alive",

"timestamp" => gmdate("Y-m-d H:i:s")

) );

$nextKeepalive = time() + 15;

}

$ix = mt_rand(0, count($symbols)-1 );

...

To change it to only send after quiet periods, simply run $nextKeepalive = time() + 15; after sending real data too.

Client Side

SSE already has a reconnect function built into the protocol. So, in that sense, we don’t strictly need keep-alive handling. Just sending an SSE comment (see the previous section) will be enough to keep the TCP/IP socket alive. And if that socket dies, the browser should automatically reconnect. There are two reasons we choose not to rely on that functionality. The first is that the browser will only reconnect if the socket dies nice and cleanly and immediately, like an extra in an action movie. However, sometimes sockets can die like the hero in an action movie. And just like in the movie, it might be 30, 60, or even 120 seconds from when the socket stops working to the browser being sure it is dead. Bugs in your backend code can cause a similar problem. The second reason is nice and simple: we want our code to work with the fallbacks, too. That is why we send a keep-alive as a proper message that can be processed in our JavaScript.

First we need to define a couple of global variables in our JavaScript:

var keepaliveSecs = 20;

var keepaliveTimer = null;

The first decides how sensitive we want to be. The correct value is to match the pace of the keep-alive messages that the backend sends. We chose 15 seconds for our backend; I like to add a bit of buffer, to allow for network latency and other delays on both the front and backends, which is why I have chosen 20 seconds.

keepaliveTimer is the handle of a setTimer() call. We create this timer when we do the initial connect. Then, whenever data comes through from the server we kill the old timer and create a new one. So, as long as data (whether real data or a keep-alive message) keeps coming through regularly, the timer will always be killed before it gets a chance to trigger. Only when no data comes through for a period of 20 seconds will the timer finally get a chance to trigger. And that can only mean there is a problem somewhere, because keep-alive messages are supposed to be received every 15 seconds.

In code that looks like this:

function gotActivity(){

if(keepaliveTimer != null)clearTimeout(keepaliveTimer);

keepaliveTimer = setTimeout(connect, keepaliveSecs * 1000);

}

The second parameter to setTimeout is given in milliseconds, hence the multiply by 1,000. The first parameter is the function to call after the timeout, so it will call connect() if no keep-alive is received. You remember that we have a function called connect that previously looked like this:

function connect(){

if(window.EventSource)startEventSource();

//else handle fallbacks here

}

To start things going we just add one line at the top, so it looks like:

function connect(){

gotActivity();

if(window.EventSource)startEventSource();

//else handle fallbacks here

}

This is very important: without it, a connection that went wrong (that never sent any data) would go unnoticed. It goes at the start just in case startEventSource() throws an exception, and the end of the function is not reached.

Are you concerned we don’t kill the old connection in connect()? We leave that job to startEventSource() (and we already handle it: seeRefactoring the JavaScript). The way to kill the connection varies depending on the fallback we are using.

There is one final piece to add. At the very top of processOneLine(s) we add a call to gotActivity():

function processOneLine(s){

gotActivity();

var d = JSON.parse(s);

...

It does not matter if it is a keep-alive, regular data, or anything else. Ending up in processOneLine(s) is a sign of getting a message. The fallbacks we look at in the next two chapters will also use connect() and processOneLine(s), so there will be no changes to this code for them to support keep-alives. Try out fx_client.keepalive.html to see it in action; Figure 5-1 shows how it will look after a couple of keep-alives have come through.

fx_client.keepalive.html after running for about 35 seconds; two keep-alives have come through

Figure 5-1. fx_client.keepalive.html after running for about 35 seconds; two keep-alives have come through

ANOTHER WAY TO DO KEEP-ALIVES

Our current keep-alive solution kills and re-creates a timer every single time we get new data. An alternative approach is to just record the timestamp of the latest data. This then (for instance) could require a once-every-four-second timer. Each time that timer triggers, it checks to see how long it has been since we got real data; when that is over 20 seconds, it assumes that the server has died, at which point it can try to reconnect.

This approach has some downsides. It needs a couple more globals (var keepaliveTimerSecs=4;, var lastTimestamp=null;). It needs about double the number of lines of code. And it becomes less precise: when the server goes down it will be between 20 and 24 seconds before we notice. The way it was shown previously, we will notice exactly 20 seconds after the last received message.

There must be some advantage to doing it this way, right? Yes, updating a timestamp each time we get data is quicker than killing and starting a timer. This extra CPU load comes just when we want it least: when we are getting a burst of very rapid data, and already had more than enough to do, thank you very much for asking.

The first draft of this chapter did it this way in the main text, and the simpler version was in this sidebar. However, I got suspicious and went away to benchmark the difference. In Chrome (well, actually WebKit/PhantomJS), the stop-start of the timers took 14 to 17 times longer than just assigning the current timestamp. In Firefox, the difference was even bigger: about 250 to 350 times slower! Aha, so my hunch was right! Very gratifying. Then I took a step back. I had got caught up in a micro-optimization. Say we receive 100 messages/second, which can be considered a very busy feed. My benchmarking told me that 100 stop-starts of a timer takes about 6ms, so with 100 messages/second that equates to about 0.6% of one CPU core.

Conclusion: the simpler keep-alive processing is never going to be the bottleneck.

SSE Retry

SSE has its own stay-connected functionality. How does that work, and how does our code interact with it? The built-in reconnect of SSE is working at the socket level. If the socket is closed by the server, the browser will perform these steps:

§ Set readyState (an element of our EventSource object) to CONNECTING.

§ Call our error handler (see Error Handling).

§ Wait retry seconds, and then connect again.

Who decides how long that retry delay is? The default is decided by your browser,[20] and is going to be about 3–5 seconds. What that means is that if the connection goes down due to a closed socket, the restart will happen before our keep-alive code gets a chance to notice. So there will not be any clash. The SSE restart will handle everything; our keep-alive restart will never be used.

Well, if SSE has this retry code built in, why did we bother writing our own keep-alive system? Good question. First, we will need it for the fallbacks that we will be looking at in the next two chapters. But even if we control the ecosystem and know all our browsers have EventSourcesupport, we still need this code. The SSE reconnect only handles one of the things that can go wrong: the socket gets closed. There are other ways a data feed can stop working. Sockets can die in a way that is not noticed. The backend script could crash in a way that does not cleanly close the socket. It might enter an infinite loop. There could be a browser bug or a server bug. Luckily, our explicit keep-alive system takes care of all of these. But the most important one it takes care of is the case where the server is offline. When the web server cannot be reached, or sends back a 404, or is not configured for CORS correctly, SSE changes the readyState to CLOSED, and does not try again. Ever. Our explicit keep-alive system will retry every 20 seconds.

Going back to SSE retry, the default wait of 3–5 seconds is quite short. If your server is likely to close the connection a lot and does not want frequent reconnections, it can set the retry time higher. Conversely, if you want it shorter, to reduce downtime when something goes wrong, you can set a lower number.[21] The way you do this is by sending a special SSE line:

retry: 10000

It is in milliseconds, so 10,000 means 10 seconds. I suggest you never set this higher than the rate at which you send keep-alives. For instance, if you set retry to 20000, then I suggest you have your server send keep-alives at 25- or 30-second intervals. (But don’t go much higher than that or the browser and intermediaries, such as proxies, load balancers, etc., might start interpreting quietness as a lost connection.) And remember if you increase that interval on the server, you must increase keepaliveSecs in the JavaScript, too.

Perhaps we should take inspiration from the SSE retry, and implement our own protocol so that the server can also state the rate at which it is sending keep-alive messages, and we adjust keepaliveSecs automatically to match? It is a great idea for when the server is getting overloaded: dynamically tell clients to back off a bit. In reality, if you space the keep-alive messages out too far, the browser (or intermediaries) will think there is a problem, kill the socket, and will try to reconnect, thus creating more overall load. So you are only left with a range of about 15 to 40 seconds within which you can adjust. That makes little difference and is not worth the extra complexity involved.

In the book’s source repository, there is a file called fx_server.retry.php. All it does is add one line at the top of the script, as shown here:

header("Content-Type: text/event-stream");

echo "retry: 10000\n\n";@flush();@ob_flush();

...

How do we test this? Well, the script contains a self-destruct clause! Just inside the top of the infinite loop, I’ve added this:

while(true){

if(time() % 20) == 0)break;

...

At the start of each minute, and at 20 seconds past and 40 seconds past the start of each minute, the script will quietly exit. This is a nice clean exit, so the browser should learn about it immediately.

WAYS TO KILL

Another way to test SSE recovery of a connection is to kill the server. For instance, if I’m using Apache on Ubuntu I can just type sudo service apache2 restart. As with breaking out of the infinite loop shown previously, this is a clean kill: the browser should recognize the socket has died almost immediately.

Incidentally, sudo service apache2 graceful is exactly what you don’t want: it restarts all the Apache instances that are not doing anything, but your SSE process is doing something, so it keeps your SSE socket open.

What about ways to do a dirty kill?

If we run the server and client on different machines, we can simply pull out the network cable between them. Similarly, we could shut down the network interface on the server. The browser won’t detect the socket has failed and our keep-alive process will get to do its work.

Another approach, when using Apache, is to work out which of the Apache processes is servicing our request, get its pid, then do sudo kill -s STOP 12345, where 12345 is the pid. This works like pulling out the cable: the browser won’t detect the problem and instead our keep-alive will. The STOP signal means go to sleep. Use sudo kill -s CONT 12345 to start it up again (equivalent to plugging the cable back in).

Why doesn’t the browser detect the problem in the last two cases? Imagine if you pull out a cable for a second and then put it back in. Or you are connecting over mobile or WiFi and briefly go out of range, then come back in range again. By design, TCP/IP deals with these brief outages. The client cannot tell the difference between going quiet, a temporary problem, or a fatal problem. This is why we need a keep-alive system.

Point one of our client scripts at fx_server.retry.php to try it out. (fx_client.retry.html is supplied for this purpose; all that has changed is the URL to which it connects. Note that it has the keep-alive logic, so you can play with the interaction between the browser keep-alive and our own keep-alive.) With retry:10000, you will see 1–20 seconds of activity, then it will go quiet for 10 seconds. If you have the JavaScript console open, you will see an error appear: this is when the browser noticed that the socket disappeared. Then you will see it alternate between 10 seconds of activity (the seed of the new connection will be printed to screen) and 10 seconds of quiet. Try commenting out the retry header in fx_server.retry.php. With Firefox (which uses a 5-second default for retry), you will see 15 seconds of activity alternate with a 5-second quiet period. Now try setting the retry header to 500 (i.e., half a second), and you will see the errors appear in the console log, but almost no interruption of service.

Finally, try setting the retry to 21,000. This is higher than our own keep-alive check of 20 seconds. So our own code does the reconnect, not the browser SSE implementation. And now something fascinating happens: we end up connecting at 0, 20, and 40 seconds past each minute. This exactly matches the self-destruct times, and no data gets sent ever again! What fun. To be clear, this is just a chance interaction between the self-destruct time and the keep-alive timeout time. Try changing the self-destruct timing, or keepaliveSecs in your JavaScript, to get a feel for this. Or better still, keep retry to less than the keepaliveSecs and don’t put self-destructs in your code. Ah, but, hang on, self-destruction is also the theme of the next section.

Adding Scheduled Shutdowns/Reconnects

In the real-world FX markets, there is no data to send on weekends.[22] All those sockets are being kept open, but all that is sent down them are keep-alive messages. Especially nowadays, with the cloud allowing us to change our computing capacity hour to hour, this is a waste. So what we want is for the server to be able to broadcast a message saying: “That’s all folks. Tune in Monday morning for the next exciting episode.”

On the backend, we could add this code:[23]

$when = strtotime("next Sunday 17:00 EST");

$until = date("Y-m-d H:i:s T", $when);

$untilSecs = time() - $when;

sendData( array(

"action" => "scheduled_shutdown",

"until" => $until,

"until_secs" => $untilSecs

) );

This code sends the timestamp when the clients should reconnect, in the "until" field. We also send it as a Unix timestamp, the "until_secs" field, to make it easier for clients to work with. (It also means the client does not need to worry about differing time zones, or slow clocks: the server said come back in 100,000 seconds, so that is what we will do.)

Here we choose 5 p.m. Sunday afternoon, in EST (New York winter time), the traditional weekly start for FX trading. Our calculation of $until is a bit crude. If it is already Sunday, then “next Sunday” will go horribly wrong. Second, New York switches from EST (UTC-05) to EDT (UTC-04) for the summer. Or, in plainer language, we want to use “EDT” from the second weekend in March through to the first weekend in November. PHP can do these calculations automatically for you, but that is starting to get outside the scope of this book. In a real application you will also want to consider public holidays, so you should consider getting all shutdown and reconnect times from a database rather than calculating them.

And in fact we will do something similar now (see fx_server.shutdown.php). The main loop now looks for the presence of a file on disk called shutdown.txt. It expects to find a datestamp in that file that strtotime can interpret.

NOTE

This is the first time we’re using strtotime in the book; see Date Handling in Appendix C if it is unfamiliar to you.

It will then send a shutdown to the clients, giving them that timestamp. This code has been added near the start of the main infinite loop:

$s = @file_get_contents("shutdown.txt");

if($s){

$when = strtotime($s);

$untilSecs = $when - time();

if($when > 0 && $untilSecs > 0){

$until = date("Y-m-d H:i:s T",$when);

sendData( array(

"action" => "scheduled_shutdown",

"until" => $until,

"until_secs" => $untilSecs

) );

break;

}

}

The first line uses @ to suppress error messages. Effectively, it does a check for existence of the file, then loads it. If the file does not exist, $s will be false. The rest of the code is basically the example we saw earlier, with a bit of error-checking for bad timestamps (because we get it from a file that could contain anything).

So, because it is summertime as I write this, at Friday 5 p.m. EDT I will create a file with these contents: “next Sunday 17:00 EDT.” I must make sure the file gets deleted by midnight on Saturday. (If I really didn’t want clients connecting in the daytime on Sunday, I could replace it with a file that just read “17:00 EDT” for the first 17 hours of Sunday.)

Let’s take a look at how we handle this on the frontend. There are two tasks: recognizing we got a shutdown message, and acting on it. For the first of those, we will add this to the end of our main loop:

...

else if(d.action=="scheduled_shutdown"){

document.getElementById("msg").innerHTML +=

"Scheduled shutdown from now. Come back at :" +

d.until + "(in " + d.until_secs + " secs)<br/>";

temporarilyDisconnect(d.until_secs);

}

A first stab at the temporarilyDisconnect() function looks like this:

function temporarilyDisconnect(secs){

var millisecs = secs * 1000;

if(keepaliveTimer){

clearTimeout(keepaliveTimer);

keepaliveTimer = null;

}

if(es){

es.close();

es = null;

}

setTimeout(connect, millisecs);

}

Stop the keep-alive timer (we don’t want that triggering while we’re supposed to be sleeping!), close the SSE connection (when we add our fallbacks we need to put an entry in here for each to shut them down, too), and then call connect() at exactly the time we are told to.

I said “first stab,” so you already know there is something wrong here…but it actually works perfectly. Test it (fx_client.shutdown.html). Start it running in your browser and then on the server, in the same directory as fx_server.shutdown.php, create a file called shutdown.txt and put a timestamp in that is about 30 seconds in the future. Use a 24-hour clock, and I recommend giving the time zone explicitly. For example, if you are in London, in summer, and it is currently 3:30:00 p.m., then try “15:30:30 BST” (remember how strotime() works; if you do not give a date, it defaults to the current day). It converts that to GMT, so in your browser you’ll see a message something like “Scheduled shutdown from now. Come back at 2014-02-28 14:30:30 UTC(in 29 secs).” Wait those 29 seconds and it comes back to life. Just. Like. Magic.

It works perfectly. What on earth could be wrong? Here’s a clue: it works perfectly and we call connect() at exactly the time we are told to. Have you spotted the hidden danger? Go back to the FX markets, and imagine you have 2,000 clients. Think what is going to happen on Sunday, at 17:00:00, New York time. They will all try to connect at that exact same moment and you have the mother of all traffic spikes.

How can we avoid this? An observation: it does not really matter if some clients come back a little earlier. So how about we add the couple of lines highlighted here:

function temporarilyDisconnect(secs){

var millisecs = secs * 1000;

millisecs -= Math.random() * 60000;

if(millisecs < 0)return;

if(keepaliveTimer){

clearTimeout(keepaliveTimer);

keepaliveTimer = null;

}

if(es){

es.close();

es = null;

}

setTimeout(connect, millisecs);

}

Try this out (fx_client.jitter.html). It randomly spreads the client connection attempts out over a 60-second period before the connect time we told them. The second line I added just says if that means there is no sleep needed at all, then don’t even disconnect. By the way, you should deleteshutdown.txt at least 60 seconds before the reconnect time. Otherwise, those early reconnecting clients just get told to go away again.

Sending Last-Event-ID

When we lose the connection and then reconnect, for whatever reason, we will get the new latest data when we reconnect. That is wonderful, but for any reasonably active data feed it means we will have a gap in our data. In Chapter 4 we went to the trouble of keeping a history of all the data we downloaded, but if it is not kept accurate and complete, it has much less value.

Fortunately, the designers of the SSE protocol gave this some thought. At connection time an HTTP header, Last-Event-ID, can be sent that specifies where the feed should start from. The ID is a string; it does not have to be a number.

The good news is that our fallbacks, using XMLHttpRequest and ActiveXObject, can use the setRequestHeader() function to simulate this behavior. The bad news is that we cannot specify it manually using EventSource. So with SSE we can only use the value that the server has previously sent to us, and that means that with a fresh connection we cannot specify it all. There is no setRequestHeader() function on the EventSource object (yet). This is one of those (rare) cases where our fallbacks are better than SSE.

NOTE

If you are thinking this restriction on not being able to send our own Last-Event-ID header must have to do with security, perhaps so the server can stop us from trying to access older data, I should point out you could get around it with any HTTP client library in any major computing language. Such security would be an illusion.

Imagine the case where we are reconnecting because our own keep-alive triggered. Or where the user has reloaded the page, and we are storing her history in an HTML5 LocalStorage object (therefore, we know the last data event she received, including its ID). For these cases, we will have to send the ID in the URL. So, the server has to look at both the URL and the Last-Event-ID header. The header should always get precedence (because that would mean it is one of the EventSource auto-reconnects, meaning the ID embedded in the URL is now out of date).

The next thing to consider is that there is only a single ID for a given SSE connection. If we are sending different data feeds (e.g., different FX exchange rates) down the same connection, what should we do? There is the easy way, and the hard way. The hard way is shown in the next section. The easy way, and the one we will use here, is to use the current time. Specifically, we will use the time on the server, and it will be in milliseconds since 1970. We use milliseconds since 1970 because that is the internal JavaScript format, so no conversion will be needed. How does it look on the server side? Just before each data line we send, we will send the time in the id field. So the client might receive a sequence of data like this:

id:1387946750885

data:{"symbol":"USD/JPY","timestamp":"2013-12-25 13:45:51",↵

"rows":[{"id":1387946750112,"timestamp":"2013-12-25 13:45:50.112",↵

"value":98.995},{"id":1387946750885,"timestamp":"2013-12-25 13:45:50.885",↵

"value":98.980}]}

id:1387946751610

data:{"symbol":"USD/JPY","timestamp":"2013-12-25 13:45:51",↵

"rows":[{"id":1387946751610,"timestamp":"2013-12-25 13:45:51.610",↵

"value":98.985}]}

NOTE

The id: line can just as well come directly after the data: line, as long as both are before the blank line that marks the end of the SSE message.

Keen-eyed readers will notice that the data: lines have a different format than what we have been sending up to now. In each row we now send an id field, in addition to the timestamp. This is needed because in SSE (but ironically not in our fallbacks) we cannot get hold of the id row. Notice that the id is encoded in JSON as an integer, not a string.

I will start with the changes in the FXPair class. Relative to fxpair.structured.php, you will find just a couple of lines have changed in the generate() function of fxpair.id.php. The new version of generate() looks like this, with the additions highlighted:

public function generate($t){

$bid = $this->bid;

$bid += $this->spread * 100 *

sin( (360 / $this->long_cycle) *

(deg2rad($t % $this->long_cycle)) );

$bid += $this->spread * 30 *

sin( (360 / $this->short_cycle) *

(deg2rad($t % $this->short_cycle)) );

$bid += (mt_rand(-1000,1000)/1000.0) * 10 * $this->spread;

$ask = $bid + $this->spread;

$ms = (int)($t * 1000);

$ts = gmdate("Y-m-d H:i:s",$t).sprintf(".%03d",$ms % 1000);

return array(

"symbol" => $this->symbol,

"timestamp" => $ts,

"rows" => array(

array(

"id" => $ms,

"timestamp" => $ts,

"bid" => number_format($bid, $this->decimal_places),

"ask" => number_format($ask, $this->decimal_places),

)

)

);

}

$t is the number of seconds since 1970, with a fractional part. To get milliseconds since 1970, $ms, multiply by 1,000 (and because $t is to microsecond accuracy, then use (int) to truncate the microsecond part away). Sending this number back to the client is as easy as adding the'id'=>$ms, line.

To have fx_server.id.php send the id back to SSE clients, just a couple of additions are needed relative to fx_server.shutdown.php. At the top, add include_once("fxpair.id.php");, the new FXPair class. Then just below the definition of sendData(), add another helper function:

function sendIdAndData($data){

$id = $data["rows"][0]["id"];

echo "id:".json_encode($id)."\n";

sendData($data);

}

So it outputs the id: row, then relies on sendData() to output the data: row and do the flush.

NOTE

In the sendIdAndData() function just shown, we have this line: echo "id:".json_encode($id)."\n";. This could equally well have been echo "id:$id\n"; because $id is an integer and therefore needs no special JSON encoding. Try changing it to see that the application behavior is identical. I’ve chosen to use json_encode() explicitly so that this code is ready to go if $id is a string or even something more complicated (see the following section).

Then in the middle of the main loop, change:

sendData( $symbols[$ix]->generate($t) );

into:

sendIdAndData( $symbols[$ix]->generate($t) );

To use it from the browser, just change the URL to connect to; there is nothing else to be done on the frontend, because SSE’s use of id: is taken care of by the browser, behind the scenes.

NOTE

Now that there is an integer ID in the row data, we could go back and change our history store to use that as the key, instead of the timestamp string. It should mean the key is 8 bytes instead of 24 bytes for the string. It might mean lookups are quicker. There is a catch, though: we also use that timestamp string in our interface. So either we still need to store it (more memory usage overall), or we need to make the timestamp from the milliseconds value, using JavaScript’s Date functions (more CPU usage, though it does give us the flexibility to use different date formatting). I chose not to change anything.

Something to be careful of is that the ID should be the timestamp of the (most recent) data, not the current time. This is shown in the preceding code (I don’t expect you to know the ID number, but the last three digits match the fraction of the second, and the fourth digit from the end is the last digit of the seconds). Of course, the timestamp on your new data and the current time are going to be close, but consider the case when the data being fed to the clients is coming to your server from another server, and to that server from another. The latency from all those hops could start to add up. We need to know the ID of the data, because when reconnecting, we use that ID to tell the server the last piece of data we saw, so that it can restart the feed from the very next item.

ID for Multiple Feeds

What if we have data feeds that are not being indexed by time? For instance, what if we are sending messages that we get by polling an SQL database, which uses an autoincrement primary key. Being given the timestamp in the Last-Event-ID header would require a search of the timestamp column, which is either slow or requires us to add another index column to our database (and that slows down database writes). What we really wanted in Last-Event-ID is the last value of the primary key that has been seen.

But then what if we are polling multiple tables? For instance, consider a chat application or a social network where we push out all kinds of notifications: chat messages, requests to chat, friends available for chat, friends who have logged off, new friend requests, etc. We need Last-Event-ID to tell us the latest ID seen in each of those tables.

Sounds hard, doesn’t it? But I have some good news. The ID the server sends in the id: field, and that is sent back in Last-Event-ID, can be a string of any characters (to be precise, anything in Unicode except LF or CR). We settled on using JSON for the message we put in the data:lines, so why not do the same for the data we put in the id: line? It might look like this:

id:{"chatmessages":18304,"chatrequests":1048,"friendevents":8202}

The need to use this technique is not as contrived as it may sound. It is quite common in the finance industry to sell delayed data at a different price. For instance, a live feed of stock market share prices can be expensive, but Yahoo! and Google can show us 20-minute delayed prices for free. If you buy live data for just two symbols, and get the delayed prices for all the other symbols, your lastId variable is going to be continuously jumping back and forth 20 minutes. You are guaranteed that whatever its value when you need to do a reconnect, it is going to be wrong for some symbols. One solution is to send id like this:

id:{"live":1234123412,"delayed":1234123018}

One last thing to be careful of with using a JSON object for the id: field: if it gets past a hundred bytes or so, you cannot use GET, and will need to use cookies (See HTTP POST with SSE in Chapter 9 to learn why HTTP POST is not a choice.) Then if it gets past a few thousand bytes, it is going to cause problems when sent as an HTTP header (remember SSE sends this header for you; you cannot control it). Specifically most web servers complain (send a 413 status code) if the total header size (request line, all headers, including the user agent and all cookies) exceeds 8KB.

NOTE

Older versions of nginx have a 4KB limit, but the current default is 8KB, and it can be configured: http://wiki.nginx.org/HttpCoreModule#large_client_header_buffers. Apache can be similarly configured: http://httpd.apache.org/docs/2.2/mod/core.html#limitrequestfieldsize.

So, if your id: field is over a hundred bytes, stop and think if there is a better way. For instance, can the current position in each data source be stored server side, in a user session, and have that session referenced with a cookie?

Using Last-Event-ID

Back in the server-side script, how do we use the Last-Event-ID header? When using PHP, with Apache, headers sent by the browser get:

1. Changed to uppercase

2. Prefixed with HTTP_

3. Put in $_SERVER

In fx_server.id.php, here is what we do:

if(array_key_exists("HTTP_LAST_EVENT_ID", $_SERVER)){

$lastId = $_SERVER["HTTP_LAST_EVENT_ID"];

}

elseif(array_key_exists("lastId", $_POST)){

$lastId = $_POST["lastId"];

}

elseif(array_key_exists("lastId", $_GET)){

$lastId = $_GET["lastId"];

}

else $lastId = null;

(The lines mentioning $_POST and $_GET are explained in the next section.) Then I put the following code between setting $t based on finding seed in the HTTP request and using $t to set the random seed:

if($lastId)$t = $lastId / 1000.0;

In other words, because this is only a test application, I’m basically using the Last-Event-ID header as a synonym for seed. For the sake of testing and understanding how Last-Event-ID works, this is good enough; in a real application, this is where you would do the history request, then send a patch of the missing data.

WARNING

Security Alert! lastId is pure, unadulterated user input. It can theoretically contain anything; never assume it will only ever contain what your frontend JavaScript puts in it. A hacker could put whatever he wants in it.

The preceding code is actually secure, but in this case the security check is very subtle: I expect $lastId to be a number. When I divide by 1000.0, if it is anything but a number, PHP will first implicitly convert it to a number. If a hacker has set lastId to be {"hello":"tell me your password"}, that ends up as 0 before being divided by 1000.0. $t gets set to January 1, 1970. The worst thing that a hacker can do is put $t as a date in the far past or far future.

When lastId contains other types of data, you have more work to do to sanitize it and understand the potential risks. This isn’t a book on web security, so I suggest you read up on the sanitization techniques available for the backend language you are using.

How to test? The same way we tested the retry: header earlier (see Ways to Kill). So, try cutting the connection (have your JavaScript console open so you can see when the disappearance of the SSE socket is noticed by the browser) and after a few seconds the browser reconnects and continues with the quotes from the timestamp it had reached.

NOTE

You will notice the quotes start falling behind the wall clock! This is just a side effect of it being artificial test data. If it really bothers you, find a good therapist.

GETTING LAST-EVENT-ID IN NODE.JS

The code shown in this section was using some features quite specific to PHP, so how does it look in Node.js? The following example has been grafted onto basic_sse_node_servers.js, which we saw back in Chapter 2:

var url = require("url");

...

http.createServer(function(request, response){

var urlParts = url.parse(request.url, true);

if(urlParts.pathname != "/sse"){

...

}

var lastId = null;

if(request.headers["last-event-id"]){

lastId = request.headers["last-event-id"];

}

else if(urlParts.query["lastId"])lastId = urlParts.query["lastId"];

console.log("Last-Event-ID:" + lastId);

//The SSE data is streamed here

}).listen(port);

(You can find this code as basic_sse_node_server.headers.js in the book’s source code.)

HTTP headers are found in request.headers. Nice and easy. Just remember they have been lowercased.

To get the GET data requires parsing request.url, which is done at the top of the function. After that, the GET data can be found in urlParts.query.

You notice I do not show how to get POST data. It is a bit more involved, though only to the tune of six lines or so.[24] But the real complication is that the parsing is asynchronous. So the code would need to be refactored to use callbacks, which is getting a bit involved for this sidebar!

Passing the ID at Reconnection Time

During the previous section I showed how to get $lastId from the Last-Event-ID header. But I also included code to look in the POST and GET data for a variable named lastId. This allows us to specify the ID ourselves on a fresh connection, not rely on the underlying SSE protocol. Why do we need this? Because EventSource currently has no way to allow us to send our own HTTP headers. No, but why do we need this? It is needed in two cases:

§ When our keep-alive has triggered and it is our JavaScript doing the reconnect, not the browser’s implementation of SSE

§ When reloading the page in the browser, and a cookie or LocalStorage object knows the last ID we’ve seen

Take note of the order of the code: it is first come, first served. If the header is present, then that is used. Otherwise, we look in the POST data (this is actually for the fallbacks in the next two chapters; native SSE does not support POST-ing data). If neither the Last-Event-ID header norlastId is in the POST data, then and only then will it look for lastId in the URL. This is important because when SSE sends the Last-Event-ID header to reconnect it will be using the same URL. If we let the GET data have precedence over the Last-Event-ID header, it will be using an old ID and not the latest one.

Over on the client side, what changes do we need to make? Start with a global:

var lastId = null;

If you are saving permanent state in a LocalStorage object, you would initialize lastId from that.

We only want lastId pasted into the URL for SSE, not for the other fallbacks (because we can set HTTP headers for them). So instead of connect(), we change startEventSource(), which currently looks like this:

function startEventSource(){

if(es)es.close();

es = new EventSource(url);

es.addEventListener("message",

function(e){processOneLine(e.data);}, false);

es.addEventListener("error", handleError, false);

}

to the following (changes in bold):

function startEventSource(){

if(es)es.close();

var u = url;

if(lastId)u += "lastId="

+ encodeURIComponent(lastId) + "&";

es = new EventSource(u);

es.addEventListener("message",

function(e){processOneLine(e.data);}, false);

es.addEventListener("error", handleError, false);

}

The last step is where the refactoring in Chapter 4 (adding an id field to each row of data) can finally bear fruit. In function processOneLine(s), there is currently this loop:

for(var ix in d.rows){

var r = d.rows[ix];

x.innerHTML = d.rows[ix].bid;

full_history[d.symbol][r.timestamp] = [r.bid, r.ask];

}

Now add one line to the end of the loop, so that the lastId global always contains the highest ID received so far:

for(var ix in d.rows){

var r = d.rows[ix];

x.innerHTML = d.rows[ix].bid;

full_history[d.symbol][r.timestamp] = [r.bid, r.ask];

lastId = r.id;

}

Again, if using Web Storage to persist data even after the browser is shut, you would update that here (e.g., localStorage.lastId = r.id;) too.

IDS AND MULTIPLE UPSTREAM DATA SOURCES

Remember that the approach described in the main text (a single global for lastId) only works when all symbols (aka multiplexed data feeds) share the same ID system. In our case, all the symbols use id to mean the time of the quote (in milliseconds since 1970).

But even when using a timestamp as the unique ID, there is need for care. If the system is broadcasting share prices from two or more exchanges, then those two exchanges could be slightly out of sync, or one might have experienced a temporary delay. As an example, you have received prices up to 14:30:27.450 for the New York Stock Exchange, but the last NASDAQ price seen was 14:30:22.120 and 5 seconds’ worth of prices are currently delayed, and then you lost your own connection. When you reconnect, if you say the last price seen was at 14:30:27.450, you would miss out on those 5 seconds of NASDAQ prices. If instead you request prices since 14:30:22.120, you get 5 seconds of duplicate NYSE information.

So when dealing with two upstream data sources, maintain the last ID for each (see ID for Multiple Feeds).

To test this we need to force a keep-alive timer to timeout, meaning that our script has to go quiet, but not die (if the socket dies cleanly, the SSE reconnect will kick in first). One way to do that is to put the following code at the top of our infinite main loop over in fx_server.id.php :

if($t % 10 == 0){sleep(45);break;}

In other words, go to sleep for a long time and then exit, every 10 seconds. (The client will have disconnected before sleep() returns, causing the PHP process to be shut down so the break is not really needed.) If you use that method, it will hit a problem when it reconnects, because $twill be divisible by 10, so it will immediately fail. Ad infinitum. A workaround for that is to place this line just before entering the infinite main loop. It just fast-forwards the clock to get past the divisible-by-10 point:

while($t % 10 == 0 || $t % 10 == 9)$t += 0.25;

NOTE

If you want a ready-made version, see fx_server.die_slowly.php in the book’s source code, which is paired with fx_client.die_slowly.html (the only change from fx_client.id.html is the URL it connects to).

When you test this, you should see a few seconds of quotes come through. Then it stops. After 20 seconds (the keep-alive timer length), it connects again and you should see quotes pick up from where it left off. (See Ways to Kill for ways to dirty-kill a socket so you can test how this code copes with that.)

Don’t Act Globally, Think Locally

Up until now all our code has used a bunch of global variables. Appendix B explains why this is bad, and what we can do about it, but the bottom line is that using those globals is stopping code reuse: we cannot have more than one SSE connection on a page. The following listing is based onfx_client.id.html, but adds the highlighted sections:

var url = "fx_server.id.php?";

function SSE(url,options){

if(!options)options={};

var defaultOptions={

keepaliveSecs: 20

};

for(var key in defaultOptions)

if(!options.hasOwnProperty(key))

options[key]=defaultOptions[key];

var es = null;

var fullHistory = {};

var keepaliveTimer = null;

var lastId = null;

function gotActivity(){

if(keepaliveTimer != null)

clearTimeout(keepaliveTimer);

keepaliveTimer = setTimeout(

connect, options.keepaliveSecs * 1000);

}

.

. (all other functions untouched)

.

connect();

}

setTimeout(function(){new SSE(url);}, 100);

There are two main changes:

§ Encapsulate all the variables and functions, so that SSE is the only thing in the global scope, and multiple instances could be created.

§ Introduce an options parameter, where everything is optional. keepaliveSecs was moved in here.

I said multiple instances could be created, but it would be silly to just create two instances here without thinking about your data and how it is to be displayed. Currently, the code is hardwired to use the static HTML found in fx_client.closure.html. So two instances would end up competing for control of the HTML. What to do? If you want a single HTML table to show merged data from two data feeds (e.g., USD/JPY comes from one feed, EUR/USD comes from another feed) then fullHistory should be pulled out of the SSE constructor and returned to being a global, along with updateHistoryTable() and makeHistoryTbody(). On the other hand, if you want two sets of data to appear in the browser, you should wrap each block of HTML in a div, and give the ID of that div as a parameter to the SSE object. (See Tea for Two, and Two for Tea for an example of the latter approach.)

Cache Prevention

The browser would be silly to cache streaming data. However, it never hurts to get a bit explicit. So, near the top of your script (near where you set the Content-Type header), add these lines:

header("Cache-Control: no-cache, must-revalidate");

header("Expires: Sun, 31 Dec 2000 05:00:00 GMT");

The first one is for HTTP/1.1, and should really be the only thing you need, given that HTTP/1.1 was defined in 1999. But there are still some old proxies around, so that is what the second line is for; it just has to be any date in the past. You could also add header('Pragma: no-cache'); as a third line, but it should be completely redundant in both old and new browsers, servers, and proxies.

Death Prevention

This code is PHP-specific, and is more important on Windows than Linux; if your script keeps dying after 30 seconds, this might well be the fix. I’ve left the explanation of why to Falling Asleep. Just throw this line at the top of your script (right after thedate_default_timezone_set('UTC'); line is just perfect):

set_time_limit(0);

The Easy Way to Lose Weight

Shed those pounds the easy way! Just take this tasty pill and watch your dreams come true:

AddOutputFilterByType DEFLATE text/html text/plain text/xml text/event-stream

If inserted in the correct place (i.e., your Apache server configuration), this will run gzip compression on the data sent back. But first look for a line similar to that in your existing configuration; perhaps you just need to add text/event-stream to the list. For instance, in Ubuntu there is a file called deflate.conf (under /etc/apache2/mods-enabled/) and I added text/event-stream to the end of the line that mentions text/plain.

Another way to configure Apache is to DEFLATE everything except a few image formats. That might look like this (and if this is what you already use, there is nothing to add for SSE):

<Location />

SetOutputFilter DEFLATE

SetEnvIfNoCase Request_URI \.(?:gif|jpe?g|png)$ no-gzip dont-vary

Header append Vary User-Agent env=!dont-vary

</Location>

Learn more about Apache configuration of compression at http://httpd.apache.org/docs/2.4/mod/mod_deflate.html. The Vary header is added to avoid some issues with proxies.

If using IIS as your web server, this article explains how to configure compression for dynamic content: http://technet.microsoft.com/en-us/library/cc753681.aspx.

If using nginx, see http://nginx.org/en/docs/http/ngx_http_gzip_module.html. Note that you might want gzip_min_length to be set to 0, or some low number, to make sure it works for streaming content, too.

Looking Back

In this chapter we have tried to improve the quality of our application, by reporting errors, sending keep-alives, avoiding caching problems, and reconnecting when there is a problem. For reconnecting we use both SSE’s built-in retry mechanism, and our own, both relying on the application sending us an ID number that tells us the latest data seen so far. We also looked at scheduled shutdowns and supporting multiple connections.

The next couple of chapters work on the coverage of this application, instead of the quality, allowing browsers without SSE support to receive the same data while keeping all the production-quality features introduced here.


[19] At the time of writing, all browsers quietly swallow SSE comments, so you cannot even see them in the developer tools.

[20] At the time of writing, it is 3 seconds in Chrome and Safari (see core/page/EventSource.cpp in the WebKit or Blink source code) and 5 seconds in Firefox (see content/base/src/EventSource.cpp in the Mozilla source code).

[21] Firefox enforces a minimum reconnect time of 0.5 seconds.

[22] We could have done this for the simulation server that was created in the earlier chapters: regularly look at the time and go to sleep for 48*3600 seconds at Friday, 5 p.m., New York time. But I left it as working 24/7 because chances are that you will want to try using the demo scripts on weekends. There is such a thing as too much realism!

[23] Note: this assumes the script is running in the UTC (GMT) time zone. If your server is not set for UTC, then at the top of your PHP script use date_default_timezone_set('UTC');. Or if you write the timestamp you give to strtotime in the local server timestamp, it will work (but creates more work on the client).

[24] You can find a good discussion of this at http://stackoverflow.com/q/4295782/841830.