Case Studies - Systems Programming: Designing and Developing Distributed Applications, FIRST EDITION (2016)

Systems Programming: Designing and Developing Distributed Applications, FIRST EDITION (2016)

Chapter 7. Case Studies

Putting It All Together

Abstract

A main theme running throughout the book is to bring the technical content to life through a variety of practical activities, programming exercises and case studies, numerous examples, and analogies. In particular, a single case study runs through the first four “viewpoint” chapters, putting the material into an application context and cross-linking the many themes within and across the chapters.

The multiplayer network game case study was selected because it has a useful set of characteristics that make it ideal as a basis to discuss structure, function, and behavior. However, the documentation for that case study has been dispersed across several chapters with the primary goal of reinforcing the material of those chapters and placing it into perspective; the focus has not been on the presentation of the case study itself.

This chapter provides two additional self-contained case studies. These have been chosen such that they are well differentiated in terms of their structure and behavior and therefore are collectively representative of a wide space of distributed applications. Here, the focus is on the case studies in their entirety. The case studies are presented with detailed design documentation, from requirements analysis to complete working applications with annotated sections of codes presented and discussed.

Also included is a discussion of good design and development practices for distributed applications. Aspects discussed include requirements analysis, architectural considerations, communication, code reuse and code libraries, and testing.

Keywords

Case studies

Time service

Network Time Protocol (NTP)

gethostbyname

Event notification service

Publish-subscribe

Message serialization

Requirements analysis

Protocol Definition Unit (PDU)

Designing distributed applications

Testing distributed applications

Interoperability

Heterogeneity

Transparency

DNS

Directory service.

7.1 Rationale and Overview

The purpose of this chapter is to present two complete and detailed case studies, which place the various sections of theory and practice covered in the earlier parts of the book into application perspectives. The new case studies, in addition to the multiplayer distributed game case study, which has been used as a common thread throughout the earlier parts of the book, serve to integrate the four viewpoints (process, communication, resource, and architecture) to give a complete big-picture view of distributed applications animated through a series of diverse examples.

The case studies are (1) time service client and (2) event notification service. These have been chosen to so as to collectively provide wide coverage of the concepts discussed in the earlier chapters; each case study provides unique opportunities to demonstrate specific structural and behavioral characteristics and techniques to overcome the earlier identified challenges of distributed applications design and development.

For each use case, the design and development process is documented in detail with annotations to help the reader follow the various steps and understand the choices available and decisions taken. The full program code is provided.

7.2 Introduction to the Use Cases

The first case study is a client-side application for the Network Time Protocol (NTP) service provided by the National Institute of Standards and Technology (NIST). NTP has been described in Chapter 6.

This case study provides an interesting example of a situation where a developer must design an application that integrates with a preexisting service with a predefined interface. In such cases, the developer must ensure strict conformance to the prepublished interface specification. In the case of NTP, the communication protocol operates on a request-reply basis with a fixed message format. This request-reply nature of the protocol implicitly defines the interaction mechanism.

The case study also demonstrates the use of a DNS resolver to locate the IP address of the required NTP server instance, an example of refactoring, the creation of a software library to modularize the logic and facilitate code reuse, and the creation of two different user front-end components, which change the user interface but use the same library methods and thus have the same underlying functionality.

The second case study is an event notification service (ENS), complete with a set of application components, which serve as publisher clients and consumer clients of the ENS. The ENS decouples the application components (the publishers and consumers) such that they never communicate directly. The ENS does not need to know which events will be published or the meaning of those events; it simply stores event type-value pairs. When a new event type or new value of an existing event type is published, the ENS updates its tables. Event values are pushed out to all clients that have subscribed to (registered an interest in) the respective event type. The ENS is a service that can be employed by multiple different applications simultaneously; it does not need to know which applications are using it or what their purpose is.

The use of an ENS has design-time benefits in that a consumer component does not need to know the identity of a publisher that will generate the event data and vice versa. Component design is thus simpler, faster, and safer in the sense that it will not have built-in dependencies on specific other components. The ENS provides runtime flexibility and facilitates dynamic logical configuration in terms of publisher-consumer relationships between components.

The second case study also demonstrates the use of a directory service (DS) to facilitate dynamic binding of components and thus provides an example of a hierarchical service model (the ENS server self-registers with the DS, and ENS clients then contact the DS to get the address of the ENS server), interoperability between heterogeneous components developed in multiple different languages, and serialization of language-specific representations of data structures into implementation-neutral (byte array) representations of the same data structure.

7.3 Case Study #1: Time Service Client (with Library)

This case study is concerned with the design and development of a client-side application, which obtains a current timestamp value, on demand, from a trusted time service hosted on the Internet.

7.3.1 Learning Outcomes Associated with the Case Study

This case study encompasses several important aspects of distributed applications design, structure, and behavior. Subject matter includes the following:

• A practical example of the client-server architecture

• A practical example of a request-reply protocol

• A practical example of refactoring to separate business logic and user interface logic

• The development of a code library

• Incorporation of library codes into application projects

• An example of the use of a DNS resolver to resolve domain names into IP addresses

• A detailed example of documentation of the design and development processes

• An example of the use of the UDP communication protocol at the transport layer

• An example of the use of the NTP communication protocol at the application layer

Note that this case study has been implemented in three phases and that the documentation presented relates to the third phase, in which the business logic for the client-side interaction with the time service is separated from the user interface front end and placed into a software library. The three phases of development are described later, in Section 7.3.7.

7.3.2 Requirements Analysis

This section identifies the functional and nonfunctional requirements of the time service client application.

• Trustable, reliable, and future-proofed. The application must use a standard, popular, and well-supported time service. NIST provides the Internet Time Service (ITS), which comprises several different time services. Some of these, such as TIME and DAYTIME, are outdated and depreciated to some extent; the currently recommended network time protocol of choice is NTP. The case study example will thus use NTP.

• Flexible and loose-coupled. The application must not be dependent on any specific NIST time server. The application must be able to use different time server instances if any particular one is not available or has had an IP address change. The case study example will thus embed a DNS resolver so that NIST servers can be selected based on their domain name URL.

• Responsive. There must be low-latency end-to-end, that is, from when the timestamp is requested to when it is displayed to the user. This is because the data (being the current time value) are highly delay-sensitive. The client's operation and especially the way the communication aspect is managed in the library component must be designed accordingly.

• Robust. The NTP client side should be able to continue functioning despite a number of server-side problems, as well as communication problems. Examples of problems that could occur are as follows: the NTP server crashes, the service domain name used is incorrect and cannot be resolved to the IP address of an NIST server, the UDP-transported NTP request message is lost or corrupted, and the NTP server response message is lost or corrupted.

• Modular architecture. There is a need to support the integration of the time service client behavior into a variety of applications that require accurate timestamps. This suggests that the core NTP client functionality should be deployed in the form of a library. It is also necessary to be able to build simple self-contained NTP client applications for testing and evaluation of the library functionality.

• Efficient. As the NTP client side will be developed as a library module, it is very important that the NTP client logic should be implemented with the lowest possible complexity code, with only the required functionality and using the least possible resources in terms of processing time and also network bandwidth. The library module interface must be as simple as possible, exposing only a minimal set of methods required to configure the module and to instigate NTP timestamp requests.

7.3.3 Architecture and Code Structure

The application is designed around a library component, which deals with the core behavior. This comprises setting up the local communication resources (i.e., creating a socket and configuring it to operate in nonblocking mode), resolving the domain name of an NIST time server to its IP address (this requires the use of a DNS resolver), and the actual communication with the NTP server instance located at the resolved IP address. The communication methods within the library that perform sending an NTP request and receiving an NTP response must always return a result within a short time frame; even when the NTP server does not respond, the library module must continue to operate and must not crash or freeze. The library logic has been kept as simple as possible and comprises a single class as shown in Figure 7.1.

f07-01-9780128007297

FIGURE 7.1 The CNTP_Client class (the core of the NTP client library).

Figure 7.1 shows the CNTP_Client class template. The NTP client library component is initialized when the constructor CNTP_Client() is called; this initializes the socket that will be used to communicate with the NTP server. There are two internal state variables: (1) the IP address of a specific NTP server, which is set when the application component calls the ResolveURLtoIPaddress() method with the URL of an NIST server as the passed-in parameter, and (2) the timestamp data structure NTP_Timestamp_Data (see Figure 7.2), which is created in the library based on received NTP timestamps, which are encoded in a raw format as the number of seconds since the beginning of the year 1900. This structure is returned from the library as the result of an application component making a timestamp request by calling the Get_NTP_Timestamp() method. The Send_TimeService:Request() and Receive() methods are used internally to structure the communication logic and handle the NTP server connectivity side of the library. These two methods are not exposed on the application component interface of the library.

f07-02-9780128007297

FIGURE 7.2 The NTP_Timestamp_Data structure returned by the Get_NTP_Timestamp() method of the library.

The case study includes two application components, either of which can be combined with the NTP client library component to build a specific time service client application program. The application components differ in terms of the user interface they provide but use the library functionality internally in the same way. The external connectivity to the NTP service is managed by the library component and not the application components, and therefore, this aspect is identical in both of the resulting application programs.

The first of the two application components provides a graphical user interface (GUI); see Figure 7.3. The template of the core class of this application component is shown in Figure 7.4.

f07-03-9780128007297

FIGURE 7.3 The GUI time service client user interface.

f07-04-9780128007297

FIGURE 7.4 The CCaseStudy_TimeServiceClient_GUIDlg class.

Figure 7.4 shows the template of the CCaseStudy_TimeServiceClient_GUIDlg class. This is the core class of the GUI application component. Only the key detail that relates to the integration with the library component is included; user interface-specific details such as controls (buttons, list boxes, etc.) are omitted. The m_pNTP_Client variable is a pointer to an instance of the library module, which is initialized when the class constructor CCaseStudy_TimeServiceClient_GUIDlg() runs. Whenever the user selects an NIST server domain name from the provided list of URLs (see Figure 7.3), the OnLbnSelchangeNtpServerUrlList() event handler method is called, which passes the domain name to the library to be resolved to its respective IP address. NTP time requests are made (via the NTP client library) at 5 s intervals in the OnTimer event handler method, which is controlled by a programmable timer.

The second application component provides a textual user interface; see Figure 7.5. This procedural application component is represented in a class template-like format in Figure 7.6.

f07-05-9780128007297

FIGURE 7.5 The console (text-based) time service client user interface.

f07-06-9780128007297

FIGURE 7.6 The console version of the time service client.

Figure 7.6 shows the key aspects of the console time service client user interface component, in terms of its interface with the NTP client library. The m_pNTP_Client variable is a pointer to an instance of the library module, which is initialized in the InitialiseLibrary() function. The ResolveTimeServiceDomainNameTOIPAddress() function passes the URL of the user-chosen NIST time server to the library. NTP time requests are made (via the NTP client library) at 5 s intervals from the _tmain() function.

Figure 7.7 shows the sequence diagram for the GUI application, which comprises the GUI front-end component and the NTP client library. Compare this with Figure 7.8, which shows the sequence diagram for the textual interface-based application, which comprises the console front-end component and the NTP client library.

f07-07-9780128007297

FIGURE 7.7 Sequence diagram for the GUI front-end interaction with the NTP client library.

f07-08-9780128007297

FIGURE 7.8 Sequence diagram for the console front-end interaction with the NTP client library.

Figures 7.7 and 7.8 show the interactions between each of the two client application front ends and the NTP client library. The front-end applications are different in terms of their code structure and their internal behavior. The most significant difference is that the GUI-based front end is event-driven (i.e., its behavior is driven by events such as the user's mouse clicks or keyboard entries, as well as by timer-driven events), while the console application is procedural (i.e., it begins with a main function, from which other functions are called in a sequence to carry out the logic of the program). Despite the differences in the two front ends, it is clear from these diagrams that the library is used in the same way in each case:

• The library is first initialized (its constructor method is invoked). This involves setting up the communication socket, which will be used to send requests to, and receive responses from, the NTP service. This socket is set to work in nonblocking mode.

• The library resolves the domain name of the NIST service, within the ResolveURLtoIPaddress() method. It uses gethostbyname (which is a DNS resolver) internally and contacts the external DNS service automatically (see Chapter 6 for details of how DNS works).

• The library makes NTP timestamp requests, from the Get_NTP_Timestamp() method to the external NTP service.

The sequence diagrams also illustrate an important aspect of transparency; the two external services that are used, DNS and NTP, are represented as single objects that respond to requests. This is the external view that the developer is presented with although each of these services is internally quite complex. A component diagram is provided in Figure 7.9 to further reinforce this point.

f07-09-9780128007297

FIGURE 7.9 Component diagram showing the process—internal and external connectivities.

Figure 7.9 (left-hand side) shows how the client application is composed of two software components: one of the front-end user interfaces and the NTP client library, which deals with all of the external connectivity and communication with the two external services used (DNS and NTP). The figure (right-hand side) also depicts the external services as pseudocomponents; essentially, the entire DNS service is abstracted as a single method on a single component (and likewise for NTP). In reality, these two services have high complexity in terms of the number of server entities and the way they interconnect and synchronize. However, the whole point of transparency provision is to make services such as these accessible without knowing any of the details of their internal workings.

7.3.4 Separation of Concerns

Careful separation of concerns along component boundaries contributes benefits in terms of clear design and ease of implementation (especially in terms of reducing the extent of coupling and interaction between software components). It also facilitates modular reuse of functionality and can simplify testing.

The NTP service provides a clearly defined function, and in the use case scenario presented here, the client side is a user interface front end, the functionality of which is limited to displaying the timestamp value provided by the NTP service. However, the client-side application functionality can be broken down at a finer level. There are two software components, one of which provides the user interface and the other (the NTP_Client library) provides the business logic necessary to locate and connect to the NTP service, to make NTP requests to the service, and to receive and interpret the responses from that service. Figure 7.10 puts this two-level view of the separation of concerns into perspective.

f07-10-9780128007297

FIGURE 7.10 Mapping the strands of functionality at the system level and the software component level.

Figure 7.10 shows that the mapping of the different functional strands is dependent on the level at which it is applied. At the highest level, the NTP client application only provides user interface functionality, but internally, the logic is split between two components, and at this finer level, we can separate out the purely user interface-related logic from the business logic of managing the communication with external services and managing the data resource, which in this case is the timestamp.

7.3.5 Coupling and Binding Between Components

The coupling between the NTP client and NTP service can be described as loose (because the client can connect to any instance of the NTP service, based on a runtime-provided NIST service domain name) and direct (because once the service instance is decided, the communication takes place directly between the client and server components).

7.3.5.1 Client Binding to Time Servers

NIST has a number of time servers deployed with well-known domain names, but this does not mean that the IP addresses of servers are fixed, and thus, a chosen domain name must be resolved to an IP address, rather than hard coding or long-term caching. Binding can be considered to occur in two phases. Firstly, a user-selected URL is resolved to the IP address of the respective NTP time server, courtesy of the DNS service, and then the IP address is used to send point-to-point UDP segments containing NTP request messages to the NTP server.

7.3.6 Communication Aspects of the Design

NTP uses a request-reply protocol. A fixed well-known port number 123 is used to identify the NTP server process on a particular computer. The host computer is identified by its IP address (which is in turn derived from a NIST-publicized domain name; see preceding section).

7.3.6.1 Message Types and Semantics

There are two message types: an NTP request and an NTP response. The NTP request is only valid when sent to an NTP server, and an NTP response is only valid when received from an NTP server as a response to an NTP request that was sent a short time earlier. The communication semantics are thus simple.

7.3.6.2 NTP Protocol Definition Unit (PDU)

Both the NTP request message and NTP response message are encoded as fixed-size arrays of 64 bytes. The NTP PDU format overlays the linear byte array with a structure that collects the bytes into various fixed length fields and thus maps out the application meaning of the message content. Figure 7.11 shows the NTP PDU, which defines the format of NTP request and response messages.

f07-11-9780128007297

FIGURE 7.11 The PDU format of the NTP protocol. LI = Leap Indicator: A 2-bit code, which indicates that a leap second will be inserted or deleted in the last minute of the current day. This field is significant only in an NTP server response message. VN = Version Number: A 3-bit code that indicates the NTP version number. Mode: A 3-bit code indicating the protocol mode. When sending an NTP request, the NTP client sets this field to 3 (which signifies the message originated on the client side). When responding to a client request, the NTP server sets the mode value to 4 (which signifies the message originated on the server side). When operating in broadcast mode, the NTP server sets the mode value to 5 (which signifies broadcast). Stratum: An 8-bit value indicating the type of reference clock (1 means primary reference, such as synchronized by a radio clock, and values 2-15 mean a secondary reference, which is synchronized by NTP). Poll: An eight-bit value expressed as an exponent of two, indicating the maximum interval between successive messages. Values are from 4 to 17, meaning maximum intervals of 16, 32, 64, 128 … seconds up to 131,072 s (which is approximately 36 h). Precision: An 8-bit value representing the clock's precision, expressed as an exponent of two. Values are from − 6 to − 20, meaning precision values of one sixty-fourth of a second or better. Root Delay: A 32-bit value indicating the round-trip delay to the primary reference source in seconds. Root Dispersion: A 32-bit unsigned value indicating the maximum error, which can be up to several hundred milliseconds. Reference Identifier: A 32-bit value identifying the particular reference source. For stratum 1 (primary server), the value is a four-character code, and for secondary servers, the value is the IPv4 address of the synchronization source. Reference Timestamp: The time the system clock was last set or corrected. Originate Timestamp: The time at which the request message was sent from the client (to the server). Receive Timestamp: The time the request arrived at the server (or the time the reply arrived at the client, depending on the message direction). Transmit Timestamp: The time the reply message was sent from the server (to the client). Authenticator: An optional value used with NTP authentication.

The NTP request message is populated as follows: The entire 48-byte array is zeroed out, and then the LI, Version, and Mode fields are set to the values 3, 4, and 3, respectively; this indicates that the message is an NTP version 4 message sent from the client (i.e., a request), with a currently unsynchronized clock (i.e., the timestamp values in the message are not meaningful). See Figure 7.12.

f07-12-9780128007297

FIGURE 7.12 The Send_TimeService:Request method of the CNTP_Client class in the library.

Figure 7.12 shows the program code that sets the NTP request message content and sends the message, in the time service client library component. Only the first byte is configured to inform the recipient that the type of message is a request (from a client) and is conformant to NTP version 4.

Figure 7.13 shows the receive method of the library component. This method is called when an NTP request message has been sent (over UDP) to the NTP time server and an NTP response is expected (also over UDP). The combined use of a short time delay and a single call to recvfrom (i.e., not repeated periodically in a loop) with the socket configured in nonblocking mode meets a useful compromise between the three potentially conflicting requirements of low-latency responsiveness, robustness, and simple design. The nonblocking socket mode ensures reliability in the sense that the call will return regardless of whether the NTP server responds or not. This is essential to prevent the NTP client library code from blocking indefinitely if the NTP server crashes or if either the request or response message is lost or corrupted in the network.

f07-13-9780128007297

FIGURE 7.13 The receive method of the CNTP_Client class in the library.

The use of the 500 ms delay allows for the round-trip time (RTT) of sending the request message and receiving the response. Network delay is continuously changing, and therefore, there can never be a perfect statically decided time-out value for long-haul network transmissions (as in the case of contacting NTP time servers). The 500 ms-delay value was found experimentally to be a good compromise between waiting long enough so that NTP responses are caught in almost all cases and on the other hand not inserting too much additional latency. Even if the RTT was near instantaneous, this approach only inserts half a second of latency.

The timestamp is held in bytes 40-47 of the response message (the Transmit Timestamp field). The timestamp value is 64 bits wide, the most significant 32 bits representing the number of seconds and the least significant 32 bits representing the fraction of seconds. For the case study application, it was deemed adequate to only consider the whole seconds part of the timestamp (hence, the values of bytes 40-43 are used as can be seen in the code in Figure 7.13). In applications where greater precision is needed, the fractional part of the timestamp can also be taken into account.

7.3.6.3 URL Resolution with DNS

The gethostbyname DNS resolver has been embedded into the library code such that the DNS system is automatically contacted to resolve an NIST time server domain address into an IP address; see Figure 7.14.

f07-14-9780128007297

FIGURE 7.14 The ResolveURLtoIPaddress method of the CNTP_Client class in the library.

7.3.6.4 Rationale for the Chosen Communication Design

The fixed-latency receive mechanism was chosen to keep the code simple while also ensuring a predictable response and preventing the call from blocking if certain faults occur. The mechanism is simple in design and operation: a short time delay is used in combination with a nonblocking socket call.

There are however additional approaches that could be used if the 500 ms fixed latency were problematic in some applications.

A self-tuning system could be built in which a short delay of 50 ms is first tried, and if the recvfrom times out, the delay is doubled and the request message sent again. This doubling would be repeated until the NTP response is received without timing out (thus auto-adjusting the RTT wait time on the client side). There are three issues with this approach. (1) It increases complexity. (2) NTP clients are not supposed to make NTP requests more frequently than once every 4 s (callers making requests at a higher rate may be interpreted as performing a denial of service attack on the NTP service). To conform with this requirement, the self-tuning approach would have to wait 4 s in between each attempt and thus could introduce a significant delay before a timestamp value is received (although the latency in the timestamp itself will be potentially less than the currently fixed 500 ms). (3) Network delay is continuously varying, so the RTT can change even after the self-tuning has completed.

Another way to increase responsiveness of the client side is to invoke the nonblocking recvfrom call at shorter intervals (such as 50 ms), driven by a programmable timer mechanism. This requires a stopping condition in case the NTP server never responds; a cutoff point has to be decided, perhaps after 10 invocations (i.e., keeping the upper limit latency to 500 ms). This approach increases complexity and uses more runtime resource than the current design.

7.3.7 Implementation

The time service client application has been implemented in three phases, all of which are available as full sample code examples.

The first iteration was to develop a monolithic application, which contained the full NTP client functionality, integrated with the user interface logic. This approach represents a form of rapid prototyping-based development and is only really suitable in applications with quite narrow functionality, as in this case. The phase 1 monolithic project is CaseStudy_TimeServiceClient_GUI Phase1 Monolithic.

The second phase was to refactor the code to place the NTP client-side business logic into a separate class from the user interface-related functionality. The phase 2 refactored project is CaseStudy_TimeServiceClient_GUI Phase2 Refactored.

The third phase was to create a library that contains the NTP client-side functionality and enables the reuse of this functionality by embedding the library into applications. The phase 3 library project is CaseStudy_TimeServiceClient Phase3 Library.

To demonstrate the benefit of the library approach, two separate front-end applications are developed. The first of these provides the same GUI interface as the phase 1 and phase 2 projects. The second front end provides a text-based interface; since it uses the library, it has exactly the same time service-related functionality, but with a totally different user interface. The phase 3 (part a) GUI application project is CaseStudy_TimeServiceClient_GUI Phase3 App-side uses library. The phase 3 (part b) text-based application project is CaseStudy_TimeServClnt_Console AppSide uses lib.

This implementation route is mapped out in Figure 7.15.

f07-15-9780128007297

FIGURE 7.15 The implementation roadmap.

7.3.8 Testing

Testing is a continual process of checking the correctness of the requirements and the design and ensuring that the design actually reflects the requirements and that the implementation actually reflects the design. In the case of the NTP client application, it was decided to implement the core functionality in the form of a library. The most important aspects of behavior in this particular application case are that the NTP protocol is being used correctly, and that the correct results are returned from the NTP service, and also that the results are properly interpreted in the application code. There are various faults that can occur, and one particular perceived problem that was to be avoided was that the application should not crash or freeze if a problem occurs with the NTP service itself or as a result of the communication with the service. The approach taken was to build an application, which served as a test-bed for getting the NTP protocol interaction correct, and then to subsequently refactor and extract the library as secondary phases. The user interface aspect was not the primary concern of the first phase, but since it turned out to be quite suitable for purpose, it was not changed for the sake of doing so, when moving through phases 2 and 3.

The formal test plan is used as a final sign-off that the application meets its functional, behavioral, and correctness requirements, but as stated above, this does not reveal the true extent of the testing that actually occurs, due to its continuous nature. The test plan and the test outcomes are provided in Table 7.1.

Table 7.1

The Test Plan and Outcomes for the Time Service Client Application

t0010

7.3.9 Transparency Aspects of the Use Case

Transparency to the user. The NTP client-side library deals with the connectivity and communication with the NTP service, thus hiding the communication aspect from the user regardless of the front-end user interface. Depending on how it is designed, it is possible that the user interface hides or reveals different levels of detail, for example, the user may be asked to select a name service instance by its domain name and/or IP address. The library could be embedded into an application such that the NTP connection is handled silently from the user's perspective and thus there is no need for the user to be aware that the NTP service is used. The library provides location transparency, distribution transparency, and access transparency (the time value is a resource, which is accessed via the NTP service, regardless of which service instance is used or its underlying platform).

Transparency to the developer. The internal clock synchronization of the NTP service is hidden from developers. Applications need only resolve one of the published NIST service domain names to get a server IP address and then send an NTP request message to well-known port 123 at that address. All server instances should provide the same time value, thus hiding the distributed nature of the service as well as the structure and organization including details of the clock strata and the synchronization that goes on between the servers.

7.3.10 Case Study Resources

The full source code and project files for each of the three phases of the case study implementation, as well as both of the user interface variants used in phase 3 (the library version), are provided as part of the support resources that accompany the book (five complete code projects).

7.4 Case Study #2: Event Notification Service

This case study focuses on the design and development of an ENS. In addition, it provides a demonstration of the use of a DS to support dynamic binding between clients of the ENS and the ENS server itself. The various components have been developed in a variety of languages to illustrate interoperability within a heterogeneous distributed application.

Applications that use the ENS comprise two types of components: publisher components, which publish events via the ENS, and consumer components, which use the event values received from the ENS.

The case study uses the DS that is built into the Distributed Systems Workbench to facilitate dynamic binding between the application components (which are clients of the ENS) and the ENS. To facilitate this, the ENS server auto-registers its name and address details with the DS, and the client components send resolve request messages to obtain the IP address and port of the ENS server from the DS.

The ENS is a form of middleware that supports applications, which operate on a publish-subscribe basis. The ENS provides logical connectivity between application components while they remain fully decoupled from each other. Figure 7.16 provides a simplified view of the interaction that occurs between the ENS and the application components.

f07-16-9780128007297

FIGURE 7.16 High-level illustration of the role of the event notification service.

Figure 7.16 shows how the ENS serves as a form of middleware between the application components. Events published by components are automatically passed to other components that have registered an interest in those event types.

7.4.1 Learning Outcomes Associated with the Case Study

This case study covers a wide range of topics, which include the following:

• A practical example of an ENS

• A practical example of a publish-subscribe application

• A practical example of loose-coupled components and runtime configuration

• Dynamic binding of components using a DS

• An example of the combined use of TCP and UDP transport layer protocols in the same application

• An example of the use of UDP in broadcast mode to perform service location

• An example of TCP connections and a server component maintaining an array of connected clients

• Interoperability across heterogeneous components developed in different languages

• Serialization and deserialization of language-specific data representations into an implementation-neutral byte array format

Note that the application components (the event publishers and event consumers) have each been implemented in three different languages (C++, C#, and Java), and therefore, there are six different sample client component types, which can connect to the ENS and interact through event publishing and consumption.

The differently implemented components have different user interfaces, with a different “look and feel” but with the same underlying behavior. In particular, they all interact with the ENS with the same, single interface it provides. This means that through the ENS, the components can interoperate regardless of the language they are written in; for example, a C++ publisher can create events, which are consumed by a Java consumer (any of the combinations of the six component types will work). For this to be possible, all messages are serialized into a specific ENS PDU, which comprises four fixed length fields, and transmitted as a flattened byte array (a contiguous block of bytes).

All messages have the same standard format when they reach the ENS; thus, it can correctly extract the message content from these messages and construct replies as appropriate, without needing to know the implementation of the sender.

7.4.2 Requirements Analysis

This section identifies the functional and nonfunctional requirements of the ENS:

• Automatic location and connection. The ENS server should automatically register with the DS so that other components can query the DS to obtain the address of the ENS server on demand. When a client component starts, it should automatically discover the ENS address details by querying the DS. Clients should then automatically establish a TCP connection with ENS.

• Support for publish-subscribe-based applications. Such applications comprise some components that publish events and others that consume events. The ENS should maintain a database of event types and their values and the address details of the most recent publisher for each particular event.

• Facilitate dynamic logical association between components based only on named event types. Consumer clients will subscribe to events that they are interested in. Publisher components will publish events that they generate. The ENS will make implicit associations by automatically forwarding to each client updates of the events the specific client has subscribed to.

• Application-independent and extensible. The ENS server must create new event type categories when they are either subscribed to or published, if they don't already exist.

• Low-latency. The ENS server must pass the current value of an event (if known) to a subscriber, immediately upon receiving a subscription request. That is, it sends the instantaneously available cached value; it must not contact the publisher requesting an update, which would add latency and increase the overall communication intensity. Event updates are propagated on a state-change basis to the set of registered subscriber components thereafter.

• Scalable and efficient. The ENS must use resources effectively. In particular, the communication intensity must be low; event updates should only be sent to the subset of clients that have subscribed to them.

• Availability. The use model assumes that the ENS server is long-lived and runs continually. This requires that it is robust.

7.4.3 Architecture and Code Structure

The ENS and DS fall into the category of common services discussed in Chapter 6. The combined use of these services facilitates full decoupling between application components, as shown in the interaction diagram in Figure 7.17; the application components do not communicate directly.

f07-17-9780128007297

FIGURE 7.17 Sequence diagram representation of ENS interaction.

Figure 7.17 shows a typical interaction involving the ENS, the DS, and a pair of application components: one publisher and one consumer. In this sequence, the ENS first registers with the DS. The publisher component is then started and obtains the address details of the ENS from the DS and using this information connects to the ENS. The publisher then publishes a new event type “E” with value “27.” The consumer component is then started, and it too obtains the address of the ENS from the DS, and it also establishes a connection with the ENS. The consumer subscribes to event type “E,” and the ENS sends back its cached value for this event type, which is “27.” The publisher then publishes a new value of “33” for event type “E.” The ENS responds by pushing the new value for the event type to the consumer. The consumer now unsubscribes to event type “E.” Subsequently, the publisher publishes a new value of “47” for event type “E.” The ENS does not push the new value for the event type to the consumer.

7.4.4 Separation of Concerns

The ENS is designed to support distributed applications, which consist of two categories of components (publishers and consumers); both of these categories of components are seen as clients from the ENS system perspective. Within the application itself, publishers may be effectively the server side and consumers are effectively the client side. These have quite specific application-dependent behavior with respect to a particular type of event, and in this regard, the functional and behavioral concerns are naturally separated. However, the design can be more complex when one component is a publisher of some event types and a consumer of others or where a component consumes an event, modifies the underlying data values, and then publishes a new version of the same event. The ENS is stateless, that is, its design assumes that publish and subscribe activities are independent, asynchronous, and unordered. It is possible that there are multiple subscribers to a particular event type; this situation is not problematic. However, where a particular consumer subscribes to multiple event types, the relative ordering of updates may be an issue in some applications (and should be resolved in the application logic). In the case that there are multiple publishers of a specific event type, there may be application-specific restrictions needed, for example, creating variations of event types with additional naming information, which differentiates between them without creating a dependency on a specific publisher component. For example, one publisher may publish an event type temperature_01, which could indicate a temperature value with an accuracy of 1°, while another publisher could publish an event type temperature_03, which could indicate a temperature value with an accuracy of 3°. A consumer could subscribe to both event types but use the value of temperature_01 with preference if it is available.

7.4.5 Coupling and Binding Between Components

The coupling between ENS clients (either an application's publisher or consumer component) is loose (because the DS is used to find the address details of the ENS server dynamically at runtime) and direct (because once the ENS server address details are known, a TCP connection is established directly between the client component and the ENS server).

The coupling between the application components themselves, that is, the coupling between the event publishers and event consumers, is loose (because the ENS provides a logical connectivity based on event types and not addresses) and indirect (because there is no direct connection or messaging between these components; all communication is brokered by the ENS server).

The avoidance of tight coupling ensures that the entire system comprising the ENS service and the applications that use it can be flexibly configured at runtime, and no components need any prior knowledge of the location of others. This also contributes to robustness since publisher components can be substituted without consumers being affected and that consumers can be added or removed without publishers being affected.

7.4.5.1 Application-Level Coupling

Application components are logically coupled solely by the event relationships they share. An event type is dynamically created whenever a publisher publishes a new event type or a consumer subscribes to a new event type not already in the ENS database. This aspect is totally runtime-configured.

Application components therefore have no direct dependencies on other components; a publisher can publish a new event type regardless of whether there is any consumer for that event type, and similarly, a consumer can subscribe to an event type regardless of whether a publisher for that event type is present.

This lack of direct dependency is important for scalability and also extensibility because new features of applications, or even new applications, can be supported by the ENS without it having to know what event types exist a priori. The complete application-level decoupling allows publish and subscribe actions to occur asynchronously and in any order, for example, it is not an error to subscribe to an event type that has not yet been published. The loose coupling achieved is illustrated in Figure 7.18.

f07-18-9780128007297

FIGURE 7.18 Loose coupling of application components, facilitated by the ENS.

Figure 7.18 shows how the ENS decouples application components such that they do not have any direct dependencies and do not communicate directly. This allows systems to scale up without having to redesign components. For example, new consumers can be added for a particular event type without having to modify the publisher component, which generates the events. The lack of direct dependencies is also very powerful in terms of fault tolerance; a consumer can fail without the publisher needing to know about it. A publisher can fail without crashing the consumer (although the events will no longer be published by the particular failed publisher, another publisher may be started or the consumer will wait but will meanwhile still be capable of responding to other event types it has subscribed to).

Figure 7.18 also shows the separation of concerns at two levels. All components shown below the ENS in the figure are clients from the point of view of the ENS service. However, when viewed from the perspective of specific applications, the components on the left are essentially servers and those on the right are essentially clients.

7.4.6 Communication Aspects of the Design

Figure 7.19 shows the port assignments used in the ENS and the DS.

f07-19-9780128007297

FIGURE 7.19 ENS and DS ports.

All components must interact with the DS interface using the ports shown in Figure 7.19. However, the client components do not need to know the ENS port number a priori; the purpose of using the DS is to allow the clients to find the ENS address and port details at runtime.

The ENS uses a combination of UDP and TCP transport layer communications. UDP is used for communication with the DS. UDP broadcasts are used to send register messages to the DS (in the case of the ENS Server) and to send resolve request messages to the DS (in the case of clients of the ENS, in order to obtain the address of the ENS Server). The DS responds to resolve requests with a directed UDP message to the specific client component. TCP is used for communication between the ENS server and its clients (the publishers and consumers of events). Each client, publisher, or consumer establishes a dedicated TCP connection with the ENS server, over which the ENS application messages are passed. There are four ENS message types, which are discussed later. Figure 7.20maps out the ENS connectivity in the form of a socket-level view of the communication.

f07-20-9780128007297

FIGURE 7.20 Socket-level view of ENS communication.

Figure 7.20 provides a socket-level view of the communication that occurs between the various components. The diagram illustrates several important points:

• The ENS is a service that makes use of another service (the DS) to facilitate connectivity. This is a useful example of the use of common services as building blocks to achieve higher functionality while at the same time limiting the complexity of components. In this case, the clients of the ENS can automatically locate the ENS server and connect to it at runtime. Communication with the DS is implemented using UDP because it supports broadcasts.

• The ENS is a form of middleware. The user-level applications comprise the application event publisher components and the application event consumer components. The ENS is not actually part of the user-level applications; its role is to provide a loosely coupled connectivity between the application-level components such that they never communicate directly. This facilitates runtime configuration and leads to flexible and robust outcomes.

• The ENS supports multiple concurrent TCP connections with clients. These can be either consumers or publishers (in fact, a single component can play both roles, as it can subscribe to some event types and also publish some event types). The ENS therefore maintains an array of all client sockets (and hence, connections); it does not differentiate between the roles of clients at the connection level.

Figure 7.21 shows the TCP connection structure used in the ENS. An array of these structures are maintained. The bInUse flag is set to true for entries in the array that relate to currently connected client components.

f07-21-9780128007297

FIGURE 7.21 The ENS TCP connection structure.

Figure 7.22 shows the structure that the ENS uses to hold details of event types. The ENS maintains an array of these structures.

f07-22-9780128007297

FIGURE 7.22 The Event_Type structure.

For each event type, there can be many subscribers, and it cannot be predicted how popular the various event types are. Using a large array to hold subscribers, details for each event type would be inefficient in cases where there are only a small number of actual subscribers, and no matter how large the array is, there is no certainty that it is large enough in all cases. Therefore, a linked list of subscribers is attached to each event type (as can be seen in Figure 7.22); this can grow and shrink dynamically as necessary. Figure 7.23 shows the format of a subscriber linked list entry. Note that the linked list entry only contains a single piece of data relating to the subscriber, the index into the connection array. This approach is efficient because it is concise and avoids duplication of data.

f07-23-9780128007297

FIGURE 7.23 The subscriber linked list entry structure.

7.4.6.1 ENS Message Types

The ENS application interface comprises four message types, which are defined in the form of an enumerated type in all of the components, regardless of the component's role and the language it is developed in (see Figure 7.24). This is a key aspect of achieving universal interoperability.

f07-24-9780128007297

FIGURE 7.24 The ENS message type codes enumeration.

The various ENS message types share a single PDU format, shown in Figure 7.25, which simplifies design and development. A couple of the fields are not used in some message types representing slight inefficiency (e.g., the cEventValue field is not valid in subscribe messages), but this is not considered to be significant enough to warrant a more complex design in the sense that message serialization and deserialization would be dependent on the message type.

f07-25-9780128007297

FIGURE 7.25 The ENS PDU.

Figure 7.25 shows the ENS PDU (the representation as defined in C++). This PDU format is an invariant across all the component implementations, although the structure cannot be defined in exactly the same way in each of the three languages that were used in the example application. The equivalent flattened byte array format that is achieved when this structure is serialized is shown in Figure 7.26. This forms a linear buffer of 117 bytes and is the format in which the message is actually transmitted within the TCP segment and thus is the format that must be achieved when serializing from the internal representations in the other languages.

f07-26-9780128007297

FIGURE 7.26 The flattened byte array representation of the ENS PDU.

7.4.6.2 ENS Communication Semantics

Communication with the ENS operates on a one-way basis with regard to each of the four message types. No acknowledgment is necessary at the application protocol level because the reliable TCP transport layer protocol is used, which automatically takes care of retransmissions if a message is lost or corrupted. Conceptually, subscribe might be considered to operate on a request-reply basis (as it is answered by zero or more event update messages), but since these are asynchronous, and may never occur, it is more accurate to consider the subscribe and update messages as separate activities from a communication semantics viewpoint.

7.4.6.3 DS Message Types

The DS has a simple interface comprising two request messages, which are each encoded as a string of 8-bit ASCII character values, in ASCIIZ format (i.e., terminated with a “\0” character) and a single reply PDU format. See Figures 7.27 and 7.28.

f07-27-9780128007297

FIGURE 7.27 The DS request message formats with ENS usage examples.

f07-28-9780128007297

FIGURE 7.28 The DS reply message PDU.

Figure 7.27 shows the two DS request message types. Note that in the case of register request messages, the DS automatically determines the IP address of the sender from the recvfrom socket API call, which extracts this information from the UDP packet header. Hence, there is no need to explicitly encode the address in the register request message; its omission represents a slight efficiency improvement in terms of processing time and communication bandwidth usage. The port must be stated explicitly however because it cannot be assumed that the registering service wants to use the same port that it used to connect to the DS for receiving application messages.

Figure 7.28 shows the DS reply message PDU, which contains the IP address and port number of the requested service.

7.4.6.4 DS Communication Semantics

The DS has been designed to be lightweight with respect to communication overhead and also to operate with low latency. As such, it has a simple communication interface as discussed above and also simple communication semantics. DS register messages operate on a one-way basis without acknowledgment. DS resolve messages operate on a request-reply basis: When the requested service name is matched by the DS, an appropriately populated DirectoryServiceReply PDU is returned (see Figure 7.28); if the requested service name is not found in the DS database, the DS returns a PDU containing the address:port values 0.0.0.0:0.

7.4.7 An Application Use Scenario to Illustrate the Use of the ENS

An application that comprises temperature and humidity monitoring components (which are effectively application servers) and display components (which are effectively application clients) has been created to illustrate the use and behavior of the ENS in a realistic application setting. The environmental monitoring components need to send messages containing temperature and humidity values to the display components. The application has been designed to make use of the ENS to achieve a decoupled and runtime configurable architecture. In this scheme, the application components never communicate directly with each other; the ENS acts as a message broker, based on event types. The environmental monitoring components publish events (which are the values of temperature or humidity when they change). The display components are said to be consumers of these events. They each register their interest in certain event types by sending a subscribe message to the ENS. Thereafter, the ENS sends event update messages to each of the consumers, when new values are published.

To illustrate interoperability across components written in different programming languages, the environment monitoring application has been developed with publisher component variants and consumer component variants developed in three different languages (C++, C#, and Java).

A use scenario in which one of the publisher variants (the C# one) publishes events that are consumed by all three consumer components is used to illustrate the operation of the ENS and the transparency it provides. See the following four figures in which the activity of this scenario is broken down into four stages.

Figure 7.29 shows the first stage of activity in the environment monitoring application use scenario.1 The sequence of actions in this stage occur in five steps: (1) The DS is started if not already running (the DS is an application provided by the Distributed Systems Workbench); (2) the ENS server registers its name, IP address, and port number with the DS; (3) The ENS client components (which are the environment monitoring application event publishers and event consumers) send resolve requests to the DS, containing the name of the ENS server “Event_NS”; (4) the DS replies to each request with the IP address and port number of the ENS server; and (5) each of the ENS clients then makes a TCP connect request based on the ENS server's address and port details.

f07-29-9780128007297

FIGURE 7.29 Stage 1: Automatic binding and connectivity facilitated by the directory service.

Figure 7.30 shows the second stage of activity in the environment monitoring application use scenario, in which the consumer clients subscribe to the event types they are each interested in. Working down from the top on the right-hand side of the figure, the C#-based consumer client subscribes to “TemperatureChange” events, the C++-based consumer client subscribes to both “TemperatureChange” and “HumidityChange” events, and the Java-based consumer client subscribes to “HumidityChange” events. Note that the buttons on the user interfaces toggle between “Subscribe” and “Unsubscribe” actions, and their labeling is updated accordingly when pressed.

f07-30-9780128007297

FIGURE 7.30 Stage 2: Consumer clients subscribe to event types they are interested in.

Note that in the specific application scenario sequence illustrated in this section, the three consumers all subscribe to event types before any events have been published. The publish and subscribe actions are however completely asynchronous, and the ENS does not enforce or expect any particular ordering. For example, one of the consumers could have delayed subscribing to the event types until some values had already been published; in which case, upon receiving the subscription message, the ENS server would immediately send back to the client the latest already stored value of the respective event type.

Figure 7.31 shows the third stage of activity. The C#-based publisher publishes a series of four TemperatureChange events, each of which is pushed out to the relevant subscribers in the form of event update messages. Notice that only the consumer clients that have subscribed to the particular event type receive the updates, in this case, the C#-based consumer and the C++-based consumer.

f07-31-9780128007297

FIGURE 7.31 Stage 3: Publisher client publishes a series of TemperatureChange events.

Figure 7.32 shows the fourth stage of activity. The C#-based publisher publishes a series of four HumidityChange events, each of which is pushed out to the relevant subscribers in the form of event update messages. Only the consumer clients that have subscribed to the HumidityChange event type receive the updates, in this case, the C++-based consumer and the Java-based consumer.

f07-32-9780128007297

FIGURE 7.32 Stage 4: Publisher client publishes a series of HumidityChange events.

7.4.8 Testing (Table 7.2)

Table 7.2

The Test Plan and Outcomes for the ENS

t0015_at0015_b

The test plan and test outcomes are shown in Table 7.2.

7.4.9 Transparency Aspects of the ENS

Access transparency is provided because the application data are transmitted in a platform-independent and language-independent serialized byte stream format. All event type names and event values are represented in simple 8-bit ASCIIZ character string format, hence avoiding encoding anomalies that can arise when data formats such as integers and floating point numbers are used with different encodings and precisions in different systems. The character string approach also means that any event type name and value can be encoded and managed by the ENS in a totally application-agnostic manner.

Location transparency is provided by the fact that the DS is used to resolve the ENS address from its name when clients need to connect.

Migration transparency is achieved in the sense that the ENS server self-registers with the DS on a 10 s periodic basis, which means that it can be moved between computers, and any newly started client components will receive the new location details when they subsequently issue resolve requests to the DS.

Implementation transparency. The use of the standard TCP communication protocol, in combination with the simple data representation as described above (see access transparency), means that the ENS supports clients operating on any platform and developed in any programming language. This does however require great care when performing the message serialization and deserialization actions within application code (see the different techniques used in each of the C++, C#, and Java examples included in the supporting materials).

The loose coupling between application components, facilitated by the ENS, contributes to failure transparency because there are no direct dependencies between components. One component can crash without causing others to crash, thus achieving fault tolerance. The extent to which this is actually transparent is application-dependent. For example, while a consumer can fail without affecting the behavior of the publisher components, the role of the consumer is not necessarily being performed elsewhere in the system. Similarly, if a publisher were to fail, the events it generates are no longer published; this will not cause the consumers to crash, but they may wait indefinitely for the missing events. The asynchronous nature of the event updates means that the consumer may still be capable of responding to other events it has subscribed to (this depends on the application-specific interpretation of events).

In addition, the operation of the ENS has been designed to be application-independent; that is, it can support any publish-subscribe-based application. This contributes to flexibility and future-proofing.

7.4.10 Case Study Resources

The ENS server has been developed in C++. The environment monitoring application has been developed with three publisher component variants (developed in C++, C#, and Java) and three consumer component variants (also developed in C++, C#, and Java).

The full source code and project files for the ENS server and all of the application components of the environment monitoring system (which are clients of the ENS) are provided as part of the support resources that accompany the book. The Distributed Systems Workbench (which provides the DS necessary to run the ENS system) is also provided as part of the supporting resource materials.

7.5 Good Design Practice for Distributed Applications

This section summarizes some main design concepts that have been identified in the book and relates them to the case studies where relevant. Guidelines are provided which identify the key considerations and decisions that must be taken when designing a distributed application, accompanied by discussion of the significance of the various issues.

7.5.1 Requirements Analysis

Get the requirements analysis right. Take great care to ensure that you understand what is required, what features are needed, and how it should work. You should also consider the feasibility and form initial plans of how you are going to build and test the application. If you unknowingly make mistakes at this stage, they will propagate through the design and build. Many systems effectively fail at this stage, but the failure does not become apparent until much later, by which point it has cost time and other resources and is more difficult to put right.

Requirements fall into two categories: functional and nonfunctional. Both are equally important, but the functional requirements are easier to identify and usually a lot easier to test. The nonfunctional requirements can be overlooked, or sometimes, the intention is to add them on later. However, in general, it is not possible to add on support for nonfunctional requirements effectively, as by their nature, they are not provided by a specific software function (consider, e.g., transparency and scalability). Instead, they are resultant of good design practice across all aspects of the functional behavior, the underlying software architectures, and the mechanistic aspects. Therefore, it is important to make the nonfunctional requirements a first-class concern at every stage of the project.

7.5.2 Architectural Aspects

Pay attention to separating concerns on a logical basis. This applies at the level of classes within the same software component as well as at the level of software components, which execute as separate processes. For example, in a client-server application, pay attention to the split of functionality between the client and the server. Consider, for example, the distributed game use case in which the server manages the game logic and holds the game state and the client provides the user interface. Separation of concerns leads to a number of benefits, which include cleaner design, especially in terms of interfaces, less interaction across interfaces, easier to understand code structure, and behavior (and thus easier to test, validate, and document). In particular, clear separation of concerns can reduce the intensity of interprocess communication and thus less network bandwidth is used and less latency is incurred. Separating the logical or behavioral scope of components on clear functional boundaries can also reduce the occurrence of direct dependencies as components are less tightly integrated than when a particular function is spread across several components.

Avoid direct coupling wherever possible. Direct coupling risks fault propagation where one component cannot operate because of the failure of, or absence of, another component. Direct coupling also projects a design-time vision of connectivity (which cannot in general foresee the full set of configuration scenarios that can occur) into the runtime. Loose coupling is more flexible and robust; it allows runtime connectivity associations between components to be based on actual availability and other contextual aspects such as system workload and resource reconfiguration to overcome platform-level failures.

The components should be right-sized. Coarse-grained designs, having larger components, can reduce the total number of components and thus also reduce the amount of communication necessary to link the application together. However, if this approach is taken too far, it reduces the benefits of distribution in regard to being able to separate functionalities and spread load across processing resources, facilitate sharing, and meet nonfunctional requirements such as availability, robustness, and scalability. The larger components are also more complex and therefore more challenging to test. Fine-grained systems with many small components are more flexible and the individual components are easier to understand and test. However, if the system is too fine-grained, there is a risk of excessive connectivity and higher communication overhead. The configuration complexity of systems increases as the number of communicating components rises, and this is traded with the reducing complexity of the individual components. Therefore, it is very important that a balance is reached in which components are right-sized for the particular application in terms of their functionality and connectivity requirements.

Use common services to provide generic functionality and thus keep application components as simple as possible. Application functionality should focus on the business logic mainly and where possible should not provide additional services such as resource locating. The use of common services permits standardized behavior across applications and entire systems. Common services themselves can be used in modular ways; as, for example, in the second case study in this chapter in which an ENS makes use of an external DS. It would have been possible to build the DS functionality into the ENS or to use some form of service advertising broadcast mechanism. However, this would add extra complexity to the ENS and possibly detract from the development of its core functionality. Other applications and services are likely to require the DS functionality in most systems, and in such cases, there is effectively no additional cost of providing the DS (i.e., it is not for the exclusive use of the ENS).

7.5.3 Communication Aspects

Communication is fundamental to all distributed applications. It is also a cause or factor in many different classes of problems relating to performance, latency, resource limitations, and the occurrence of errors. In general, it is best to minimize the frequency of communication and also to keep the actual messages as short as possible, based on careful analysis of application requirements. Messages must be serialized and transmitted in language-independent and architecture-independent formats to ensure consistency and correctness in heterogeneous systems (see the Event Notification Service case study).

Blocking IO mode communication is efficient because a process (or thread) does not use CPU resource while waiting for message arrival. However, this can lead to unresponsive behavior of processes when they wait indefinitely (see Chapter 3). This can be overcome by using multithreaded designs such that some threads continue to be responsive while others wait on the IO. Alternatively, single-threaded components can be designed to be responsive without the additional complexity of threads, by using nonblocking IO mode sockets and an appropriately chosen time-out (see, e.g., the technique used in the time service client case study). This level of detail can only be sensibly considered in the context of specific application requirements, but it is important to flag up this aspect as it can have a significant impact on efficiency. Situations where both sides of a connection use blocking IO mode sockets should be avoided because of the risk of communication deadlock (see Chapter 3).

7.5.4 Reuse Code When Opportunities Arise

Code reuse has many benefits. Most obviously, you don't have to write so much code. However, it also has other benefits, which are perhaps more important, as identified:

• Better readability of code. Extracting classes and methods when opportunities arise has the effect of reducing the size of methods, or blocks of code, making them easier to understand. This has the additional benefit of making it more likely that certain errors or inefficiencies will be noticed at the time of code writing.

• Better structure of code. Refactoring improves the structure of code, and the avoidance of duplication (through extracting methods) makes code more maintainable as any modifications in the future only need to be applied in one place; this also prevents inconsistency creeping in through incremental updates where there are multiple sections of code where changes need to be synchronized.

• Reduced testing effort. The reuse of code blocks, which have already been unit tested, saves testing effort, both in terms of avoiding the need to write additional tests and also by not having to run those additional tests each time the full test suite is run.

7.5.5 Create Libraries of Tested and Trusted Code

Code libraries are a means of achieving code reuse across different software components and projects.

Libraries are a means of preserving your effort so that it can pay off again in the future. For example, suppose you have spent significant effort developing a class that deals with some complex numerical formulas, which are specific to your company's line of business. Encoding of these formulas turned out to be quite error-prone and has required a disproportional testing effort compared to more routine aspects of the code. This is a good example of a situation where placing the class in a library could have high benefits: It provides a means of encapsulating the test cases and also the documentation needed to explain the complex formulas, and it has the potential to avoid a repetition of the hard effort in the future should another application require using the same formulas.

Modularize code (and thus, the development process) where opportunities arise. This leads to a cleaner division of functionality (separation of concerns), better code structure, and also better documentation.

Create libraries wherever possible. Refactor code to separate different aspects of business functionality or to separate business functionality from UI functionality as a precursor step to splitting off a library. See the example in Chapter 5 and also the time service client case study.

7.5.6 Testing Aspects

As the earlier chapters have illustrated and emphasized, distributed systems represent complex environments in which many components interact with unpredictable timing artifacts and the ever-present possibilities of hardware failures, message loss, and software component crashes. This provides a challenging backdrop for the design and development of distributed applications and places high importance on testing.

Test plans are vital and should be established, to the extent possible, as part of the analysis and design stages of a project. However, it is very difficult to formally capture the full set of testing activities that an experienced developer carries out throughout the development life cycle.

Testing is often documented as a primarily postdevelopment step; this is partly due to the way in which the various development methodologies describe the software life cycle as a series of stages, which are typically as follows: analysis → design → development → testing.

Some methodologies such as the spiral model repeat these steps in a series of shorter cycles but essentially show testing at the end of each block of activity. While this sequence makes sense to a theorist (you cannot test something that you have not yet developed), it does not reflect what really needs to happen to ensure high-quality outcomes and efficient development progress.

Your perception of the role of testing and the way you apply it is linked to your mind-set with regard to building systems; it is about the extent to which you value having a quality outcome. You need to consider testing as a continual activity and to integrate testing throughout the entire development process.

As part of a quality-oriented mind-set, consider that every step of the development life cycle is an opportunity to make good or bad decisions. It can be difficult to elicit the full set of requirements of a system at the outset of a project and to express these in unambiguous ways. It may not be possible to capture every single nuance of behavior in the requirements statements; it may be necessary to retain some flexibility in their interpretation and to ensure that the consequences of any subsequent changes are considered across all levels of the design.

You should check that requirements do not conflict with one another such that they cannot all be satisfied, and if such a situation occurs, it must be resolved before moving on to design. You can evaluate the correctness and feasibility of the requirements by asking carefully thought-out questions to the would-be users of the system. You can also consider building partially functioning early prototype systems to check that your design concepts are well aligned with the requirements and also as a means for users to be able to see the consequences of the stated requirements and thus to revisit the requirements stage to ensure they correctly reflect and describe the desired system.

During development, you should test regularly. I have had many discussions with students about testing where their initial viewpoint has been that testing is somehow additional work. The reality is that if testing is performed on a continuous basis, it actually saves time and effort because problems are found and their causes understood before they propagate through the design; undoing a bad design decision is usually a lot easier in the earlier stages of development.

I find that an incremental iterative approach to testing with the basic premise that only one feature is added or modified between tests is most productive. This can on the face of it seem a slow way to progress, but it does lead to quality outcomes, and it ensures that the effects of one change can be understood before further additions are made, and it also provides regular opportunities for reflection on the overall goals and progress toward them.

7.6 End of Chapter Programming Exercises

1. Integrate the NTP client-side functionality into an application. This programming challenge relates to the time service client use case.
The task. Build an NTP client functionality into any application of your choice. You are recommended to use the NTP library provided as part of the support materials. Start by studying the program code of the two provided NTP application client programs, which integrate the library and then mimic this library integration in your own application.
Note that the two application clients already discussed in the time service case study serve as sample answers for this exercise.

2. Develop a publish-subscribe application, which uses an external ENS as a means of decoupling the application components. You are recommended to use the ENS presented in the second case study in this chapter.
The event types and values will depend on the theme of the application you choose; for example, in a distributed game application, there may be event types such as new-player-arrival and new-game-started. Begin by studying the program code of the sample application publisher and consumer components provided as part of the supporting resources for the book. Consider developing your publisher in one language and your consumer in another, to experiment with the heterogeneity and interoperability aspects.
Note that the application clients already discussed in the ENS case study serve as sample answers for this exercise.

7.7 List of Accompanying Resources

The following resources are referred to directly in the chapter text and/or the end of chapter exercises:

• Distributed Systems Workbench, “systems programming” edition (specifically the DS application)

• Source code

• Executable code

t0020


1 For the purpose of simple presentation and annotation, all components were running on a single computer in the scenario illustrated.