The Communication View - Systems Programming: Designing and Developing Distributed Applications, FIRST EDITION (2016)

Systems Programming: Designing and Developing Distributed Applications, FIRST EDITION (2016)

Chapter 3. The Communication View

Abstract

This chapter examines communication between components of distributed systems in terms of mechanisms and protocols, addressing, ports and binding, PDU formats, and message sequences, types and contents. The main focus is on the transport layer, and use of the sockets API to develop application-specific bespoke communication schemes. The TCP and UDP protocols are examined in depth.

Communication fundamentals are explained starting from a simple one way protocol, progressing through the request reply protocol to constructed higher-level forms of communication such as RPC and RMI. Quality-related attributes of communication, including reliability, latency and efficiency are woven into the discussion.

Also included are: the OSI and TCP/IP layered network models, IPv4, IPv6, network sockets, ­socket IO modes (blocking and non-blocking), addressing methodologies (unicast / multicast / broadcast / ­anycast), error detection and error correction techniques, building higher-level communication mechanisms using the transport-layer protocols as building blocks, and techniques to facilitate components locating each other.

The communication aspects of the case study distributed game are examined in detail, including the game-level communication protocol and the supporting data structures and state maintained in the game server, as well as message types, contents and sequences. End-of-chapter programming challenges are based on extending the case study application.

Keywords

TCP/IP network model

Request-reply communication

Unicast communication

Multicast communication

Broadcast communication

Remote Procedure Call (RPC)

Connectivity

ISO-OSI seven layer model

The IP protocol

The TCP protocol

The UDP protocol

Addressing

IP addresses

Ports

Sockets

Blocking and nonblocking IO modes

Binding

Application-specific protocols

Logical and physical views of systems

Protocols.

3.1 Rationale and Overview

A key goal of distributed systems design is the achievement of transparency. At a high level, this is usually interpreted as hiding the underlying architecture, functional division into components, and communication between those components. Essentially, the goal is to make the distributed components of the system appear to the user as a single coherent whole. However, for the developer, the functional split and the communication between components are critically important aspects that impact overall quality and performance of the system and thus must be studied in detail.

This section looks at distributed systems from the communication viewpoint. This aspect of systems is important not only in terms of the reliability and quality of service achieved by applications but also in terms of the efficiency with which the finite communication bandwidth that must be shared by all applications in the system is used. Thus, it is vital that designers of systems have a clear understanding of communication concepts and of the features and limitations of the communications mechanisms that underpin the higher-level systems and are able to use communication protocols efficiently and effectively. The chapter starts with the configuration of the most basic form of communication between a pair of processes and progresses through to more complex derived forms including Remote Procedure Call and Remote Method Invocation.

3.2 The Communication View

3.2.1 Communication Basics

Communication in distributed applications takes place between a pair of processes, which can be located on the same or different computers.

We start by identifying the minimum requirements for communication to take place between a pair of processes, which we shall refer to as the sender and receiver, respectively.

A. The receiver process must be able to receive a message.

B. The sender process must be able to send a message; it also must have the actual message data stored in a buffer (this is a reserved block of memory).

C. The sender must also know (or be able to discover) the address of the receiver.

D. There must be a transmission system that both sender and receiver are connected to.

Consider the analogy of sending a letter to a friend.

As illustrated in Figure 3.1, for requirement (A), the receiver must have the means to receive the message; that is, there must be a fixed address that the postal service will be able to find and must have an actual letter box into which the letter can be placed by the postman. For requirement (B), the sender must write the content of the message. For requirement (C), the sender must know the address of the recipient and write this on the envelope. For requirement (D), in this scenario, the postal service is the network that connects the sender and recipient. Is the recipient guaranteed to receive the letter? This depends on the quality of the postal service; most postal services loose at least some letters.

f03-01-9780128007297

FIGURE 3.1 Sending a letter; user (external) view of communication system.

The postal service may be viewed as a discrete block, as in Figure 3.1, or can be expanded to reveal its internal detail, as in Figure 3.2.

f03-02-9780128007297

FIGURE 3.2 Sending a letter; network (internal) view of communication system.

Figure 3.2 provides a simplified illustration of the internal details of the postal service. Letters are sorted by the hierarchical components of the address (country, region, town, local area) until they are in finely grained delivery-zone groups, which are then delivered to the recipients. The postal system requires a carefully designed internal structure to provide a large-scale yet efficient service. It is important to realize that the user of the system does not need to understand the precise nature of the internal structure, but should be aware of the basic principle on which it works and hence write the address clearly on the envelope. This is analogous to the operation of a communication protocol such as the Internet Protocol (IP). Its routing function is hierarchical for efficiency (based on patterns within the IP address), and its internal operation is transparent to the end user.

There are a wide variety of communication techniques that are used within distributed systems; the choice depends on a wide variety of factors, which include the application requirements in terms of time constraints and reliability, the nature of the data to be transmitted, and the scale of the system. The rest of this chapter discusses the various techniques and the relative suitability for each. This is placed in the context of specific application examples, and the discussion and examples are linked to the popular protocols and technologies in current use.

3.3 Communication Techniques

3.3.1 One-Way Communication

One-way communication has limited applicability, but there are situations where it is adequate. In such cases, it is advantageous due to its simplicity both in terms of design and behavior; this is illustrated with an example. Consider an automated factory system, in which the production chain has a number of chemical processes that must be monitored to ensure that the temperatures of various parts of the system are kept within safe limits.

One component within this system is a temperature sensor unit that is connected to a part of the machinery. This unit has the following equipment:

• A temperature sensor

• A process that controls the reading of sensor values (sampling) and converts the analogue reading from the sensor into a digital value (which will be the message content)

• A network connection, so that the process is able to transmit the message

We might require that the sensor unit reads the sensor and sends a message containing the temperature once every 10 s.

Another component in this system is the monitoring unit that collects the data from the temperature sensor and determines if the system is within the safe limits; if not, it sounds an alarm. In all cases, the temperature value is displayed. This unit has the following equipment:

• A display.

• An audible alarm.

• A process that receives the message, extracts the temperature data from the message, compares the temperature data with a threshold level, sounds the alarm if the temperature has exceeded the temperature threshold, and in all cases displays the temperature value. Sound the alarm if there is a loss of signal from the temperature sensor unit.

• A network connection, so that the process is able to receive the message.

The system is illustrated in Figure 3.3.

f03-03-9780128007297

FIGURE 3.3 One-way communication between two processes in the temperature monitoring system.

Figure 3.3 shows the communication (the solid arrow) between the two processes in the remote temperature monitoring system. The sensing process is responsible for sampling the temperature sensor on a periodic basis and sending the digitized temperature data in a message to the monitoring process. The diagram does not show physical computer boundaries, and it is important to note that the two processes are each part of different software components but are not necessarily running on physically different computers; they could be running on the same machine.

The two processes can be fully decoupled; this means that they operate independently, having the communication as the only common link. In this simple scenario, we can assume that the temperature sense process transmits its periodic messages to a predecided address whenever it is running, that is, regardless of whether there is a monitoring process present. This approach is robust in the sense that the monitoring process can be shut down, or may fail, without affecting the behavior of the sensing process. Similarly, the monitoring process, when it is running, simply waits for the periodic messages from the sensing process. The lack of messages does not cause the monitoring process to fail; it simply continues to wait. As far as the monitoring process is concerned in this case, the communication is asynchronous; that is, even though the messages are sent on a periodic basis, the monitoring process does not need to be aware of this in either its design or its behavior. If the monitoring process were a critical part of a larger system, this decoupling could offer the significant benefit that if the sensing component would fail, the monitoring component would continue to function and could generate an alarm to signal loss of connection to the sensor. This approach is also flexible because each component can be upgraded independently, the only invariant being the communication mechanism, including the format of the communication messages. The approach is also flexible with respect to the process lifetimes. It is feasible that in some implementations, the sensing activity would run continuously, while the monitoring process might only operate at certain times; the decoupling allows independent process lifetimes.

Activity C1 explores one-way communication, independent process lifetimes, and the decoupling aspect.

Activity C1

Using the Distributed Systems Workbench to Explore One-Way Communication, Decoupling, and Process Lifetimes

Prerequisites

The instructions below assume that you have previously performed activity I1 in Chapter 1. This activity places the necessary supplemental resources on your default disk drive (usually C:) in the directory “SystemsProgramming.” Alternatively, you can manually copy the resources and locate them in a convenient place on your computer and amend the instructions below according to the installation path name used.

Learning Outcomes

1. To gain an initial understanding of process-to-process communication

2. To gain an understanding of one-way communications

3. To gain an appreciation of the decoupling of components (loose synchrony through message passing)

4. To gain an appreciation of independent lifetimes and behaviors of components

Method

This activity uses the “Temperature Sensor” application, which is found on the “One-way Comms” tab of the Distributed Systems Workbench. The actual temperature sensor itself is mocked up with a random value generator to avoid the need for a physical temperature sensor. The Temperature Sensor sender process sends the generated temperature values at 10 s intervals. The communication aspect is implemented as it would be in a real application; this is the aspect we are interested in.

Start two copies of the Distributed Systems Workbench. These can be either on the same computer or on different computers. From one of the copies of the workbench, start the Temperature Sensor sender program (from the “One-Way Communication” tab at the top level), and from the other copy of the workbench, start the Temperature Sensor receiver (user console) program.

The Temperature Sensor sender program requires that the IP address of the computer hosting the user console process is entered in the GUI. It autodetects its own IP address and populates the input field with this by default, so if both processes are on the same computer, there is no need to change the IP address; otherwise, you need to enter the IP address of the computer where the receiver is running into the sender's GUI. If you do not know the IP address of the receiver, you can use broadcast communication instead, so long as both processes are running within the same local network. Ensure that the sender's port number is set to the same value as the user console's “receive” port.

The communication behavior is evaluated under various scenarios to demonstrate the decoupling, separate lifetimes, and robustness aspect of the design.

1. Normal operation. Click the “START Temperature Sender” button on the sender and click the “START receiving” button on the user console. The temperature sensor values are generated periodically and sent to the user console for display. Observe the behavior. The screenshots below show the two processes running normally. Notice that because I started the sender slightly before the user console, the first sent message was not received; this is the correct behavior when using the UDP (which has no reliability or delivery guarantee mechanisms) and should be noted.

b03-01-9780128007297

2. Receiver not present. Have both processes running initially. Close down the receiver process (you can either shutdown just the program or the entire parent workbench). Notice that the sender process is unaffected. The two components are said to be decoupled in this design, and the sender is robust to failures in the receiver. Restart the receiver. Note that it picks up transmissions from the sender automatically.

3. Receiver stopped. Have both processes running initially. Stop the receiver process by pressing the button that was labeled “START…” initially but is now labeled “STOP…,” but do not shutdown the process. Wait while the sender generates at least three temperature values and sends them, and then, restart the receiver. Observe that the receiver has not actually missed any of the intervening messages and they have been held in a buffer and are now displayed on the user console. Message buffering will be explored in more depth in subsequent activities.

4. Sender stopped. Have both processes running initially. Stop the sender process by pressing the button that was labeled “START…” initially but is now labeled “STOP…,” but do not shutdown the process. The receiver will notice after some time that the messages are not arriving and will signal an alarm to indicate possible failure of the sender component. However, note that the receiver component is not adversely affected by the failure of the sender in the sense that it continues to operate and provides important diagnostic information. Restart the sender process by pressing the “START…” button. Notice that the receiver resumes its normal display mode. This is an illustration of purposefully designed robust behavior in which the receiver handles the lack of communication from the sender in a predictable way. The screenshots below show the situation where the receiver has detected the absence of expected periodic messages from the sender (whose transmission has been stopped purposefully in the experiment but equally could have crashed from the viewpoint of the receiver).

b03-02-9780128007297

Expected Outcomes

You have explored one-way communication in an example application scenario that demonstrates its value, although it should be noted that most distributed systems communication is bidirectional between components. You have seen an example of decoupled components where each component's lifetime is independent of the other and where each can operate correctly in the absence of the other. Robustness stems from this decoupling; in this case, one component is able to detect the failure of the other.

Reflection

The ability to stop and start components independently without causing their communication partners to fail is a highly desirable characteristic for distributed applications. A robust design should support this where feasible because it is not possible to predict when one software component in a system or the hardware platform it is hosted on will crash or be purposefully shutdown.

The rules for a particular communication system are collectively called the protocol. The temperature sensing application described above is just about the simplest possible protocol scenario, as it is limited to one-way message transmission between a single sender and a single receiver (which the example assumes is at a preknown location).

Almost all communication is more complex, due to one or more of the following factors:

• Processes are not at preknown locations.

• Some processes perform services only when requested (rather than the continuous-send example discussed above).

• Multiple processes can be involved (possibly, more than one process may request service).

• Communication is bidirectional, where in the simplest scenario, the requestor initiates communication by making a request and supplies their address details to the server so that the server is able to direct resultant data messages back to the requester. Considerably, more complex scenarios are possible.

• Application-specific aspects, such as the mean interval between message transmission and the mean size of messages.

• Systems are dynamic. The location and availability of services and components can change. Load, and hence delay, is continuously varying.

• Network infrastructure and message transmission is intrinsically unreliable. Messages can be lost, and if these are important, then some means to detect their loss and retransmit them are needed.

These factors lead to potentially highly complex communication scenarios in distributed systems, in terms of the component to component connectivity, the number of components involved, the size and frequency of transmission of messages, and the need for components to locate and identify each other. In addition, distributed applications have specific communication requirements and are impacted differently by the various aspects of interaction complexity and the resulting intensity of communication, depending on the specific application functionality. Due to the diverse communication requirements of distributed applications, a wide variety of communications protocols have been developed.

The communication protocol defines the rules of communication. The protocol governs aspects such as the quantity and format of information that is transmitted; the sequence of interactions between two parties and which party initiates the sequence; and whether a reply or an acknowledgment that the message has been received is sent back. Some protocols also incorporate features such as sequence numbers to ensure that messages are differentiable and that ordering of message delivery can be guaranteed. Automatic retransmission of sent messages that have not been acknowledged within a certain time frame is a popular way to add reliability.

The communication protocol is in many systems the only invariant between components that are moved or upgraded and thus is a vital aspect of reliable distributed systems. In addition, the communication protocol and the underlying support mechanisms such as network sockets facilitate connectivity in otherwise heterogeneous systems, thereby facilitating network transparency. For example, using TCP as the communication protocol, a client written in C# operating on a Microsoft Windows-based personal computer can interface to a server process written in C++ and operating on a rack-mounted processor unit running the Linux operating system. So long as both of the software components use the communication protocol correctly and both the platforms implement the underlying communication mechanisms correctly, the two components will be able to interact and exchange application data.

3.3.2 Request-Reply Communication

The request-reply communication mechanism is the basis of a popular yet simple group of protocols in which simple two-way communication occurs between a specific pair of processes.

A generic description of request-reply communication is as follows: Interaction begins with a request for service message, which is sent from a service requestor process to a service provider process. The service provider then performs the necessary computation and returns the result in a single message to the requestor; see Figure 3.4.

f03-04-9780128007297

FIGURE 3.4 The request-reply protocol.

Figure 3.4 shows the request-reply protocol concept. The essence of this strategy is that control flows to the service provider, that is, a request message, and data (the result of the request) flow back to the requestor.

A popular and easy to understand example of a request-reply protocol is the Network Time Protocol (NTP) service, which is one of several time services that constitute the Internet Time Service (ITS) provided by the National Institute of Standards and Technology (NIST), based in the United States. Synchronizing the time value of the clock on a computer to the correct real-world time is a vital prerequisite action in order that many applications function correctly. NTP provides a means of getting an accurate time value from one of a pool of specially designated NTP time servers. The synchronization among the various time servers within the ITS service itself is performed separately with its own internal hierarchical structure comprising several stratum's (layers) of clocks, with highly accurate clocks, such as atomic clocks, in stratum 0. It is a very important point to note that external users of the NTP service do not need to know any details of this internal configuration; they simply send a request message to any one of the NTP time servers and receive a response containing the current time value. The ITS NTP service operation is illustrated in Figure 3.5.

f03-05-9780128007297

FIGURE 3.5 Overview of the operation of the NTP service.

Figure 3.5 shows how the NTP service is accessed to retrieve a current time stamp value. In step 1, a request message formatted to comply with the NTP protocol is sent to one of the pool of NTP servers. In step 2, the specific NTP server responds with a reply message, which is sent back to the original requester (who is identified by examining the source address details in the request message). Part A of the figure provides an overview of the actual system, while part B shows the user's simplified view of the system. This example provides some important early insights into transparency requirements for distributed systems services: the NTP client does not need to know the details of the way in which the ITS service operates internally (in terms of the number of participating NTP servers and the way in which updates and synchronization are performed among the servers) in order to use the service. In terms of behavior, the NTP time service should provide a low-latency response, further reinforcing the transparency as seen by the user. In this respect, the NTP server instance should return the instantaneously available time value from its local clock, which has been presynchronized by the NTP service itself, rather than requesting a fresh synchronization activity within the service. Figure 3.6 provides pseudocode for an NTP client.

f03-06-9780128007297

FIGURE 3.6 The NTP client pseudocode.

Activity C2 explores request-reply protocols and the behavior of the Network Time Protocol (NTP) using the NTP client provided within the Distributed Systems Workbench.

Activity C2

Using the Network Time Protocol (NTP) Client Within the Distributed Systems Workbench to Explore Request-Reply Protocols and the Behavior of NTP

The US National Institute of Standards and Technology (NIST) maintains the Internet Time Service (ITS), which provides a number of well-known standard time services, one of which is the Network Time Protocol service.

Learning Outcomes

• To examine the use of a request-reply protocol

• To gain an initial understanding of time services

• To gain an initial understanding of the Network Time Protocol

• To gain an appreciation of the importance of standardization of well-known services

• To gain an appreciation of the importance of a clear separation of concerns between components of a distributed application

• To gain an appreciation of the importance of transparency in distributed applications

Method

Start the NTP client from the NTP tab in the Distributed Systems Workbench.

Part 1. The NTP client provides a partial list of NTP server URLs. Select each one in turn and see if they all respond with time values, and if so, do they all give the SAME time? NIST maintains a webpage at http://tf.nist.gov/tf-cgi/servers.cgi, which reports the current status of some of the NIST servers: it is not uncommon to find that one or more are unavailable at any time. This reinforces the reason why there are multiple NTP time servers available.

Part 2. NIST provides a global address: time.nist.gov, which is automatically resolved to different NIST time server addresses in a round-robin sequence to equalize the service-request load across the servers. Try selecting this URL and see what IP address it resolves to. If you make several attempts within a short time frame, then you will likely be directed to the same time server instance each time. However, if you wait several minutes between attempts, you will see that it does sequence through the available servers. Try this for yourself. Think about the importance of using this global address (especially if hard-coded into an application) rather than individual server domain names.

Expected Outcome

The first screenshot below shows the NTP client in operation, using the wolfnisttime.com NIST time server. The URL wolfnisttime.com has been resolved to IP address 207.223.123.18 and a series of NTP time requests have been sent, and responses received.

b03-03-9780128007297

The screenshot below illustrates the use of NIST's global address time.nist.gov. In this instance, it resolved to the server at IP address 128.138.141.172. This screenshot also reveals the unreliability of UDP; NTP requests are carried over the UDP, and you can see that while 86 requests were sent, only 84 responses were received.

b03-04-9780128007297

The screenshot below shows how NIST's global address time.nist.gov resolves to different NIST time server addresses at different times. In this instance, it resolved to the server at IP address 24.56.178.140.

b03-05-9780128007297

Reflection

This activity provides some insight into the importance of a clear separation of concerns in a distributed application. In this example, the client is a bespoke program, which requests and uses the up-to-date time value from the NTP time service, which is a well-known service with publicly documented behavior and a standard interface. The client in this application has very little business logic; it is limited to resolving the URLs of the NTP service domain names into IP addresses and actually making the NTP protocol request. The rest of the client's functionality is related to the user interface. All of the time service-related business logic is held at the NTP server side. The request-reply protocol is very well suited to this software architecture; the client sends a request and the server sends back the appropriate reply in a stateless way (i.e., the server does not need to keep track of the particular client or keep any specific context about the client's request). This stateless approach leads to a highly scalable service, and the combination of the simple protocol and the high degree of separation of concerns means that it is very easy to develop NTP clients or to embed NTP client functionality into other applications.

The activity also illustrates some important aspects of transparency; the user does not need to know the internal structure of the ITS time services, the number of NTP server replicas or the way in which they are updated, in order to use the service. The NTP service appears to the client as a single server entity.

Further Study

The design and operation of the NTP client is explored in detail in the form of a case study in Chapter 7.

3.3.3 Two-Way Data Transfer

As explained above, the request-reply protocol is very common in services in which a command or request is passed in one direction and the reply (data) is passed in the other. There are also a very large number of applications in which data and control messages are passed in both directions (and not necessarily in such a structured sequence as with request-reply), for example, eCommerce, online banking, online shopping, and multiplayer games.

The various approaches to communication can be described in terms of the addressing methodology used and also in terms of the actual design of higher-level protocols and mechanisms built on top of the simpler transport layer protocols. These are discussed in the following sections.

3.3.4 Addressing Methodologies

There are four main addressing methodologies, that is, ways in which the recipient of a message is identified.

3.3.4.1 Unicast Communication

A message is delivered to a single destination process, which is uniquely addressed by the sender. That is, the message contains the address of the destination process. Other processes do not see the message.

Figure 3.7 illustrates unicast communication in which a message is sent to a single, specifically addressed destination process.

f03-07-9780128007297

FIGURE 3.7 Unicast communication.

3.3.4.2 Broadcast Communication

A single message (as transmitted by the sender) is delivered to all processes. The most common way to achieve this is to use a special broadcast address, which indicates to the communication mechanism that the message should be delivered to all computers.

Figure 3.8 illustrates broadcast communication in which the sender sends a single message that is delivered to all processes. When considering the Internet specifically, the model of broadcast communication depicted in Figure 3.8 is termed “local broadcast,” in which the set of recipients are the processes on computers in the same IP subnet as the sender. The special IPv4 broadcast address to achieve this is 255.255.255.255.

f03-08-9780128007297

FIGURE 3.8 Broadcast communication.

It is also possible to perform a directed broadcast with the IP. In this case, a single packet is sent to a specific remote IP subnet and is then broadcast within that subnet. In transit, the packet is forwarded in a unicast fashion. On reaching the destination subnet, it is the responsibility of the router on the entry border of the subnet to perform the last step as a broadcast. To achieve a directed broadcast, the network component of the original address must be the target subnet address, and all bytes of the host part of the address are set to the value 255. On reaching the final router in the delivery path, the address is converted to the IP broadcast address (i.e., 255.255.255.255) and thus delivered to all computers in the subnet. As an example, consider that the subnet address is 193.65.72.0, which may contain computers addressed from 193.65.72.1 to 193.65.72.254. The address used to send a directed broadcast to this subnet would be 193.65.72.255. The concept of directed broadcast is illustrated in Figure 3.9.

f03-09-9780128007297

FIGURE 3.9 IP-directed broadcast communication.

When broadcasting using a special broadcast address, the sender does not need to know, and may not be able to know, the number of receivers or their identities. The number of recipients of messages can range from none to the entire population of the system.

A broadcast effect can also be achieved by sending a series of identical unicast messages to each other process known by the sending process. Where the communication protocol does not directly support broadcast (e.g., with TCP), this is the only way to achieve the broadcast effect. The advantage is greater security as the sender identifies each recipient separately, but the disadvantages are greater overheads for the sender in terms of the processing associated with sending and greater overheads on the network (in terms of bandwidth used, as each individual message must now appear on the medium).

There is also the consideration of synchronization. With broadcast address-based communication in a local area network, the transmission time is the same (there is only one message sent), the propagation times will be similar (short distances), and thus, although the actual delivery to specific processes at each node may differ because of local workloads on host computers, the reception is reasonably synchronized; certainly more so than when a series of unicast messages are sent, when one receiver will possibly get the message before the other messages are even sent. This could have an impact in some services where voting is used or where the order of response is intended to be used to influence system behavior. For example, in a load balancing mechanism, a message to solicit availability may be sent and the speed of response might be a factor in determining suitability (on the basis that a host that responds quickly is likely to be a good candidate for sending additional work to), and thus, the use of the unicast approach of achieving a multicast or broadcast effect may require further synchronization mechanisms to be used (because the sender has implicitly preordered the responses in the ordering of the sending of requests).

Broadcast communication is less secure than unicast because any process listening on the appropriate port can hear the message and also because the sender does not know the actual identities of the set of recipient processes. IP broadcast communication can also be inefficient in the sense that all computers receive the packet at the network (IP) layer (effectively an interrupt requiring that the packet be processed and passed up to the transport layer) even if it turns out that none of the processes present are interested in the packet.

3.3.4.3 Multicast Communication

A single message (as transmitted by the sender) is delivered to a group of processes. One way to achieve this is to use a special multicast address.

Figure 3.10 illustrates multicast communication in which a group (a prechosen subset) of processes receive a message sent to the group. The light-shaded processes are members of the target group, so each will receive the message sent by process A; the dark-shaded processes are not members of the group and so ignore the message. The multicast address can be considered to be a filter; either processes listen for messages on that address (conceptually they are part of the group) or they do not.

f03-10-9780128007297

FIGURE 3.10 Multicast communication.

The sender may not know how many processes receive the message or their identities; this depends on the implementation of the multicast mechanism.

Multicast communication can be achieved using a broadcast mechanism. UDP is an example of a protocol that supports broadcast directly, but not multicast. In this case, transport layer ports can be used as a means of group message filtering by arranging that only the subset of processes that are members of the group listen on the particular port. The group membership action join group can be implemented locally by the process binding to the appropriate port and issuing a receive-from call.

In both types of multicast communication, that is, directly supported by the communication protocol or fabricated by using a broadcast mechanism, there can be multiple groups, and each individual process can be a member of several different groups. This provides a useful way to impose some control and structure on the communication at the higher level of the system or application. For example, the processes concerned with a particular functionality or service within the system can join a specific group related to that activity.

3.3.4.4 Anycast Communication

The requirement of an anycast mechanism is to ensure that the message is delivered to one member of the group. Some definitions are stricter, that is, that it must be delivered to exactly one member. Anycast is sometimes described as “delivery to the nearest of a group of potential recipients”; however, this is dependent on the definition of “nearest.”

Figure 3.11 illustrates the concept of anycast communication, in which a message is delivered to one member of a group of potential recipients. Whereas broadcast and multicast deliver a message to 0 or more recipients (depending on system size and group size, respectively), the goal of anycast is to deliver a message to 1 (or possibly more) recipients.

f03-11-9780128007297

FIGURE 3.11 Anycast communication.

Neither TCP nor UDP directly supports anycast communication, although it could be achieved using UDP with a list of group members, sending a unicast message to each one in turn and waiting for a response before moving on to the next. As soon as a reply is received from one of the group, the sequence stops.

The Arrows application within the Networking Workbench provides an interesting case example for exploring addressing modes; see Section 3.15.

3.3.5 Remote Procedure Call

Remote Procedure Call (RPC) is an example of a higher-level mechanism built on top of the TCP. RPC involves making a call to a procedure that is in a different process space to that of the calling procedure. All programmers will understand the normal procedure call concept (which to avoid confusion we shall now call local procedure call) in which a call from one procedure to another occurs, all within the same program, and thus, at run time, the entire activity occurs within a single process. Both the calling procedure and the called procedure are within the same code project and are compiled together. The local call, if successfully compiled, is guaranteed to work because both procedures are in the same process image, which either is running or is not, and there is no network communication required to make the call.

Remote Procedure Call is an extension of local procedure call in which the called procedure is part of a different program to the calling procedure, and thus, at run time, the two procedures are in two different process spaces. Perhaps, the main benefit of RPC is that from the programmer's viewpoint, the procedure call works the same regardless of whether it is a local or remote call. Thus, RPC provides location and access transparency and removes the need for the programmer to manually implement the communication aspects.

This is a very powerful facilitator for developing modular component-based applications. The developer is able to focus on the business logic of the application and to distribute the application logic across multiple software components and also to potentially distribute those software components over several physical computers, without significant effort spent on the networking aspects.

However, it would be misleading to suggest that the transparency provided by RPC removes all of the design challenges of distribution or that it removes all of the complications that networking introduces. In order to produce high-quality, robust, and scalable applications and systems, a developer needs to pay attention to the overall architecture of systems and the configurations of specific components and also the communication that occurs between pairs of components. The frequency of procedure calls and the amount of data transferred in each of the request and reply arms of the call and the relative processing effort required in the called procedure compared with the time costs of making the call itself and the underlying network latency when the call takes place should be considered. Programmers who have not yet implemented distributed systems will not immediately realize the potential pitfalls associated with inappropriate use of communications mechanisms such as RPC, so here are two example scenarios to put this into context.

RPC example 1. The called (remote) procedure is a helper procedure that performs some substeps in a large computation and has been separated off from the calling procedure as a result of a refactoring activity to improve code structure. A large amount of data must be passed into the called procedure, which performs a few straightforward computational steps requiring very limited processing time. It then returns the results to the calling procedure. Even with local code refactoring, it is important to understand the costs involved in performing the call, relative to the advantages of having the improved code structure. It may be that the call is invoked by several different callers and thus the improvement in structure is highly valuable, avoiding duplication of code. Where the called procedure is remote, there are additional costs of the network bandwidth used and the latency of the network communication. As RPC works on top of TCP, a connection has to be established at the beginning of an RPC call and has to be shut down at the end; thus, the additional latency can be significant for fast-operating applications. In addition, there are also the latency and processing costs associated with causing a system-level software interrupt in the host computer of the called procedure. For this particular example, RPC would appear to be inappropriate in the general case.

RPC example 2. An application is structured such that user-interface functionality is separated from the business logic, the latter needing access to data that are shared by many running instances of the application. This could be part of a banking system, for example. The Remote Procedure Call is used as a means of having the calling procedure based in the user-interface component, while the called procedure is in a back-end system close to where the data are located and also much more secure because this component only runs on the bank's own computers, behind their firewalls. Here, the use of RPC is for the purpose of implementing a scalable and secure system. The amount of data passed in banking transactions is usually quite low and is not the significant factor, neither is the typical amount of processing performed in the procedure particularly high. Rather, it is the need for secure and shared access to the database while retaining transparency to the developer of the user-interface (client) component of the system that is the critical factor. RPC is highly appropriate in this type of scenario.

Figure 3.12 illustrates the mechanism of RPC. A local procedure call is shown for comparison. From the view point of the main thread of execution, both the local procedure call and the Remote Procedure Call appear the same. This is very important from the programmer viewpoint. The abstraction is that all procedure calls are local, and this removes from the developer of the application component the burden of implementing the network communication. The underlying communication is based on a TCP connection, which is set up and maintained automatically by the stubs (also referred to as proxies). This enables the developer of the calling application to write code as if calling a local procedure, while it also allows the developer of the remote procedure to write the code as a normal procedure and to not be concerned with the fact that it is actually called from a nonlocal procedure (i.e., from a different process).

f03-12-9780128007297

FIGURE 3.12 The Remote Procedure Call mechanism.

The best known RPC implementation is Open Network Computing (ONC) RPC, which was developed by Sun Microsystems and is thus also known as Sun RPC. This originally supported C and C++ on Unix systems but is now available on Linux and Windows platforms. Heterogeneity is supported by using a special language and platform-independent data format called eXternal Data Representation (XDR).

XDR is one solution to the problem of facilitating communication between heterogeneous components when using mechanisms such as RPC. Another common approach is to use an intermediary language to define the component interfaces in a programming language-independent and platform-independent way. These are generically called Interface Definition Languages (IDLs). IDL is revisited in Chapter 6.

3.3.6 Remote Method Invocation

Remote Method Invocation (RMI) is the object-oriented equivalent of RPC. Instead of remotely calling a procedure, a remote method is invoked. RMI was first introduced in the Java language, and this specific implementation is sometimes referred to as Java RMI. However, the mechanism of RMI is also supported in other languages such as C#, in which case it is properly referred to as C# remoting. Figure 3.13 provides an overview of the operation of Java RMI.

f03-13-9780128007297

FIGURE 3.13 The Remote Method Invocation mechanism.

Figure 3.13 shows the basic operation of RMI. There needs to be some way by which the RMI client (within the calling process) can locate the required object; this is facilitated through the use of a specialized name service called the RMI registry. Step 1 in the figure shows the RMI server registering with the RMI registry the object that will be made accessible for remote access. The RMI registry can subsequently resolve requests for this object, as shown in step 2 where the calling process provides the name of the object it requires and is returned the address details. The calling process can now invoke method calls on the remote object, as shown in step 3 in the figure.

3.3.6.1 Java Interfaces

An interface defines the methods that are exposed on a remote interface, without providing any implementation detail (i.e., it contains method names and parameter types, but does not include the program code within those methods).

The interface is necessary because the client-side code needs to be compiled independently of the server side. The compiler needs to be able to check that the remote methods are being used correctly in the client (in terms of the types and numbers of parameters passed in to the method and the type of return parameter expected). However, the compiler does not need to know details of the way in which the methods are actually implemented by the server, in order to perform these checks, and thus, the interface provides sufficient details. Note that the Java interface performs a similar role to header files in C or C++ and the Interface Definition Language (IDL), which is used in middleware.

To illustrate, an example interface for an Event Notification Service (ENS) is provided (see Figure 3.14). A non-RMI-based ENS is the subject of one of the case studies in Chapter 7. The example here is based loosely on that case study; the interface is essentially the same except that in the case of an RMI implementation, the functionality of the ENS is provided through remote method calls, instead of through the use of discrete application messages sent over a TCP connection, as in the case study version.

f03-14-9780128007297

FIGURE 3.14 Java RMI interface example.

Figure 3.14 shows the Java RMI interface for an ENS. There are three server-side methods that are made available to be invoked remotely (SubscribeToEvent, UnSubscribeToEvent, PublishEvent) each described in terms of their parameter lists and types, but without revealing implementation detail. The use of the interface allows the application developer to incorporate calls to these methods in the client-side code, as if they were local methods; in this way, RMI achieves distribution transparency from the perspective of the client-side application developer. Notice that the interface extends the interface java.rmi.remote. This is mandatory and causes the lower-level communication infrastructure to be automatically put in place so that the stated methods can be invoked remotely, without the program developer having to be concerned with the mechanistic aspects of the communication. As with RPC, RMI performs its communication over a TCP connection, which is set up and managed silently from the programmer's perspective.

A wide variety of run time problems can occur when invoking remote methods; these include problems associated with the server side of the application, the RMI registry, and the network connection. It is necessary to detect and handle these problems appropriately in order to achieve robust operation. For this reason, each method declared in a remote interface must specify java.rmi.RemoteException in its throws clause.

The Java interface contributes to implementation transparency, since the client-side object can invoke methods without knowledge of their implementation. This approach also contributes to component decoupling and thus flexibility with respect to deployment and maintenance. For example, after the client has been compiled, it is possible for the server-side implementation to be changed (e.g., a more efficient technique to achieve the same functionality may have been discovered); so long as the interface details are not changed, the application will still function correctly. Figure 3.15 illustrates the role of the Java interface.

f03-15-9780128007297

FIGURE 3.15 Comparison of compile-time and run time views of Remote Method Invocation, showing the role of the Java interface.

Figure 3.15 shows the way in which a Java interface serves as a proxy for a remote method and therefore facilitates a compile-time view of remote methods, which are not actually present in the software component being compiled. At run time, the real remote methods are invoked.

3.4 Layered Models of Communication

The examples provided in the early part of this chapter have served to introduce the general concepts of communication and have also provided some insight into the key requirements and challenges of communication mechanisms for distributed systems.

Due to the many different types of technical challenge involved, communication systems are structured as a set of layers, each layer providing a specific set of functionalities. The layers are connected by well-defined interfaces that are called service-access-points (SAPs), because these are the means by which the components in one-layer access services provided by components in an adjacent layer.

The division of network services into a set of well-defined layers has a number of benefits that include the following:

• Limiting the complexity of any particular layer: Each layer performs a well-defined subset of the network functionality.

• Simplifying further development of specific protocols without the need to modify adjacent layers: Well-defined interfaces between layers ensure that the communication between the layers is consistent and independent of the internal behavior of any specific layer. This allows replacement or upgrade of a particular layer within the stack without disturbing other layers, so long as the interface specifications are strictly adhered to.

• Facilitating standards and stable documentation: The interfaces between layers, and the functionality of each layer, are well defined and documented as standards.

• Interchangeability of technologies within the protocol stack: Technologies can be exchanged at a specific layer without affecting the layers' either side because the interfaces to these layers are standardized. For example, changing the data-link layer technology from a wired LAN technology such as Fast Ethernet to a wireless LAN technology such as IEEE 802.11 does not disturb the higher-level functionality of the network layer and above. The network layer has a logical view of the network connectivity and is not concerned with the actual technology of the links available to it.

• Application independence from the characteristics and behavior of the lower protocol layers: Applications need to operate independently of the underlying network technology. The layered model allows the communication technologies to change over time without affecting the behavior of the higher-level applications that use the network. This is important for the stability and robustness of applications.

• Separation of logical concerns: The upper layers of the network stack are concerned with issues such as logical connections, communication semantics, and the presentation of data. The lower layers are focused on the actual transmission of messages from point to point. Therefore, it is important to separate high-level concepts such as the sending and receiving of messages from the lower-level physical concerns related to, for example, the communication medium, frame formats, timing, signaling, and bit errors that occur.

There are two very important layered models for network communication. The Open Systems Interconnection (OSI) model (ISO/IEC 7498-1), produced and supported by the International Organization for Standardization (ISO), divides the network communication functionality into seven clearly defined and standardized layers. This model is mostly viewed as being conceptual, since there are a relatively small proportion of popular protocols that adhere closely to this model. The model is however very useful as a structure guide and an aid to understanding and describing behavior in networks. The TCP/IP model on the other hand directly maps onto the TCP/IP suite, which is by far the most popular set of protocols in use and is the basis for the majority of Internet communication.

Protocol Data Units: The correct term for a network message format is protocol data unit (PDU). PDUs are defined for each protocol, at each level of the network-layered model. The normal format for a communication protocol PDU is a header part with predefined fields and a payload part that contains the PDU of the layer above. This concept is called encapsulation.

Encapsulation: Encapsulation is the term given to embedding the PDU of one protocol as the payload of another. In this way, the lower protocol carries the higher-layer protocol's data. This process repeats all the way down the protocol stack. This is illustrated inFigure 3.16.

f03-16-9780128007297

FIGURE 3.16 Encapsulation of higher-layer protocols into the payload of the protocol in the layer below.

Figure 3.16 illustrates the concept of encapsulation using a common real example scenario in which a file transfer using the File Transfer Protocol (FTP) is passed across an Ethernet network link. Encapsulation serves to keep separate the concerns of the different layer protocols as the application data is passed across the network.

In the example scenario, the application layer protocol is FTP. The FTP protocol is concerned with information such as the file name and the type of data encoding used. FTP is not concerned with the type of network technology used or whether individual packets arrive successfully or have to be retransmitted. The overall concern of the FTP protocol is to ensure that the entire file is sent from one specific location to another.

The transport layer protocol is concerned with the identity of the process the message must be delivered to when it reaches the final destination computer. The process is indicated indirectly by the port number held in the transport layer header. A process on the destination computer will have to have associated itself with this port number in advance, so that the message can be delivered correctly. In the specific example, the transport layer protocol would be TCP because FTP requires a reliable transport protocol and so rules out UDP. The overall concern of TCP in this example is to ensure that each separately transmitted chunk of the file is delivered to the destination process (which is either the FTP client or FTP server at the destination computer) in such a way that the entire file can be reconstructed from the chunks.

The network layer protocol is IP. The overall concern of the IP in this example is to get the TCP segments containing the file data to the destination computer. The routing protocol in the system will use the IP address information in the IP packet header to select the path the packet takes through the network.

The concern of the data-link layer technology, in this example Ethernet, is to pass the packet across a single data link to the correct destination (at the link level), which for all but the final link will be the next router in the chain. If the next link the message must pass through is a different type of technology, such as wireless IEEE 802.11, then the Ethernet frame will be discarded and a new link technology-dependent frame will be used to carry the data. In such case, the higher-layer data are not changed; they are re-encapsulated into the new frame.

3.4.1 The OSI Model

The ISO-OSI seven-layer model provides a conceptual reference for communication systems. The model supports a modularized approach in which each layer is isolated from the concerns of the layers above and below it; this enforces clear demarcation of responsibilities, avoids duplication of functionality, and promotes better understanding.

Figure 3.17 shows the OSI model network layers and their main functions. The model serves as a standard reference for network protocol design. Communication systems are vastly complex dealing with a very wide range of challenges that range from higher-level concerns such as the way in which applications interface to the network system and the way in which data are represented such that heterogeneous systems can communicate, to concerns relating to how devices are addressed, the ways in which information is routed to the correct destination, to lower-level issues such as how to accommodate different network link technologies, the type of signaling used, and very many more. The OSI model therefore plays a vital role as a standard reference and framework for the division of concerns and functionalities.

f03-17-9780128007297

FIGURE 3.17 The ISO OSI seven-layer reference model.

The TCP/IP model was already well established when the OSI standard was introduced. In particular, the popularity of TCP/IP is driven by the fact that the Internet is based on the TCP/IP suite. As a result, the TCPs/IPs are far more commonly used than the OSI-specific technologies. The OSI model is however extremely useful as a discussion and modeling framework. It serves to describe network systems in a common language that is independent of any particular implementation and is thus used in teaching and also in research, with the TCP/IP model being mapped onto it so that specific protocols in the TCP/IP suite can be discussed in terms of their position within the seven-layer reference model.

3.4.2 The TCP/IP Model

The TCP/IP model comprises four layers. The lowest of these generically represents the network interface and physical network aspects. The upper three layers represent logical communication (i.e., they are concerned with communication based on logical addresses and are not concerned with the physical technologies and characteristics of networks such as host adapters, cables, frame formats, bandwidth, and bitrates). The protocols of the TCP/IP suite reside in the three upper layers.

Figure 3.18 shows the TCP/IP network layers aligned against the OSI equivalent layers and also shows the protocol data units used at each layer and some popular example protocols found at each layer.

f03-18-9780128007297

FIGURE 3.18 The TCP/IP network model.

The link layer is not of concern to application developers in general. This layer deals with the technical and operational characteristics of the underlying technology of the network links. The use of layers provides decoupling of the application concerns (at the upper levels) from the technology at the lower levels, and thus, applications can be developed without having to know details of the network itself. This is very important because apart from the technical complexity that would otherwise be involved each time an application were built, the decoupling allows for the technology to change over time without affecting the behavior and operation of the applications.

The Internet layer is of limited interest to application developers. This is because the IP (IPv4 or IPv6) operates at the level of the host; that is, it is concerned with the delivery of messages between computers and is not concerned with specific applications and their processes. Obviously, the way that IP delivers messages (its datagram basis and the way it uses addressing) needs to be understood by application developers, but they will not find themselves directly sending messages at the IP level.

The transport layer, on the other hand, is concerned with communication between specific processes. This is the lowest level that the application programmer can work. The main protocols in this layer are the Transport Control Protocol (TCP) and the User Datagram Protocol (UDP), which will be discussed in detail later.

The application layer is also very important to applications developers. This layer contains a wide range of application protocols that perform specific commonly required functions such as transferring files or web pages. A developer needs to be aware of what protocols are supported in this layer and the functionality they provide and any limitations. This is very important when determining whether the needs of a particular system can be met with the generic protocols or otherwise whether bespoke communication schemes based on top of the transport layer protocols need to be developed. The case study game used throughout this book follows the latter approach; that is, the communication is designed for the specific needs of the game and is based directly on the transport layer protocols; it does not use any of the protocols in the application layer.

3.5 The TCP/IP Suite

The notation TCP/IP can be ambiguous; it can imply the specific use of the TCP carried over an IP-based network (in this text, this will be written as “TCP over IP”), but more correctly, it is used to refer to the protocol family itself (this is the meaning ascribed here). This means that when a system is described as using TCP/IP, it could actually be using any combination of the protocols in the TCP/IP suite, so long as the layer ordering is respected. The following protocol combinations provide common examples: UDP over IP, TCP over IP, SNMP over UDP over IP, and FTP over TCP over IP.

You may be familiar with many of the protocols that occupy the application layer. These include FTP (File Transfer Protocol) and HTTP (HyperText Transfer Protocol), which are used very frequently. Protocols in this layer provide well-defined fixed functionality.

Figure 3.19 shows a subset of application layer protocols and their mapping onto the transport layer protocol that carries them. The mapping is significant because the TCP and UDP have distinctly different characteristics. The Simple Mail Transfer Protocol (SMTP), the HyperText Transfer Protocol (HTTP), the File Transfer Protocol (FTP), and the Telnet Protocol are all examples of protocols that are transported via TCP because of its advantage of ensuring reliable data transfer. The Simple Network ManagementProtocol (SNMP), the Trivial File Transfer Protocol (TFTP), the Bootstrap Protocol (BOOTP), and the Network File System (NFS) are each transported via UDP because of one or more relative benefits of UDP (which are lower overheads, lower latency, and the ability to use broadcast addressing).

f03-19-9780128007297

FIGURE 3.19 The TCP/IP suite showing a subset of application layer protocols.

However, many applications have specific communication requirements that need greater flexibility than are provided by the well-defined generic functionality of the application layer protocols. This can be achieved by directly using sockets programming at the transport layer. The transport layer operates at a lower level than the application layer and supports flexible communication based on the passing of messages (called segments) between software components in whatever pattern or sequence is necessary. There are two main protocols in the TCP/IP transport layer: the Transmission Control Protocol (TCP) and the User Datagram Protocol (UDP). Applications that directly transmit data over the network use either or both of these.

TCP and UDP implement process-to-process communication and use ports to identify a particular process within a computer. Both TCP and UDP operate over the Internet Protocol (IP), which works in the network layer. IP will deliver a packet to a destination computer, but does not know about processes. Therefore, the transport layer protocols make use of the IP's ability to deliver a message to a computer and extend the addressing with a port number to determine which process the message will be delivered to. Note that it is the combination of IP address and port number that constitutes a unique identifier of a particular process in a network. To use the TCP or UDP, an application developer uses the “socket API” to make calls into the protocol software stack.

3.5.1 The IP

The IP is the main protocol of the network layer. The IP is at the very heart of Internet operation. Network traffic is carried across the Internet from computer to computer in the form of IP packets. Routing in the Internet is based on the IP destination address carried in the IP's header. In short, without the IP, there would be no Internet.

Two versions of IP are currently in use: IPv4 and IPv6. IPv4 was introduced in the early 1980s and has worked very well for many years. However, there is a limit to the IPv4 address range, which led to the development of IPv6 that has a far greater IP address range and several other important improvements over IPv4, for example, in terms of better quality of service provision.

IPv6 was defined during the mid-1990s and was due to run concurrently with IPv4 for a number of years and then replace IPv4 completely. The take-up of IPv6 has been much slower than expected, and IPv4 is still in common use. However, the current severe shortage of IPv4 addresses is likely to cause an accelerated rate of take-up of IPv6 (during 2011-2014, the availability of new addresses officially ran out in several geographic regions).

Figure 3.20 shows the IPv4 header. The first field is a 4-bit version number, which will always be set to the value 4 in an IPv4 packet. The second field is the header length measured in units of 4 bytes. For example, in the default case where no options are added at the end of the header, the header will be 20 bytes long and the value in the header length field will thus be 5. The type of service field has been used for various purposes related to quality of service, priority, and congestion notification. In addition to knowing the header length, it is necessary to know the length of the whole packet because the data portion, which follows the header, can have variable length. The Total Length field is 16 bits wide and thus can hold a value up to (216) − 1, which is 65,535; this therefore limits the size of an IP packet. The IP supports fragmentation, such that packets can be broken into smaller “fragments” to pass over link technologies with different size frame limits (this limit is called the Maximum Transmission Unit—MTU). The Identification field thus identifies each IP packet and all fragments thereof, which retain the original ID, so the packet can be subsequently defragmented at the receiving computer. There are three flag bits but only two are used. The More Fragment flag, if set, signifies that the packet has been fragmented and that there are more fragments to follow this current fragment. If not set, the flag signifies either that the packet was not fragmented or that this is the last of several fragments. The Don't Fragment flag, if set, prevents the packet from being fragmented; this means that if the packet exceeds the MTU for a particular link, it will be dropped. The fragment offset field signifies the position of the fragment within the original packet. If this is the first fragment or the packet has not been fragmented, then the offset will be zero. The fragment offset field is only 13 bits long, so in order that it can represent the same overall packet length as the Total Length field, the offset values increment at 8-byte intervals. Thus, a fragment offset of 1000 actually means that the data carried in that fragment start at position 8000 in the original packet. The 8-bit TTL field is used to prevent routing loops. Each time a packet passes through a router, the TTL value is decremented, and if the TTL value reaches zero, the packet is dropped. TTL values are usually initialized to a value such as 64, which should be adequate to get to all destinations without issue so long as routers are configured and operating correctly. The Protocol field identifies the next layer protocol and thus identifies how the packet's payload should be handled. If, for example, the packet contains a TCP segment as its payload, then the value in the Protocol field will be the code 6. The code 17 is used to signify UDP. The header checksum provides a means of verifying that the header has not been corrupted in transit. IPv4 does not check the correctness of the data it carries, which are left for higher-layer protocols to deal with. The IP source address and destination address are each 32-bit values discussed in a subsequent section.

f03-20-9780128007297

FIGURE 3.20 The IPv4 header format.

Figure 3.21 shows the IPv6 header. As with IPv4, the first field is a 4-bit version field, which will always be set to the value 6 in an IPv6 packet. IPv6 has a fixed-length main header, replacing the header options of IPv4 with extension headers and thus not needing a header length field. The Traffic Class field serves two purposes: classification of packets for quality of service purposes and explicit congestion notification and control. The Flow Label field is used by routers to keep packets of related flows on the same path. The Payload Length field contains the length of the payload, including any extension headers. The Next Header field indicates which extension header, if any, follows the current header. The Next Header field is also used to signify payload; for example, a TCP header or UDP header can follow the IPv6 extension headers. The Hop Limit field serves the same purpose as the TTL field in IPv4. The IPv6 source address and destination address are each 128-bit values. Extension headers are only used when needed, avoiding a huge inefficient single-header format, which would be necessary to support all of the features of the sophisticated IPv6 protocol. Extension headers deal with additional functionality including hop-by-hop options, fragmentation, and authentication.

f03-21-9780128007297

FIGURE 3.21 The IPv6 header format.

3.5.2 The TCP

TCP is a connection-oriented protocol. This means that a logical connection must be established before messages (termed segments in transport layer parlance) can be sent and received over the connection. Because a connection is necessary, all TCP communication is unicast; that is, a connection can only be between a pair of processes and thus each process can only communicate with one other, via a particular connection. Processes can however have multiple connections, if they need to communicate with more than one other process.

TCP divides the application data to be transmitted into segments (the TCP message type). Each segment begins with a TCP header of the format shown in Figure 3.22. The TCP header contains various fields, which are necessary to support its quality and reliability features. The source and destination ports provide the process-level addressing capability and thus facilitate process-to-process communication. The sequence and acknowledgment numbers support segment ordering and also facilitate detection of lost segments. When a segment is sent, a timer is started on the sender side. When the segment is received by the connected process, an acknowledgment is sent back to the original sender. When the acknowledgment is received, it cancels the timer. If, however, the acknowledgment is not received before the timer expires, the segment is assumed to have been lost, and so, the original segment is retransmitted.

f03-22-9780128007297

FIGURE 3.22 The TCP header format.

An application data stream may have to be broken down into many TCP segments for transmission across the network. The 32-bit sequence number is used in each segment to provide a unique identifier (this represents the numerical offset of the data in the segment, measured from the beginning of the stream). The acknowledgment number indicates which byte in the stream is expected next and thus which bytes have been received. This enables the original sender to determine which of its sent segments have been received and which ones need retransmitting. The use of sequence numbers thus also provides the benefit of ensuring that received segments are ordered correctly when reconstituting the data stream as it is passed up to the higher-layer protocol or application.

There are six flags that support the setup (SYN), closing down (FIN), and reset (RST) of connections, as well as the ACK flag that indicates when the value in the acknowledgment field is valid (i.e., the message contains an acknowledgment), the URG flag that signals that urgent data have been placed in the message at the offset indicated by the urgent pointer and must be processed before other data, and the PSH flag that causes TCP to send a message before a full buffer is available (this “push” function is used, e.g., in Telnet, which requires that a TCP message is sent for each character that is typed).

The TCP also includes two transmission-management mechanisms: flow control and congestion control. TCP flow control uses a sliding window technique to prevent a sending process from overwhelming a receiving process; this is important in heterogeneous systems in which the different host computers may have different processing speeds, different workloads, or different buffer sizes. To implement this, the receiver process advertises the amount of buffer space it has to the sender (via the window size field); this is a dynamic quantity as the buffer may be partially filled with data. TCP congestion control helps prevent congestion buildup by controlling the rate at which data enter the network. TCP detects congestion indirectly by the fact that when routers' queues are full, they drop any newly arriving packets; this triggers TCP's congestion control activity. Each side of a TCP connection maintains a congestion window variable, which limits the total number of unacknowledged packets that may be in transit at the same time. In response to the detection of congestion, the congestion window is reduced in size. TCP uses a “slow-start” technique in which the congestion window is increased with each successful transmission. In this way, the congestion control mechanism is continuously adaptive, which is necessary because congestion levels in networks are dynamic and can change abruptly.

As a result of the reliability and quality mechanisms outlined above, TCP incurs significant overheads when compared with simpler protocols such as UDP. Additional network bandwidth is used because of the larger header size necessary to contain the additional control information and also because of the need for additional handshaking messages used in the setup and teardown of network connections, as well as acknowledgment messages. There is also additional communication latency because there is additional processing that must be carried out in both the sender and receiver processes; there are also the initial delay while a connection is set up and the delays that occur while waiting for segments to be acknowledged.

3.5.3 The TCP Connection

A TCP connection is set up using three primitives (listen, connect, and accept) in sequence. Each component must initially create a socket. A connection is then established as follows. First, the passive side (usually the server in a client server application) executes the listen primitive, which has the effect of making it receptive to connection requests from other processes. The active side (usually the client in a client server application) then executes the connect primitive, which initiates the special sequence of three messages necessary to set up a connection, thus called the TCP three-way handshake; see Figure 3.23.

f03-23-9780128007297

FIGURE 3.23 The TCP three-way handshake.

Figure 3.23 illustrates the three-way handshake and the sequence of primitive calls and related actions, which occur to establish a TCP connection between two processes. The so-called three-way handshake requires that each side sends a synchronization request message (SYN), which must be acknowledged by the other process (by replying with an ACK message). The first ACK is piggybacked onto the second SYN, so there are only three messages sent to achieve the four components of the handshake.

At this point, the connection is established, and the active side already has a socket associated with the connection so is ready to proceed with communication. The passive side must now execute the accept primitive, which makes a logical link between the application and the new connection (specifically, a new socket is created on the passive side for dedicated use with the new connection). The additional accept stage is necessary to allow the passive (server) side to continue to listen for additional connection requests, using the original socket that was created specifically for this purpose.

In many applications, a server will support many connections simultaneously; consider, for example, a web-server application. In this case, the server will still only need to execute the listen primitive once. However, the accept primitive will have to be executed each time a different client makes a connect request. To achieve this, it is usual to place the accept primitive call in a loop or in a timer event handler that is invoked periodically, perhaps every 100 ms. As the accept primitive creates a new socket each time it successfully accepts a connection request, the server ends up with multiple sockets: the original listen socket and one new socket for each connected client. Recall that the endpoint for communication is actually a socket, so this model of the server having one socket per client is ideal in terms of reflecting the communication configuration of the components and also in terms of code structure.

Figure 3.24 illustrates the way in which the accept primitive creates a new socket for each client, leaving the original “listen” socket to await additional connection requests. The server process accumulates an additional socket for each connected client; this simplifies application logic as the socket provides a representative identity at the server side for its specific connected client and keeps the state for each connection separate from other connections.

f03-24-9780128007297

FIGURE 3.24 accept primitive creates dedicated sockets for each connection.

3.5.3.1 Higher-Layer Protocols That Use TCP as a Transport Protocol

• File Transfer Protocol (FTP). This is because files must be transferred verbatim and the received version must be an identical replica of the original, thus requiring a reliable transport protocol.

• Telnet. Telnet facilitates remotely logging in to a computer and executing commands. The characters typed at the keyboard form a continuous stream that must be replicated, in the exact order at the remote computer. It is vital that no parts of the character sequence are lost or duplicated. It is also not desirable that Telnet data be buffered and delivered in discrete chunks as would be the case if UDP were used (single character datagrams would have to be used).

• HyperText Transfer Protocol (HTTP). Web pages contain mixed media content, which must be delivered and rendered exactly as designed and therefore warrant the use of TCP.

• Simple Mail Transfer Protocol (SMTP). Email content is fundamentally data and thus has similar correctness requirements to file transfers. Hence, TCP is used as the transport protocol.

3.5.4 The UDP

UDP is a connectionless protocol; it does not establish logical connections. A process can send a segment to another process without any prior handshaking. As there is no connection, UDP communication need not be unicast (although this is the default mode of addressing used); it is also possible to broadcast a UDP segment. UDP segments are also called datagrams.

Figure 3.25 shows the UDP header format; this is significantly shorter than the TCP header and reflects the fact that the UDP has less functionality than TCP. UDP is a transport layer protocol, so (as with TCP) it uses ports to achieve process-to-process communication. The length field holds the length of the UDP datagram (header and data), and the checksum field is used to detect transmission errors in both header and data.

f03-25-9780128007297

FIGURE 3.25 The UDP header.

UDP is a simple protocol, lacking the reliability and quality mechanisms provided by TCP. There are no sequence numbers, acknowledgments, or automatic retransmission. There is also no congestion control or flow control. As a result, UDP is said to be unreliable. In fact, the chances of any single UDP segment being lost in isolation are basically the same as the chances of an isolated loss of a TCP segment. However, the big difference is that while TCP has mechanisms to discover the loss and automatically retransmit the segment, UDP does not have such mechanisms, so a lost segment stays lost unless this is resolved by a higher-layer protocol or at the level of the application. For this reason, UDP is often described as having “send and prey” reliability.

The order in which UDP datagrams are received is not guaranteed to be the same as the order in which they were sent because of the lack of sequence numbers. Without sequence numbers, it is also not possible for the receiver to know if a datagram has been duplicated during transmission across the network and thus received twice or more; as far as UDP is concerned, these are all different datagrams. Thus, if an application is sensitive to the receipt of duplicate data, then additional support to detect this must be designed in to the application itself. If UDP datagrams carry commands in the application information flow, then it is highly recommended that the commands are designed to be idempotent; that is, they are designed to be repeatable without having side effects. The concept of idempotent commands is discussed in Chapter 6, but to give a simple example, the command “add ten pounds to account number 123” is not idempotent, while the command “set the balance of account number 123 to thirty pounds” is.

UDP incurs significantly lower overheads than its transport layer counterpart TCP does. UDP uses less network bandwidth because it has a smaller header (8 bytes compared with TCP's 20 bytes) as it does not have to carry the additional control information that TCP needs. UDP does not send any handshaking messages, further reducing bandwidth usage when compared with TCP. UDP is a low-latency protocol as it does not require that a connection be set up and does not wait for acknowledgments.

3.5.4.1 Higher-Layer Protocols That Use UDP as a Transport Protocol

• Trivial File Transfer Protocol (TFTP). TFTP was designed as a lightweight file transfer mechanism primarily used for transferring short configuration files to routers and other devices, typically over a short dedicated link or at least within a LAN environment. TFTP is a cut-down version of FTP, designed so that a TFTP server can be hosted within a device such as a router without requiring excessive processing and memory resources. Many text files that are used to configure routers will fit into a single typical segment, so the issues of ordering are greatly reduced. Therefore, for TFTP, the use of UDP at the transport layer is desirable because the low overheads and latency outweigh any reliability concerns. TFTP uses simple checksum techniques to detect if a file is actually corrupted in which case it is rejected.

• Domain Name System (DNS). DNS uses UDP as its transport protocol (by default) for lookup queries sent to DNS servers and for responses returned from DNS servers. The fundamental reason for this is to keep the latency of DNS lookups as low as possible; using TCP would incur significantly higher latency because of the need to set up and shut down a TCP connection. However, DNS uses TCP to perform zone transfers between DNS servers, as these are effectively the equivalent of a file transfer, and it is vitally important that the data are not corrupted.

• Simple Network Management Protocol (SNMP). SNMP uses UDP as its transport protocol fundamentally because of the need for low latency and also to keep network usage overheads low so that the network management system does not itself become a source of excessive network load.

3.5.5 TCP and UDP Compared

The main distinguishing characteristics of the TCPs and UDPs are compared in Table 3.1.

Table 3.1

Comparison of TCPs and UDPs

t0010

3.5.6 Choosing Between TCP and UDP

The choice as to which transport protocol is used depends entirely on the communication needs of the particular application or on the higher-layer protocols that the application uses. As discussed above, the TCPs and UDPs are significantly different in almost all aspects of their operation, to the extent that they are almost complete opposites. Therefore, when the communication needs of any particular application are scrutinized, it will usually become clear which transport protocol is most appropriate. Where there are conflicts (e.g., one part of the application requires reliable transport and another part requires the use of broadcast), it may be necessary to use a mixture of both TCP and UDP, each for the specific parts of the communication they are suited to. An example of the combined use of both TCP and UDP is provided by DNS (as explained above). There are also applications that use connection-oriented communication to transmit data between processes but use the broadcast capability of UDP to implement server advertising and thus allow the processes to initially locate each other automatically (see the programming exercise at the end of this chapter for an opportunity to practice this).

Some general guidelines for choosing between the two transport protocols are provided:

• If reliable transport is required, choose TCP.

• If ongoing dialogues between components are necessary or otherwise one component needs to keep track of the number of and state of its communication partners, TCP is likely to be the best choice.

• If broadcast communication is needed, choose UDP.

• If latency and network bandwidth usage are the main considerations, choose UDP.

• If the higher-layer protocols or the application itself provides reliability, the case for using TCP is reduced, and in some cases, UDP would be acceptable.

• UDP datagrams are the simplest units of communication at the transport layer and thus are suitable as building blocks on which to build other protocols. A hybrid transport protocol (developed on top of UDP datagrams) may have a subset of the features of TCP, with some other features not supported by TCP; for example, a broadcast mechanism might be needed, but with acknowledgments from recipients.

• UDP is ideal for real-time data streaming applications in which it is better to permanently loose a dropped packet rather than to suffer the retransmit latency that would occur if TCP were used.

• If the communication is unidirectional, such as service advertisement, heartbeat messages, and some synchronization messages such as clock synchronization, the UDP is likely to be the most suitable transport protocol. If the messages are transmitted periodically, such that occasional loss has no overall effect on the application's behavior, then the case for UDP is even stronger.

TCP is more complex than UDP from the developer viewpoint. However, in situations where the additional reliability or other features of TCP are warranted, it is a false economy to save on development effort and end up with an inferior application.

3.6 Addresses

An address is a description of where to, or perhaps how to, find something. Everyone will be immediately familiar with this concept, because everyone has an “address” where they live. If you want your friend to come round to your house, you must first give them your address. However, rather than one address, most people have many addresses, perhaps as many as ten or even more! What am I talking about? Well, the address where you live could be more specifically described as your postal address, the address I would use to post a letter to you. However, if I wish to phone you, I need to know your phone number. Assuming that the full number including international code and regional code is used, then your phone number is a worldwide unique address; if I use this number, it will connect me to you. You may even have several phone numbers (home, mobile, and work). Similarly, if I wish to send you an email, I need to know your email address. If I wish to contact you through social media, I will need to know your Facebook name, or to connect via Skype, I need your Skype name. If you have a website, it has a unique address that I would use to view the site. So, you can see that there can be a wide variety of types of address, with various different formats, but all of the examples above have one thing in common; they are unique to you (i.e., they identify you).

Resources in a computer system must also have addresses, so that we can access them. Some of the examples above are relevant to this point, especially website addresses and email addresses. These are actually special cases of a class of addresses called Universal Resource Locators (URLs).

Network addresses are long numeric values that are not human-friendly (to illustrate this point, consider how many of your friends' phone numbers you can remember). In contrast, URLs are a textual form of address that is much easier for humans to remember and communicate; this largely due to their pattern-based nature, so that reciting, for example, a website address becomes relatively easy compared to trying to remember the numerical IP address.

The Domain Name System (DNS) provides a special translator service to translate the textual URL addresses into their numeric format when necessary. URLs are discussed in Chapter 4, and DNS is discussed in Chapter 6.

3.6.1 Flat Versus Hierarchical Addressing

Addresses can be flat (which means that there is no pattern or structure within the addresses that can be used to aid locating resources) or hierarchical (which means that the address values contain a structure and that addresses can be allocated in such a way that the value of the address identifies the resource's position in the network).

Phone numbers provide a useful analogy. A phone number has a three- or four-level structure; it comprises an international code, an area code, and a subscriber number. Some subscriber numbers are followed by an extension number, which extends the single subscriber number to represent the different users within an organization (this is analogous to the way a port number extends an IP address; see later). The subscriber number is unique within the set of phone numbers that share the same combined international code and area code values. Of course, it is possible to duplicate each subscriber number in the other international code and area code combinations. The hierarchical structure of a phone number is used to efficiently route the call. The international code is used to route the call to the correct country's phone system. Only once in the correct country does it make sense to apply the area code. Once routed to the correct area, the subscriber number is used to connect to a specific user. If phone numbers used a flat scheme, that is, if a phone number was just a long number with no pattern to it, then the routing would be very difficult. At each routing node in the network, there would need to be access to a full database of all the phone numbers and how to route calls to them from any position in the network.

The Phillips Inter Integrated Circuit system (I2C) is a short-range serial bus communication system designed to facilitate interconnecting microprocessors and peripherals. The I2C address scheme provides a useful example of a flat addressing system. The 7-bit address range limits the number of addressable devices to 127, as address 0 is used to make a general call (broadcast). Applications are usually closed systems of embedded components, for example, a factory automation system with several microcontrollers controlling machines, conveyor belts, and robots and providing control, synchronization, and monitoring. Another example of its use is within a complex piece of office machinery such as a multifunction printer/photocopier or a paper-folding machine in which several microcontrollers are connected via a short bus. The scale of such systems is limited, and thus, the flat addressing is not usually a problem.

3.6.2 Addresses in the Link Layer

The link layer provides connectivity between devices on a single data link. This can include an entire LAN, for example, Ethernet, up to the point where it is bounded by a router. The link layer needs to uniquely identify physical devices and ensure that no two devices have the same physical address. Therefore, the MAC address is fixed for the specific hardware network interface and must be globally unique.

There are many different manufacturers of network hardware devices (network adapter cards, routers, switches, etc.), which all need MAC addresses. The problem of ensuring global uniqueness is solved by having the first three bytes of the MAC address assigned centrally to the device manufacturer. The value of these three bytes is referred to as the Organizationally Unique Identifier (OUI). The device manufacturer is then responsible for ensuring that the second group of three bytes is unique for each OUI. Figure 3.26 shows some typical MAC addresses. Note that several of the MAC addresses have the same OUI, indicating the same hardware manufacturer, but that there is no pattern to the MAC addresses overall and that each one is unique.

f03-26-9780128007297

FIGURE 3.26 Link layer addresses and network layer addresses.

The OUI is purely used to ensure global uniqueness across the different manufacturers. Thus, even though MAC addresses do in some sense have two components, they are a flat addressing scheme as the OUI plays no role in terms of locating devices.

3.6.3 Addresses in the Network Layer

The network layer provides logical computer to computer communication. By logical, it is meant that the network is divided into groups of devices (called subnets), with each group having a set of related addresses and the groups of addresses having patterns such that it is easy to find a specific computer. This is facilitated by the hierarchical nature of IP addresses; all computers in a particular subnet will have the same network component of their address, but a different host part, making their overall address unique. Therefore, the computer's network layer address is based on its position within a network and not on the identity of its physical network adapter; see Figure 3.26.

Figure 3.26 illustrates the differences between link layer addresses and network layer addresses and the roles they play. The diagram shows the way in which the hierarchical network layer (IPv4) addresses contain patterns in their network part (the first three bytes in the case of the IPv4 addresses shown) based on the subnet to which they belong, thus relating to their logical position within the network. The subnet addresses are used by routing protocols to deliver packets to the appropriate subnet. The host part of the address (the fourth byte in the case of the addresses shown), which is unique within each subnet, is then used to deliver the packet the last step of its journey, that is, to the specifically addressed computer. In order to achieve this last step, it is important that each computer has a unique address at the link technology level, to which a frame must be addressed, hence the requirement that MAC addresses are globally unique, because regardless of what device is placed in a particular subnet, it must not be possible for it to have the same MAC address as any other device in that subnet.

The MAC addresses contain an OUI (the first three of the six bytes; see earlier), which can be common among devices produced by the same manufacturer. Figure 3.26 depicts an example where several of the MAC addresses share the same OUI, while others don't; this is done to make the point that this partial pattern plays no role in terms of locating devices.

Note that if we were to replace one computer with another, in the same network, we could assign the same IP address to the replacement as the original had, but the physical address (the MAC address) will be different. If we move a computer to a new location, its IP address will change to reflect its new location, but its physical address will remain the same (because it has the same network adapter).

3.6.3.1 IP Addresses

There are two versions of the IP in current use: IPv4 and IPv6. One of the main reasons that IPv6 was introduced was because the address range provided by IPv4 is insufficient to meet the growing demand for Internet addresses. Thus, IPv6 supports a much wider range of address values.

3.6.3.2 IPv4 Addresses

IPv4 addresses are 32 bits long (4 bytes). An IPv4 address is written in the format of 4 decimal numbers each separated by a single dot, hence called dotted decimal notation. This is generically represented as d.d.d.d where each d represents a decimal number in the range 0-255. An example is 193.65.72.27.

IPv4 addresses are hierarchical in the sense that they contain a network component that is used by routing protocols to deliver packets to the appropriate subnet (as discussed above) and a host part that is used to deliver a packet to a specific computer within the subnet. IPv4 addresses are divided into classes A, B, and C based on the way the 32 bits are split across the network and host parts of the address. Class D is reserved for multicast addressing, used, for example, within some routing protocols so that the routers can communicate among themselves. The division into classes is illustrated in Figure 3.27.

f03-27-9780128007297

FIGURE 3.27 IPv4 address classes.

Figure 3.27 shows the three main address classes used in IPv4 and the multicast address class D. The address class can be determined by inspecting the first few bits of the highest-order byte; the critical bits for this purpose are shown in the figure; for example, if the address begins with “10,” then it is class B.

A subnet mask is a bit pattern that is used to separate the network part of an address from the host part. It is particularly important that routers can determine which part of an address is the network part; this is determined by the part of the subnet mask that is set to binary “1s” (where 255 decimal is the pattern “11111111” in binary). The default subnet masks for address classes A, B, and C are shown in the right-hand side of Figure 3.27.

Some computers have multiple network adapters (each with their own IP address) and can receive packets on any of them. The developers of distributed applications are usually not interested in which physical interface is used to receive a particular packet. However, in such cases where there are multiple adapters and thus the computer has multiple IP addresses, problems can arise if the socket address structure is set up to contain the specific IP address of one of the adapters. The special IP address INADDR_ANY can be used when binding a socket, indicating to the TCP or UDP that the socket can receive any of the IP addresses the computer has. Note that if a socket bound with INADDR_ANY is used for sending, the actual address used (the source address in the packet sent) will be the computer's default IP address, which is the lowest numbered of the addresses it has.

If a message is to be sent to a destination process, which is on the same computer (it is said to be “local”), the special loopback address 127.0.0.1 can be used. This causes the outgoing message to be turned around (looped back) by the network adapter and thus is not actually transmitted externally on the network. The most common usage of the loopback address is for testing and diagnostics.

3.6.3.3 IPv6 Addresses

IPv6 addresses are 128 bits (16 bytes). An IPv6 address is written in the format of 8 hexadecimal numbers each separated by a single colon. This is generically represented as x:x:x:x:x:x:x:x where each x represents a 16-bit hexadecimal number, that is, in the range 0H-FFFFH. An example is FF36:0:0:0:11CE:0:E245:4BC7. The address range represented by an IPv6 address is so vast that parts of the number range may be effectively unused for some time to come, leading to several “0s” in the address; in such cases, a compressed form can be used to replace one string of “0s” (the longest such string) with the symbol “::.” The example given above thus becomes FF36::11CE:0:E245:4BC7, and because only one continuous series of “0s” was changed, it can be unambiguously converted back to the original 8 hexadecimal number representation.

3.6.3.4 Translation Between IP Addresses and MAC Addresses

The Address Resolution Protocol (ARP) translates an IP address into a MAC address. It is used when a sending device (computer or router) needs to create a frame (which is a link layer message) to carry a packet (a network layer message) to another device for which the sender only knows the IP address and therefore needs to find out the destination device's MAC address.

3.6.4 Addresses in the Transport Layer (Ports)

A distributed application comprises two or more processes. Each process must have a unique address so that messages can be directed to it. The transport layer provides process-to-process communication and thus is perhaps the most significant layer from the viewpoint of distributed application developers.

An IP address refers to a particular computer. Thus, the IP address is sufficient to get a packet to the correct computer, and all messages with a particular IP address will be delivered to the same computer. However, modern computers support many processes concurrently, and it is always a process that is the final recipient of a message. Hence, a further level of address detail is needed to provide a unique address to each process within the computer, to facilitate communication, that is, to ensure that the message can be passed to the appropriate process. The additional part of the address is called the port; that is, a port is an extension of the IP address that specifies which process to deliver a message to, once the packet that carries the message has reached the destination computer.

A useful analogy here (which extends the postal address scenario provided earlier) is that of a shared house in which several people live, each having a room numbered from 1 to 4. The sender of a letter addressed to one of these people will write the room number as an additional line of address. The postal system (analogous to the IP) will ignore the room number detail; it is meaningless to the postman; his job is to post the letter to the right building based on the street address, but he has no knowledge of the internal allocation of rooms inside the house. Once the letter has been received at the property, the occupants can examine the room number detail (analogous with port) to determine which person (analogous with process) should receive the message.

Figure 3.28 illustrates a scenario where there are several processes spread across two computers, and the requirement is to send a message (which is encapsulated within a packet) to one specific process. As the two computers have different IP addresses, the IP address (which will be included in the IP header of the packet) is sufficient to get the message delivered to the correct computer. The port number is included in the transport layer protocol (e.g., TCP or UDP) as this layer is concerned with process-to-process communication. Once the packet has reached the destination computer, the transport layer header is inspected and the operating system will extract and deliver the message to the appropriate process based on the port number. Notice that while IP addresses must be globally unique, it is only necessary that port numbers be locally unique (i.e., there cannot be any two the same at a single computer). In the figure, there is a process using the port number 1933 on each computer. Note that even in this case, the combination of an IP address and a port number is still globally unique.

f03-28-9780128007297

FIGURE 3.28 Ports identify specific processes located at a particular computer.

The port number can be written as an extension of the IPv4 address, using a colon to separate the address and port:

t0015

The port number is expressed in decimal and in the example above signifies Telnet.

This is unambiguous because the separator character used between the address components is a dot. However, IPv6 uses the colon as its address component separator character, and thus, the use of a colon to indicate port number is confusing. There are several ways in which an IPv6 address and port number combination can be written, but the preferred technique is to place the IPv6 address within square brackets, followed by a colon, and then the port number:

t0020

The port number is expressed in decimal and in the example above signifies HTTP.

3.6.5 Well-Known Ports

Port numbers (in TCP and UDP) are 16-bit values, which means that approximately 65 thousand are available on each computer. Bespoke applications are free to use the majority of the port numbers, but some parts of the number range are reserved for the common protocols and services, so that particular values can be mapped to specific services. This greatly simplifies the development of distributed systems and the components thereof, as the port numbers that the server side will bind to (and the client side will connect to), are preknown and fixed for a wide variety of services and thus can be designed into software, reducing the amount of run time discovery and configuration needed.

The well-known port numbers are of the range 1-1023. Some examples are shown in Figure 3.29

f03-29-9780128007297

FIGURE 3.29 A selection of well-known port allocations.

Figure 3.29 illustrates the great diversity of services that have been allocated well-known port numbers to facilitate service location and client-service binding. The well-known port numbers can be considered the premier set reserved for the most popular services. There is also a secondary set called the registered ports, which are reserved for a larger group of less common services. These occupy the numerical range of port numbers from 1024 to 49,151. A sample of the registered ports is shown in Figure 3.30.

f03-30-9780128007297

FIGURE 3.30 A selection of the registered port allocations.

Port numbers greater than 49,151 are called dynamic ports and can be used by any application or service. If you are developing an experimental system or a service for one specific company, then it should use ports in the dynamic range.

3.7 Sockets

A socket is a structure in memory that represents the endpoint for communication (i.e., sockets are the means by which processes are identified by the communication system). The central concept is that sockets belonging to each communicating process are connected together, to create communication channels between the processes.

TCP and UDP operate in the transport layer; thus, the sockets are virtual. A socket exists only in the form of a data structure and is a means of making a logical association between processes. Do not confuse virtual sockets with physical sockets; that is, a virtual socket is not a physical portal such as an Ethernet socket.

Figure 3.31 illustrates the concept of virtual sockets. A socket represents the process in terms of identifying it for communication purposes; this means that the socket is effectively the process' interface to the communication system. A socket is associated with the address of the process, which includes both its IP address (identifying the host computer) and the port number, which is unique on each computer and thus identifies the specific process. Each process creates as many sockets as necessary, depending on its communication requirements. The figure shows the way in which processes can communicate via sockets with other processes, which are on the same, or different, physical computer. This aspect is important in fully appreciating the “virtual” nature of the communication. The transport layer protocols send and receive messages on behalf of processes in an access-transparent way; that is, the mechanism is exactly the same from the process' point of view regardless of whether its communication partner is local or remote.

f03-31-9780128007297

FIGURE 3.31 Virtual sockets provide process-to-process connections.

Blocking Versus Nonblocking IO Modes of Sockets: A Brief Introduction: Network communication is a form of IO. Writing a message to a network adapter for transmission over the network is similar to writing a file to a disk in the sense that both cases involve slow external devices requiring specific handling via a device driver. Waiting for a network message to arrive is similar to reading a file from a disk in the sense that in both cases, there is significant delay because the device involved operates slowly relative to the speed of the processor.

Sockets can be configured to operate in one of two IO modes, blocking or nonblocking. The choice between these modes impacts on the scheduler's treatment of the owner process. If the socket is in blocking mode, then the process will be moved into the blocked process state whenever it has to wait for an event such as receiving a message, whereas if the socket is in nonblocking mode, the process can carry on with other activities while the event is pending.

The behavior of the communication protocol itself, either TCP or UDP, is unaffected by the socket's IO mode. The socket IO mode is discussed in more detail later.

3.7.1 The Socket API: An Overview

The socket Application Programmer Interface (API) is a set of library routines (called socket primitives) that a programmer uses to configure and use the TCP and UDP communication protocols from within an application.

Versions of the socket API are available on almost every platform and supported by almost every high-level programming language, making the TCPs and UDPs near-universally available to applications developers.

The main advantages of developing at the transport level are flexibility and control. At this level, the communication is broken down to the level of individual messages, which can be combined together as building blocks to create any protocol necessary. For example, higher-level communication mechanisms such as RPC, RMI, and middleware are all built on top of the TCP and thus developed using the socket primitives. Some applications require specific communication patterns and behaviors that are not provided by the existing application layer communication protocols. Using the socket API, it is possible to embed the custom communication protocol required for a particular application directly into the program code of each component. The case study game provides an interesting example of this approach.

However, building communication logic at the transport level is challenging as the developer needs to understand the low-level characteristics of communication and especially the types of error that can occur and the run time behaviors that these will cause and ultimately the impact these have on the reliability and correctness of the application itself. The developer must ensure that faults and failures are handled robustly, taking care to use the resources of the system efficiently, especially in terms of network bandwidth.

The socket API primitives are explained individually in Appendix. Annotated code examples are provided for each primitive.

3.7.2 The Socket API: UDP Primitive Sequence

This section describes the typical use sequence of the socket primitives when implementing communication based on the UDP.

1. The socket primitive is used to create a socket. All of the other primitives require the identity of this socket when performing actions such as connect, send, and receive.
The socket's configuration is important; it can be set to operate as a TCP stream-based socket or as a UDP datagram-based socket. It can be configured to operate in either the blocking or nonblocking IO mode, using the ioctlsocket utility.

2. The bind primitive is used to map a process to a port (binding is discussed in detail later in this chapter).

3. The sendto primitive is used to send data to another process.

4. The recvfrom primitive is used to retrieve data from the receive buffer.

5. The closesocket primitive is used to close the socket.
A typical sequence for UDP communication is illustrated in Figure 3.32.

f03-32-9780128007297

FIGURE 3.32 Sequence of primitive calls in a typical exchange between a pair of processes using UDP.

Figure 3.32 shows a possible sequence of primitive calls involved in a typical exchange between two processes using the UDP. Prior to communication, each process creates a socket that is necessary as the logical endpoint for the communication that will take place. Each process then binds its socket to its local address (which comprises the computer's IP address and the port that the specific process will use to receive messages). This is necessary because when the process subsequently issues a recvfrom primitive request, the local operating system must have a mapping between the process and the port the process uses, so that the operating system can direct arriving messages to the appropriate process. The two processes now issue sendto and recvfrom requests in whatever sequence necessary to achieve the application's communication needs. For example, if transferring a large file in a series of fragments, process 1 could send the first fragment and process 2 could send back an acknowledgment (there are no in-built acknowledgments provided by the UDP); this could be repeated several times until the file transfer is complete. Each process then closes its socket, using the closesocket primitive.

Note that in Figure 3.32 (and all similar figures in which arrows are used to depict communication events against a time line), the arrows are sloped in the time direction. This is to reinforce that all communication has some latency and thus the actual sending of a message always occurs a short time before the message arrives at its destination.

Activity C3 provides an introduction to the UDP and datagram buffering.

3.7.3 The Socket API: TCP Primitives Sequence

This section describes the typical use sequence of the socket primitives when implementing communication based on the TCP.

1. The socket primitive is used to create a socket.

2. The bind primitive is used to map a process to a port.

3. The listen, connect, and accept primitives are used to set up a connection.

4. The send primitive is used to send data to another process.

5. The recv primitive is used to retrieve data from the receive buffer.

6. The shutdown primitive is used to close the connection.

7. The closesocket primitive is used to close the socket.

Activity C3

Using the Networking Workbench to Explore the UDP and Datagram Buffering

Prerequisite

Download the Networking Workbench and the supporting documentation from the book's supplementary materials website. Read the document “Networking Workbench Activities and Experiments.”

Learning Outcomes

1. To gain an initial understanding of the UDP

2. To understand the concept of datagrams and the sending and receiving of messages without having to set up a connection

3. To gain an initial understanding of message buffering and how it applies to UDP datagrams

4. To gain basic familiarity with the Networking Workbench

Method

This activity is carried out in three steps:

1. Experimentation with UDP
Use two copies of the “UDP Workbench” application, within the Networking Workbench (on the UDP Tab), to investigate the basic communication functionality of UDP. Ideally, use two computers (but you can run both instances of the workbench on the same computer if need be). Set up a simple two-way communication in which each user can send and receive messages to/from the other.
Ensure that the unicast radio button is selected. You will need to configure the send IP address (Address 1) in each instance of the UDP Workbench to hold the address of the computer where the other copy is located. You will also need to set the send and receive ports appropriately. If running the two copies on separate computers, start with all the port numbers being the same; otherwise, set the ports as shown in the screenshot below. After you have set the receive port number, you need to click on the “Enable Receiver” button. The screenshot shows the configuration of two copies of the UDP Workbench on a single computer. A single message has been sent in each direction.

b03-06-9780128007297

2. Buffering (Part 1)
For this part, you will need to use one copy of the UDP Workbench (this will be used to send messages) and one copy of the Non-Blocking Receive, which is also available on the UDP tab of the Networking Workbench (this will be used to receive messages).
The two applications can be on the same or different computers, but ensure that address and port settings are appropriate before continuing (make sure that the sending IP address of the UDP Workbench is that of the computer where the Non-Blocking Receive resides and that the send port on the UDP Workbench is the same as the receive port on the Non-Blocking Receive).
Take care to follow the EXACT sequence of the steps set out below.
Start the Non-Blocking Receiver, but do not press any buttons.

1. Using the unicast addressing mode, use the UDP Workbench to send a message containing “A” to the receiver (the send port should be 8000).

2. Ensure that the receive port on the Non-Blocking Receiver is set to 8000, and click “Bind.”

3. Click “Receive” on the Non-Blocking Receiver—did it receive the “A”? If not, why not?

4. Send a message containing “B” to the receiver.

5. Did anything happen at the receiver?

6. Click “Receive”—what happens?
The result is different from that of step 3 above. Why?

7. Send a message containing “C” to the receiver.

8. Send a message containing “D” to the receiver.

9. Click “Receive”—what happens?

10. Click “Receive” again—what happens?
The screenshot below shows the configuration of the two applications on the same computer (after step 6).

b03-07-9780128007297

3. Buffering (Part 2)
Use the same configuration as for the experiment: Buffering part 1.

1. Start the Non-Blocking Receiver, and press “Bind.”

2. Use the UDP Workbench to send a message many times to the receiver in quick succession.

3. Now press “Receive”—what happens? And again?

4. Describe what is happening here in terms of packet queuing and buffering.

5. Where do you think the buffer is held, at the sender or receiver end?

a. Devise a simple experiment to confirm your hypothesis? (Hint: what happens if the sender is shut down after it sends a message but before the message has been displayed by the receiver?)

6. Are the UDP segment boundaries maintained throughout the entire transmission process (including any buffering that might occur), or can they be further divided or concatenated? In other words, are the messages kept separate, or are they merged when retrieved from the buffer? Try to confirm your answer through empirical investigation (i.e., carry out some more experiments to find out).

Expected Outcome

Through these experiments, you should have gained an initial understanding of how the UDP works and how datagrams are sent and received. The second and third parts of the activity focus specifically on the buffering behavior; you should have found that the datagrams are buffered on the receiver side and that they are kept separate in the buffer, so that each datagram must be retrieved one by one.

Reflection

As a result of doing this activity, you should be able to list, in correct sequence, the different actions that need to be performed in order to set up and use UDP communication between a pair of processes. Try to identify the steps required, starting from “each process creates a socket.” You may wish to rerun the activity to enable further exploration.

A typical sequence for TCP communication is illustrated in Figure 3.33.

f03-33-9780128007297

FIGURE 3.33 A typical TCP exchange.

Figure 3.33 shows a typical sequence of primitive calls involved in the setting up, using of, and shutting down of a TCP connection. Each process first creates a socket as the logical endpoint for the connection between the processes. The server side (process 1) then binds its socket to its local address (which comprises the computer's IP address and the port that the specific process will use to receive messages). If the bind primitive is successful, the server-side process then executes the listen primitive. The client side (process 2) subsequently executes the connect primitive, which causes a connection to be established between the two processes. The server side then executes the accept primitive to handle the specific newly created connection; this involves creating a new dedicated socket at the server side for use with the connection. The processes now communicate using send and recv requests in whatever sequence is necessary depending on the application's communication needs. The connection is then shutdown by each side invoking the shutdown primitive. Each process then closes its socket, using the closesocket primitive.

Activity C4 provides an introduction to the TCP and stream buffering.

Activity C4

Using the Networking Workbench to Explore the TCP, the TCP Connection, and Stream Buffering

Prerequisite

Download the Networking Workbench and the supporting documentation from the book's supplementary materials website. Read the document “Networking Workbench Activities and Experiments.”

Learning Outcomes

1. To gain an initial understanding of the TCP

2. To understand the concept of the TCP connection

3. To understand sending and receiving of messages using a connection

4. To gain an initial understanding of message buffering and how it applies to TCP data streams

Method

This activity explores the TCP and thus uses the TCP:Active Open and TCP:Passive Open applications (each found on the TCP tab of the Networking Workbench). The TCP:Active Open application initiates connection establishment; this is typically the client side of a client server application. The TCP:Passive Open application waits for connection requests; this would typically be the server side.

The activity is carried out in three steps.

1. Understand the behavior of the TCP primitives and the sequences in which these are used
The TCP API is implemented as a set of primitives (simple functions). These primitives (such as bind, listen, accept, connect, send, and recv) must occur in a valid sequence between the two endpoint processes, in order to establish and use a connection. For example, the Passive Open end must create a socket and then bind that socket to a port before certain other steps can occur.
This experiment is designed to enable students to investigate the effects of using the primitives in various sequences. It is intended that students identify the correct sequence through logical deduction.

1. Start two copies of the Networking Workbench, preferably at different computers (one for TCP:Active Open and one for TCP:Passive Open).

2. Use the applications to create connections, send and receive packets, and close connections.

3. Investigate the event sequences that lead to successful communication. Perform sufficient experiments to confirm your understanding.

The screenshot below shows the two processes once a connection has been set up and a single message has been sent in each direction.

b03-08-9780128007297

2. Understand the behavior of streams and stream communication

1. Start two copies of the Networking Workbench, preferably at different computers (one for TCP:Active Open and one for TCP:Passive Open).

2. Establish connection.

3. Set the active open end's socket to nonblocking.

4. Enable receive at the Active Open end, and send a packet to it.

5. Note that the packet is delivered as it was sent.

6. Disable receive at the Active Open end and then send two packets to it.

7. Enable receive.

Q1. What happens (what exactly is delivered to the application)?

Q2. What does this tell you about the nature of stream communication?

3. Understand connections and the relationship between listen and accept

1. Start two copies of the Networking Workbench, preferably at different computers (one for TCP:Active Open and one for TCP:Passive Open).

2. Create sockets at both the Active and Passive ends.

3. At the Active end, try to Connect—what happens and why?

4. Bind at the Passive end.

5. At the Active end, try to Connect—what happens and why?

6. Listen and Accept at the Passive end.

7. At the Active end, try to Connect—what happens and why?

8. Close connections and sockets at both ends.

9. Repeat steps 2 and 4, and then, listen at the passive end.

10. At the Active end, try to Connect—what happens and why?

11. Accept at the Passive end—what happens and why?
You should now have a better idea of the relationship between Listen and Accept.

Q1. State the role of each.

Q2. Why might it be important to separate these two steps of connection establishment (hint: think of typical process behavior and lifetimes in client server applications).

Expected Outcome

Through these experiments, you should have gained an initial understanding of how the TCP works, the nature of the TCP connection, and the steps required to create a connection. You should also have found that the stream segments become concatenated together when buffered on the receiver side so that the entire buffer contents can be retrieved in one go.

Reflection

As a result of doing this activity, you should be able to list, in correct sequence, the different actions that need to be performed in order to use the TCP. In particular, the sequence for creating a connection is important. Continue exploration until the concepts and event sequences are clear to you.

3.7.4 Binding (Process to Port)

Binding is the term used to describe the making of an association between a process and a port and is performed by the local operating system (i.e., the operating system of the computer where the particular process resides). The port is particularly important in the mechanism of receiving a message, as the operating system must know which process to pass a particular arriving message to. This is done by inspecting the port number in the received message's transport layer header (e.g., TCP and UDP both have a destination port number in their header). The operating system then looks at its port table to find a match. There will be an entry in the port table for each process-port relationship, so if a process is associated with the particular port number, the operating system will be able to get the process' PID and thus deliver the message to the process.

The bind primitive (discussed in outline earlier; see also Appendix) is the means by which a process makes a request to its operating system to request use of a port. There are a few restrictions that the operating system must check before granting the request. Firstly, the operating system must ensure that the port is not already in use by another process; each port must be mapped to only one process because otherwise the operating system will not know how to deliver a message. Allowing two processes to use the same port at the same time would be analogous to having two houses in the same street with the same door number: how would the postman know how to deliver letters? Another analogy would be giving two people the same telephone number; it would be ok when they make calls, but what would happen when someone tries to call one of them?

You might at this point be asking why are ports needed? Why not send messages addressed using the process' ID (the PID), which is created when the process is created (and thus is also known by the operating system), is guaranteed unique, and cannot change throughout the process' lifetime? It is on the face of it a very good question. The answer is that because the operating system allocates a process ID automatically when a process is created, in a simple round-robin fashion, it is not possible to know in advance what the PID will be for any particular process. In fact, if you think back to the discussion of programs and processes in the Process View chapter, you will recall that a single program can be executed several times simultaneously, even at the same computer, giving rise to several processes (each having a different PID). Even if the program is only executed once at a time, the process ID allocated to the process is far more likely to be different each time it is executed, than it is likely to be the same. A further consideration is that if we execute the same program on a number of different computers at the same time, each process will likely have a different PID. Although in all cases, the PID is guaranteed to be locally unique to its process, a sending process located on a different computer cannot know what the PID is. If the PID were used as the delivery address, a lot of additional work would have to be done in order that the sender process could find out the appropriate PID number of a particular remotely running process. In contrast, because the port number is supplied by the process itself when making a bind request to the operating system, it can be known in advance; in particular, it can be known when the application is written and thus can be embedded in the logic of both the sending and receiving programs. This is the reason why certain port numbers are “well known” and reserved for a particular application (see earlier discussion). As an example, an FTP server uses the well-known port numbers 20 (for data transfer) and 21 (for control and commands). An FTP client does not need to perform any complex lookups to find out the PID of the remote file server process; it simply has to send its messages to the appropriate port number. A new FTP client can be developed without any knowledge of the internal operation of the FTP service itself. As long as the FTP communication protocol is used and ports 20 and 21 are used appropriately to address the server, the client will be able to connect to the FTP server and have its FTP commands actioned.

Arriving messages are filtered based on their destination IP address (this is performed by the IP device driver), and for those that match the local machine's address, the operating system takes responsibility to deliver the message to the appropriate process. The operating system looks up the destination port number identified in the transport layer header within the message, in its port table. If there is an entry for the particular port number, then the message will be passed to the appropriate process (identified by its PID in the port table). If the port number is not found in the port table, then the message is dropped. This basic mechanism is illustrated in Figure 3.34.

f03-34-9780128007297

FIGURE 3.34 Binding a process to a port.

Figure 3.34 illustrates the mechanism of binding a process to a port. In step 1, the process identified by its PID (17) issues a bind request, asking to use port number 1234. In step 2, the operating system checks to see if the port is already in use, and as it is not in use, it creates an entry in the ports table and returns “success” when the bind call completes (so that the process knows the port has been allocated to it). Subsequently, a message arrives, addressed to port 1234 (step 3). The operating system searches the port table and finds that process 17 is using this port number, so the message is held in a buffer for the process. If a message arrives for a port that does not have an entry in the port table (as in step 5), it is discarded (step 6). Storing the message would tie up precious space in the buffer and would be pointless because there is no process interested in the message. When process 17 issues a receive request (step 7), the message is passed to the process (step 8).

Although discussed in other sections, it is worth mentioning here that the illustration of the operation of bind (as shown in Figure 3.34) reinforces the important point that the operating system's buffering of arriving messages decouples the target process from the actual mechanism of receiving the message from the network. As long as the process has issued a successful bind request, the operating system will receive messages from the network on behalf of the process. This is important for two main reasons: (1) that individual processes do not directly access or control the network adapter and the IP driver and (2) that the process may not be in the “run” state at the time the message actually arrives and in such case would not be able to move the message into a buffer as it arrived.

Activity C5 explores binding.

Activity C5

Using the Networking Workbench to Explore Binding

Prerequisite

You should have completed Activity C3 and have gained a reasonable understanding of the UDP before attempting this activity.

Learning Outcomes

1. To gain an understanding of the need for binding processes to ports

2. To gain an initial understanding of the operation of bind

3. To use bind to establish a link between a process and a port

4. To understand why incoming messages are ignored before bind has been executed

5. To understand common reasons why bind may fail

6. To understand that the same port number can be reused on different computers, that is, it is the port and IP address combined that must be unique

Method

This activity is carried out in two parts:

1. The fundamental purpose of binding
Use two copies of the “UDP Workbench” application, both running on the same computer.
Set up the IP addresses and port numbers appropriately. Both IP addresses should be set to the address of the local computer. Set the send port of the first copy of the UDP Workbench to 8000 and its receive port to 8001. Set the send port of the second copy of the UDP Workbench to 8001 and its receive port to 8000.

1. BEFORE enabling the receiver at either copy, send a message from the first copy to the second copy.

Q1. What happens (does the message arrive)?

2. Now, enable the receiver at the second copy of the UDP Workbench.

Q2. Does the message arrive if you now enable the receiver (i.e., has it been buffered)?

3. Send a second message from the first copy to the second copy.

Q3. What happens (does the message arrive)?

Q4. What does this behavior tell you about the significance of binding?

2. Exclusive Use of Ports
Use two copies of the “Non-Blocking Receiver” (found on the UDP tab of the Networking Workbench), both running on the same computer.

1. Start a copy of the Non-Blocking Receiver and leave its receive port at the default value.

2. Press “Bind.”

3. Start a second copy of the Non-Blocking Receiver at the same computer. Ensure that its receive port value is the same as for the first copy.

4. Press “Bind.”

Q1. What happens? Why is this behavior not only correct but also necessary?

Q2. What do you think would have happened if the second receiver had been at a different computer?

Q3. Confirm your hypothesis empirically (check your answer by trying it out). What happens?

The screenshot below shows the situation after the second attempt to bind to the same port.

b03-09-9780128007297

Expected Outcome

From part 1, you should have discovered that messages sent to a port before the bind has occurred are ignored; that is, they are not placed in the buffer. This is because the operating system is not aware that the receiver process is expecting them, until it has bound to the particular port.

From part 2, you should have discovered that you can only bind one process per computer to any particular port. If two processes try to bind to the same port, the second attempt is refused. This is important because the operating system needs to know which process to pass a received message to, and it does this based on the port number in the message. If two processes were to be allowed to bind to the same port then the operating system would not know which one to pass an arriving message to.

From part 2, you should have also discovered that the same port number can be reused on different computers. For example, two processes on different computers can both bind to the same port number. It is the port and IP address combined that must be unique.

Reflection

It is very important to understand the way in which the operating system identifies the recipient processes by the port number contained in each message and the role that bind plays in making the association between the process and the port.

3.8 Blocking and Nonblocking Socket Behaviors

The discussion above provides an explanation of the basic mechanism of receiving a message and passing it to a process (see Figure 3.34). However, the way in which the message is passed to the process is affected by additional factors, thus leading to several variations in behavior, as discussed here.

Sockets can be configured to operate in two IO modes: blocking and nonblocking. This affects the way in which the operating system treats the socket's owner process if it attempts an operation that uses the socket and that cannot complete immediately. For example, the process may issue a recv request and have to wait for a message to arrive. The process must be in the run state when it actually issues the recv request (because it can only execute instructions when using the CPU). This means that the process must either be able to do something else useful while waiting for the message to arrive or must be moved into the blocked process state until the message arrives. The programmer chooses between these two approaches when deciding which socket IO mode to use.

A socket in blocking mode will cause its process to be moved to the blocked state, by the operating system, whenever a request cannot be completed immediately. This means that the process will be held at the point where the primitive request (e.g., accept, recv, or recvfrom) was made until the awaited event occurs. Put another way, think of the primitive request as being effectively a subroutine call; the call will not return until the event (such as receiving a message) has completed. There is no limit as to how long a process may have to wait.

A socket in nonblocking mode will cause an error message (type = “WOULD_BLOCK”) to be returned by the operating system, to the process whenever a request cannot be completed immediately. The important point here is that the process does not encounter any delay (it does not wait for the event to occur). Instead, the process must examine the error message received and determine its action accordingly.

Figure 3.35 shows the sequence and behavior of socket primitives when a message is sent by one process to another process that is using a blocking socket. In order that communication can take place, each process first creates a socket. This must be accompanied by a socket address structure populated with the appropriate address details; for the UDP example illustrated, the sending side must be configured with the IP address and port number of the process it is sending to. For the receiving side, the address structure must contain its own IP address and the port number on which the message will be received.

f03-35-9780128007297

FIGURE 3.35 Behavior of socket primitives, in a UDP context, when the socket is set to blocking mode.

As the UDP is datagram-based, there is no need to establish a virtual connection (as there would be with TCP). As soon as the sender has created its socket, it can use it as the basis to make a sendto primitive call.

The receiving side must use the bind primitive call to take ownership of the required port (thus identifying the receiving process to the operating system and enabling the operating system to associate the process ID and the port number together in its port table). The figure shows the scenario where a message arrives at the receiving side before the bind has occurred; in this case, the operating system rejects the message as it has no entry in its port table and thus cannot identify the recipient process. Messages arriving after the bind has been completed are buffered and forwarded to the receiving process. Exactly how this occurs depends on the timing, with respect to the issuance of recvfrom calls. If the message arrives before a recvfrom has been issued, it is buffered and delivered upon issuance of the recvfrom call (as with the second message sent in Figure 3.35). If a recvfrom call has already been issued, the process will be in blocked state, and in this case, a message will be delivered to the process as soon as it arrives in the operating system's buffer. This has the effect of moving the process into the ready state.

Figure 3.36 shows the same sequence of primitive calls and with the same relative timing as in Figure 3.35; however, in this case, the socket is set to nonblocking mode. The important difference in behavior arises when the second recvfrom primitive call is made. In the nonblocking socket case, the call returns immediately and the process can continue in the running state, whereas in the blocking socket mode, the process was blocked until a message arrived.

f03-36-9780128007297

FIGURE 3.36 Behavior of socket primitives, in a UDP context, when the socket is set to nonblocking mode.

Figure 3.37 shows an overview of the behavior of, and sequence of use of, the main socket primitives in a TCP context. As with UDP, the exact behavior depends on whether the socket is configured for either blocking or nonblocking operation, and this leads to the same two types of behavior in cases when the process would have to wait for an event to occur; either the process is moved into the blocked state by the operating system, or an error message is returned to the primitive call that would have otherwise blocked. However, the TCP is more complex than UDP, especially in terms of the need to set up a connection prior to communication taking place and the need to close (teardown) the connection when communication is complete. Thus, it is not possible to clearly represent all of the possible timing sequences and resulting behaviors in a single diagram.

f03-37-9780128007297

FIGURE 3.37 Behavior of socket primitives, in TCP context.

The TCP primitives that are of most interest in terms of the socket IO mode are recv and accept. recv is analogous to and exhibits the same respective behavior as UDP's recvfrom in each of the two socket modes. This is for the same underlying reason that it is not possible to know at the receiving side exactly when a message will be sent to it and thus when it will arrive (so there is always a likelihood that it will have to wait for it). Similarly, it is not known in advance when the accept primitive is issued whether a connection request is pending or when in the future such a connection request will arrive. Note however that the listen primitive only has to be issued once and its waiting behavior is handled at a lower level, within the TCP software.

3.8.1 Handling Nonblocking Socket Behavior

When a nonblocking primitive call cannot complete immediately, it returns with an error code. In most cases, this does not represent a real error in the sense that something has gone wrong, merely that the requested action could not be performed yet. Thus, a common requirement is for the application logic to wait a while and then try the action again. Perhaps, the easiest way to achieve this is to use a timer that generates a software interrupt after a programmable time interval. This can be implemented as follows: When a primitive call such as recv returns with the “WOULD_BLOCK” error code, start a timer (configured for the appropriate time interval), so that the action can be tried again after the given time span. In the meantime (while the timer is counting down), continue with other processing, which is entirely application-dependent. When the timer expires, retry the primitive again. This approach is illustrated in Figure 3.38.

f03-38-9780128007297

FIGURE 3.38 Using a timer to facilitate autoretry for failed nonblocking primitive calls.

Figure 3.38 shows how a timer can be used to enable a process to repeat its failed actions after a set time period. My preferred extension of this approach is to assume that due to the asynchronous nature of network message receipt, in general, there will not be a message waiting when the process requests it. Therefore, to make the attempt to receive a message periodic by design and to effectively treat the timer and event handler mechanism as a separate thread of control that operates continuously in the background. By shortening the time period, I can make the process more responsive to message arrival (by checking for messages more frequently), but with the trade-off of increased processing overheads. This trade-off can be tuned so that the rate of checking for messages matches the typical rate of actual message arrival in a particular application. This approach is illustrated in Figure 3.39 and has been used in both the client and server components of the case study game; see later.

f03-39-9780128007297

FIGURE 3.39 Periodic handling of nonblocking primitive events using a programmable timer.

3.8.2 Communication Deadlock

A deadlock occurs in a system when a group of two or more processes each wait for resources that are held by other processes in the group, such that none can make progress and hence the resources are never released for the waiting processes to use them. This is explored in detail in Chapter 4.

Distributed deadlock is an extension of the deadlock problem that occurs in distributed systems, where the processes and resources involved in the deadlock are spread across more than one of the computers in the system.

Communication deadlock is a special case of distributed deadlock, in which the resources that the group of blocked processes are waiting for are messages from other processes in the group. See Figure 3.40.

f03-40-9780128007297

FIGURE 3.40 Communication deadlock.

Figure 3.40 illustrates the problem of communication deadlock in which a pair of processes are each blocked while waiting for a message to arrive from the other one. Since each process is blocked, it will not be able to send a message, and thus, the messages that each process is waiting for will never arrive.

Fortunately, the situation can only arise when certain communication configurations are used and thus can be avoided by design. The requirements for communication deadlock to be possible are that the receive sockets are both configured in the blocking IO mode and that the send operations are on the same threads as the receive operations in both processes. In addition, both processes have to be waiting to receive at the same time for the deadlock to actually occur.

The easiest way to prevent communication deadlock by design is to ensure that at least one of the processes uses the nonblocking socket IO mode. Alternatively, use separate threads to handle sending and receiving actions such that the send thread can continue to operate while the receive thread is blocked.

Activity C6 investigates blocking and nonblocking socket behavior.

Activity C6

Practical Investigation of Blocking and Nonblocking Socket Behaviors and Communication Deadlock, Using the Networking Workbench

Prerequisite

You should have completed Activity C4 and have gained a clear understanding of the TCP and how to establish a connection, before attempting this activity.

Learning Outcomes

1. To gain a basic understanding of the differences between blocking and nonblocking socket behavior

2. To understand what is meant by communication deadlock and how it can occur

Method

This activity is carried out in two parts:

1. Understand the Implications of Blocking Versus Nonblocking Communication

1. Start two copies of the Networking Workbench, preferably at different computers (one for TCP:Active Open and one for TCP:Passive Open).

2. Establish a connection between the Active and Passive ends.

3. Set BOTH connection sockets to nonblocking mode.

4. Enable both receivers.

5. Send and receive several messages in each direction between the two ends.

6. Set ONE of the connection sockets to blocking mode.

7. Send and receive several messages in each direction between the two ends. Do you notice any change in behavior from that in step 5 above? How do you account for this behavior?

2. Understand how deadlock can occur within communicating applications when both ends use blocking sockets

1. Start two copies of the Networking Workbench, preferably at different computers (one for TCP:Active Open and one for TCP:Passive Open).

2. Establish connection between Active and Passive ends.

3. Set BOTH connection sockets to blocking mode.

4. Enable both receivers.

5. Attempt to send and receive several messages in each direction between the two ends. What happens? How do you account for this behavior?

Expected Outcome

From part 1, you should have discovered the fundamental difference between blocking and nonblocking socket IO modes. When both sockets were set to nonblocking mode, you should see that both processes can send and receive in any order that the user chooses. However, when the socket is set to blocking mode, the process is unresponsive while waiting to receive a message (i.e., the process has been blocked by the operating system because it is waiting for an IO operation to complete).

From part 2, you should have encountered communication deadlock. When both processes have blocking sockets and are waiting to receive a message from the other one, they are each in blocked state; this means that neither one is able to send a message, and thus, they are both doomed to keep waiting indefinitely for a message that can never be sent and thus will never arrive.

Reflection

The IO mode is one of the more complex aspects of sockets. However, from this activity, you should appreciate how the choice of mode can have significant effect on the behavior of processes, and thus, it is an aspect well worth mastering early on. You need to understand the different behaviors arising from each of the two modes and also be able to determine when each mode is appropriate to use.

You can also explore the socket IO modes with the UDP, within the workbench. Use the UDP Workbench as the sending process and experiment with each of the Blocking Receiver and the Non-Blocking Receiver applications (all available on the UDP tab).

3.9 Error Detection and Error Correction

Communication protocols in general are designed to detect errors that occur in transmission so that they can reject corrupted messages. The actual detection of corruption is usually based on a checksum-based technique in which a low-cost numerical representation of the data is generated at the sending side and included in the header of the communication protocol. On arrival at the receiver, the checksum is again generated from the received data and compared with the value that was included in the message. If there is any difference, the message is considered corrupted and is discarded. Reliable protocols go a step further by signaling that a retransmission is needed. Forward Error Correction (FEC) codes, in contrast, carry additional information to enable reconstruction of the data at the receiver side, without the need for retransmission.

There is a clear trade-off between the two techniques. Error detection incurs additional latency while a message retransmission is requested, and the original message is resent. This is generally suitable for use where errors occur infrequently and the occasional added latency can be tolerated. Error correction incurs additional overhead in every message, so it is costly to use in systems where errors occur rarely but is ideal where errors occur very frequently (such as over very unreliable links) because in such situations, even if there is a retransmission mechanism in place, there is no guarantee that the resent message will not also be corrupted. Error correction avoids the additional latency incurred by retransmission, and this could be very important in high-speed services but is perhaps even more important where there are high propagation delays due to long-distance connections. A very good example of this is space exploration. Messages sent to spacecraft or robot missions to other planets take a very long time to propagate due to the large distances involved, making retransmissions highly undesirable.

Error-correcting codes are also important where the messages have very high value, for example, control signals used in fly-by-wire systems, or when sensing dangerous environments such as the internal state of a nuclear reactor. In such scenarios, it is important to ensure that a message can be understood despite a limited number of bit errors, thus being able to deal with the information promptly, avoiding the delay of having to resend the message.

3.9.1 A Brief Introduction to Error Detection and Error Correction Codes

When designing an FEC code, a main consideration is the number of bit errors that the code will be able to correct. In other words, the number of bit transitions (0-to-1 or 1-to-0) that can occur within a given-length stream of bits transmitted and yet the original data value can still be determined.

Consider the 8-bit data value 11001100. A single bit error could give rise to, for example, 10001100, 11011100, or 11001110. Two bit errors could give rise to, for example, 00001100, 11101110, or 01001101.

The number of bits that are different in the data (from the original to the modified versions) is called the Hamming distance (d); the single-bit error examples shown above have a Hamming distance of 1 (i.e., d = 1), while the two-bit error examples have a Hamming distance of 2 (i.e., d = 2).

If all bit combinations in a particular code are possible correct values, then the value of d for the data code is 1 and there is no redundancy; that is, a single bit change shifts the code from one legal value to another and therefore cannot be detected as an error. Consider a 4-bit code that holds a Customer ID (CID) and that CIDs can range from 0 to 15 (decimal). The following binary values are all acceptable: 0000, 0001, 0010, 0011, …, 1111. If, during transmission, a single bit error occurs in any of these values, the resulting code (which is wrong) has the value of another acceptable code and is thus undetectable as an error. For example, 0010 can become 0011, which are both valid codes, and thus, the receiver of the value 0011 cannot tell if this is the value that was actually sent or if an error has occurred turning a different value into 0011.

To perform error detection or correction on an original data message, additional information must be added at the point of transmission. The additional information is a form of overhead (also referred to as “redundant bits”) since it must be transmitted as part of the new larger message but does not carry any application-level information.

The simplest way to detect a single bit error in the 4-bit code is to use parity checking, in which case one additional bit must be added (the parity bit). In this case, for every four data bits transmitted, a fifth parity bit must be transmitted, so the overhead is 20%; or alternatively, you can consider that the efficiency is 80%; since 80% of the bits transmitted carry application data, the remaining 20% are redundant from the application viewpoint.

Consider “even parity” in which the number of “1” bits must be even in the 5-bit code that is transmitted. CID values 0000, 0011, and 1001 are examples where the number of “1” bits is already even; thus, a parity bit of “0” will be added, turning these values into 00000, 00110, and 10010, respectively. CID values 0001, 0010, and 1101 are examples where the number of “1” bits is odd; thus, a parity bit of “1” will be added, turning these values into 00011, 00101, and 11011, respectively.

If a single bit error occurs now, for example, 00110 becomes 00100 during transmission, the error will be detected by the receiver, because the parity check will fail (there are not an even number of “1s”). Notice that although the error is detected, it cannot be corrected because there are many possible valid codes that could have been translated into the received value 00100 by a single bit translation, including 00101, 10100, and 01100.

To achieve error correction, additional redundant bits must be added such that there is only one valid data code that could be translated into any particular received value, as long as the number of bit errors does not exceed the number supported by the error correction code. The theory aspect of error correcting codes is a complex subject, so the discussion will be limited to a simple example based on the triple modular redundancy technique to provide an introduction.

In a triple modular redundancy code, each bit is repeated three times, that is, transmitted as 3 bits, so each “0” becomes “000” and each “1” becomes “111.” There are only two valid code patterns per three-bit sequence that can be transmitted, and it is not possible for a single bit error to convert one valid code into another. A single bit error turns 000 into one of 001, 010, or 100, all of which are closer to 000 than 111, and thus, all are interpreted as a 0 and not a 1 data value. Similarly, a single bit error turns 111 into one of 110, 101, or 011 all of which are closer to 111 than 000, and thus, all are interpreted as a 1 data value. Thus, a single bit error can be automatically corrected, but at the cost of a significantly increased message size. For the triple modular redundancy technique, the overhead is 67%; in other words, the code efficiency is 33%; since 33% of the bits transmitted carry useful data, the remaining 67% are redundant. Fortunately, more efficient error-correcting codes do exist.

If triple modular redundancy were applied to the earlier customer ID example, each 4-bit CID value would be transmitted as a 12-bit message. For example, 0001 would be transmitted as 000000000111, and 0010 would be transmitted as 000000111000.

3.10 Application-Specific Protocols

An application-specific protocol describes the structure, sequence, and semantics of the communication between the components of a distributed application. This is necessary when applications communicate directly at the transport layer or otherwise have communication requirements beyond the functionality provided by the standard protocols in the application layer.

An application-specific protocol is designed for a particular application and represents the sequence of messages and the contents of those messages, necessary to achieve the communication requirements of the application.

The application-specific protocol is not the same as an application-layer protocol. This is a very important distinction. The application-layer protocols are a set of standard protocols to provide clearly defined communication services, such as transfer a file (FTP), retrieve a WWW page (HTTP), facilitate remote login (Rlogin), and many others (some of which were discussed earlier). These all reside in the application layer of the network stack. However, a specific application may require a unique mix of different types of communication. For example, it may require that an initial connection is made using TCP to exchange credentials and check security and access rights and establishing the actual communication requirements for a specific data exchange. An eCommerce application may involve access to some web services, transfer of a specific file, and/or retrieval of data from one or more databases. It may use a time service such as NTP to provide trusted timestamps for transactions (i.e., not trusting the local computer's clock). A multimedia service may use HTTP as its main interface but, depending on user selections, may require access to various databases or may need to transfer files or stream some real-time data; this may require use of FTP, Rlogin, and others. A networked game may use TCP or UDP directly to achieve the main login and in-play game-data transfers but may also require other protocols, for example, to stream some content or to provide players with an in-built text-chat or email facility. Thus, the application itself must be considered to be “above” the application layer and not “within” this layer because it uses the resources of the application layer.

The application-specific protocol will build on the facilities provided by the underlying communication protocol. So, for example, when using TCP, the application-specific protocol can rely on the transport protocol (i.e., TCP) to deliver the messages in the correct sequence, because TCP has in-built acknowledgment-based recovery and sequence number-based message ordering. However, if the same application were redesigned to use UDP instead as the transport layer protocol (e.g., because of UDP's broadcast capability), then the application would have to take care of message recovery and sequencing itself because UDP does not provide these functions. This would require that the application implements its own sequence numbers and also sends and monitors acknowledgments for packet arrival.

3.11 Integrating Communication with Business Logic

When developing distributed applications, it is necessary to integrate the communication events (send and receive) with the application's business logic. The way in which the application functionality is split across the various components will be a major factor in determining the nature of the communication between the components. Handling communication aspects brings a new dimension to application development and can introduce significant additional complexity, which must be managed without compromising the quality by which the business logic is designed and implemented.

Developers may prefer to build the two aspects of systems separately and then combine once each part has reached a certain level of maturity (e.g., initially invoking business logic functions from local “stub” interfaces rather than making calls from remote components and/or testing communication mechanisms in skeletal code that does not contain the business logic).

A mapping is required for each component, detailing the way in which internal events cause messages to be sent to other components and also detailing the way messages received from other components are handled and the way such messages drive local behavior. For the purposes of a simple illustration, consider a distributed application in which a user-local client component provides a user interface and depending on the user's actions causes contextual processing to occur remotely (at a server component). In this scenario, a user clicking a certain button on the client interface might cause a particular message to be sent to the server. Figure 3.41 shows one way to represent the addition of communication between a pair of components.

f03-41-9780128007297

FIGURE 3.41 Representation of application-level logic showing the client and server components, loosely synchronized by the passing of messages across the network.

Figure 3.41 illustrates a scenario in which an application operates as a pair of processes distributed across two computers. The two application processes operate independently, loosely synchronized by the passing of messages. The figure provides an example of how application logic can be divided across two components, coupled together by message transmission. In this case, a GUI-based client provides the user interface to the application whose business logic is mostly located at the server side. User requests arising from interface interaction are passed in network messages to the server process, which provides the business logic to deal with the requests. Each message contains a “type” field that signals to the server the meaning of the message, and thus, the server is able to divide the messages based on this application-specific type value, for contextualized processing. The figure emphasizes the lack of direct coupling between the processes. Messages sent by one process are initially buffered by the host operating system of the other process, this being a requirement due to the multiprocessing scheduling in use and the fact that the server process may not be running at the instant the message arrives. The server process pulls messages from the receive buffer using one of the receive primitives: recv (if using TCP) or recvfrom (if using UDP). Despite the loose synchrony between the processes, it is important to realize that they remain independent entities.

3.12 Techniques to Facilitate Components Locating Each Other

One of the main challenges in the operation of distributed applications is to enable components to locate each other.1 Ideally, this should be automatic and transparent to users. Applications and their subcomponents generally need to operate in the same manner regardless of their relative locations, and it is not realistic to design-in location details such as the IP address as these will change whenever a component is relocated or if the network itself is restructured. This implies that the mechanism of locating communication partners needs to be dynamic, to reflect the state of the system at the moment when the connection needs to be established. Typically, the active component (the client in a client server application) will contact the passive component (the server). In order to set up a TCP connection or to send a UDP datagram, the sender must first know the IP address of the server. Several techniques by which that can be achieved, each having specific strengths and weaknesses, are described:

• Service advertising: A broadcast message with the meaning “I am here” is sent periodically by the server. The message contains the IP address and port number and possibly other information such as a description of services offered. This technique is highly effective in local environments but does not work beyond the first router that blocks the broadcast. It is efficient in the sense that the one message may be heard by many recipients, but inefficient in the sense that the transmission continues indefinitely even when there are no curious clients present.

• Use of local broadcast to find a server: A client, upon startup, broadcasts a message with the meaning “where are you.” The server responds with a unicast message containing its IP address and port number, and on receipt of this, the client has the necessary information to bind to the server. As with service advertising, the “where are you” technique is highly effective in local environments, but limited by the local range of the broadcast. However, this technique can be more efficient than service advertising because the request is only sent when needed and can be more responsive because the client does not have to wait for a periodic broadcast from the server; with the “where are you” approach, the server should respond promptly.

• Hard coding: Embedding an IP address is only suitable for prototyping and testing communication aspects of systems. It is the most secure but least scalable approach, as the client application requires recompilation if the server address changes. Hard coding should be avoided in general.

• Asking the user: Requiring that the user provide the address details of the server, via the user interface of the client has very limited applications as it assumes that the user knows what an IP address is and how to find it for the computer hosting the server. It is useful in development environments when developer needs to focus on the business logic of the application and/or, for example, when there are multiple prototype servers to choose from, as part of a testing regime.

• Lookup file: Upon startup, the client reads a locally held file to find the address or domain name details of the server. Storing configuration details in a client-local file offers security and should be considered for use with very sensitive applications in which accidental connection to a spoof service represents a significant threat. This approach can also be used as a simple form of access control, as the configuration file is only given to users with the appropriate rights to use the service, such as having security clearance or having paid a license fee. This approach is far superior to hard coding, because only the configuration file needs to be updated when the server address changes; the client application does not need recompilation. However, the lookup file approach is not very scalable due to the need to update and propagate the configuration files.

• Name service: A name service maintains a database of services and their locations. When component A needs to find the address of component B, it sends a request to the name service, supplying the name of component B as a parameter. The name service replies with the address details of component B. This is a superior and location-transparent approach because the location-finding aspect is handled externally to the user applications and is kept updated by techniques such as service registration. For example, upon startup, a server sends a message to the name service informing it of its location and the services it offers; these details are then added to the name service's database. The most common and important example of a name service is the Domain Name System (DNS), which is discussed in Chapter 6.

• Middleware: Middleware is a collection of services, which sit between software components and the platforms they run on, to provide various forms of transparency to applications. Middleware usually provides location transparency by means of an in-built name service or subscription to an external service such as DNS. Middleware is discussed in detail in Chapter 6.

The use of well-known ports (as discussed earlier) contributes to transparency, as for many popular services the client will know the (fixed) port number of the service, and in such cases, the port number can often be hard-coded since these values are standardized.

3.13 Transparency Requirements from the Communication Viewpoint

Several forms of transparency are particularly relevant from the communication viewpoint, as discussed below.

Access transparency. Network-to-device boundaries should not be visible to applications and the communications mechanisms they use. Requests to services should have the same format and should not require any different actions at the application level, regardless of whether the communication partner is local or remote.

Location transparency. Network resources, especially communication partners, should be automatically locatable. A location-transparent system enables connecting to and sending a message to a process without prior knowledge of its physical location.

Network transparency. This involves hiding the physical network characteristics, such as the actual technology and its configuration, and faults that occur at the physical network level, such as bit errors and frame corruption. Ideally, the presence of the network itself is hidden, such that the separation of components and the heterogeneity of the underlying platforms are hidden.

Distribution transparency. This concerns hiding the separation between components, such that the entire application appears to be running on a single computer. All communication should appear to be local, that is, between pairs of processes that are local to each other.

Failure transparency. Some applications require reliable transmission of data and thus need a protocol that ensures that data are received at the intended destination or otherwise are automatically retransmitted, without involvement by the application itself.

Scaling transparency. Communication mechanisms and protocols need to be efficient in terms of network resource usage to maximize the scalability of communications aspects of systems. This in turn impacts on the number of communicating components that can be supported with appropriate quality of service before the system performance degrades noticeably.

3.13.1 Logical and Physical Views of Systems

A physical view of a system is based on actual physical details such as the location and configuration of devices and is related to concepts at the lower layers of the network model. In contrast, a logical view is based on concepts in the higher layers.

The goals of transparency in distributed systems require that physical details are hidden from the user. It is perhaps more accurate to say that users of the system (including the developers of applications) need to be shielded from the underlying complexity of systems, thus the need to provide an abstract or logical view. From a communication viewpoint, it is very important to be able to separate out logical and physical views of the system and to use each where appropriate.

For distributed systems, perhaps, the most important topic in the context of logical and physical views is connections. There is a need to achieve connectivity at the software component level without having to know the actual physical locations, details of underlying platforms, network technologies, and topologies. A developer needs to ensure that a particular group of components can interact, through the use of higher-level mechanisms provided in systems, without having to take into consideration all of the possible network technologies and configurations that could occur. This is quite significant because some of the network technologies that the application will eventually operate over may not even have been invented at the time the application is developed.

Therefore, when developing distributed systems, it is necessary to use a combination of logical and physical representations of systems and to be able to translate between them as necessary. Most or all of the application-level design, especially communication- and component-level connectivity, is necessarily done on a logical basis to abstract away the physical aspects, which are specific to a particular computer, network technology, or system.

3.14 The Case Study from the Communication Perspective

A main consideration is the choice of transport layer communication protocol for the game application. The relative advantages and disadvantages of TCP and UDP need to be considered in the context of the specific communication requirements of the game. The choice of transport layer protocol in turn impacts other aspects of design and implementation.

In the case of the game, all messages are valuable. Any loss would result in incorrect game status or play sequence, thus indicating that a reliable protocol such as TCP should be used, or otherwise, if UDP is chosen, additional mechanisms must be implemented at the application level to take care of message sequence and acknowledgment and recovery if a message is lost. TCP is the better choice in this scenario and has been used in the implementation.

Socket IO modes. It was decided that both components should use nonblocking sockets. For the client (user-interface) component, this solves the issue of responsiveness without the need to make it multithreaded. For the server, this approach makes it responsive to many connected clients simultaneously, again without the need for multithreading. For this particular application, multithreading was considered to be unnecessary, and the avoidance of multithreading makes it a simpler and more suitable example to illustrate the communication aspects. The fact that at least one side of the communication is nonblocking means that the game is not susceptible to communication deadlock.

Having chosen the transport layer protocol, it is necessary to define the application-specific protocol for the game. This is a particular set of allowable message sequences, unique to the game, which facilitates the operation of the game.

The specific game application requires the following activity sequence:

• The server is started and runs continuously as a service.

• A user starts an instance of the client when they wish to join in.

• The client connects to the server.

• The user chooses an alias name, which is sent to the server and propagated to other clients as part of a list of all available players.

• The user selects an opponent from the advertised list of available players.

• The game is played, each user taking turns. The server mediates by determining whose turn it is and keeps track of the game state to determine if the game has ended and, if so, what the outcome is. The server propagates a user's gameplay moves to the user's opponent, so their interface can be updated.

• Users are notified of the game result by the server.

• The client's connection to the server is closed.

Each of these activities involves sending messages between the game components to update state and synchronize behavior. The actual message sequence that arises from the activity sequence is illustrated in Figure 3.42.

f03-42-9780128007297

FIGURE 3.42 The game application-specific protocol, showing interaction between three clients and the server.

Figure 3.42 shows a typical message sequence that occurs in a scenario in which three clients connect to the server, and then, a game is created between two of the clients. The server can support up to ten connected clients. It does this by having an array of structures each of which represents a single connected client (or is unused). The structure contains the client socket and its related socket address structure and the user's chosen alias name and a flag to signify whether each instance of the structure is in use or otherwise available for use when the next client connect request is received. The in-use flag also enables functionality such as iterating through the array of the structures and sending player list messages to all connected clients.

Figure 3.43 shows some of the game state information maintained by the server. The connection structure (which is discussed above) is shown and is central to the operation of the server. The server has an array of these connection structures so that it can support up to 10 client connections simultaneously. A logical game object is used to keep track of game instances that exist between pairs of clients. This is represented in the code by the game structure; an array of 5 of these is maintained because each game is shared between 2 clients. The game structure holds the connection array index positions of the two involved clients and the gameplay state in terms of the actual tic-tac-toe grid.

f03-43-9780128007297

FIGURE 3.43 The connection and game structures.

The message sequence shown in Figure 3.42 describes the communication behavior of the application but is not a full description of the application logic. The application comprises two programs: a server and a client. And an in-progress game comprises three processes: a server process and two client processes (it is important to realize that the two client processes are instances of the same client program and thus have the same logic, although their run time state will be different as they represent two different players in the game). Flowcharts for each program are provided in Figures 3.46 (client) and 3.47 (server).

The choice of TCP as the transport layer protocol has another advantage, in addition to the earlier discussion, in that only the server needs to bind to a specific port; that is, one that is known in advance by the clients so they can issue connect requests. The fact that the clients don't need to use any particular port means that they can be coresident with the server and with each other on the same computer. This is ideal for this particular application because its fundamental purpose is the demonstration of process-to-process communication and the fact that applications should operate in the same way regardless of the physical location of components. The game can be configured with all three components on a single computer, with two components on one computer and the other component on a separate computer, or with all three components on separate computers. Obviously, for a real distributed game, the normal configuration would be that each user sits in front of their own computer (hence each one running the client process locally), and the server can be hosted on any computer in the network.

As TCP is used, all messages are sent in unicast mode. PLAYER_LIST messages are sent to each connected client in rapid succession, creating the effect of a broadcast.

Figure 3.44 shows the receive logic for the game client and shows how careful selection of message content and structure facilitates well-structured receive-handler code. The outer-level switch statement deals with the message type, while where relevant inner switch statements deal with actual message data contextually.

f03-44-9780128007297

FIGURE 3.44 The receive logic of the client (C++), showing nested switch statements to provide contextual handling of the message based on message type and content.

The first step after the call to the recv primitive is to check for errors that may have occurred. Since a nonblocking socket has been used, the WSAEWOULDBLOCK error is not a true error in the sense that it signals that there was no message in the buffer and thus the call would have otherwise blocked (instead, it returns this error code). This is thus a normal occurrence, and the response in the code is to restart the interval timer and exit this instance of the handler (after 100 ms, the timer will expire, and the receive handler will be entered again). If however another error has occurred, the StopTimerIfConnectionLost() method is called to check if the error code is one of several that indicate that the TCP connection has been lost (e.g., because the server has been shut down), and if so, the client is shut down gracefully. A further check on the received message length is performed; receipt of a zero-byte message (as opposed to no message) is also taken to imply that the connection has been lost in the context of the communication logic of this game.

Next, the message type field is checked, and the message processing is performed contextually based on the specific message type code; this is performed with a switch statement, which ensures good code structure. For each message type that can be received by a client, the specific actions followed are discussed:

• PLAYER_LIST: Display the list of available players, sent by the server, in the player list box on the user interface.

• START_OF_GAME: If the server has signaled that a game has started, a number of things must be done at the client, driven by the message content: the opponent's alias is displayed on the user interface; the game tokens for the user and the opponent are set, for display as the game progresses; the user is told either that it is their move or that it is the opponent's move (this game sequencing is controlled by the server); and the game status displayed value is set to “Game In Progress.”

• END_OF_GAME: A second-level switch statement is used to provide a contextual end-of-game on-screen notification depending on the message's end-of-game code, which can indicate that the user has won or lost or that the game was a draw.

• OPPONENT_CELL_SELECTION: This signals that the opponent has made a move. This also uses a second-level switch statement to update the user's display by drawing the opponent's token in the cell of the game grid identified by the iCell field of the message. This signaling by the server that the opponent has made a move is also used as a trigger to enable the user interface to accept a move, through a call to the method EnableEmptyCells(), and the user is told it is their move.

Figure 3.45 shows the server-side receive logic for the game. The DoReceive method is called periodically from within a timer handler routine, once per connected client, at a rate of ten times per second to ensure that the server is highly responsive to received messages. As with the client logic, the first step after the call to the recv primitive is to check for errors that may have occurred. The server also uses nonblocking sockets; but here, an array of sockets is used, one for each connected client. If the WSAEWOULDBLOCK error has occurred, the remainder of the method is skipped. If another error has occurred, the ConnectionLost() method is called to check if the error code is one of several that indicate that the TCP connection has been lost (e.g., because the specific client has been shut down), and if so, the server closes its socket related to the particular client and also closes the connection to the opponent of the disconnected client. If the recv has been successful, a check on the received message length is performed; any zero-byte messages are ignored.

f03-45-9780128007297

FIGURE 3.45 Server-side receive logic (C++), showing use of switch statement to provide contextual handling of the message based on message type.

Next, the message type field is checked, and the message processing is performed contextually based on the specific message type code, following the same approach as in the client. This is performed with a switch statement. The specific actions followed for each message type that can be received by the server are discussed:

• REGISTER_ALIAS: The client identified by its index position in the connection array has sent the alias name the user wishes to use for identification in the game. The server updates its diagnostic display and also writes the alias name into the client's connection array entry. The server then updates all connected clients by sending a PLAYER_LIST message containing a list of all connected users' aliases.

• CHOOSE_OPPONENT: The client identified by its index position in the connection array has made a selection from the available players list. In response, the server creates a game (this is a server-internal logical game entity that represents the game and its state and is the basis on which the server relates the two clients together, synchronizes their moves, and determines the outcome of the game). As a substep of creating the game, the server sends a START_OF_GAME message both to the client that has chosen an opponent and to the opponent. These messages inform each client of the token that it is to use (“o” or “x”) and whether it has the next move or must wait for the other player to move. The server updates its diagnostic display accordingly.

• LOCAL_CELL_SELECTION: The client identified by its index position in the connection array has made a move (the user has selected a cell in which to place their token). The server sends the client's opponent an OPPONENT_CELL_SELECTION message so that the user interface can be updated and the opponent user is thus informed. The server updates the game status and then checks to see if the new move has led to a win (any straight line of three tokens in any orientation). If so, the winner's client is sent an END_OF_GAME message with the message iCode field containing END_OF_GAME_WIN, and the opponent client is sent an END_OF_GAME message with the message iCode field containing END_OF_GAME_LOSE. If the game has not been won, a draw is checked for; this situation arises when all cells have been filled and there has been no winner. If a draw has occurred, both clients are sent an END_OF_GAME message with the message iCode field containing END_OF_GAME_DRAW.

• CLIENT_DISCONNECT: This message type supports graceful disconnect of a client and causes the corresponding game to be closed in the server. Upon receipt of this message type, the server updates its diagnostic display, closes the socket it uses to communicate with the specific client, and clears the relevant connection array entry. It also closes the opponent client's connection. The list of available players is updated, and a PLAYER_LIST message containing a list of remaining players is sent to all connected clients.

Figure 3.46 shows the behavior of the client process. The client is event-driven, and each event is modeled as a separate activity, which starts and stops independently of other events. In addition to the timer-based activity (which deals with message receiving on the nonblocking socket), there are a number of user activity events, which start when a specific user-interface event occurs, such as when the user selects an opponent to play against or makes a move in the gameplay.

f03-46-9780128007297

FIGURE 3.46 Client flowchart.

Figure 3.47 shows the behavior of the server process. The server is event-driven; each event is modeled as a separate activity, which is initiated when the relevant event is detected. Nonblocking sockets are used in combination with a timer to implement periodic listening and receiving activities. The server maintains one socket for listening for connection requests and one socket per connected client, on which messages are received. The per client sockets are held in an array and can thus be tested for message receipt in a loop, each time the timer handler is invoked. The timer operates at 10 Hz to ensure that the server is responsive to connection requests and incoming messages from game clients.

f03-47-9780128007297

FIGURE 3.47 Server flowchart.

3.15 End-of-Chapter Exercises

3.15.1 Questions

1. Determine which transport layer protocol is most appropriate for the following applications, justify your answers.

(A) Real-time streaming

(B) File transfer that is only used in a local area network

(C) File transfer that is used across the Internet

(D) A clock synchronizing service for all computers in a local network

(E) An eCommerce application

(F) A local file-sharing application in which clients need to automatically locate the server and cannot rely on a name service being present

2. Determine which sequences of socket primitives are valid to achieve communication, and also, state whether the communication implied is based on UDP or TCP.

(A) create socket (client side), sendto (client side), create socket (server side), recvfrom (server side), close (server side), close (client side)

(B) create socket (client side), create socket (server side), bind (server side), listen (client side), connect (client side), accept (server side), send (client side), recv(server side), shutdown (server side), shutdown (client side), close (server side), close (client side)

(C) create socket (client side), create socket (server side), bind (server side), sendto (client side), recvfrom (server side), close (server side), close (client side)

(D) create socket (client side), create socket (server side), bind (server side), listen (server side), connect (client side), accept (server side), send (server side), recv(client side), shutdown (server side), shutdown (client side), close (server side), close (client side)

(E) create socket (client side), create socket (server side), bind (server side), listen (server side), connect (server side), accept (client side), send (server side), recv(client side), shutdown (server side), shutdown (client side), close (server side), close (client side)

(F) create socket (client side), create socket (server side), sendto (client side), bind (server side), recvfrom (server side), close (server side), close (client side)

3. Explain the fundamental difference between RPC and RMI.

4. Explain the main differences between constructed forms of communication such as RPC or RMI and lower-level communication based on the socket API over TCP or UDP.

5. Identify a way in which communication deadlock can occur when using the socket API primitives to achieve process-to-process communication, and explain a simple way to avoid it.

6. Identify one benefit and one drawback for each of the two socket IO modes (blocking and nonblocking).

3.15.2 Exercises with the Workbenches

The exercises below use the Networking Workbench to investigate various aspects of the behavior of the UDP and TCP, ports and binding, buffering, addressing, multicasting, and broadcasting.

Most of the exercises are best performed using two networked computers, although they can be partially completed (in some cases, fully completed) by running two or more copies of the Networking Workbench on a single computer.

Exercise 1. Understanding the Use of Ports

1. Start one copy of the UDP Workbench (found under the UDP tab on the Networking Workbench) on computer A and one copy on computer B.

2. Set the correct (destination) IP addresses for each workbench so that the address shown in the “Address 1” field is the address of the other computer. Select the “unicast” mode for this exercise.

3. The send port corresponds to the port that the packet will be sent to; that is, it is the value that is written into the destination port number in the UDP packet header.

4. The receive port number is the port number that the receiving UDP software module listens on.

5. You must click the “Enable Receiver” button to enable receiving.

6. To change a receive port number, you must click the “Stop Receiving” button, then change the port number, and then re-enable receiving.

Try each of the port number configurations shown below. In each case, try holding a two-way conversation between the two workbench instances.

A

Computer
A

Computer
B

Send port

8000

8001

Receive port

8002

8003

B

Computer
A

Computer
B

Send port

8001

8001

Receive port

8002

8002

C

Computer
A

Computer
B

Send port

8001

8002

Receive port

8001

8002

D

Computer
A

Computer
B

Send port

8001

8001

Receive port

8001

8001

E

Computer
A

Computer
B

Send port

8001

8002

Receive port

8002

8001

F

Computer
A

Computer
B

Send port

8000

8002

Receive port

8001

8003

Q1. Which of these send and receive port configurations work (i.e., allow 2-way communication)?

Q2. What then is the underlying requirement for port-based communication?

Q3. Which of the above configurations work if both copies of the UDP Workbench are running on the same computer?

Q4. What is the reason for the difference in outcome?

Exercise 2. Achieving a Multicasting Effect with UDP

There are several ways to achieve multicast addressing over UDP in the higher software layers (as it is not directly provided by UDP; UDP directly supports only unicast and broadcast modes of addressing).

Q1. Suggest one method by which multicast could be achieved.

1. Start the UDP Workbench. Select multicast communication.

2. Set the three addresses (“Address 1,” “Address 2,” and “Address 3”) to point at three different, existing computers.

3. Set up additional copies of the UDP Workbench, one on each target computer, and enable the receivers.

4. Make sure the port configurations are correct (refer to exercise 1 above if you have any doubt).

Q2. What happens when you send a message, in terms of the number of messages actually sent by the sending process and the numbers of messages received by each of the receiving process?

5. Now, set the three addresses to all point at the same computer.

Q3. Now, what happens when you send a message (hint: look at the statistics windows)?

Q4. Can you now determine which method has been used (in the UDP Workbench) to achieve the multicast effect?

Exercise 3. Broadcasting with UDP

Devise a simple experiment using several copies of the UDP Workbench to determine how broadcasting is achieved in UDP.

Q1. How does the sender know the addresses of the receiver(s)? Does it need to?

Q2. When a message is broadcast to a network of 5 computers, how many messages are sent? How many messages are received (hint: look at the statistics windows)? Explain how this can be.

Exercises 4, 5, and 6 use the “Arrows” application within the Networking Workbench (on the UDP tab). The Arrows application consists of the Arrows Server (which is simply a display server) and the Arrows Controller (which tells one or more Arrows Servers what to display).

The Arrows application requires that each Arrows Server be given a different port number (whether on the same computer or not). The port numbers used by the Arrows Servers must be sequential (e.g., you might use 8000, 8001, 8002, etc). The Arrows Controller must be told the lowest port number used (i.e., 8000 in this example and the number of Arrows Servers in use).

Each Arrows Server must bind to its port (click “listen” to achieve this) before it can start to receive UDP messages from the Arrows Controller.

The rest of the controls on the Arrows Controller are intuitive and should become familiar after a few minutes' experimentation.

Figure 3.48 shows the configuration of an Arrows Controller and three Arrows Servers on the same computer, before clicking on the “listen” buttons. Note that the port numbers allocated to the servers are in sequence.

f03-48-9780128007297

FIGURE 3.48 Initial state of the Arrows application components.

Figure 3.49 shows the Arrows application in operation; the arrow “moves across” the server windows in sequence.

f03-49-9780128007297

FIGURE 3.49 The Arrows application in operation.

Exercise 4. Bind Constraints

Q1. Can 2 or more Arrows Servers reside at the same computer?

Q2. If so, are there any specific conditions (with respect to Binding) that must be met?

Try to predict the answer and then try binding two Arrows Servers (by clicking “listen”) at the same computer with various port combinations.

Exercise 5. Addressing (Identifying Servers)

Q1. How does the Arrows Controller identify its Arrows Servers?

• IP address only

• Port only

• IP address and port

Q2. Is this approach appropriate for applications in general?

Q3. Is this approach appropriate for the Arrows application?

Q4. Can you think of any alternative addressing scheme that would be suitable (remember that there can be more than one Arrows Server per computer)?

Exercise 6. Buffering

Configure two copies of the Arrows Controller to control at least two Arrows Servers (both controllers are to attempt to control the SAME pair of Arrows Servers, so the settings at the two controllers will be identical).

1. Assign the port values to the Arrows Servers, and click “listen” on each one.

2. Configure and start both Arrows Controllers.

3. Move the speed slider control to maximum on both controllers and leave the system running for about a minute.

4. Now, click stop on each Arrows Controller and close the Arrows Controller windows. The Arrows Servers keep running—apparently still receiving messages.

Q1. Explain what is actually happening.

3.15.3 Programming Exercises

Programming Exercise #C1: The case study game currently requires that the user enters the IP address of the server into the client user interface so that the client can establish communication with the server.

Modify the game so that when both the client and server are in the same local network, the client can automatically locate the server, based on server advertisement broadcasts. To do this, you will need to implement UDP broadcast. The server should broadcast a message containing its IP address and the port it is listening on, at regular time intervals such as every second. The client, during initialization, should listen for the server's broadcast message, and once it has received it, the client should use the address contained in the message to automatically send a connection request to the server. This means that the client may appear unresponsive for a short while until the server's message is received, which is the reason why the interval between the server's advertisement messages should not be too long.

An example solution is provided in the programs: CaseStudyGame_Client_withServerAdvertising and CaseStudyGame_Server_withServerAdvertising.

Programming Exercise #C2: The case study game server currently detects client disconnection, which can occur by explicit disconnection of the client (using the “Disconnect” button), by closing the client process (using the “Done” button), or by the client's host computer crashing or by network connectivity problems between the client and server. The current response of the server, on detecting a client disconnect, is to close its socket related to the particular client and also to close the connection to the opponent of the disconnected client, which causes the opponent client to close abruptly.

Modify the game server code such that when a client disconnect is detected, it sends a new message type such as “opponent connection lost” to the disconnected client's opponent. Alternatively, you could send an existing “END_OF_GAME” message type, but with a new message code which means “opponent connection lost.” You will also need to modify the client code such that on receipt of this new message, the opponent client returns to its initial connected state, so that the user can select another player from the available players list and does not automatically shut down, as it does now.

An example solution is provided in the programs:

CaseStudyGame_Client_withServerAdvertising_AND_ClientDisconnectManagement and CaseStudyGame_Server_withServerAdvertising_AND_ClientDisconnectManagement.

3.15.4 Answers to End-of-Chapter Questions

Q1. Answer

(A) UDP is preferable due to its lower latency.

(B) Either TCP or UDP is possible. FTP uses TCP as its transport layer protocol, but TFTP uses UDP and works well on local networks and especially where short files are transferred.

(C) TCP is needed because of its error-handling and message-ordering capabilities.

(D) UDP is needed because of its broadcast capability.

(E) TCP is needed because of its robustness and message ordering.

(F) Use UDP broadcast to locate the server (server advertisement), and then, use TCP for file sharing/file transfers.

Q2. Answer

(A) The set of primitives is consistent with the UDP. However, the client sends a datagram to the server before the server has created a socket, and thus, the datagram will not be delivered. Also, the server side does not bind to a port.

(B) The set of primitives is consistent with the TCP. However, the client listens where in fact, the server should listen, so a connection will not be established.

(C) The set of primitives is consistent with the UDP. The sequence of primitive calls is correct; a single datagram will be sent from the client to the server.

(D) The set of primitives is consistent with the TCP. The sequence of primitive calls is correct; a single datagram will be sent from the server to the client.

(E) The set of primitives is consistent with the TCP. However, the connect primitive occurs at the server side and the accept primitive occurs at the client side (these have been reversed), so a connection will not be established.

(F) The set of primitives is consistent with the UDP. However, the client sends a datagram to the server before the server binds to a port, and thus, the datagram will not be delivered.

Q3. Answer

RPC is a means of remotely calling procedures. It is used with procedural languages such as C and also supported by C++.

RMI is a means of remotely invoking methods on remote objects. It is used in Java, and also, a similar mechanism called remoting is supported in C#.

RMI can be considered the object-oriented version of RPC.

Q4. Answer

Constructed forms of communication such as RPC or RMI provide structured communication within a particular framework (i.e., calling remote procedures or invoking remote methods). An abstraction is provided to developers such that remote objects can be accessed in program code as local ones are. Lower-level communication details are handled automatically, such as the setup of the underlying TCP connection and dealing with certain types of error. This approach achieves a high degree of transparency in several of its forms.

On the other hand, lower-level communication based on the socket API over TCP or UDP is less structured and more flexible. Developers can construct their own protocols and higher-layer communication mechanisms (e.g., it is possible to construct an RMI or RPC system in this way). However, the developer must deal with many more aspects of the communication in this case, such as establishing and maintaining connections, controlling the sequence of messages, and dealing with errors that arise. Most significantly, the low-level communication does not provide transparency; the developer is faced with the complexity of the transport layer and the separation of processes.

Q5. Answer

Communication deadlock can occur when two processes communicate with blocking sockets (and are single-threaded). If a situation arises where both processes are waiting for a message from the other process, communication deadlock has occurred.

There are several ways to prevent communication deadlock. The use of nonblocking sockets ensures that a process does not wait indefinitely for a message to arrive. Alternatively, placing send and receive activities in different threads can resolve the problem as long as the operating system schedules at the thread level (and thus blocks at this level) and not at the process level.

Q6. Answer

Blocking socket IO is efficient in terms of the use of system resources. A process will be blocked while waiting for an event such as message receipt or for a client connection request to occur. Blocking IO is the simplest operation mode of sockets from a programmer viewpoint. However, blocking sockets can lead to unresponsive applications and possibly communication deadlock.

Nonblocking socket IO is more flexible than blocking socket IO and can be used to achieve responsive processes that are able to perform other processing while waiting for communication events. However, nonblocking IO mode requires more complex program logic, as there is a requirement for the use of timers, the need to retry failed actions, and handling the pseudo-error code returned when an event could not complete immediately.

3.15.5 Answers/Outcomes of the Workbench Exercises

Exercise 1. Ports

Q1. Combinations D and E work.

Q2. The send port (the port to which the message is sent) at the sender must be the same as the receive port (the port that is listened on) at the receiver.

Q3. Only configuration E works.

Q4. When the two processes are hosted on the same computer, they cannot bind to the same port. So, while they can both send to the same port, they cannot both receive on the same port. The second one to request the same port (using bind) will have its attempt refused.

Exercise 2. Multicasting

Q1. There are two easy ways to achieve a multicast effect with UDP. One is to actually use broadcast, but to arrange that only a subset of the potential recipient processes are actually listening on the appropriate port. The other way is to use unicast addressing and to send the same message to each of a set of specific computers, using a loop at the sending process.

Q2. In the experiment, you should see that the sender actually sends three messages and that each recipient process receives one message.

Q3. When all three addresses are the same, you should see that the sender actually sends three messages and that the addressed recipient process receives three messages.

Q4. The UDP Workbench is using unicast addressing, in a loop to send one copy of the message to each of the three specified addresses.

Exercise 3. Broadcasting

Q1. The sender does not need to know the addresses of receivers when using broadcast. A special broadcast address is used.

Q2. The delivery of a broadcast message is implemented by the communication protocol and underlying network mechanisms. This means that the sender only has to send a single message, which is delivered to each recipient (they each receive one message).

Exercise 4. Experimentation with the Arrows Application: Exploring Bind Constraints

Q1. Yes, many Arrows Servers can reside at the same computer.

Q2. Each Arrows Server must be bound to a different port.

Exercise 5. Experimentation with the Arrows application: Exploring Addressing

Q1. The Arrows Servers are identified by port only. All messages are broadcast by the Arrows Controller, so the servers can be anywhere in the local network (within the broadcast domain of the controller).

Q2. This is a quite specific means of addressing. It is not appropriate for most applications in general fundamentally because it is so reliant on broadcasting, which should be used sparingly.

Q3. This means of addressing is ideal for the Arrows application because it allows the servers to be placed on the same or different computers, and the port number sequence is used as the means to order the servers logically; this is necessary to get the “traveling arrow” effect across a number of Arrows Server instances.

Q4. An alternative is unicasting to each process specifically. This would require a combination of IP address and port number.

Exercise 6. Experimentation with the Arrows application: Exploring buffering

Q1. The Arrows Servers do not recognize any particular instance of the Arrows Controller; they simply receive command messages and perform the appropriate display action. Hence, when multiple Arrows Controllers are present (and because both controllers are using broadcast communication), all Arrows Servers receive the (possibly conflicting) commands from all controllers. Hence, the display behavior may appear erratic as the servers perform the display actions as soon as they get commands from each controller.

3.15.6 List of in-Text Activities

Activity Number

Section

Description

C1

3.3.1

One-way communication with a temperature sensor application

C2

3.3.2

Request-reply communication and an NTP application example

C3

3.7.2

Introduction to UDP and datagram buffering

C4

3.7.3

Introduction to TCP and stream buffering

C5

3.7.4

Exploration of binding

C6

3.8.2

Investigation of blocking and nonblocking socket IO modes and communication deadlock

3.15.7 List of Accompanying Resources

The following resources are referred to directly in the chapter text, the in-text activities, and/or the end-of-chapter exercises.

• Distributed Systems Workbench (“Systems programming” edition)

• Networking Workbench (“Systems programming” edition)
Note that in addition to the actual exercises supported by the workbenches, the Networking Workbench in can be used as a powerful diagnostic tool when developing networking applications. In particular, for UDP, the UDP Workbench, the blocking receive programs, and nonblocking receive programs and, for TCP, the TCP:Active Open and TCP:Passive Open programs are useful for testing connectivity of applications you are developing, for example, if you are developing an application with client and server components and need to test the client in the absence of the server, or vice versa.

• Source code

• Executable code

Program

Availability

Relevant sections of chapter

Use-case game application: client

Source code, Executable

3.14

Use-case game application: server

Source code, Executable

3.14

CaseStudyGame_Client_withServerAdvertising
(Solution to end-of-chapter programming task #1 client side)

Source code, Executable

3.15

CaseStudyGame_Server_withServerAdvertising
(Solution to end-of-chapter programming task #1 server side)

Source code, Executable

3.15

CaseStudyGame_Client_withServerAdvertising_AND_ClientDisconnectManagement (Solution to end-of-chapter programming task #2 client side)

Source code, Executable

3.15

CaseStudyGame_Server_withServerAdvertising_AND_ClientDisconnectManagement (Solution to end-of-chapter programming task #2 server side)

Source code, Executable

3.15

Appendix Socket API Reference

This section presents the socket API primitives individually with the method prototypes and annotated code examples in C++, C#, and Java. Supporting information concerning socket options and the socket address structure is also provided.

Important note. To avoid repetition, examples of exception/error handling are shown only for some API calls. However, all socket API calls can fail for a variety of reasons relating to the state of the connected-to or intended-to-be-connected-to process and the sequence in which the primitive calls are used, so robust exception/error handling is necessary in all cases.

A1 Socket

C++ prototype: SOCKET socket(int AddressFamily, int Type, int Protocol)

Types: SOCK_DGRAM (for UDP), SOCK_STREAM (for TCP)

Return parameter: SOCKET is derived from an integer type and identifies the socket

Example:

SOCKET ClientSock = socket(AF_INET, SOCK_DGRAM, PF_UNSPEC);

if(INVALID_SOCKET == ClientSock)

{

// Display error message "Could not create socket"

}

C#

Example:

Socket ClientSock;

try

{

ClientSock = new Socket(AddressFamily.InterNetwork, SocketType.Stream, ProtocolType.Tcp);

}

catch (SocketException se)

{

// If an exception occurs, handle it and display an error message

}

Java. A variety of Socket classes are available, including for client-side TCP use, for server-side TCP use, and for use with UDP.

Client-side TCP: uses the Socket class. There are various overloads of the Socket constructor, which automatically perform the connect action; one use example is shown below:

InetAddress IAddress; // An object which represents the IP address to connect to

int iPort;

… assign application-specific values to IAddress and iPort to identify remote socket to connect to …

Socket ClientSock;

try

{

ClientSock = new Socket(IAddress, iPort);

// Create socket and connect it to the socket identified by the InetAddress object and port number.

}

catch (IOException e)

{

System.out.println(e);

}

There is also a socket constructor that creates an unconnected socket:

Socket ClientSock = Socket(); // Create an unconnected socket, which requires subsequent use of Connect().

Server-side TCP uses the ServerSocket class, which automatically performs the bind and listen actions.

int iPort = 8000; // Assign server-side port number, for binding

ServerSocket ServerSock = new ServerSocket(iPort); // Create a socket and bind to the local port specified.

// A timeout can be set, which allows the socket to be used in a nonblocking fashion.

ServerSock.setSoTimeout(50); // wait 50 milliseconds.

// For example, when attempting to receive data from the socket, the process will wait (block) for a specified

// time (rather than permanently), before returning control to the process.

UDP: uses the DatagramSocket class, which automatically binds the socket to a port.

Example:

ClientSock = new DatagramSocket(5027); // Create a UDP socket and bind to the port specified.

A2 Socket Options

Options that can be selected for use with a particular socket include the following:

t0040

C++. To get or set socket options, calls to getsockopt() or setsockopt() are made.

Prototype: int setsockopt(SOCKET s, int level, int optname, const char* optval, int optlen);

Prototype: int getsockopt(SOCKET s, int level, int optname, char* optval, int* optlen);

If no error occurs, setsockopt and getsockopt return zero, otherwise they return an error code.

level is the level at which the option applies (usually SOL_SOCKET).

optname is the name of the option to set.

optval is the buffer containing the option value to set or in which to place the option value (set or not set).

optlen is the length of the optval buffer.

Example (turn on broadcasting):

char cOpt[2];

cOpt[0] = 1; // true

cOpt[1] = 0; // null terminate the option array

int iError = setsockopt(ClientSock, SOL_SOCKET, SO_BROADCAST, cOpt, sizeof(cOpt));

if(SOCKET_ERROR == iError)

{

// Display error message "setsockopt() Failed”

}

Example (test if broadcasting mode is set):

char cOpt[2];

int iError = getsockopt(ClientSock, SOL_SOCKET, SO_BROADCAST, cOpt, sizeof(cOpt));

if(SOCKET_ERROR == iError)

{

// Display error message “getsockopt() Failed!”

}

C# uses the GetSocketOption() and SetSocketOption() methods, some examples are given:

// Set option to allow socket to close gracefully without lingering

ClientSock.SetSocketOption(SocketOptionLevel.Socket, SocketOptionName.DontLinger, true);

// Set option to allow broadcasts on the socket

ClientSock.SetSocketOption((SocketOptionLevel.Socket, SocketOptionName.Broadcast, true);

// Test whether the linger option is set on the ServerSock

byte[] SockOptResult = new byte[1024]; // byte array to hold the result of the GetSocketOption() call

SockOptResult = (byte[]) ServerSock.GetSocketOption(SocketOptionLevel.Socket, SocketOptionName.Linger);

Java uses a variety of methods on the socket object; some examples are given:

boolean getKeepAlive()

// Determines whether the SO_KEEPALIVE socket option is enabled

int getSoLinger()

// Returns the setting for the SO_LINGER property

int getSoTimeout()

// Returns the setting for the SO_TIMEOUT property

void setSoTimeout(int timeout)

// Sets the SO_TIMEOUT property value, in milliseconds, for the socket

int getReceiveBufferSize()

// Returns the value of SO_RCVBUF, i.e. the receive buffer size

void setReceiveBufferSize(int size)

// Sets the value of SO_RCVBUF, i.e. set the receive buffer size

int getSendBufferSize()

// Returns the value of SO_ SNDBUF, i.e. the send buffer size

void setSendBufferSize(int size)

// Sets the value of SO_ SNDBUF, i.e. set the send buffer size

boolean isBound()

// Determines whether the socket has been bound to a port

boolean isClosed()

// Determines whether the socket is closed

boolean isConnected()

// Determines whether the socket is connected

boolean isInputShutdown()

// Determines whether the connection is shutdown in the read direction

boolean isOutputShutdown()

// Determines whether the connection is shutdown in the write direction

A3 Socket Address Formats

A socket address comprises an IP address and a port number. A socket address can represent the local socket to which other sockets connect or can represent a remote socket to connect to, depending on the context of its use.

C++ uses a socket address structure.

struct sockaddr {

unsigned short int sa_family; // address family (fixed size of 2 bytes)

char sa_data[14]; // up to 14 bytes of address

};

A special version of the Socket Address Structure is used with IP addresses:

struct sockaddr_in {

short int sin_family; // Address family (AF_INET signifies IPv4 address)

unsigned short int sin_port; // Port number (fixed size of 2 bytes)

struct in_addr sin_addr; // Internet address (fixed size of 4 bytes for IPv4)

};

C#

System.Net.IPEndPoint represents a socket address (the equivalent of the sockaddr_in structure in C++).

System.Net.IPAddress represents an IP address.

Example:

// Combine an IP address and a port to create an endpoint

IPAddress DestinationIPAddress = IPAddress.Parse("192.168.100.5");

int iPort = 9099;

IPEndPoint localIPEndPoint = new IPEndPoint(DestinationIPAddress, iPort);

ServerSock.Bind(localIPEndPoint); // bind the socket to the IPEndPoint

Java

The SocketAddress class represents a socket address (the equivalent of the sockaddr_in structure in C++).

The InetAddress class represents an IP address.

Prototype:

InetSocketAddress(InetAddress address, int port) // Creates a socket address using the supplied parameters.

Related methods on the socket object which determine address-related settings include:

int getPort() // Returns the remote port number the socket is connected to.

int getLocalPort() // Returns the local port number the socket is bound to.

SocketAddress getRemoteSocketAddress() // Returns the remote endpoint address the socket is connected to.

SocketAddress getLocalSocketAddress() // Returns the local endpoint address the socket is bound to.

A4 Setting a Socket to Operate in Blocking or Nonblocking IO Mode

A socket operates in blocking mode by default. It can be changed between the two modes, to suit application requirements.

C++ uses a utility, ioctlsocket to control the IO mode of a socket.

Example:

t0050

C# uses a property Blocking on the socket object to set IO mode.

Example:

ServerSock.Blocking = true; // or false to set to Non-blocking IO mode

Java uses a timeout value to determine how long a socket will block for. This is a flexible approach that combines the benefits of blocking initially for a period during which an event (such as message receipt) is expected and preventing the socket from waiting indefinitely.

Example:

ClientSock = new DatagramSocket(8000);

// Create a UDP socket and bind to port 8000

ClientSock.setSoTimeout(50);

// Set timeout to 50 milliseconds

ClientSock.receive(receivePacket);

// This call blocks for up to 50 milliseconds. The call returns when

// either a message is received or when the timer expires.

A5 Bind

Binding associates a socket with a local socket address (which comprises a local port and IP address).

The side that will be connected to (usually the server side in client server applications) must issue this call, so that clients can “locate” the server by its port number.

C++ uses the bind() function.

Prototype: int bind(SOCKET s, const struct sockaddr* name, int namelen);

// name is a sockaddr structure that holds the address to bind to (comprises IP address and port number).

// namelen is the size of the sockaddr structure.

Example:

int iError = bind(ServerSock, (const SOCKADDR FAR*)&m_LocalSockAddr, sizeof(m_LocalSockAddr));

C# uses the Bind() method of the Socket class.

Example:

int iPort = 8000;

IPEndPoint localIPEndPoint = new IPEndPoint(IPAddress.Any, iPort); // Create an Endpoint

// IPAddress.Any signifies that the binding will apply to all IP addresses of the computer, such that

// a message arriving on any network interface, addressed to the appropriate port, will be received

ServerSock.Bind(localIPEndPoint); // Bind to the local IP Address and selected port

Java uses the bind() method of the Socket class, but note that bind is performed automatically when creating a ServerSocket (in which case a separate bind action is not performed).

// Create a socket address object using the local host’s IP address and port 8000.

InetAddress Address = InetAddress.getLocalHost(); // Get local host’s IP address

InetSocketAddress localSocketAddress = new InetSocketAddress(Address, 8000)

ServerSock.bind(localSocketAddress); // Bind to the local IP Address and selected port

A6 Listen

Listen is used on the passive side (usually the server side in client server applications), after bind. This sets the socket into listening-for-connection state. This is only used with TCP sockets.

C++

Prototype: int listen(SOCKET s, int backlog);

backlog is the maximum length of the queue of pending connections.

Example:

int iError = listen(ServerSock, 5); // Listen for connections, with a backlog queue maximum of 5

C#

Example:

ServerSock.Listen(4); // Listen for connections, with a backlog queue maximum of 4

Java. The listen action is integrated with binding. This is performed automatically when creating a ServerSocket, or can be performed separately using the bind() method of the Socket class.

A7 Connect

Connect is used on the active side of a connection (usually the client side in client server applications), to establish a new TCP connection with another process. This is not required when using UDP.

C++

Prototype: int connect(SOCKET s, const struct sockaddr* name, int namelen);

name is the socket address structure containing the address and port details of the other socket to connect to.

namelen is the size of the sockaddr structure.

Example:

int iError = connect(ClientSock, (const SOCKADDR FAR*)& ConnectSockAddr, sizeof(ConnectSockAddr));

C#

Example:

String szIPAddress = IP_Address_textBox.Text; // Get a user-specified IP address from a text box

IPAddress DestinationIPAddress = IPAddress.Parse(szIPAddress); // Create an IPAddress object

String szPort = SendPort_textBox.Text; // Get a user-specified port number from a text box

int iPort = System.Convert.ToInt16(szPort, 10);

IPEndPoint remoteEndPoint = new IPEndPoint(DestinationIPAddress, iPort); // Create an IPEndPoint

m_SendSocket.Connect(remoteEndPoint); // Connect to the remote socket identified by the endpoint

Java

Connect is performed automatically with several of the Socket() method constructors (those with address arguments so that the remote-side address and port details can be supplied).

If the no-argument constructor was used, connect is necessary and serves as a means of supplying the remote-side address and port details (in the form of a SocketAddress object).

Example:

ClientSock.connect(new InetSocketAddress(hostname, 8000));

A8 Accept

Accept is used on the passive side of a connection (usually the server side in client server applications). It services a connection request from a client. Accept automatically creates and returns a new socket for the server side to use with this specific connection (i.e., to communicate with the specific connected client). This is only required when using TCP.

C++

Prototype: SOCKET accept(SOCKET s, struct sockaddr* addr, int* addrlen);

addr is the socket address structure containing the address and port details of the connecting process.

addrlen is the size of the sockaddr structure

Example:

SOCKET ConnectedCliSocket = accept(ServSock, (SOCKADDR FAR*)& ConnectSockAddr, &iRemoteAddrLen);

C#

Example:

Socket ConnectedClientSock = ServerSock.Accept();

Java

Example:

Socket ConnectedClientSock = ServerSock.accept();

A9 Send (Over a TCP Connection)

C++

Prototype: int send(SOCKET s, const char* buf, int len, int flags);

If no error occurs, send returns the number of bytes sent. Otherwise, it returns an error code.

buf is the area of memory containing the message to send.

len is the size of the message in the buffer.

flags can be used to specify some control options.

Example:

int iBytesSent;

iBytesSent = send(ClientSock, (char *) &Message, sizeof(Message_PDU), 0);

C#

Example

int iBytesSent;

byte[] bData = System.Text.Encoding.ASCII.GetBytes(szData); // Assumes message to send is in string szData

iBytesSent = ClientSock.Send(bData, SocketFlags.None);

Java. Sending is performed using IO streams. First, the stream objects need to be obtained, and then, IO operations can be performed.

Example:

OutputStream out_stream = ClientSock.getOutputStream(); // Obtain OutputStream object

DataOutputStream out_data = new DataOutputStream(out_stream); // Obtain DataOutputStream object

out_data.writeUTF("Message from client”); // Write data to stream

A10 Recv (Over a TCP Connection)

The receiving action checks the local buffer to see if any messages have been received and placed there (used with a TCP connection). If there is a message in the buffer, it is passed to the application.

C++

Prototype: int recv(SOCKET s, char* buf, int len, int flags);

If no error occurs, recv returns the number of bytes received. If the connection has been closed, the return value is zero. Otherwise, it returns an error code.

buf is the area of memory that will contain the message.

len is the size of the buffer (i.e., the maximum amount of data that can be retrieved in one go).

flags can be used to specify some control options.

Example:

int iBytesRecd = recv(ConnectedClientSock, (char *) &Message, sizeof(Message_PDU), 0);

C#

Example:

byte[] ReceiveBuffer = new byte[1024]; // Create a byte array (buffer) to hold the received message

int iReceiveByteCount;

iReceiveByteCount = ConnectedClientSock.Receive(ReceiveBuffer, SocketFlags.None);

Java. Receiving is performed using IO streams. First, the stream objects need to be obtained, and then, IO operations can be performed.

Example:

InputStream in_stream = ClientSock.getInputStream(); // Obtain InputStream object

DataInputStream in_data = new DataInputStream(in_stream); // Obtain DataInputStream object

System.out.println("Message from server: " + in_data.readUTF()); // Read data from stream

A11 Sendto (Send a UDP Datagram)

C++

Prototype: int sendto(SOCKET s, const char* buf, int len, int flags, const struct sockaddr* to, int tolen);

If no error occurs, sendto returns the number of bytes sent. Otherwise, it returns an error code.

buf is the area of memory containing the message to send.

len is the size of the message in the buffer.

flags can be used to specify some control options.

to is the sockaddr structure holding the address of the recipient socket.

tolen is the size of the address structure.

Example:

int iBytesSent = sendto(UDP_SendSock, (char FAR *) szSendBuf, iSendLen, 0,

(const struct sockaddr FAR *)& SendSockAddr, sizeof(SendSockAddr));

C#

Example:

byte[] bData = System.Text.Encoding.ASCII.GetBytes(szData);

UDP_SendSock.SendTo(bData, remoteEndPoint);

Java

UDP is datagram-based, so sending over UDP sockets cannot use stream-based IO. Instead, a DatagramPacket object is used to encapsulate the message data and the IP address and port number of the destination socket. This is then sent as a discrete datagram.

Example:

datagram = new DatagramPacket(buf, buf.length, address, port); // Create the datagram object

UDP_SendSock.send(datagram); // Send the datagram to its specified destination

A12 Recvfrom (Receive a UDP Datagram)

C++

Prototype: int recvfrom(SOCKET s, char* buf, int len, int flags, struct sockaddr* from, int* fromlen);

If no error occurs, recvfrom returns the number of bytes received. If the connection has been closed, the return value is zero. Otherwise, it returns an error code.

buf is the area of memory that will contain the message.

len is buffer size (i.e., max amount of data that can be retrieved in one go).

flags can be used to specify some control options.

from is a sockaddr structure containing the sending socket's address (optional).

fromlen is the length of the address structure.

Example:

int iBytesRecd = recvfrom(UDP_ReceiveSock, (char FAR*) szRecvBuf, 1024, 0, NULL, NULL);

C#

Example:

IPEndPoint SenderIPEndPoint = new IPEndPoint(IPAddress.Any, 0);

EndPoint SenderEndPoint = (EndPoint) SenderIPEndPoint; // Create endpoint to hold sender’s address

UDP_ReceiveSock.Bind(endPoint);

byte[] ReceiveBuffer = new byte[1024]; // Create buffer to hold received message

int iReceiveByteCount;

iReceiveByteCount = UDP_ReceiveSock.ReceiveFrom(ReceiveBuffer, ref SenderEndPoint);

Java

UDP is datagram-based, so receiving over UDP sockets cannot use stream-based IO. Instead, a DatagramPacket object is created to hold received message data and the IP address and port number of the sending socket. The receive method is used to receive a message and place it into the datagram object.

Example:

byte[] buf = new byte[1024];

DatagramPacket datagram = new DatagramPacket(buf, buf.length); // Create an empty datagram object

UDP_ReceiveSock.receive(datagram); // Receive a message and place it into the datagram object

A13 Shutdown

Shutdown closes a connection and is used only with TCP.

C++

Prototype: int shutdown(SOCKET s, int how);

how is a flag that indicates which actions are no longer allowed. Values are {SD_RECEIVE (subsequent calls to recv are disallowed), SD_SEND (subsequent calls to send are disallowed), SD_BOTH (disables sends and receives)}.

Example:

int iError = shutdown(ClientSock, SD_BOTH);

C#

Example:

ClientSock.Shutdown(SocketShutdown.Both);

Java uses two methods on the socket object:

void shutdownInput()

// Close the input stream.

void shutdownOutput()

// Close the output stream.

A14 CloseSocket

Close or Closesocket closes a socket. Used with TCP and UDP.

C++

Example:

int iError = closesocket(ClientSock);

C#

Example:

ClientSock.Close();

Java

Example:

ClientSock.close();


1 This is sometimes referred to as binding. In this particular context, it refers to one component locating and associating with another. Take care not to confuse this use of the term “binding” with the process to port binding facilitated by the sockets API bind primitive (discussed earlier in this chapter).