Software Engineering: A Methodical Approach (2014)

PART C. Software Design

Chapter 10. Database Design

If you review the OO modeling hierarchy (Figure 7-11) of chapter 7, you will notice that the left hand side is characterized by the term object structure analysis (OSA). This chapter focuses on OSA, or more precisely, object structure design. In this chapter, we shall relax any distinction between database design and object structure design, for the following reason: As established in the previous chapter, irrespective of the software engineering paradigm employed, data (object) structure design is of paramount importance. By way of observation, most software engineering environments embrace the idea of a relational database, upon which an OO user interface is superimposed. This chapter presumes that convention, and provides you an overview of the database design experience. For a more comprehensive coverage of database systems, please refer to the recommended readings.

The chapter proceeds under the following captions:

· Introduction

· Approaches to Database Design

· Overview of File Organization

· Summary and Concluding Remarks

10.1 Introduction

A database is the record keeping component of a software system. Database design is critical part of software engineering. Underlying most software products is a database that stores data that is critical to the successful operation of the software. Figure 10-1 provides you with some examples of this. In most cases, the database is superimposed by a user interface, and may therefore sometimes not be obvious to the end user. However, whether it is obvious or not the database component is real and potent. A full discussion of database design is beyond the scope of this course; it is best done in a course in database systems. This chapter provides you with a useful overview of the territory.

Figure 10-1. Illustrations of the Importance of Database

Database design is very crucial as it affects what data will be stored in and therefore accessible from the system. Hence, it affects the success of the system. Poor design leads to the following software flaws:

· Poor response time, hence

· Poor performance

· Data omissions

· Inappropriate data structures

· Redundancies

· Modification anomalies

· Integrity problems

· Lack of data independence

· Difficulty in system maintenance

· Inflexibility

· Lack of clarity

· Security and reliability problems

· Pressure on the programming effort to compensate for the poor design

Poor database design puts pressure on the software development team to program its way out of the poor design. By contrast, good design leads to the exact opposite of these conditions induced by poor design.

Some objectives of database design include the following:

· Comprehensive data capture

· Efficiency

· Flexibility

· Reliability

· Control of Redundancy

· Security and Protection

· Consistency and Accuracy

· Ease of access and ease of change

· Availability of information on demand

· Desirable data integrity

· Data independence — immunity of application programs to structural, storage or hardware changes of the database

· Clarity and multi-user access

10.2 Approaches to Database Design

There are two approaches database design:

· Conventional files

· Database approach which includes

· Relational model

· Object-oriented model

· Hierarchical model

· Network model

· Inverted List model

10.2.1 Conventional Files

Figure 10-2 illustrates the idea of conventional file approach. Application programs exist to update files or retrieve information from files.

Figure 10-2. Conventional File-Based Design

This is a traditional approach to database design that might still abound in very old legacy systems (to be discussed in chapter 18). You may use this approach if the software system is already designed using this approach, and the task is to maintain it.

Note Do not attempt to redesign the software system without management consent. Also, be aware that people (including managers) sometimes get annoyed with a software engineer who walks around looking for every problem to fix. Ironically, fixing problems often created by human ineptitude or limitations is an integral part of your job. Just be discreet in the execution of your job (chapters 14 and 15 provide more guidelines on how to conduct yourself on the job).

10.2.2 Database Approach

In the database approach, a database is created and managed via a database management system (DBMS) or CASE tool. A user interface, developed with an appropriate application development software, is superimposed on the database, so that end users access the system through the user interface. Figure 10-3 illustrates the basic idea.

Figure 10-3. Database Approach to Design

Of the five methodologies for database design, the relational model and the OO model are the two which dominate contemporary software engineering; this is expected to continue into the foreseeable future. The other three approaches are traditional (from the 1960s and 1970s), but occasionally show up in legacy systems. They will not be discussed any further; for more information on them, see the recommended readings.

Following are the steps involved in designing a relational or OO database:

1. Identify data entities or object types

2. Identify relationships

3. Eliminate unnecessary relationships

4. Develop an entity-relationship diagram (ERD) or an object-relationship diagram (ORD)

5. Prepare the database specification

6. Develop and implement the database

Note Step 6 belongs to the field of database systems and will not be explored any further in this course. However, you should appreciate the close nexus between database design and software engineering.

10.2.3 Identifying and Defining Entities or Object Types

Identifying information entities or object types requires skills, experience, and practice. In this regard, the techniques discussed in chapter 5 are applicable. As you will recall, an information entity is a concept or object about which data is (to be) stored. An object type is concept or object about which data is (to be) stored, and upon which a set of operations are defined. If you relax the latter part of the definition of an object type, you will readily see that information entities and object types are similar.

Of equal importance is the structure of each information entity (object type). We define structure in terms of attributes: An entity (object type) is made up of attributes (also called elements or properties) that describe the entity (object). Attributes are non-decomposable (atomic) properties about the entity (object), as illustrated in the example below (Figure 10-4).

Figure 10-4. Partial Entity-Attributes List for a Manufacturing Environment

10.2.4 Identifying Relationships

Identifying relationships among entities (object types) also requires skills, experience and practice, but can often be intuitively recognized by observation. To identify relationships, you have to know what a relationship is and what types of relationships there are. Your course in database systems will elucidate these issues to some level of detail. For now, you may consider a relationship as a mapping involving two or more information entities (or object types) so that a data item (an object) in one relates in some way to at least one data item (object) in the other(s) and vice versa. There are seven types of relationships:

· Component relationship (if OO database)

· Aggregation relationship (if OO database)

· Super-type-sub-type relationship (if OO database)

The first four types of relationships are referred to as traditional relationships because up until object model (for database design) gained preeminence, they were essentially the kinds of relationships that were facilitated by the relational model. Observe also, that the only difference between a 1:M relationship and a M:1 relation is a matter of perspective; thus, a 1:M relationship may also be described as a M:1 relationship (so that in practice, there are really three types of traditional relationships). Put another way:

For traditional relationships, to determine the type of relationship between two entities (object types) E1 and E2, ask and determine the answer to the following questions:

· How many data items (objects) of E1 can reference a single data item (object) of E2?

· How many data items (objects) of E2 can reference a single data item (object) of E1?

To test for a component relationship between any two relations (object types) E1 and E2, ask and determine the answer to the following questions:

· Is (a data item of) E1 composed of (a data item of) E2?

· Is (a data item of) E2 composed of (a data item of) E1?

For a subtype relationship, the test is a bit more detailed; for entities (or object types) E1 and E2, ask and determine the answer to the following questions:

· Is (a data item of) E1 also a (a data item of) E2?

· Is (a data item of) E2 also a (a data item of) E1?

The test is identical for object types, except that in the object-oriented paradigm, the term “instance” is preferred to “data item.” Possible answers to the questions are always, sometimes, or never. The possibilities are shown in Figure 10-5.

Figure 10-5. Testing for Sub-type Relationship

Example 1: Again using the sample manufacturing subsystem of Figure 10-4, several relationships can be identified, as indicated in Figure 10-6.

Figure 10-6. Relationships List for a Manufacturing Environment

10.2.5 Developing the ERD or ORD

An entity-relationship diagram (ERD or E-R diagram) is a graphical illustration of entities and their relationships in the database. In the OO paradigm, the equivalent diagram is called an object-relationship diagram (ORD or O-R diagram). For small and medium sized projects, it is a very useful modeling technique. However, as the size and complexity of the system increases, the ORD/ERD tends to become unwieldy. In these circumstances, unless the software engineering team is using a CASE tool that facilitates generation and maintenance of the ERD/ORD, other pragmatic approaches are recommended. One such approach is to tabulate as illustrated above. Another approach is to construct for each information entity (object type), an object/entity specification grid (O/ESG). This will be discussed shortly.

The symbols used in an ERD are as shown in Figure 10-7. Figure 10-8 shows the ERD for the manufacturing system of Figure 10-4, using the Crows-foot notation. In the diagram, the convention to show attributes of each entity has been relaxed. Note also that relationships are labeled as verbs so that in mapping one entity (or object type) to another, one can read an object-verb-object formation. If the verb is on the right or above the relationship line, the convention is to read from top-to-bottom or left-to-right. If the verb is on the left or below the relationship line, the convention is to read from bottom-to-top or right-to-left.

Figure 10-7. Symbols Used in E-R Diagrams

Figure 10-8. E-R Diagram for Manufacturing Environment

It is customary to indicate on the ERD, the multiplicity (also called the cardinality) of each relationship. By this we mean, how many occurrences of one entity can be associated with one occurrence of the other entity. This information is particularly useful when the system is being constructed. Moreover, violation of multiplicity constraints could put the integrity of the system is question, which of course is undesirable. Usually, the DBMS does not facilitate enforcement of multiplicity constraints at the database level. Rather, they are typically enforced at the application level by the software engineer.

Several notations for multiplicity have been proposed, but the Chen notation (first published in 1976, and reiterated in [Chen, 1994]) is particularly clear; it is paraphrased here: Place beside each entity, two numbers [x,y]. The first number (x) indicates the minimum participation, while the second (y) indicates the maximum participation.

An alternate notation is to use two additional symbols along with the Crow’s Foot notation: an open circle to indicate a participation of zero, and a stroke (|) to indicate a participation of 1. The maximum participation is always indicated nearest to the entity box.

The Chen notation is preferred because of its clarity and the amount of information it conveys. Figure 10-9 provides an illustrative comparison of the two notations.

Figure 10-9. Illustrating Multiplicity Notations

Let us now turn our attention to the O-R diagram. The symbol for object type (mentioned in chapter 9) is also used in the ORD, and replaces the entity symbol (as you will soon see, they are actually similar). Assuming the UML notation, the following guidelines apply:

· Similar to the information entity, a box (square or rectangle) represents an object type. The object-type box has two additional compartments: one for attributes, and the other for operations.

· A triangle or arrowhead (pointing towards the super-type) represents an inheritance relationship.

· An open diamond represents an aggregation relationship (the parts existing independent of the whole).

· A filled in diamond represents a composition relationship (the parts only exist as part of the whole).

· A line connecting two object types represents a traditional relationship; the multiplicity (also called cardinality) of this relationship is indicated by a pair of integers next to each object type; the lower value is indicated first, and an asterisk is sometimes used to mean “many.” The multiplicity of a relationship is the level of participation of each object type (or entity) in the relationship. The role that each object type plays in the relationship is also indicated next to the object type.

Since the object symbol automatically incorporates object attributes, there is therefore no need for an attribute symbol. In any event, including attributes and operations on the ORD tends to clutter the diagram rather quickly. It is therefore recommended that you omit this detail from the diagram for very large and/or complex systems. The upcoming subsection will suggest a creative and elegant way to represent attributes and related operations for object types comprising a system.

Figure 10-10 illustrates an ORD (using UML notation) depicting aspects of a college environment. According to the diagram, Student and Employee are subtypes of College Member. Additionally, Employee is a composition of Employee Personal Info, Employee Work History,Employee Academic Log, Employee Publication, Employee Extra Curricular, and Employee Dependents Log.

Figure 10-10. Partial O-R Diagram for College Environment (UML Notation)

Once, created, the ERD/ORD (or its equivalent) must be maintained for the entire life of the software system. If you are fortunate to be using a sophisticated CASE tool that automatically generates the diagram from the current database, then this will not be a problem for you.

10.2.6 Preparing the Database Specification

The database specification may be in different forms, depending on the available resources. In an OO environment where you have the use a CASE tool that supports UML, it may simply be a detailed ORD where each object type is represented as explained and illustrated in section 9.2.2 of the previous chapter. In an FO environment, it may simply be a detailed ERD where the attributes of each entity are included on the diagram.

For large, complex projects (involving huge databases with tens of information entities or object types), unless a CASE or RAD tool which automatically generates the ERD/ORD is readily available, manually drawing and maintaining this important aspect of the project becomes virtually futile. In such cases, an object/entity specification grid (O/ESG) is particularly useful. The grid contains the following components:

· Descriptive name of the entity (object type)

· Implementation name of the entity (object type) — typically indicated in square brackets

· Reference identification for each entity, to facilitate easy referencing

· Descriptive name, implementation name (in square brackets) and characteristics (in square brackets) for each attribute

· References (implying relationships) to other entities in the system (indicated in curly braces)

· Comments on the entity and selected attributes

· Indexes (including primary key or candidate keys) to be defined on the entity

· Operations to be defined on each entity (object type)

· Optionally, implementation names of operations are be indicated in square brackets next to respective operations

The convention for specifying attribute characteristics is to use a letter to represent the nature of the data (A for alphanumeric, N for numeric and M for memo) followed by numbers representing the length and precision (for decimals). Figure 10-11 provides an illustration of an O/ESG for the manufacturing environment, or the college/university environment of earlier discussions. The ESG for three entities are included in the figure. In actuality, there would be one for each entity (object type) comprising the system. Also note the special data attributes that reference other entities in the figure (E1.3, E2.4, E2.8). In order to determine when to introduce such references, you need to apply principles of database design. These principles are best discussed in a course on database design (see the recommended readings).

Figure 10-11. Sample Object/Entity Specification Grid for Manufacturing Environment

10.3 Overview of File Organization

In planning the underlying database for a software system, it is important that the software engineer understands the different types of file organization techniques and the rationale and benefits of each. From your earlier programming courses, you should recall that there are four types of file organization:

· Sequential File organization

· Relative (direct) File Organization

· Indexed Sequential File Organization

· Multi-Access File Organization

Let us pause to briefly review each technique.

10.3.1 Sequential File Organization

In a sequential file, records are arranged in arrival sequence usually stored on systematic tape. Accessing of records must also be done sequentially (the file may be sorted in a particular order).

Sequential file organization is useful when a large volume of records is to be added or updated in bulk or batch mode. Figure 10-12 illustrates what a sequential file of student records may look like. Accessing the N^th record means first accessing N-1 records.

Figure 10-12. Illustration of Sequential File

Sequential file organization is not suited for interactive processing. This is so because records have to be accessed sequentially in arrival sequence. If you have a file with thousands of records, attempt to provide interactive processing on a sequential file would produce very poor results, and would therefore be counterproductive.

10.3.2 Relative or Direct File Organization

In relative (direct) file organization, records are arranged in some logical order where there is a relationship between the key used to identify a particular record and the record’s address on the storage medium. In computer science, this is often represented as follows:

F(key) Address

Use of linked list is one method of implementing relative file. Each record has a pointer to the next logical record. Access is improved but additional data has to be stored with each record. Figure 10-13 illustrates a relative file implemented by linked list (on surname). The main problem with this implementation is that only sequential access is facilitated.

Figure 10-13. Illustration of Linked List (Relative File)

Other more desirable methods of implementation of relative files are direct mapping, table lookup, hash functions, and open addressing with buckets (you should be familiar with these methods from your course in data structures and algorithms). These methods facilitate random (direct) access of the file(s). Consequently, interactive processing is facilitated.

10.3.3 Indexed Sequential File Organization

In an indexed sequential file organization, records are ordered sequentially, but can also be accessed randomly via some key.

Indexed sequential access method (ISAM) is suitable for batch processing as well as interactive processing. ISAM files are typically implemented as B-trees (or some derivative of the B- tree). It is the most widely used method of file organization (again, please review your data structures).

10.3.4 Multi-Access File Organization

In multi-access file organization, a record can be accessed by any key order. An enhancement of ISAM, multi-access file organization is typically implemented by sophisticated DBMS suites and CASE tools. The DBMS maintains the index(es) that may be defined on the file in a manner that is transparent to the user. These indexes are typically B-tree or bitmap implementations.

10.4 Summary and Concluding Remarks

Let us summarize what we have covered in this chapter:

· A database is the record keeping component of a software system. It must be properly designed. Failure to do so will seriously compromise the quality of the software system.

· Contemporary databases are designed to be relational or object-oriented, but mostly relational.

· Database design involves five steps: identifying the information entities (or object types), identifying relationships among the entities, eliminating unnecessary relationships, developing the ERD or ORD, and preparation of the database specification.

· There are four types of file organizations: sequential, direct, indexed-sequential and multi-access. In sequential file organization, the records are organized in arrival sequence. The file can only be accessed sequentially. In direct file organization, each record has a specific address, thereby allowing random access. Indexed-sequential file organization supports both sequential access and random access. In a multi-access file, the records can be accessed sequentially or randomly, as well as via alternate access paths.

This is merely an introduction to database design from the context of software engineering. Study of database systems is a field of computer science, so a full discussion is beyond the scope of this course (please see the recommended readings). Appendix 10 provides you with a real example of the database specification for the Inventory Management System of earlier mention. The next chapter discusses design of the software user interface.

10.5 Review Questions

1. How important is database design? Cite four concrete examples of database playing an important role in computer software.

2. Identify the problems that are likely to occur due to poor database design.

3. Identify six objectives of good database design.

4. Outline the steps involved in the design of a database.

5. The following is an excerpt from the requirements for a college academic administration system:

· Courses are offered by various departments without any overlap (a department offers between 5 and 30 courses).

· The courses make up academic programs, in some instances a course may occur in more than one program. Academic programs are offered by departments (no overlap allowed).

· A faculty typically consists of several departments.

· A lecturer is scheduled to lecture at least two courses. Each course is lectured in a specific lecture room.

· A student may register for several courses; typically a course is pursued by a minimum of fifteen students.

· Each student is registered to one department only.

From the information given, develop an ERD for the system.

6. The Inventory Management Information System (IMIS) of a marketing company has the following database specification:

· The company has several warehouses, each storing certain inventory items without overlap.

· The company has a cadre of suppliers, each supplying various items of inventory, with possible overlap.

· The company purchases items by first sending a purchase order to a supplier (of course, the supplier could receive several orders). Each purchase order details the items required. In responding to the purchase order, the supplier submits an invoice, detailing the items supplied, along with other relevant information.

· The company may also sell items from its inventory. In such a case, a sale-invoice is submitted to the customer, which details the items sold, along with other relevant information.

· A sale-invoice is usually with respect to a sale-order, received from a customer. A sale-order is essentially a purchase order, coming into the company, from one of its customers.

· Each inventory item belongs to a category.

· A department may make a requisition for inventory items. In response, inventory items may be issued to department(s).

From this information, develop an ERD for the system.

By conducting a brainstorming session (or otherwise), and using your E-R diagram as well as guidelines illustrated in Figure 10-10, construct an initial entity specification grid (ESG) for the IMIS project.

10.6 References and/or Recommended Readings

[Chen, 1994] Chen, Peter. “The Entity-Relationship Model – Toward a Unified View of Data,” In Readings in Database Systems 2^nd ed., pages 741-754. San Francisco, CA: Morgan Kaufmann, 1994.

[Date, 2004] Date, C. J. An Introduction to Database Systems 8^th ed. Menlo Park, CA: Addison-Wesley, 2004. See chapters 1, 3, 5, 6, 11, and 12.

[Foster, 2010] Foster, Elvis C. with Shripad Godbole. Database Systems: A Pragmatic Approach. Bloomington, IN: Xlibris Publishing, 2010.

[Hoffer, 2007] Hoffer, Jeffrey A., Mary B. Prescott and Fred R. McFadden. Modern Database Management 8^th ed. Upper Saddle River, New Jersey: Prentice Hall, 2007. See chapters 3 and 4.

[Kendall, 2005] Kendall, Kenneth E. and Julia E. Kendall. Systems Analysis and Design 6^th ed. Upper Saddle River, NJ: Prentice Hall, 2005. See chapter 13.

[Kifer, 2005] Kifer, Michael, Arthur Bernstein and Philip M. Lewis. Database Systems: An Application-Oriented Approach 2^nd ed. New York: Addison-Wesley, 2005. See chapters 3 and 4.

[Lee, 2002] Lee, Richard C. and William M. Tepfenhart. Practical Object-Oriented Development With UML and Java. Upper Saddle River, NJ: Prentice Hall, 2002. See chapter 8.

[Martin, 1993] Martin, James, and James Odell. Principles of Object-Oriented Analysis and Design. Eaglewood Cliffs, NJ: Prentice Hall, 1993. See chapters 6 and 7.

[Rumbaugh, 1991] Rumbaugh, James, Michael Blaha, William Premerlani, Frederick Eddy and William Lorensen. Object Oriented Modeling And Design. Eaglewood Cliffs, NJ: Pretence Hall, 1991. See chapter 4.

[Ullman, 1997] Ullman, Jeffrey and Jennifer Widom. A First Course in Database Systems. Upper Saddle River, NJ: Prentice Hall, 1997. See chapters 1-3.