Transaction Path Analysis (TAPA) - Appendix - Pragmatic Enterprise Architecture (2014) - Strategies to Transform Information Systems in the Era of Big Data

Pragmatic Enterprise Architecture (2014)
Strategies to Transform Information Systems in the Era of Big Data

Appendix

This appendix provides one useful artifact for a discipline known as physical data base design within data modeling architecture, which participates primarily in the software development life cycle and data centric development life cycle, as well as in several other life cycles.

A.1 Transaction Path Analysis (TAPA)

A.1.1 TAPA 01 Significant Transactions

Many software applications contain a large variety of transactions, and usually, the majority of these transactions are incapable of having a noticeable impact upon the performance of the application. Since time and labor are valuable, it is best to focus only on the small number of transactions that are likely to impact the availability of shared resources.

A significant transaction would be an application function that is expected to have any of the following characteristics:

- Business criticality

- The functionality of some business transactions is more business critical than others, and it is important for the business users to alert the database administrator as to which transactions are the most business critical.

- High volume

- As computer processing power increases, the threshold that defines high volume continues to increase. In general, it is worth bringing it to the attention of your DBA when the frequency of the transaction is expected to exceed one hundred per minute, or when the number of concurrent users may exceed one hundred.

- Complex

- A number of factors should be considered when determining whether a transaction is complex. The types of characteristics to be on the alert for are transactions that use the database and contain:

– Lengthy calculations, either embedded or called externally

– Intersystem and/or intra-session communication

– SQL JOINS that exceed three tables

– An excess of 20 SQL statements within the transaction

– Multiple commits to the database

- Long running

- A number of factors also should be considered when determining whether a transaction is long running. The types of characteristics that you may wish to alert the DBA to include transactions that contain:

– CURSORS that may touch more than 500,000 rows before completing

– Batch transactions that may process more than 100,000 input records

– Conversational interaction (two-way communication) with an end-user

- Memory intensive

- In general, it is worth bringing the transaction to the attention of the DBA when the transaction contains large arrays or data fields that exceed the typical 4 kB size, such as an XML tag or column that is defined with a large data type, such as CLOB, BLOB, TEXT, NTEXT, and VARCHAR(MAX), or a large return set exceeding a gigabyte.

In order to provide the DBA the opportunity to perform their responsibilities, the DBA will need to become familiar with the significant transactions. As a result, the DBA will be looking to the development team to provide the following information for their significant transactions:

- Transaction name

- A database transaction is considered the component that has the most pertinent information that a DBA can learn about to perform physical design.

- Maximum transactions per minute

- Transaction rates are usually stated in transactions per minute or transactions per second.

- Maximum number of concurrent users

- The overall resources that are required to allow a database management system (DBMS) to keep up with new requests are often determined by the amount of concurrent user activity.

- Lengthy calculation(s)

- Some calculations can prolong the length of time that locks are maintained on database tables.

- Intersystem and intra-session communication

- Communication across the transaction boundary creates an interdependency that can prolong the length of time that locks are maintained on database tables.

- Maximum number of SQL tables in a join

- Table joins that exceed three tables can severely prolong the length of time that locks are maintained on database tables.

- Number of SQL statements

- A large volume of SQL statements in a transaction can severely prolong the length of time that locks are maintained on database tables.

- Multiple commits within a transaction

- Although uncommon within online transactions, the presence of multiple commits is a signal to the DBA to consult with the development team to learn more about the factors creating a need for multiple commits.

- Maximum number of cursor rows

- A large number of cursor rows can severely increase the number of update locks in a relational DBMS and may also prolong the length of time that locks are maintained on database tables.

- Batch versus online

- Batch and online transactions usually demonstrate a significantly different performance profile from one another.

- Number of batch input records

- A large volume of input records can significantly increase the execution time of a batch transaction, thus requiring the DBA to become aware of the commits and restart recovery capabilities of the transaction.

- Two-way user communication

- Online transactions that employ a conversational design approach to communications can lock resources for prolonged time periods, as compared to a pseudo-conversational design approach.

- CLOBs or BLOBs or TEXT or NTEXT or VARCHAR(MAX)

- Data columns that support large data sizes are an indication of transactions that may prolong the length of time that locks are maintained on database tables by incurring waits for large storage requests.

A.1.2 TAPA 02 SQL Data Services

Application programs may contain a variety of database I/O requests. The database optimizer is the database component that receives the database I/O request. What is most important is that the way in which the database I/O request is structured can often have a profound impact upon the way that the optimizer interprets and processes the request.

For example, some database I/O requests for data may be structured as one large JOIN statement, or a statement with one or more statements embedded within it, or as multiple individual statements that follow after one another.

Since many database I/O requests can exhibit unexpected results, or may adopt inefficient access strategies when accessing the data, it is important to pass your SQL statements to the DBA so that the structure of the I/O request as well as the structure of the database’s physical design can be adjusted to eliminate performance problems and programming errors that are easy for someone intimately familiar with the database’s physical design, and the particular DBMS’s physical design options, to avoid.

By presenting the application’s database I/O requirements to the DBA, the interaction allows the DBA to add value by sharing their knowledge of the particular DBMS, including alternatives involving the use of database built-in functions, the applicability of unions, the correct use of cursors, or the use of carefully chosen options within the syntax of the database I/O request.

Aside from having the DBA participate as a valued member of the software development team, the SQL Data Services Workbook allows the DBA to determine and ensure the necessary SLA for database performance. In order to provide the DBA the opportunity to perform their responsibilities, the DBA will need to become familiar with every database I/O request in the application requiring the following information:

- Transaction name

- A transaction is considered the source of all database I/O requests, and hence the first piece of information to identify the DBA is the transaction making the database I/O request.

- Average # times invoked in transaction

- Transactions that are infrequently invoked can also be responsible for frequently invoking one or more database I/O requests, such as I/O requests that are repeated in a loop.

- Purpose of SQL statement

- Only after the intent of the database I/O request is made known to the DBA will the DBA be able to determine whether the actual request will support the needs of the developer, and it will also make it easier for the DBA to recommend an alternate structure for the database I/O request.

- SQL statement

- Paste the SQL statement here for the DBA to review its structure and to determine how the optimizer might better support the request by potentially altering the structure of the database I/O request or by altering various aspects of the physical database design.

- Suggestions, questions, concerns

- A working knowledge of an application’s components can provide the developer with a perspective that the DBA may not have considered, or it may simply be an opportunity for the developer to pose a question or concern.

As information is provided by the developers, the following pieces of information can be documented by their DBA:

- DBA notes

- The DBA can record notes, such as the determination that the particular SQL statement is simple or infrequently invoked, thereby not requiring further analysis.

- Alternate SQL

- After analyzing the purpose of the I/O request, the DBA can propose an alternate SQL statement to address accuracy/performance issues.

- Indexing and other design considerations

- A wide variety of physical design decisions can result from understanding an application’s database I/O requirements, potentially leading the DBA to partition tables, resize table spaces, reallocate tables to table spaces, or alter the use of indices. However, the most common use for an application’s I/O requests is to determine the cost benefit for denormalization.

- Status

- A status can help organize SQL data services into useful groups that help communicate which database I/O requests are pending, in progress, or approved.

A.1.3 TAPA 03 Significant Transaction DB Table Use

The primary flow of business transactions and their underlying activities tend to demonstrate a natural progression and sequence of data and information recorded and retrieved.

For example, let us focus momentarily on a set of business transactions common to a retail outlet.
We begin the business process by placing an initial order, where we should record the new product information, including its description, wholesale and retail price, and the order that we placed for the product, including its order date, price, quantity, vendor, and vendor contact information. If this is our first transaction with this vendor, we will provide the vendor with our company credit card information, our shipping instructions, and our business sales tax id so that we will not be invoiced for sales tax on top of the product cost, and shipping and handling.
When the shipment arrives, the second activity in the sequence will be to record the delivery information, including its delivery date, delivered quantity, invoice amount, invoice number, as well as updating the product inventory.
The third activity may be to read the product information to generate a barcode product labels for each unit containing the product price and product identifier, and to record the fact that each unit was placed in the retail area for sale, as opposed to being allocated for consumption by the business operation.
The fourth activity occurs when customers purchase the product, which would cause the sale to be recorded and the product inventory to be updated.

In the abovementioned example, we have a total of four business transactions: (1) order product, (2) accept delivery, (3) put product out for sale, and (4) sell product. In our bookkeeping, we have seven separate types of records to house the data related to these business transactions, which include a place to record: (1) product information, (2) a delivery, (3) inventory line item increases, (4) expenses, (5) a product line item sale, (6) receipts from a sale, and (7) inventory line item reductions.

If we look at our definition of “significant transactions,” our transactions for ordering product, accepting delivery, and putting product out for sale are not high volume, complex, long running, or memory intensive. Our highest volume transaction is selling product, which places update locks on a product line item sale entry, a receipt from a sale, and an inventory line item reduction. As additional business transactions of the application are identified, we will be on the alert for any other “significant transactions” that share any of the same data, and whether they use an access sequence that opposes the sequence of this transaction.

In order to provide the DBA the opportunity to perform their responsibilities, the DBA will need to become familiar with the significant transactions in this and every application. As a result, the DBA will be looking to the development team to provide the following information for their significant transactions:

- Transaction name

- A transaction is considered the source of all database I/O requests, and hence the first piece of information to identify to the DBA is the transaction making the database I/O request.

- Table name

- Each table name that is acted upon by the transaction.

- Control record

- A control record would include one or few rows that centrally manage information, such as an application maintained sequence number, that has the probability of creating a single point of contention.

- CRUD (C, R, U, D)

- The database operation that the table participates in, representing Create, Read, Update, Delete, and Cursor Select.

- Execution sequence

- A number indicating the order in which the tables are acted upon within the transaction.

- Average number of rows

- The average number of rows that are affected by the database operation.

As information is provided by the developers, the following pieces of information can be documented by their DBA.

- Lock type (S, X, U)

- Shared lock types result from selects; exclusive lock types result from updates, inserts, and deletes; and update lock types result from cursor selects from schemas/users with other than read-only access.

- Number of indices affected

- Deadlock contention occurs most frequently within the data structures of indices, as opposed to occurring on rows of the underlying target tables, and the probability of a problem increases proportionally with an increase in the number of indices updated.

- Number of columns per index

- The probability of an index being updated generally increases as the number of columns participating in the index increases.

- Direction

- When the sequence of tables having locks placed on them coincides with the sequence those tables are being updated by other transactions, then the direction is in “agreement”; however, when the sequence of tables having locks placed on them conflicts with the sequence of prior transactions by opposing their sequence, then the direction is in “conflict.”

- Conflicts with transaction name(s)

- When conflicts in sequence have been identified, the name(s) of the transaction with which a conflict exists is recorded.

A.1.4 TAPA 04 Database Tables

Database tables represent a significant portion of what a DBA works with for the bulk of their planning and administration. Database tables determine database size, each having their own growth rate, retention period, purge frequency, and archival requirements, and they represent the focal point of the access strategy for the optimizer of the DBMS.

In order to provide the DBA the opportunity to perform their responsibilities, the DBA will need to become familiar with the various tables and indices of the database. As a result, the DBA will be looking to the development team to provide the following information:

- Table name

- A table will be strategically located within a table space, isolated, or among other tables and indices, sized to manage the growth rate needed to support the online retention period.

- Average row length

- The columns of a table will vary in size, and they will be populated to various degrees, called sparseness, which together will determine the average length for a row of data.

- Initial volume

- The database may contain an initial set of data that may be loaded from another file, database, or by manual entry.

- Annual rate of growth

- This estimates the net rate of growth at which additional data is added to the table. Tables that have the same amount of growth each year as the initial year have an annual rate of growth of 100%.

- Distribution of growth

- Depending upon the values of the columns that get added as rows, their position within the prime key may be placed at the beginning, end, at hot spots, or evenly distributed within the index. The distribution and rate of growth will help the DBA determine the desired frequency of database reorganizations and the amount of free space necessary to promote good access behavior.

- Percent column expansion

- Depending upon the degree to which NULL values will be populated in subsequent, the DBA will want to appreciate how much FREESPACE to allocate on the database pages to allow for expansion without causing splits and pointers to distant database pages.

- Online retention period

- Availability requirements for data online will determine the retention period. Many databases hold their data indefinitely, although they may only have a business requirement that spans a specified number of years for historical and analytical purposes. Please note that online data retention should not be confused with archival requirements, which are governed by more than regulatory requirements (see “Archival requirements” below).

- Purge frequency

- Depending upon the rate at which data is accumulated for archival, data from the online database may be offloaded and purged at a higher or lower frequency.

- Archival requirements

- Once data is offloaded and purged from online storage, archives can be retained both internally and externally for various periods of time. The determination of archival requirements, however, does not stem solely from laws and regulations, as senior management needs to determine how best to deal with lawsuits and discovery requests.

Hence, when considering laws such as Sarbanes-Oxley, which specifies, in section 103, that firms retain records relevant to audit and financial review for 7 years, and the Securities and Exchange Commission, which specifies, in SEC rule 17a-4, that records pertaining to certain parties be retained for specified periods of time, the most valuable criteria overall to consider may be the statute of limitations for various legal actions that the company may need to protect itself from.

The issues are:

- Laws and regulations

- Lawsuits and discovery requests

- Ability to access and support the necessary search capabilities

- Data security of both data archived in-house and externally

- Balancing the cost to the business of both providing and not providing various search capabilities for performing discovery

- Encryption

- The implementation of encryption is such that it should be limited in use to database columns that are considered to be moderately sensitive or higher (see Section A.1.4.2).

- Number of indices

- Tables should have a minimum of one index which would be associated with the primary key columns; however, a number of additional indices may provide a worthwhile benefit to read performance at the expense of update and insert performance.

- Index key size(s)

- Each index is built upon one or more columns, each having an average length. In order to calculate the variable portion of the space needed to support each index, the average length of the columns that comprise an index must be added together. The average length of columns that are in multiple indices must be included in the size for each index they participate in.

- Triggers/stored procedures

- As an alert to the DBA, if there is an expectation that application components will be developed as either a trigger or a stored procedure, then the DBA can prepare for compiling their source code.

As information is provided by the developers, the following pieces of information can be documented by their DBA:

- Projected number of gigabytes required

- The DBA can calculate the sum of the space needed for any initial data, one year’s growth for tables and indices, free space, and an initial margin of 30%.

- Originating versus passed

- The DBA can document whether the data in this table originates in this database, or is passed from another database, or contains data that both originates in this database as well as data that is passed from another database.

- Data quality

- See Section A.1.4.1.

- Data sensitivity

- See Section A.1.4.2.

- Status

- The DBA can designate the status of this table’s analysis as pending, in progress, or approved.

A.1.4.1 Data Quality

Data quality is ultimately a rating that communicates the level of which one can reasonably rely upon the accuracy of each data field within that database. The process of assigning a data quality level of a data field should be limited to database fields that originate at the database, as nonoriginating fields should have their respective data quality rating established at their point of origin. The levels of data quality assigned a field should be defined in business terms addressing how sparsely populated the data is when a value can exist, and how accurate the data values are when they are populated:

High data quality should be assigned to fields that are fully populated with a non-null value on each and every row of data, and whose values are considered to be completely accurate and fully reliable. These are typically required fields with comprehensive edits to ensure reliability.

Moderate data quality should be assigned to fields that are populated a majority (90%) of the time when a value was known, and whose values are considered to have a majority of accurate (90%) data and may hence be considered to be reasonably reliable. These are typically nonrequired fields usually populated when their value was known, having edits that minimally guarantee format. To ensure against the possibility that the value contained within these fields has not been systematically populated with inaccurate data of the correct format, one or more statistical analyses may be performed.

No data quality should be assigned to fields that are either sparely populated with a value, or when populated with a value the accuracy is not reasonably reliable in either value or format. These are typically nonrequired fields with no edits to ensure reliability or correct data format.

This initiative should begin by analyzing the (a) root causes of the various data qualities, (b) extent of data quality issues within various databases, and (c) identification of procedural and organizational approaches for addressing data quality, so that the requirements and planning for addressing our data quality needs may be determined.

The assignment of database data quality should begin with those databases that originate the majority of their data so that downstream databases can benefit by inheriting the data quality associated with those fields that are passed on down to them.

A.1.4.2 Data Sensitivity

Data sensitivity should be assigned to a database, as derived by identifying the data field(s) with the highest sensitivity within that database. The process of assigning a data sensitivity level of a data field should be limited to database fields that originate at the database, as nonoriginating fields should have their respective sensitivity rating established at their point of origin. The levels of data sensitivity assigned a field should be defined in business terms:

Highest level of sensitivity assigned to fields that should only be disclosed on a strict need to know basis within the enterprise in that disclosure to nonauthorized individuals poses a legal and/or financial risk to the enterprise, including anything that can be used to adversely affect its credit rating or stock price, or severely diminish any aspect of the public's perception of the enterprise, the value of the enterprise, or its ability to secure private information.

Moderate level of sensitivity should only be disclosed on a need to know basis within the enterprise in that disclosure to nonauthorized individuals may pose a near certain competitive risk to the enterprise if the data fell into the hands of a competitor, including anything that can be used to moderately diminish any aspect of the public's perception of the enterprise, the value of the enterprise, or its ability to secure private information.

Low level of sensitivity should be disclosed to a limited set of individuals that have been trusted to work with particular sets of data in that disclosure to nonauthorized individuals may pose some level of competitive risk to the enterprise if the data fell into the hands of a competitor, including anything that can be used to slightly diminish any aspect of the public's perception of the enterprise, the value of the enterprise, or its ability to secure private information.

Nonsensitive data, usually limited to externally acquired data, should be disclosed to a limited set of individuals that have been trusted to work with particular sets of data in that disclosure to nonauthorized individuals would not be usable to determine anything about the enterprise, or its practices.

The assignment of database data sensitivity should begin with those databases that originate the majority of their data so that downstream databases can benefit by inheriting the data sensitivities associated with those fields that are passed on down to them.

A.1.5 TAPA 05 Database User Availability

At a high level, the profile of database availability is useful to help determine which databases can coexist from an availability perspective, which will facilitate system maintenance to the server and the various components within its configuration.

At a more detailed level, database user availability communicates to the DBA when they will be able to schedule database maintenance, including regular maintenance, such as database backups and reorganizations, as well as any extraordinary maintenance, such as design changes or the application of database software patches.

Hence, in order to provide the DBA the opportunity to perform their responsibilities, the DBA will need to know the availability profile of the online business users and business batch processing. As such, the following information needs to be provided:

- Time of day/day of week availability

- Two fields exist for each 1-hour time period of the day, where the application’s availability for online and batch processing can be recorded.

- User business holidays

- The commonly observed business holidays for the business users need to be recorded for both domestic and international users.

As an example, if Thanksgiving is observed in the USA, but not observed among international users, then the system should be available for international users on Thanksgiving Day.

A.1.6 TAPA 06 Database Processing Volume Profile

The profile of database processing volume is useful to help determine which databases can coexist from the perspective of system resources. If the sum of the processing volume profiles of various databases remains within acceptable levels across the entire time period, then they can coexist. If, however, peak processing volume periods coincide with one another, then the maximum capacity of the server may be exceeded, rendering those databases incompatible with one another.

In order to provide the DBA the opportunity to safely determine database compatibility, the DBA will need to know the processing volume of online business users and batch processes. As such, the following information needs to be provided:

- Daily transaction volume profile by hour

- The most useful estimate of transaction volume is simply to designate the magnitude of the volume by each hour, as opposed to an exact number. A single-digit number representing the range of volume will make it easy for the DBA to work with. As the TAPA diagram indicates, the volume will be represented as a range of up to 10, up to 100, up to 1000, and so on, where the only number that needs to be captured is the number of trailing zeros. Hence, the selection for “up to 10,000,” or using scientific notation = 1 × 104, is simply the number “4,” which is just the exponent on the 10.

- Monthly transaction volume profile by week

The primary use of this volume profile is to communicate cyclical peak processing periods that occur during particular times of the year, as would be the situation for quarterly reporting or year-end processing. Applications that do not have cyclical peak processing periods will assume a consistent value each week of each month, which is equivalent to the sum of the volume for the week.

A.1.7 TAPA 07 Database Performance SLA

The service-level agreement (SLA) for database response time is useful to help determine the physical design objectives of the database and the corresponding hardware configuration needed to support the application. It is important to note that the majority of transaction response time is often determined by transaction design and the total amount of time that transactions spend in performing their nondatabase activities. Even though this may be the situation, the importance of estimating the database portion of a transaction’s overall response time cannot be overstated.

At the most technical level, every random and every sequential access to the database is determined by the specifications of the hardware components that participate in servicing each I/O request. When accuracy is absolutely critical for an application, a DBA will determine the number of random and sequential I/Os for each transaction type, and referencing the manufacturer’s specifications for the device, and then total up the milliseconds to determine the response time.

When the actual response time varies unexpectedly from the projected response time, it is almost always due to expected behavior within either the application transaction or the number and type of I/Os that the DBMS is performing. Using the proper performance analysis tools, a DBA can readily determine the source of the unexpected result.

In order for the DBA to focus their attention appropriately, the following information needs to be provided:

- Geographic locations of users

- Locations may be identified in terms of whatever is most useful to describe where the users are located. This can be an address housing company offices, a city, state, country, or even continent. Communicating the location is still important even if the application is Internet based, as many countries and regions have various degrees of Internet infrastructure in place. The geographic variety of the users can also help explain an application’s availability profile to the DBA.

- Communications

- The application may be communicating through the Internet, Intranet, or the company network. Depending upon the communication mechanism and the location of the application, the database should be located within the zone that affords the appropriate level of firewall protection.

- Number of users per location

- The SLA for various geographic regions will likely differ from another, and the number of users can determine the relative priority that performance tuning should be assigned.

- Number of concurrent users per location

- The number of concurrent users can have a significant impact upon the performance profile of the database server, and testing should incorporate as many of the real world constraints as is reasonably required to support Production.

- Maximum acceptable average response time in seconds

- It is important that the SLA stated for the maximum acceptable average response time be realistic for two reasons.
First, the maximum acceptable average response is not the same as the ideal response time desired; otherwise, it would be called that. It is intended to communicate what the users realistically can tolerate as a maximum average response time in order to conduct their business activities without causing harm to the users’ ability to actually conduct business.
Second, unnecessarily identifying a more stringent SLA will unnecessarily increase the demand upon the limited resources of the company, thereby causing expenses to escalate. If, however, performance is critical to the business function, such as with certain program trading systems, then a more stringent SLA would be appropriate.

- Application environment

- Applications are most frequently based on one or more application servers, which may support connectivity with “end-users” through an Internet, an Intranet, or a company network. When application servers are involved, database performance analysis can focus upon the requests received directly from the application server.

- Applications that run local on the user’s workstation are far less common, as they are more expensive to maintain than application server-based environments.

- Applications that are colocated on the database server are the least common type of application environment, as they are usually not permissible because they pose the greatest degree of risk to the database and everything that is dependent upon the database server.
In contrast, it is far more convenient for an application server to fail, while the database rapidly recovers from any in-flight updates from the failed application server, remaining available for the return of the application server. When a database server fails, the overall recovery process for the databases supported by the particular server is far more complex and far more time consuming.
When applications are permitted to run on a database server, the company’s support expenses are usually higher due to the fact that the environment is typically dedicated exclusively to the application in order to avoid introducing unnecessary risk to other systems.

A.1.8 TAPA 08 Database Instances

Database instance information provides an opportunity to identify additional requirements of the production database as well as the application team’s development process.

In order for the DBA to focus their attention appropriately, the following information needs to be provided:

- DBMS

- Although the choice of DBMS is determined by EDM in accordance with the EDM Database Strategy, the application development team is afforded the opportunity to communicate their preference. If there is sufficient justification, such as a major business or technical requirement for features that would otherwise be unavailable, then the project team may wish to request a variance.
There are multiple economic and technical demands that motivate the enterprise to standardize the use of DBMS platforms to DB2 and SQL Server, which both strictly adhere to the ANSI standard. Developers having a preference or familiarity with one brand over another would not be able to use that to provide sufficient justification to vary from the EDM Database Strategy. Since all relational databases use SQL, developers that need to know SQL should not be hindered with the deployment of one brand over another.
To name two examples, most developers are unaware of the fact that Oracle does not comply with the ANSI standards, and both developers and DBAs are often completely unaware of the resulting technical, financial, and business implications.

- DBMS version

- The choice of version is also determined by EDM in accordance with EDM Standards. In general, the version to be deployed is the highest version that EDM has validated as being stable and reliable. Newer versions may sometimes be requested by a development team usually to take advantage of a new or enhanced feature only available in the newer version, and depending upon the business and/or technical requirements, EDM may grant a variance to allow the use of DBMS software that data engineering has not yet verified as production ready.

- Backup window

- The backup window refers to a standard start and end time each day that database can be successfully backed up by momentarily preventing business activity.

- Environment and relative database sizing in gigabytes

- The standard number of database environments for supporting the entire SDLC is generally five, usually consisting of a Unit Test, Integration Test, User Acceptance Test, Production, and Production Maintenance environment. Since the names and uses of some of these environments may vary, the names can be changed.

- The relative size of each database environment is identified to allow varying database sizes, such as a much smaller unit test environment, or a much larger integration test environment for testing higher than expected system capacity.

- Applications requiring access and access type

- Certain databases will contain data that may meet the informational needs of more than a single application. As they become known, these applications should be identified along with the type of access (i.e., update or read-only) they require.

- Database use

- The possible uses for the database are OLTP/transactional, operational data store (ODS)/operational data store, OLAP/multidimensional, DW/data warehouse, or a combination of purposes. When the database captures data that originates here, then it is a transactional database (OLTP). If it captures a subset of data from across multiple other databases, then it is an ODS. If it calculates aggregates of data from detail data, then it is a multidimensional database (OLAP). If it generally consolidates all detail data from across many databases, then it is a data warehouse (DW). If it does some combination of the above, then it is a combination of database types.

- Inter-database dependencies

- Rather than replicating data that is not entirely stable in multiple locations, databases will occasionally provide views of data that is managed centrally in another database. This dependency may extend to or from other databases.

- Production failover and hot backup requirements

- Business critical applications need to identify their requirements for recovery.

- Firewall requirements

- Database servers must be accessible from the various ports supporting communications, and hence, it is important to note where the various interfacing applications and databases reside within our security architecture.

- Security requirements

- Databases can vary substantially in their security requirements, such as requiring access to encryption key files on other servers when using data encryption.

- Comments

- Additional notes or comments may be pertinent to the database instance, particularly when advanced business intelligence and analysis capabilities need to be incorporated into the database environment as add-ons.