Migrating to the Exadata Database Machine - Best Practices - Achieving Extreme Performance with Oracle Exadata (Oracle Press) (2011)

Achieving Extreme Performance with Oracle Exadata (Oracle Press) (2011)

PART II
Best Practices

CHAPTER 10
Migrating to the Exadata Database Machine

Up until now, we’ve covered the features of the Oracle Exadata Database Machine and the best practices for using it. If you are creating and deploying brand-new databases on this platform, you would have all the information you need. But there are an enormous number of existing databases and applications that can benefit from the Exadata Database Machine, which means that you will have to perform a migration to this new platform.

This chapter discusses the best practices and methodologies for migrating from Oracle and non-Oracle relational database sources to the Exadata Database Machine. In general, the methodology and steps required for such migrations depend on the source database and version, the hardware platform, and the operating system. You should remember that the process for migrating to the Database Machine is no different from the process of migrating to a regular Oracle Database 11g Release 2 on a Linux x86-64-based or Solaris-based platform configured with RAC and ASM. There are no special steps in the migration process that are specific to either the Exadata Storage Servers or to the Database Machine.

Also, there are no special certifications or validation requirements for Oracle Database applications to run on the Database Machine. If the applications are validated or certified to work against a non-Exadata-based Oracle Database 11g Release 2 utilizing ASM as the storage, they will automatically work without any modifications against the database deployed on the Database Machine. The application will seamlessly be able to utilize the Exadata Storage Servers and reap the benefits of features like Exadata Smart Scans and storage indexes, out of the box and without any modifications. This benefit is achieved because connections to Oracle Database 11g Release 2 and the Exadata Storage Servers are only visible to the database server processes and transparent to the application.

When considering migrating Oracle Databases from non-Exadata platforms, the migration process will include a platform migration, and may also include a database upgrade based on the version of the source database. Often, multiple methods can be used to accomplish the migration, but you should evaluate each method and choose the one that fits your scenario and business requirements the best. Typically, the requirements for a migration are lower costs, quick implementation times, low manual conversion, and minimal downtime to the applications during the migration process.

On the other hand, migrating non-Oracle Databases from any platform is generally perceived as being complex and requiring substantial manual effort. This added complication adds to high migration costs and requires more resources. However, by following the best-practice methodologies listed in this chapter and automating many of the steps by using proven processes and software tools, you can drastically reduce the costs and timelines of such migrations. Automation is the key to lowering migration costs, and should be considered whenever and wherever the opportunity presents its use.

In addition, the power of the Oracle Exadata Database Machine means that some portions of a migration effort, such as loading data, can be accomplished in a much shortened time, given appropriate resource allocations.

Oracle platform migrations from Oracle and non-Oracle hardware platforms have been performed for decades and quite a bit of documentation is available on this topic, which should give you some comfort in dealing with these types of migrations. This chapter will help you define a migration strategy to the Database Machine from both Oracle and non-Oracle sources, and also compares and contrasts the different migration methods available, with the pros and cons of each.

NOTE

Migrating hierarchical databases like IMS-DB is not in the scope of this book. The steps involved in migrating non-relational databases to Oracle tend to be specific to the type of source and cannot be generalized in a short chapter.

Premigration Steps

Platform migration projects need proper planning in order to accurately estimate the resources, cost, and complexity of the migration, as well as the timelines for the effort. This exercise will set expectations and determine the perception of success on the project. Premigration steps also play a key role in choosing the right migration strategy, architecting the migration steps, determining the tools needed to perform the migration, and most importantly, estimating how much automation can be incorporated into the migration process.

The typical premigration steps are

Image Discovering the current environment

Image Database Machine capacity planning and sizing

Image Choosing a migration strategy

NOTE

Most of the premigration steps discussed in this section are focused on migrating non-Oracle databases, since they are the most complex of migrations. Considerations specific to Oracle sources will be pointed out as relevant.

Discovering the Current Environment

Sessions focused on discovering the current source system are essential planning exercises in a migration project. The purpose of such sessions is to document as much information as possible about the source, which will help you plan an appropriate migration strategy. This information will also help estimate project costs, timelines, and resources.

The discovery sessions should be held with the subject matter experts and architects who can provide the level of detail needed, and the conversations should be focused on gathering an appropriate amount of detail so they don’t drag on. The information captured as a result of these sessions provides details related to systems and network architecture, performance metrics, applications architecture, database architecture, and the backup and recovery architecture. Hopefully, the information provided in this section will guide you in asking the right questions while keeping focused on the right topics.

Before we talk about the environment discovery phase, we will first define a few terms that will be used in the later discussions.

Image Server platform The server platform consists of the combination of the hardware and the operating system (OS). When we talk about migrating to a server platform, it usually means replacing the current hardware and the OS platform with a different hardware and OS, for reasons such as outgrowing the current hardware capacity or for implementing a technology refresh. In the context of database migrations, the platform migration could also involve a database software replacement or a version upgrade as well.

Image Data model, stored objects, and database schema The data model includes the data storage structures such as tables and summaries, database views, keys, constraints, and the data access structures such as the different types of indexes.

The database stored objects are mainly the procedures, functions, triggers, object types, and methods that are stored in the database. The stored objects incorporate some sort of processing logic and are usually implemented using database vendor-specific structured programming languages such as Oracle PL/SQL.

The database schema is the data model and the database stored objects, combined.

Image Server-side scripts Server-side scripts are the OS scripts or commands that interact with the database using SQL queries or with database-specific OS commands and utilities to perform certain functions. The scripts can be written using utilities such as Shell scripts, Perl, or batch programs, or may be stand-alone scripts invoking database commands. Server-side scripts need not necessarily reside only on the database server platform, but can also reside on other servers and be connected to the database server remotely. A few examples are scripts that perform backups, data archiving, data loads, or data dumps, or scripts that simply run SQL commands.

Image Surrounding applications Surrounding applications are the applications that connect to the database servers from other hosts. These applications may be resident on application servers, or reporting servers, or ETL (Extract, Transform, and Load) servers—basically any server with software components that initiate database connections and run SQL and other commands on the database servers.

Image Source environment metrics Metrics that are associated with a server platform are a measure of some property or state associated with a software or hardware component. Some examples of metrics associated with hardware are CPU specifications, total memory, and hard disk specifications.

Metrics related to performance of the current system play an important role in sizing the Database Machine. Some examples of performance metrics are the CPU utilization rates, disk I/Os per second, and disk MBs scanned per second. You need to utilize the current system’s built-in methods and tools for capturing performance metrics. The best practice for capturing performance metrics is to capture peak and average values, as both values play an important role in sizing the destination system.

Source code metrics are related to the software code that is associated with the database being migrated. The source code metrics of interest in this chapter would be the total number of lines of code (LOC) that have database-specific calls, and the complexity of the code categorized as low, medium, or high. The level of complexity will depend on the use of source database-specific features and functions, or features that do not have a one-to-one translation to the Oracle components.

Consider the following example for calculating source code metrics for a server-side script. Say, for example, part of the server-side script interacts with the database by creating a database connection and executes SQL statements that are specific to the database technology. The rest of the script has logic that processes the results of the SQL and does additional processing that is non-database related. For calculating the source code LOC metric, you can count, either exactly or approximately, the total lines of code of the script that have database-specific interactions. For calculating the source code complexity metric, you need to approximately determine the complexity of such calls and categorize the script as a whole to be of low, medium, or high complexity.

The source code metrics should be calculated for the surrounding applications, server-side scripts, database stored objects, and any other code that runs some processing logic on the database.

Discovering Current Systems Architecture

The architecture of the current systems helps in defining the appropriate Database Machine architecture that can support the same or better levels of SLAs, the disaster recovery requirements, and the network connectivity requirements to the external applications and systems. This information can be captured easily using the system administration tools and the OS-level monitoring commands of the current systems.

Capture the systems architecture by focusing on the following details:

Image Server architecture, platform specifications, and hardware details.

Image High availability (HA), disaster recovery (DR) architecture of the current system, and application-level service agreements.

Image Components used in DR and HA. These are third-party proprietary technologies, active/passive or active/active HA, and failover times.

Image Network diagrams depicting applications, external storage, and systems connectivity.

Image Network throughput, latency, and bandwidth requirements for all connections going in and out of the source systems.

Discovering the Non-Oracle Database Environment

When performing a migration of non-Oracle databases to the Database Machine, the majority of the work involved is in migrating the data, database schema, and the server-side scripts. Hence, it is important to discover what resides on the source platform and identify the components that will be migrated over. This discovery step will also identify the components that are candidates for automated migrations and the components that can be grouped together for phased migrations.

This step is focused on gathering source code and performance metrics that are related to the source database platform, database schema, and the server-side scripts. This information can be gathered from the source database monitoring tools, administration tools, or by using SQL scripts.

A few high-level pointers on capturing these metrics are

Image Capture the source database versions, the OS platform, and the availability of the JDBC, ODBC, and native drivers for connecting into the database.

Image Investigate the use of different types of database structures in the source database, the use of complex data types, and features that are specific to the source database technology.

Image Investigate if database-specific access structures have been defined for speeding queries. For example, the Teradata database has the capability to create join indexes for speeding up queries that involve joins. Join indexes in Teradata can be translated to Oracle Materialized Views. If such straight translations to Oracle-equivalent objects exist, the objects will be the candidates for automated migrations. The objects that are proprietary to the source database technology and for which an equivalent object does not exist on Oracle will need to be manually analyzed and migrated.

Image Document the use of procedures, functions, triggers, complex types, and methods. Also gather the source code LOC metrics and the complexity metrics that categorize the object into low, medium, or high.

Image Analyze the use of non-ANSI SQL implementations for DML (Data Manipulation Language, or write operations) and queries. This usage makes the SQL hard to translate, so it is important to document such occurrences.

Image Analyze the use of server-side scripts, their purpose, the scripting languages used, and all the source code-related metrics that are associated with it.

Image Capture the performance metrics of the database server—the I/Os per second (IOPS) for short I/Os and the MBs per second (MBPS) for long I/Os.

Image Calculate the disk space utilization of the current database. The space used by data, indexes, summary tables, staging schemas, and projected growth requirements for the next three to five years should be estimated. Also incorporate nonlinear growth requirements that arise in order to support ad hoc growth.

Image Capture the database authentication requirements—mainly the administrator and user authentication and the requirement of LDAP-based authentication mechanisms.

Image Capture the security authorization requirements of the database schema—mainly the database roles, users, and privileges.

IOPS and MBPS

IOPS (input/output operations per second) and MBPS (megabytes per second) are the metrics that measure performance of storage devices. IOPS measures the small I/O performance, which is a typical characteristic of OLTP workloads. The objective of measuring IOPS is to see how fast the storage is capable of writing (or reading) transactions that span a small number of disk blocks. IOPS can be measured for reads, writes, or mixed read/write transactions, with a typical 4K or 8K block size.

The MBPS metric measures the storage bandwidth utilization, which is the total number of bytes flowing through the storage device or the network per second. The MBPS metrics are associated with large I/Os that prevail in data warehousing and reporting environments. These environments have workloads that scan large volumes of data from the storage.

Discovering the Oracle Database Environment

Capturing performance and capacity information from Oracle Databases is easy when compared to non-Oracle databases. The Automatic Workload Repository (AWR) contains all performance-related metrics for the database instance, as well as the CPU utilization metrics of the database server. The AWR report should be run against two points in time of peak system activity, and the results of these AWR reports will be used to calculate the current database performance numbers.

The metrics that you need to derive from the AWR reports are the storage performance metrics and the IOPS and MBPS the current storage is able to deliver. You will use these metrics later to calculate the right-sized Database Machine configuration that will provide an equivalent or better performance than the current source system.

In order to calculate these numbers from the AWR, locate the instance activity section and look up the “per second” column of the following metrics:

Image Physical read total bytes

Image Physical write total bytes

Image Physical read total IO requests

Image Physical write total IO requests

Image Physical read total multiblock requests

Image Physical write total multiblock requests

Since multiblock requests are associated with large I/Os, they need to be subtracted from the total IO requests in order to get the IOPS for short I/Os. Use these formulas for calculating MBPS and IOPS:

MBPS = (physical read total bytes) + (physical write total bytes)

IOPS = (physical read total IO requests) - (physical read total multiblock requests) + (physical write total IO requests) - (physical write total multiblock requests)

In addition to these calculations, you should:

Image Capture the database feature usage from the current system, mainly the use of partitioning, OLTP compression, SecureFiles compression, and the use of LOB segments, if any. If the current database does not use partitioning or compression options, then you have the opportunity to gain their benefits by introducing these features as part of the migration. Also, the best practice is to convert LOBs to the SecureFiles storage. Some of the migration steps discussed later in this chapter will take care of these types of conversions, so you should decide if you want to implement the best practices during the migration process so the appropriate method of migration can be narrowed down.

Image Calculate the metrics related to disk space utilization from the database instance. These metrics are easily available through Oracle Enterprise Manager or the database dictionary views. The metrics of interest are

Image The total space allocated for all the tablespaces and percent of this space utilized

Image Space utilized by compressed objects (if any)

Image Total SecureFiles storage space (if any)

Image Total LOB segment space (if any)

Image Capture the source database platform specifications, mainly the details on the database versions, use of ASM, ASM allocation unit size (AU_SIZE), ASM diskgroup redundancy levels, and the database extent sizes. These values will help determine the right migration strategy in the later stages.

NOTE

The Oracle IO Numbers (ORION) tool can be used to calculate performance metrics of the source database storage subsystem. The metrics captured by ORION mimic the actual Oracle Database I/O performance, since ORION uses the same I/O libraries and kernel code as the Oracle Database software. The ORION tool does not need the Oracle Database software to function and can be installed on both Oracle and non-Oracle source systems, provided it is certified on the platform.

ETL, Data Loads, and Extracts Environment

Capture all ongoing data loads and extracts occurring on the source system, whether they are file-based or have other databases as direct sources or destinations. Capture the tools and processes used to perform these loads and extracts, whether they are packaged ETL tools or custom scripts. This information is needed to estimate the effort needed to refactor these processes to work against the Database Machine.

When capturing this information, you should:

Image Identify the sources for the data loads and destination systems for the extracts and their connection methods into the database, whether through SQL*Net, JDBC, or ODBC.

Image Identify any replication tools used either to feed into the source system or to replicate changes from the source to external systems. Investigate the current replication tool’s support for Oracle Databases.

Image Capture the source code metrics for the custom scripts used for loads and extracts. Document the tools or scripts used to perform the loads and extracts and whether they are proprietary to the third-party source technology.

Image Capture the ETL tools used for loads, extracts, or ongoing ETL; their complexity; connectivity and driver requirements; source database-specific features used in ETL; and the tool support for Oracle databases.

Surrounding Applications Discovery

It is critical to discover the applications that will be affected when the source database is migrated to the new platform. The number and types of applications affected and the degree of the impact will determine the applications migration strategy. Once the applications and their details are known, the level of effort involved in modifying these applications to work with the Database Machine can be estimated. To collect this information, you should:

Image Capture the complexity of the applications connecting into the source system, whether they are using database- and vendor-specific technologies, and the support of these tools and applications with Oracle Database 11gRelease 2. Focus on capturing source code metrics on the portions of code that comprise database-related logic.

Image Capture the use of vendor-specific SQL functions or logic within the application code. The more complex the vendor-specific SQL being used in these applications, the harder the translation becomes.

Image Document the drivers used for connectivity—native, JDBC, ODBC and versions—and any specific requirements for running with particular drivers and versions.

Image Identify the application servers connecting to the source database, whether they are based on Java or .NET, application server connection pools and mechanisms used, and Oracle Database 11g Release 2 support provided by the application servers.

Image Capture off-the-shelf reporting and business intelligence tools used, along with the tools’support for Oracle Database 11g Release 2 and any custom SQL statements used within or generated by the tool that are proprietary to the source database.

Backup and Recovery Architecture

Although you may end up using different tools and techniques for performing backup and recovery of the Database Machine, you have to capture the backup and recovery architecture, methods, and SLAs from the current environment to ensure that you do not lose any existing capabilities in this area. Whether the current strategy is to back up to tape directly or to disk or you are using a hybrid disk and tape strategy, the idea should be to utilize the existing policies and infrastructure when configuring backups in the Database Machine environment.

To collect this information, you should:

Image Capture the recovery time objective (RTO) and recovery point objective (RPO) from the current SLAs.

Image Consider the requirement to connect external tape drives or disk-based backup storage systems to the Database Machine. The Database Machine connects to the outside world using the built-in InfiniBand and Ethernet ports.

Image Investigate if third-party snap mirroring technologies are used to perform backups and also to stand up test or development environments. These technologies will not work on the Database Machine, as the third-party vendors cannot directly access the Exadata Storage Servers and there is no support for hardware RAID. Therefore, it is important to have an alternative solution to the snap mirror if this capability is needed on the Database Machine.

Image Oracle RMAN is the only tool capable of performing backups of the databases residing in the Database Machine. For further details on this topic, refer to the section on backup and recovery best practices in Chapter 6.

Database Machine Capacity Planning and Sizing

The topic of capacity planning is huge in itself, so we will focus on a few important considerations that will help size the Database Machine. The best practice for sizing systems is to size for performance and then for capacity. Sizing for capacity is easier when compared to sizing for performance and more commonly practiced in the real world. Most people do not go the extra mile to size for performance, and this oversight is a major cause of performance issues that arise in the field deployments.

The metrics of the current system obtained through the discovery process will provide the information needed to size the Database Machine. As mentioned earlier, it is quite essential to have the discovery phase done right, since all the activities, including the migration methods and sizing, are dependent on the metrics captured in this process.

The process of sizing a Database Machine is focused on sizing the Exadata Storage Servers and the database nodes for capacity and performance. Once you have the sizing metrics that you need, you will match them against Tables A-1 and A-2 from the appendix and come up with the flavor of the Database Machine that will satisfy the capacity and performance requirements.

Sizing for Capacity

The metrics of the current system that will help size the destination system for capacity are mainly database space utilization metrics. These metrics include data and index space utilization, the planned growth (linear), and unplanned growth (nonlinear) metrics. Determining unplanned or nonlinear growth is a bit tricky, since the Database Machine is a consolidation platform and the rate at which an enterprise chooses to consolidate multiple databases and applications to the new platform may be unknown. Once multiple lines of business (LOBs) start seeing benefits, the rate of consolidation to the Database Machine can grow rapidly.

Once the capacity data is obtained, you can use it to calculate the rack size of the Database Machine that satisfies the capacity requirements. The Database Machine comes prebundled with one of two flavors of hard drives, the 600GB High Performance SAS (HP-SAS) or 2TB High Capacity SAS (HCSAS), and by choosing one of these flavors of drives in a full-rack Database Machine, you can get a total raw space of 100TB or 336TB, respectively. However, this is the raw capacity, and the advertised usable data capacity (uncompressed) of the full rack, after taking out ASM mirroring and other system overheads, comes down to 45TB for HP-SAS and 150TB for HC-SAS. Use Table A-1 to match the total space you need against the capacity of each configuration to come up with the Database Machine of the right size that satisfies the capacity requirements.

Since you cannot mix and match HP-SAS drives and HC-SAS drives in the same Database Machine, the decision on the flavor of drive to use should be based on performance specifications of the drives and not just capacity. We will discuss this topic in the next section.

When factoring OLTP compression into the space requirements, you might be able to get a twofold to threefold savings on storage for OLTP compressed data. Exadata Hybrid Columnar Compression (EHCC) query compression can potentially give about a 10-fold savings, and EHCC archive compression about a 15-fold savings. The actual savings are, of course, dependent on the data repetitiveness and proximity of repetitive values in each compression unit (CU) of a table. It is a best practice to implement compression on the Database Machine, and if you like to implement this best practice, then consider compressing during the migration process. Otherwise, compressing it once the data has been loaded will require a data reorganization operation on the designated tables, which may incur additional outage.

The DBMS_COMPRESSION package supplied with Oracle Database 11g Release 2 can be used to estimate compression ratios achievable by OLTP and EHCC methods prior to actually compressing the data. The package will estimate compression ratios by sampling the data being compressed. For estimating compression ratios on databases versions 9 i through 11g Release 1, you can use the Oracle Advanced Compression Advisor package with its DBMS_COMPRESSION procedures. More detail on Advanced Compression Advisor is available on the Oracle Technology Network.

Another way to reduce the storage footprint if you are using large objects is to take advantage of SecureFiles compression and deduplication, which is a feature available with the Oracle Advanced Compression Option. You should be able to get two- to three-fold compression for large objects (LOBs) that are stored using this feature. If you are using LOBs (non-SecureFiles based) in the current database, you should definitely consider converting them to SecureFiles. The deduplication, compression and other new features deliver storage savings as well as performance improvements, as compressed objects require less I/O and some internal improvements also contribute to better performance.

In order to estimate compression ratios on non-Oracle databases before actually migrating them over to Oracle, consider loading a sample of the source data into an Oracle Database 11g Release 2 and running the DBMS_COMPRESSION package. You do not need to load the complete table, but only enough to estimate the compression ratio.

Sizing for Performance

The performance metrics of the current system that helps size the Database Machine are mainly the I/O per second (IOPS) and the MBs per second (MBPS). The IOPS are typically used for sizing OLTP workloads, whereas the MBPS measurement is used for sizing data warehousing workloads. The Database Machine has sufficient CPU cores to handle all types of real-world workloads; however, if for any reason the application loads demand the need of more cores than what come preinstalled in the flavor of the machine, then the CPU sizing should also be taken into consideration. This might be the case with applications that perform extensive processing on the database using PL/SQL or server-side programs, or databases that have ongoing high-volume data loads coming in through the InfiniBand network. Data loads coming in through high-speed networks are generally known to bottleneck the CPU.

Keep in mind that when you size for all these metrics, you need to match up with the flavor of the Database Machine that will satisfy all four metrics we discussed so far (not just one or two), mainly the IOPS, MBPS, CPU, and storage requirements. For example, based on your IOPS, storage capacity, and CPU requirements, you have determined that a Database Machine X2-2 Half Rack is enough to cater to your needs. However, if you determine that in order to satisfy the MBPS requirements you need an X2-2 Full Rack, then the choice should be to go with the Full Rack and not the Half Rack.

Considerations for choosing a High Performance SAS- or a High Capacity SAS-based Database Machine will depend on the performance specifications of the drives and the storage requirements of the database. The performance of an HP-SAS drive is, for the most part, twice as fast as an HC-SAS drive, although your mileage may vary, based on the particulars of your workload. Remember that there is more to I/O performance than simply the seek times to retrieve data from disks.

Use Table A-2 from the appendix to match up your IOPS, MBPS, and CPUs to come up with the right-sized Database Machine to fit your performance requirements. The last step will be to compare the Database Machine configuration you select when sizing for capacity, with the configuration needed to achieve the required performance, and use the bigger machine of the two to provide you both the performance and the capacity that will be required from the Database Machine.

Exadata Flash Cache Considerations in Sizing

OLTP workloads have a concept of active data, or the working set, which basically is the portion of the data that will be actively accessed for the majority of the time, as opposed to the size of the entire data stored in the database. If this working set can be stored in some sort of cache or main memory, then the I/O performance will be extremely fast, since the data is accessed from the cache. Depending on the underlying technology used for the cache, the speed-up can be 20 to 100 times faster than fetching the same data from the hard disks.

As discussed in Chapters 3 and 5, the flash cards in the Exadata Storage Servers are configured as Flash Cache by default. The Exadata Flash Cache is utilized by the sophisticated, database-aware, LRU-like caching algorithm, which results in the most-used blocks staying pinned in the Flash Cache. The total uncompressed usable data capacity available in a full rack containing HC-SAS drives is about 45TB, and the total capacity of Flash Cache is 5.3TB, which translates to about 12 percent of the user data capacity. Based on this calculation, if the working set of an OLTP system is less than 12 percent, most of the working set can potentially be stored in Flash Cache. When the read I/O requests are targeted on the working set, the Exadata Storage Servers will perform the I/O using the Flash Cache rather than the hard disks. Exadata Flash Cache IOPS performance is 20 times faster than hard disks and MBPS performance is 2 times faster than hard disks, which will enable the Exadata Storage Servers to deliver extreme improvement in performance when the data is accessed from the Flash Cache.

NOTE

The effective capacity of Flash Cache can be increased tremendously by using EHCC or OLTP compression. For example, an EHCC compressed table with a 10× compression factor will store about 53TB of uncompressed user data into 5.3TB of Flash Cache.

The benefits of Flash Cache can be factored in when sizing for performance. However, this is not an easy task, since the size of the working set is hard to calculate and you can only go with assumptions. A few studies suggest that the typical working set for OLTP systems is about 10 percent of the total data size, and if you go with this, the Exadata Flash Cache should be able to hold a typical working set quite comfortably.

Choosing a Migration Strategy

Once the right information is gathered from the current source systems in the discovery sessions, the appropriate migration strategy can be chosen. Typically in a situation dealing with large and complex database migrations, a single migration strategy may not be the best option, but rather a combination of multiple strategies that best suits the specific platform being migrated should be considered.

The following sections discuss a variety of database platform migration strategies that can be used in real-world migration projects.

Replatform with Refactoring Strategy

A replatform migration approach for databases is usually focused on replacing the existing database server platform with a bigger and faster platform like the Exadata Database Machine. This replacement is done while keeping the surrounding applications and their interfaces with the database system intact. In a replatform of non-Oracle databases to the Database Machine, the legacy database software will be replaced with the Oracle Database software.

When you replatform the database server from a non-Oracle source system, you need to modify the surrounding applications, the database stored objects, the server-side scripts, and the database schema in order to make them functional against the new environment. These modifications are called refactoring.

The refactoring strategy can approach the task at different levels. The level with the least amount of intervention would be to make the minimum number of changes to the application and the code in order to make them function just like they did before. The next level would be to make additional changes that will implement the best practices of the new environment. Another level would be to implement or fix any code inefficiencies so that the code is more optimized and performs with higher efficiency. Other higher levels of refactoring will start resembling the rearchitecting strategies, which are discussed in the next section.

Even when you are migrating from Oracle database sources, you might need some level of refactoring to the surrounding applications, the database schema, and the server-side scripts. The refactoring in this case will be to implement the best practices of the Database Machine and of the Oracle Database 11g Release 2. Although implementing best practices is not a requirement, the process is generally well worth the effort, as it can fully unlock the potential of the Database Machine environment and to get the extra improvements in performance.

Estimating the costs of refactoring will depend on the lines of code (LOC) or the number of objects considered for refactoring, along with the complexity of the changes. Generally, the higher the LOC and degree of complexity of the code, the more resource consuming and complex the refactoring will be. When moving from non-Oracle systems, refactoring of the database schema can be performed using automated tools, which will be discussed in the migration steps. Refactoring of other objects will be mostly manual, and the tools will be of little or no help.

Application Rearchitecting Strategy

Evolution of software is quite natural, and over time, a software technology or application will evolve either for maintenance purposes or to meet additional business requirements. Rearchitecting involves a rewrite of the application and other processes to either make them more maintainable and optimized or to be in sync with the latest and greatest technologies. Sometimes, rearchitecting becomes a necessary evil when either the technology in use is outdated or out of support, or if the application was badly designed to begin with, resulting in poor performance and low reliability, or to conform to enterprise-wide IT standards that have evolved over time.

Rearchitecting strategies are more complex than refactoring and require a thorough analysis of the systems and the applications. With refactoring, you are simply reorganizing existing components, while the rearchitecting process allows you to redesign and implement entire portions of functionality, potentially including new capabilities in the product. Businesses usually undertake rearchitecting projects when there is a renewed direction from upper management to implement a specific tool or technology, or as part of a platform migration project. Migration projects provide an opportunity to modify legacy applications when the organization budgets the resources, time, and the expense.

The process of rearchitecting an application involves manual conversion and is similar to a brand-new implementation in many ways. Rearchitecting an application requires exhaustive and time-consuming test cycles and is the most complex migration strategy when compared with replatforming and refactoring. Rearchitecture projects do not necessarily provide a rapid payoff, but can have a long-term return on investment that is best determined by the business and the IT departments.

Consider an example of a rearchitecture scenario for migrating Teradata BTEQ scripts to the Oracle-equivalent components. Teradata BTEQ is a type of server-side script and is mostly used for performing ETL into the Teradata environment. The appropriate technology to use in Oracle to perform ETL is Oracle Data Integrator (ODI). ODI provides point-and-click interfaces for creating data flows and mappings rather than hand-coding the ETL logic, which is what BTEQ essentially does. The benefit of using ETL tools for performing data transformations (as opposed to using scripts) are multifold, some of which are improved productivity and maintainability, better manageability of processes, easier configurability of ETL, and flexibility to changes. A rearchitecture of these server-side scripts might be a necessary component of a migration effort, but as you can see, will pay off with added benefits for the short and long term.

Phased Migration Strategy

A phased migration strategy is essential in large-scale migrations and a necessity that ensures a seamless transition to the new environment. This strategy uses a divide-and-conquer approach in which a large system is divided into smaller groups and the groups are migrated one or more at a time in a phased manner, resulting in incrementally enabling certain functionality on the new system. You would group components and applications together based on their self-contained nature, the degree of interdependencies between groups and within groups, sharing of common database objects, and the degree of impact and risk to the business during the migration. The order in which phases are executed should reduce the overall impact and cost of the complete migration.

With large database implementations, sometimes the business continuity and availability requirements require parallel runs of the legacy and the new platforms for a time until the new platform is proven to be ready for production. During the parallel run period, the loads and the inputs will be dual-feeding the two systems, and full cycles of all processes need to run on both of the systems in parallel. This parallel-run strategy adds to the cost and complexity of the migration, but is sometimes a necessity when the business demands zero-downtime migrations and testing strategies require production-like loads.

Automated Migration Strategy

Automating the steps of the migration process using software-based migration tools is an important component for the overall migration strategy. Automation can help to bring down migration costs, and thereby help accelerate the return on investment of the Database Machine deployment. The steps that can be fully or partially automated in a refactoring methodology are the database schema migration steps and the data migration steps.

When automated migration tools are unable to convert the object to the equivalent target technology, then the tool will flag the object as fallout. The fallouts of automated migrations will have to be dealt with manually, which adds to the costs and the timelines. Hence, it is important to estimate the fallouts early in the planning phase. The best approach to accurately estimate fallouts would be to implement a proof of concept that would evaluate the migration tool against a subset of the source database environment and prove its effectiveness and capabilities against the schema that is specific to the source database system. The code complexity metrics captured during the environment discovery process are also useful in estimating automated conversion rates and fallouts at a high level.

If an automated migration is the most efficient way to proceed, then one should ask why not use automation in each and every migration project from the start? The answer depends on the migration tool’s limitations for the types and versions of the source and destination databases and their capabilities to migrate the different types of components. For example, data models are the easiest components to automatically translate, and most tools can easily incorporate this translation. Stored objects and SQL are harder to migrate, and fewer tools will support such capability.

The Oracle tools that are useful in automating migrations are Oracle SQL Developer for database schema and data migrations; Oracle GoldenGate (OGG) for data replication; and Oracle Data Integrator (ODI) for data migration, transformations, and replication. We will discuss the capabilities of these tools in detail in the later sections.

Migration Steps for Non-Oracle Databases

As discussed earlier, the recommended approach for migrating non-Oracle systems to the Database Machine is by using the refactor-with-replatform strategy. The typical steps required to implement this strategy are outlined here:

Image Database schema migration

Image Server-side scripts migration

Image Data migration and synchronization

Image Using automated tools for migration

Database Schema Migration

Migration of data models from non-Oracle sources can be automated using software tools, and this step is usually the easiest in the migration process. The standard data types, data storage structures, and data access structures existing in the source databases can easily be translated on a one-to-one basis to the Oracle equivalent objects. Challenges exist when the source database uses advanced types and objects that are proprietary to the source vendor technology. These objects are prone to fallouts since the tool might not be able to translate to an Oracle-equivalent object.

There are two scenarios of fallouts in a schema migration. The first scenario is when the fallouts happen because the respective feature or type is not named the same in both environments. To fix the fallout, you would need to manually convert these objects, but there still would be a one-to-one conversion. An example of this scenario is the use of Teradata Join Indexes or DB2 Snapshot Views in the source systems. The equivalent technology in Oracle that provides the same functionality is the Oracle Materialized View object. If this translation is not programmed into the tool, it cannot perform the translation.

The second scenario is when the technology or feature used on the source system does not have a direct translation on the Oracle platform. In such a case, you need to investigate 1) how much work is involved in refactoring the object or the code to accomplish the same function on the Oracle platform; and 2) the repercussions of not moving the object or the code over to Oracle. Depending on the outcome of this investigation, the work involved in fixing the fallouts can be estimated.

An example of the previous scenario is during migration of a Teradata database to an Oracle database and encountering a Teradata hash index on the source, which is an object that does not have an exact equivalent on the Oracle platform. The nearest equivalent on Oracle for hash indexes is the Oracle B-Tree index. However, the approach in this case should be to abandon the hash index altogether in the migration. Once the migration is complete and you are testing the migrated platform, you can use Oracle SQL Access Advisor (part of the Oracle Enterprise Manager Tuning pack) to analyze missing index structures that can be created on Oracle to improve performance.

Server-side Scripts Migration

Currently, the migration tools provided by Oracle do not support automated conversion of server-side scripts. You must convert them manually or investigate third-party tools that might be able to provide this functionality.

You should also consider rearchitecting the server-side scripts into a business logic layer that can be implemented using middleware tools or other best practices available for the tasks that the scripts perform. However, you need to evaluate the costs of a rearchitecture versus a manual conversion. Rearchitecting using ODI or other tools might be the way to go, especially when you consider long-term maintenance costs of custom scripts and the issues that arise for supporting manually written code.

The metrics captured in the discovery phases will determine the total lines of code of the server-side scripts and their complexity. This should give you a rough estimate of the rearchitecture or the manual conversion effort involved.

Data Migration and Synchronization

The source system requirements listed next help in determining the appropriate data migration strategy:

Image The downtime admissible on the source database and the applications as per the current SLAs.

Image The requirement to replicate data between the current source database and the Database Machine. This is required when the migration strategy dictates a phased migration of components or to run a parallel production environment on the Database Machine.

Image The effective throughput of the network between the current source and the Database Machine.

Image The total size of the source database that will be migrated to Oracle.

Based on these metrics and requirements, there can be two options for moving data to the Database Machine. The first option involves shutting down all the activity on the current source database, performing a data dump of the source database into flat files, and then loading the dump files into the Oracle database on the Database Machine. This method incurs downtime—how much will depend on how fast the data can move from the source to the destination system.

The second option is to replicate the data, in real time or in near real-time, from the current source database to the Oracle Database and essentially run a parallel production system on the Database Machine. The replication process can be performed using software tools or custom scripts. Migrating using this option incurs zero to minimal downtime, and it allows you to run parallel production environments, which are quite useful for testing and for phased migrations.

The migration strategy can also involve combining the two approaches, with each approach working on a different set of data, based on the method that works best for each set. However, both the options discussed previously have a common step—the initial data migration step. In the first migration option, the initial data migration would also be the final data migration. In the second option involving replication, the initial load will be the method used to instantiate the destination system with the initial copy of the source database before starting the incremental migration process.

Initial Data Migration

The initial data migration process will consist of two main steps. The first step is to extract the source database into a temporary staging area, and the second step is to load from the staging area to the Database Machine. In order to accelerate the overall data migration process, the extract and the load steps can be parallelized, as long as the steps work on different data sets and excessive parallelism does not throttle down the system or network resources.

In a real-world migration scenario, the flat file extracts from the source system tend to be the bottleneck and not the network throughput or the data loads. The bottleneck is mainly due to the performance limitation of the source database storage system, the CPU limitation on the source platform, or writes to the temporary staging location.

It is not easy to estimate the time it takes for the initial data load, since multiple components exist in the path of the data and each component will have different throughput. It is important to remember that the overall data transfer throughput is limited by the slowest link in the data flow. The best way to estimate data transfer times is to measure the actual performance of a test run on the subset of data, under similar network connectivity like it would be in production.

The initial data migration process will push large amounts of data over the network connecting the source and the destination systems, and in order to get the best possible throughput, it is important to factor in the network throughput. The best throughput possible is on the Database Machine side using the InfiniBand network. Refer to Chapter 4 for the different mechanisms available to connect the Database Machine to the outside world.

The best practice for extracting data from the source database is to utilize the data dump utilities that come bundled with the source database software installation. These utilities are the most efficient and supported methods to pull data out of the source system. The loads into Oracle should be performed using the best practices of data loading such as the external table load.

Initial loads can be performed by using custom extract and load scripts, and by using automated tools such as Oracle Data Integrator, Oracle GoldenGate, or Oracle SQL Developer. We will discuss these products and their capabilities in detail in a later section.

Incremental Data Migration and Replication

Incremental data migration is the process of synchronizing the source and the destination systems by moving only the changed data (instead of the complete set) since the last synchronization process. When the incremental data migration happens in real time, it is called real-time replication.

The key requirement of the incremental data migration is the ability to capture changed data from the source system using an appropriate change data capture (CDC) strategy. Some of the CDC methods that can be used for capturing changes are

Image Data model-based CDC In order for the source database to support a data model-based CDC, the tables on the source database should be able to capture the timestamp of all the DML activity as it occurs on the source system. This is usually accomplished by using a column that stores a row modified (or inserted) timestamp for every row. The incremental extracts can use this timestamp to pull out only the changed rows since the last synchronization process. However, deletes will have to be captured in a separate log table using mechanisms such as delete triggers.

Image Trigger-based CDC Triggers can be defined on the source tables to execute when insert/update/delete activity occurs allowing it to take further actions. The triggers can be programmed to capture changes in a separate log table, or even propagate the changes directly to the destination database using queuing-based mechanisms.

Image Log-based CDC Almost all databases have some sort of journalizing capability. Journalizing is the process of capturing database transactions (or modifications) in a log file, which is stored in a format that is proprietary to the database vendor. Most database vendors provide an interface to these log files using an Application Programming Interface (API). By mining the log files using the API, it is possible to re-create the SQL as it occurred on the source database and then execute it against the destination database, at the same time preserving the transactional integrity of the DML. Real-time replication tools are based on the ability to mine database transaction logs in real time.

Based on the CDC strategy chosen, the methods that can be used for performing the incremental loads can be determined. Broadly speaking, the incremental load methods can be categorized into near real-time and real time.

Image Near real-time incremental loads When the CDC is performed by using the data model approach, then the incremental loads can, at best, be done in near real-time. Each run of the incremental load process will synchronize a batch of transactions that will be queued up to be processed. How “near” to real time the loads are is determined by the time it takes the incremental load to synchronize a batch without causing a measurable impact on the source system. The methods available for performing near real-time incremental loads are using custom extract and load scripts and ETL tools like Oracle Data Integrator.

The custom scripts and ETL mappings created for performing initial loads can also be used to perform incremental loads by modifying the SQL to extract only the changed rows since the last run instead of the complete data.

Image Real-time incremental loads Using the log-based CDC approach, changes can be replicated as they occur on the source system in real time to the destination system. The Oracle tools that support real-time replication using the log-based CDC approach are Oracle GoldenGate and Oracle Data Integrator. Both tools can perform replication from heterogeneous sources to the Oracle database. In the next section, we will discuss the capabilities and guidelines that will help you choose the right tool.

Using automated tools to perform real-time incremental loads is the most preferred method for performing real-time synchronization of data between the current source and the Database Machine. This method does not need the data model to support change data capture, as the tools are capable of capturing changes using built-in mechanisms. Also, the work involved in the setup and configuration of the tools is much less when compared to the custom scripts method.

Using Automated Tools for Migration

As mentioned previously, automating portions of the migration process can deliver a faster, smoother migration. This section will describe some of the tools that can be used for automating migration tasks.

Oracle SQL Developer

Oracle SQL Developer is a GUI-based tool that can be used for the purposes of database development against a variety of database sources. SQL Developer has support for many different SQL-based tasks, such as creating and browsing database objects, running SQL statements and scripts, and creating and debugging PL/SQL code. It has a data modeling component and provides the capability to migrate third-party databases to Oracle. SQL Developer also has the ability to create graphical reports based on the results of SQL statements. These features enable SQL Developer to perform a variety of tasks using point-and-click interfaces, which in turn will enhance user productivity.

In the context of migrations, Oracle SQL Developer has support for migrating non-Oracle database sources, including Microsoft SQL Server, Microsoft Access, IBM DB2, MySQL, Teradata, and Sybase. The tool allows you to perform automated migrations of database schema and the data. SQL Developer supports data migrations by creating data movement scripts for offline migration, and can also be used in data migration online by connecting into the source and destination systems simultaneously and perform a data move in real time.

More capabilities and features are continuously added to SQL Developer, so before you plan on using it for the migration, verify the up-to-date support for the source databases and versions and its capabilities, especially with respect to stored object migrations.

SQL Developer for Schema Migration

Migration of database schemas from non-Oracle databases can be performed using Oracle SQL Developer. Schema objects such as the data models, stored procedures, and triggers can be seamlessly converted to the Oracle-equivalent objects by a simple press of a button. SQL Developer supports a variety of third-party database sources, and the specific source object types that can be migrated automatically by the tool should be verified prior to performing the migration.

Oracle SQL Developer uses a database schema as a repository in which it will store all the source database schema definitions, the converted schemas, migration reports, and conversion logs. The database schema migration process in SQL Developer consists of the following steps:

Image Capture the current model into the SQL Developer migration repository.

Image Translate the captured source database model to the Oracle-equivalent model

Image Generate the Oracle-equivalent Data Definition Language (DDL) statements from the model.

SQL Developer has a schema capture mechanism that can capture the source database schemas, either directly by connecting to the source (online) or indirectly by the use of source database-specific capture scripts (offline). The online method will capture the schema by reading the data dictionary of the source database. The offline method will effectively do the same thing and can be used when you do not have the privileges or the network access to connect to the source system. SQL Developer can generate source-specific capture scripts, which can be shipped to the source system and executed using appropriate credentials. The output generated by these scripts can be shipped back and loaded into the SQL Developer repository to create the source model.

SQL Developer has built-in reports that will display a summary of all the conversions it has performed, and it is able to do this by querying the migration repository. These reports can be run at the end of the conversion process to evaluate if there were any fallouts or warnings during the conversion. Also, at each step during the conversion, the fallouts and warnings will be displayed in the SQL Developer window. This will help you monitor the progress of the conversion and make appropriate decisions when encountering issues.

SQL Developer has built-in intelligence to handle a few types of fallouts. When it finds an object type on the source database that is not named exactly the same in Oracle but it functions similarly, it will translate it to the Oracle-equivalent object. A good example of this is when you have DB2 snapshot views on the source DB2 database. The equivalent of DB2 snapshot views in Oracle is Materialized Views, and SQL Developer will automatically convert DB2 snapshots to Materialized Views.

Oracle SQL Developer for Data Migration

Oracle SQL Developer is capable of creating extract and load scripts that can be utilized to perform the initial data migration. The scripts generated by SQL Developer will incorporate the source-specific utilities for performing extracts and the Oracle-specific utilities for performing the loads.

In the absence of SQL Developer support for the type of the source database or the version, the data extract scripts need to be created manually. The time and effort to create the scripts need to be factored into the migration process, and for the most part, this will be proportional to the number of source database objects being migrated. However, SQL Developer can still be used for creating the load scripts on the Oracle side, if the format of the data extracted conforms to the SQL*Loader control files created by SQL Developer.

SQL Developer can also be used for moving data online between the database systems it supports. It does this by connecting into the databases using the JDBC drivers and selecting the data out of the source and inserting it into the destination. However, using SQL Developer for performing online data moves is not efficient for migrating large data sets.

Oracle GoldenGate

Oracle GoldenGate provides real-time capture, transformation, routing, and delivery of database transactions between Oracle and non-Oracle databases. GoldenGate facilitates high-performance, low-impact data movement in real time to a wide variety of databases and platforms while maintaining transaction integrity.

Oracle GoldenGate supports a variety of use-cases, including real-time business intelligence; query offloading; zero-downtime upgrades and migrations; disaster recovery; and active-active databases using bidirectional replication, data synchronization, and providing high availability. In the context of migrations to the Database Machine, Oracle GoldenGate can be used to support the following use-cases:

Image Synchronization of parallel production or dual-active environments

Image Phased migration requirements in which portions of the database are made available on one or both systems, with single or bidirectional replication

Image Zero-downtime migrations from both Oracle and non-Oracle source systems that are needed for applications requiring a high level of SLAs

Image Instantiation of destination databases by performing initial loads from the source

Prior to starting a heterogeneous data replication project, verify GoldenGate’s support for the non-Oracle source system and the specific version you are interested in migrating. If GoldenGate is selected as the tool of choice, refer to the best practices for configuring GoldenGate for the Database Machine as discussed in Chapter 6.

Oracle Data Integrator

Oracle Data Integrator (ODI) is a data integration tool that integrates heterogeneous systems and platforms using a web services-based architecture. It also provides E-LT capability by utilizing the source or the destination platform to perform the transformations, rather than using a separate transformation engine. ODI supports a variety of sources and destinations, and connects to the database systems using the native drivers or JDBC/ODBC.

ODI has a framework of knowledge modules, which are plug-ins that encapsulate the best practices for loading, transforming, and integrating data for a specific source or target. Using knowledge modules, ODI is able to integrate with a variety of sources and utilize the source-specific and optimized techniques that are efficient for performing E-LT on the source platform. This feature is useful in the context of data migrations because it provides a highly efficient data transfer mechanism by merely using point-and-click interfaces, thus eliminating the use of custom extract and load scripts. Prebuilt knowledge modules are available for a variety of source systems that help you perform tasks such as data extraction, loading, and change data capture (CDC).

ODI is also useful for performing incremental data loads in near real-time and replication in real time between the source and destination databases. The method of CDC used will determine how much in real time the modified data can be propagated. Following are the mechanisms available with ODI for change data capture:

Image Change data capture using the source data model This method is appropriate if the tables involved in CDC have columns that store a timestamp of the changed row when it gets modified or inserted.

Image Change data capture using ODI Knowledge Modules This method uses techniques such as triggers or mining of database logs to capture changes.

Image Change data capture using ODI GoldenGate Knowledge Modules ODI and Oracle GoldenGate can be connected together to provide greater optimization and extensibility. Specifically, these Knowledge Modules will leverage the power of Oracle GoldenGate for its real-time, log-based CDC.

Once the changes are captured, ODI can use a publish-and-subscribe model for propagating changes to the destination database in real time or near real-time (based on the capture method). The CDC Knowledge Module support for non-Oracle sources and their specific versions needs to be consulted prior to deciding on using ODI.

Third-party Tools and Services

Third-party tools exist in the market and are capable of migrating database schemas, database stored objects, and server-side scripts from non-Oracle systems to Oracle. The effectiveness of these tools needs to be investigated thoroughly against your database environment. A tool should be evaluated for its support for the source database type, the object types it can automatically migrate, and its automated conversion rates.

You should remember that each database environment is different, and the success and automated conversion rates of these tools from one successful migration cannot be generalized, since you might not get the same results in your environment as have been achieved in other scenarios. Using advertised conversion rates and success stories as a guideline might help, but cannot be taken as accurate estimates and applied as-is to your migration project. The best way to get an insight into the effectiveness of the tool is to perform a proof of concept with the vendor on a sample representation of your environment and observe the results.

There are third-party vendors that provide end-to-end migration services that encompass the use of automated tools along with manual migration services. These vendors utilize the best on-shore and off-shore implementation models (sometimes referred to as best shore) to considerably lower the cost of manual migrations. This book does not discuss third-party service providers or third-party migration tools.

Migration Steps for Oracle Databases

This section will highlight the important methods and steps currently in use for migrating Oracle Databases residing on non-Exadata platforms to the Oracle Database 11g Release 2 on the Database Machine. These methods will be discussed at a high level; for detailed step-by-step instructions, refer to the Oracle Database documentation and the My Oracle Support portal.

The considerations that determine the selection of the right method are the need to migrate with zero-downtime, implementing best practices during the migration, performing a full database migration versus a subset, the current database version, the OS version, and the database platform being migrated.

The migration methods can be grouped under two main categories—physical migration methods and logical migration methods.

NOTE

At the time of writing this book, the Solaris-based Database Machine configurations were not offered by Oracle. Hence, the steps for migrating Oracle databases to a Solaris-based platform are not discussed in this chapter. However, the migration methods available to you in such cases are similar to the ones highlighted in this section. But you need to consider the special requirements for the Oracle Solaris 11 Express target platform. Refer to the Oracle Support notes and the Oracle Database 11g Release 2 documentation for further details.

Migrating Using Physical Methods

Physical migration methods will perform block-for-block copy of the database blocks from the source system to the destination Database Machine platform. By the nature of their definition, these options are more rigid and inflexible in modifying the data within the blocks during the migration process. Physical migration methods do not allow the implementation of Oracle Database best practices during the migration process. If your strategy requires you to implement best practices as part of the migration, consider using the logical migration methods.

The key characteristics of physical methods are

Image Physical migration methods are generally faster than the logical migration methods, unless the logical method is moving far less data than the physical method.

Image The physical migration is of complete databases and not subsets. Moreover, these methods are not capable of performing a database upgrade during the migration. The only exception to this is the Transportable Tablespace method, which in fact can upgrade the database during the migration.

NOTE

It is recommended that you implement the Oracle Database best practices to unleash the full power of the Database Machine. The important best practices are ASM allocation unit size of 4MB, the minimum 4MB database extent size, using SecureFiles for LOBs, table partitioning, and database compression technologies such as OLTP and EHCC.

Migrating Using ASM Redundancy

Using ASM, it is possible to add disks to an existing ASM diskgroup and ASM will automatically initiate a rebalance operation and redistribute the data blocks on the newly added disk. Similarly, when a disk is dropped from an ASM diskgroup, ASM will automatically re-create the data blocks that were on the dropped disk by copying them from the mirrored location (provided the ASM diskgroup is mirrored and enough space exists). These add and drop steps can be performed while the ASM and the database instances are online, without affecting the availability of data.

This capability in ASM can be used for migration of databases to the Database Machine. The migration process would start by adding grid disks on the Exadata Storage Servers to the existing ASM diskgroups configured using the non-Exadata storage. You would then drop the disks on the non-Exadata storage and initiate a rebalance operation that re-creates the blocks (using mirrored copies) that were originally present on the dropped disk on the Exadata Storage Servers. These steps are repeated until all diskgroups are migrated over to the Database Machine.

The considerations for using this method are

Image The source system platform should be based on Linux x86-64, and the database and ASM version should be the same as the Database Machine (i.e., 11.2).

Image The source system’s ASM diskgroups must be configured with normal or high redundancy.

Image Application impact can be controlled such that the rebalance happens with minimal impact. This can be adjusted with the POWER clause of the rebalance operation.

Image Since ASM on Exadata needs to access the legacy storage, the legacy storage must be exposed to the Database Machine using NFS or iSCSI. If moving from a previous version of the Database Machine, then you should connect the two machines using the InfiniBand Network or the 10 Gigabit Ethernet.

Image Although the migration itself will incur zero downtime, in reality, there will be a slight downtime incurred in order to switch the database and ASM instances from the non-Exadata platform to the Database Machine.

Migrating Using Partition Roll-in and Roll-out

A partition roll-in and roll-out strategy deals with deleting the old table partitions out of the database when they are no longer required (roll-out) and adding the new data that is loaded to new table partitions (roll-in). Using this strategy, you can load the new data on the partitions that reside on the Exadata Storage Server, while the old data on the legacy storage will be deleted over time, based on the roll-out criteria. This method does not incur a data migration from the source to the destination database.

However, the main requirement for using this method is that the data model should be designed for the partition roll-in/roll-out strategy. The requirement seems stringent for any realistic migration use-case to fit in, but it might be possible to use this method in conjunction with other methods, like the Data Pump, which will move part of the data that does not support the partition roll-in/roll-out rules. We will discuss the Data Pump method in later sections.

The considerations for using this method are

Image The source system platform should be based on Linux x86-64 and the database should be 11.2.

Image The use of ASM on the source database is optional. The new table partitions will reside in the ASM on the Database Machine.

Image The legacy storage needs to be connected to the Database Machine using NFS or iSCSI.

Image Data migration is not needed in this option. There will be a slight downtime associated in order to switch the database and ASM instances from the non-Exadata platform to the Database Machine.

Image The source database data model should support partition roll-out and roll-in requirements as stated earlier.

Image Smart Scans across the new partitions residing on the Exadata Storage Servers and the old partitions residing on the legacy storage are not possible.

Image The legacy storage cannot be decommissioned quickly, and this could be an issue when justifying the purchase of the Database Machine.

Migrating Using Physical Standby Database

The Data Guard physical standby database is a block-for-block replica of the primary database. The Data Guard apply process is responsible for replicating the changes from the redo logs on the primary database to the physical standby database in real time, and it supports both synchronous and asynchronous mechanisms.

Physical standby databases can be used for performing database platform migrations. The method involves creating a Data Guard physical standby database of the current source database on the Database Machine and then performing a Data Guard switchover to the physical standby.

The considerations for using this method are

Image The source system platform should be a Linux-based or Microsoft Windows-based database with versions 11.1 or 11.2.

Image This method can incur less than a minute of downtime, which is required for performing a Data Guard switchover to 11g R2 database on the Database Machine.

Image If the source database version is 11.1, then you need to perform an upgrade to 11.2, either at the source or at the destination. The upgrade process will incur additional downtime.

Image The network bandwidth and throughput between the Database Machine and the source system should be adequate to handle the peak transaction rates. Otherwise, the downtime will be greater than what was mentioned earlier.

Image Consider this method when migrating from the HP Oracle Database Machine. The source database can be upgraded to 11.2 after moving it to the Database Machine.

Migrating Using Transportable Tablespaces

A transportable tablespace can be used for moving (or copying) a database tablespace along with all the objects that reside in it from one database to another. Using this method, tablespaces can be migrated from Oracle Database versions 10.1 and later, residing on any platform, to the Oracle Database 11g Release 2 residing on the Database Machine. The major steps involved in this method are to modify the tablespaces on the source database to read-only mode, to copy the data files of the tablespaces from the source to the Database Machine, and last, to plug in the tablespaces into the destination 11g Release 2 database.

The considerations for using this method are

Image The source system can be on any platform and OS, and the database version can be 10.1 or later.

Image This method allows you to upgrade from older versions of the database and to move between platforms of different endian formats.

Image This method supports partial database migrations at the tablespace level.

Image The time-consuming part of this process is the copy of the data files from the source to the Database Machine, and the time it takes for the file transfer will determine the downtime needed for the migration process. To minimize the downtime, consider connecting the source database through the InfiniBand or the 10 Gigabit Ethernet network.

Platform Endianness

Endianness is the method of representing multibyte data by computer systems. The endianness of a platform can be of two types—little endian and big endian. Little-endian systems store the least significant byte of the memory in the lowest address, whereas the big-endian systems store the most significant byte of the memory in the lowest address. The lowest memory address of multibyte data is the starting address of the data. Some examples of little-endian systems are Linux and Windows, and big-endian systems are HP-UX and Sun SPARC.

When moving data between platforms of different endian formats, the data needs to be converted so that the data integrity remains preserved. This is not an issue with logical migrations, since the SQL layer will be performing the endian conversion. But in physical migrations, the data is migrated at the block level as-is, and there needs to be an additional step that performs the endian conversion. The RMAN CONVERT command can accomplish this conversion.

Migrating Using Transportable Database

Using the transportable database method, an entire database can be migrated from one platform to another as long as the source and destination platforms have the same endian format. The method involves transferring the data files to the Database Machine, performing the RMAN convert operation if there is a change in the OS, and running the transport script at the end, which opens the newly transported database.

Some key considerations of using the transportable database method are

Image The source database version needs to be 11.2 and the source platform should be little-endian based.

Image When the source is on a non-Linux-based, little-endian platform, an additional step of converting the database files to Linux format is involved. The RMAN CONVERT command is utilized to perform this operation.

Image If using the RMAN CONVERT command to perform file conversion, then consider converting the files on the Database Machine after the transfer is complete. The conversion will be much faster because of the high performance and throughput of the Exadata Storage Servers.

Image The downtime with this method is dependent on how fast the files can be transferred to the Database Machine, the time to run the RMAN CONVERT command, and the time to execute the transport script.

Migrating Using Logical Methods

Logical migration methods will extract the data out of the source database and load it into the destination database, using utilities that interface through the SQL layer. Logical methods are more flexible than the physical methods, but generally require longer downtimes to perform the migration.

The main benefit of using logical methods is that they allow you to implement the Database Machine best practices during the migration process. If you implement the best practices after the migration, as the case will be when you use physical methods, the data needs to be reorganized using a process that could require another system outage.

The key characteristics of logical methods are

Image Logical methods enable migration of subsets of data and also allow you to implement the database best practices during the migration process.

Image Logical methods will allow you to perform a database upgrade during the migration process.

Image Logical migrations generally incur longer downtimes, with the exception of the GoldenGate and the Data Guard logical standby methods.

Migrating Using Logical Standby Database

The Oracle Data Guard logical standby database is a logical replica of the primary database, which is kept in sync using the Data Guard SQL Apply process. The SQL Apply process converts the transactions from the primary database redo logs to SQL statements, and the SQL is then applied on the logical standby database to replicate the transactions.

The logical standby database can have different physical characteristics than the primary database, and can even be on different versions. Using this capability, the source system can be migrated by creating a logical standby of the source on the Database Machine and then performing a Data Guard switchover to the logical standby database.

The considerations for using this method are

Image The source database can be 10.1.0.3 or later on a Linux x86-64 platform.

Image The outage time on the source database is the time it takes to perform a Data Guard switchover to the standby database.

Image This method allows you to perform an upgrade of the database during the migration process.

Image You can also use this method to upgrade from HP Oracle Database Machine to the Oracle Exadata Database Machine with minimal downtime.

Image The SQL Apply process does not support all data types and objects. If you have unsupported objects and data types in your source database, you should consider using methods like Data Pump for migration of those objects. For a complete list of unsupported data types and objects, refer to the Oracle Data Guard documentation.

Image Since the SQL Apply process requires more work on the target system than the Redo Apply used for physical standbys, you must make sure that the target machine can keep up with the changes sent to it, although with an Exadata Database Machine as the target, the chances of a failure in this area are slight.

Migrating Using Oracle GoldenGate

Oracle GoldenGate can replicate data between two Oracle Databases running on different versions and platforms. This feature allows you to migrate and upgrade a prior version of Oracle Database on any platform to the Database Machine and keep the databases in sync for a time, and then perform a switchover once the destination is ready to go live.

The benefits of using Oracle GoldenGate for migration are

Image This method provides zero-downtime migrations. The only impact is for the applications to initiate a reconnection to the Oracle Database on the Database Machine.

Image There are no restrictions on the source platform or the database versions. The source needs to be 10.1 or later, on any platform. Even non-Oracle sources are supported with GoldenGate.

Image The initial instantiation of the database on the Database Machine will happen while the source database is online and with minimal impact.

Image There is no performance impact on the source database, since GoldenGate asynchronously mines the redo logs for capturing changes from the source.

As mentioned earlier, GoldenGate can be used to run a parallel production environment on the Database Machine. This is considered a huge feature for databases that require a zero-downtime migration with minimal risks.

Migrating Using Oracle Data Pump

If you can afford some downtime for performing the migration, then consider the Data Pump method. Some highlights and benefits of using Data Pump are

Image This is a simple solution, but needs downtime. The amount of downtime depends on the data volume and the effective network bandwidth available between the source database and Exadata.

Image The source database needs to be 10.1 or later on any platform.

Image If Data Pump is used in the network mode, there is a one-time data movement between the source and the destination, and there is no need for a temporary staging environment.

Image If you need the temporary staging environment, the capacity of the temporary staging location will depend on the amount of data being migrated.

Image Consider tuning the Data Pump jobs by using parallelism.

Image This method enables migration of subsets of data, including subsets of schemas and even portions of tables or partitions by using the QUERY clause.

Migrating Using Create Table as Select (CTAS) and Insert as Select (IAS)

This method is extremely flexible and supports all versions of Oracle Databases on all platforms. If you are considering a migration with this method, you would first create a brand-new database on the Database Machine with all the best practices in place, create a database link to the existing Oracle database, and last, you would start data migration using Create Table as Select (CTAS) or Insert as Select (IAS) methods.

Image This method supports migration from any Oracle database version on any platform.

Image It requires downtime, which for the most part depends on the time it takes to move the data through the network.

Image CTAS and IAS methods are single threaded and are not automatically parallelized since the database link is inherently serial when used in a statement. However, you can manually parallelize the migration process by initiating multiple CTAS or IAS statements. But keep in mind that the overall throughput is limited by the available network bandwidth between the source and the destination.

Image This method enables migration of subsets of data, including subsets of schemas and even portions of tables or partitions by using the QUERY clause.

Image Consider routing the database links over the InfiniBand or the 10 Gigabit Ethernet network to improve the data transfer rate.

Summary

In this chapter, you have learned about the different migration methodologies for migrating databases to the Exadata Database Machine. These methodologies have been proven to be successful in real-world migrations that have complex requirements. In order for you to navigate a large migration project, the details listed in this chapter will help you plan the outcome that will be successful, the most cost effective, and the one that applies best in your environment.