In-Memory Option - Database Management - Oracle Database 12c DBA Handbook (2015)

Oracle Database 12c DBA Handbook (2015)

PART
II

Database Management

CHAPTER
9

In-Memory Option

In Chapter 8, we talked about tuning methodologies and other ways to maximize throughput and minimize response time. One of the most useful and powerful new features of Oracle Database 12c, the In-Memory option (available starting with version 12.1.0.2), adds another tool you can use to make your queries run faster than ever before.

This chapter gives a high-level overview of the Oracle In-Memory option, what system requirements you need to use it effectively, and what it can and cannot do. I’ll give a few real-world examples of how In-Memory works and review the dynamic performance views you need to use to identify how well In-Memory is working in your environment.

Overview of Oracle In-Memory Option

The considerations for using the In-Memory option are primarily licensing costs and how much memory you can allocate to the In-Memory column store. As of version 12.1.0.2, the licensing costs for In-Memory are comparable to that of RAC licensing; memory prices go down every year, but because the bottleneck moves from I/O to the memory, your server’s memory bus and memory speed become critical to the performance of In-Memory as a whole.

The In-Memory (IM) column store is allocated as part of the SGA; therefore, your total SGA size (SGA_MAX_SIZE) includes the memory you want to allocate to the IM column store. As the name implies, the IM column store contains one or more columns from a table stored in the SGA alongside the table’s row store in the buffer cache and on disk. Any changes to values in the IM column are kept in synch with the row store in the buffer cache and, of course, in the table’s datafile.

You don’t necessarily have to enable the IM column store for each table—you can enable it for a single column, or you can enable it for an entire tablespace by default. For a partitioned table, one partition can default to residing in the IM column store while the rest of the partitions are stored only in the row store.

Using the IM column store gives the most benefit in these scenarios:

Image Query aggregates

Image Scanning a very large number of rows using operators such as =, <, >, and IN

Image Frequently retrieving a very small number of columns from a table with a large number of columns

Image Joins between a small table and a large table (e.g., a data warehouse dimension table and fact table)

If you have the memory available, you can store most columns of your biggest tables in the IM column store continuously; once the column resides in the IM column store, it is maintained alongside the row store on disk and is removed from the IM column store only if pushed out by another, more frequently accessed column of another table or if the database instance is restarted. Using ALTER TABLE to disable the IM column store for that column immediately invalidates and flushes the contents of that column from the SGA.

System Requirements and Setup

The requirements to use the IM column store are not much more than what you likely already have—you probably need a bit more memory and maybe another CPU, but your I/O requirements will be somewhat less demanding since you will almost certainly be doing less I/O. You may not even need more CPU threads, because the processing required to manage the IM column store in memory is offset by reduced I/O, and therefore your elapsed time, especially in a batch or data warehouse environment, is going to be significantly reduced.

The most important hardware component for implementing the IM column store is the additional memory required to hold the tables or table columns in memory. The memory bandwidth, not just the amount of memory, becomes much more critical because the bottleneck in throughput has moved from the I/O subsystem to memory; you’re using the existing buffer cache to hold table rows (the row cache) while maintaining some or all columns in the dedicated IM column store.

Do you need to have enough memory to hold all of your tables and all columns in memory? Not at all. Most likely only a small subset of columns in your biggest tables are accessed most often. In addition, any given table column will be stored in memory at one of four compression levels; which level of compression you choose depends on the type and distribution of the data in the column as well as the CPU resources available. A primary key column will not compress much at all regardless of the compression type, but a column such as SERVICE_DATE or LOCATION_ID will compress nicely.

You won’t need an Exadata engineered system to effectively use the IM column store. A monolithic traditional server with lots of fast memory will actually give you some of the advantages of an Exadata system at a somewhat lower total cost of ownership (TCO).

If you already have an Exadata engineered system, you can certainly take advantage of the IM column store. While the Exadata storage subsystem leverages many of the algorithms used by the IM column store, having several tables’ columns in memory across nodes in an Exadata RAC environment still reduces the amount of I/O required by the storage subsystem. The best performing I/O is still the I/O that you don’t have to do!

In-Memory Case Study

Using the IM column store is easy, as long as you understand its limitations and requirements. Setting up the initialization parameters is easy, but you must also be cognizant of which initialization parameters cannot be changed while the instance is running and the dependencies on those parameters. In this section I’ll present the main initialization parameters you need to set and show you how to mark entire tablespaces, tables, or columns to leverage the IM column store.

Initialization Parameters

The key parameters for using the IM column store are INMEMORY_SIZE, SGA_TARGET, and SGA_MAX_SIZE. On a server with 384GB of RAM, you start by defining the total size of the SGA, including the IM column store, leaving room for the OS and the PGA. You don’t want to set the IM column store area too big since you still need space in the SGA for the shared pool and the standard buffer cache; you can still use SGA_TARGET in this scenario, but keep in mind that the target size must include the total size of the IM column store:

image

Restart the instance after setting these parameters. If the value for SGA_TARGET is too low or, worse yet, less than the value of INMEMORY_SIZE, your instance will not start and you’ll need to create a temporary text-based initialization file (PFILE) to get the instance started again!

Since the parameters SGA_MAX_SIZE and INMEMORY_SIZE are static, you might need to control use of the IM column store in other ways until you can restart the instance. For example, you can set INMEMORY_QUERY to DISABLE either at the system level to turn off all use of the IM column store or at the session level to easily test query performance with and without the IM column store, even if the table columns in the query might currently be in the IM column store for other sessions.

Marking Tablespaces, Tables, and Columns

Your other point of control for using the IM column store is, of course, marking tables and columns to use the IM column store. You can perform this operation while the instance is up and running. Keep in mind, though, that turning off and then turning on the IM column store attribute for a column will require a repopulation operation, which can temporarily affect performance of the instance due to higher CPU usage during the repopulation process.

If you want all of your IM-enabled tables to reside in a single tablespace, you can mark the entire tablespace that way and automatically enable the IM column store for any table created in or moved to that tablespace by using the ALTER TABLESPACE command:

image

The defaults that go along with the INMEMORY setting may be fine for you—but you also have the option to fine-tune how much compression should be used, whether the compression should favor SELECT statements or DML statements, and what priority this tablespace’s objects will have in the IM column store when there is not enough room to hold all the selected columns simultaneously:

image

You can always override the settings at the table level or column level even if the table resides in a tablespace whose default setting is INMEMORY. Marking the ORDER_PROC table for residency in the IM column store is as easy as this:

image

Specifying QUERY HIGH means that all columns of ORDER_PROC are stored with relatively high compression but still suitable for frequent SELECT query activity. However, knowing that the ORDER_PROC table has 215 columns and that you will likely not need to use all of those columns in most reports, you want to be more selective. If most of your big reports only need the ORDER_PROC_ID, PROC_CODE, and PANEL_PROC_ID columns, you can instead mark just those columns:

Images

If you need the flexibility, you can even mark some columns with a different compression level if desired.

Query Performance Before and After

As I mentioned earlier in this chapter, you may not have the memory to keep all desired columns in the IM column store. You also might need to keep memory contention in check by keeping some of the smaller tables’ columns out of the IM column store and instead relying on the SGA buffer cache with traditional I/O. While some queries are using the IM column store, the others may not need as much CPU at any given moment because they are bound by I/O waits (even though the disk is fast—but usually not as fast as server memory). In this example, a daily report batch consisting of about 4000 reports consumes I/O at a rate of about 3 GBps, maxing out the throughput of the I/O subsystem. Figure 9-1 shows the I/O throughput and type for this report batch.

Image

FIGURE 9-1. Report batch with IM column store inactive

After some analysis, you determine that a small number of tables, though large, account for most of the long-running queries. In addition, only about 20 columns across those big tables account for all columns referenced in the reports. Therefore, you mark those tables for INMEMORY and QUERY HIGH as in the example in the previous section. After taking a few minutes to construct the ALTER TABLE statements, you rerun the report batch to see the results of your new IM column store configuration. In Figure 9-2 you see that even your minimal level of analysis has yielded significant results:

Image

FIGURE 9-2. Report batch with IM column store active

Not only is the elapsed time reduced from about 20 minutes to 12 minutes, the overall I/O consumption has been cut by about two-thirds. The queries in the latter portion of the report batch are referencing many of the same columns as queries in the beginning of the report batch (although probably with different predicates and aggregation). With some additional analysis you might be able to reduce the elapsed time even more, but in this case the I/O subsystem now has additional bandwidth for other database instances by offloading the I/O to the IM column store.

Execution Plans

As you might expect, the Oracle optimizer is aware of the execution costs of using column data from the IM column store versus retrieving the column from the buffer cache via traditional I/O against the row store on disk. In the execution plan in Figure 9-3, every column referenced in the query was marked for storage in the IM column store.

Image

FIGURE 9-3. Execution plan for a query with IM column store columns

Most of the steps referencing tables confirm that the IM column store is being used:

image

However, it appears that the table PAT_ENC is being accessed in the buffer cache using traditional I/O. The optimizer cost for using a full table scan came out lower than using the copy in the IM column store. This could be the case for several reasons. For example, the columns in PAT_ENC could be marked for the IM column store as CAPACITY HIGH, meaning that the CPU cost for decompressing and scanning those columns in the IM column store was higher than retrieving the uncompressed columns that were already in the buffer cache.

You’ll also see cases where every column of every table in a query is in the IM column store as QUERY LOW but the optimizer still chooses to use one of the table’s indexes on disk. This costing decision makes sense if the number of rows retrieved is low—retrieving a couple of blocks of an index from disk will often take less time or resources than scanning an entire table’s column values in memory.

Data Dictionary Views

There are only a few dynamic performance views you’ll use to see the status of your IM column store: V$IM_SEGMENTS, V$INMEMORY_AREA, and, an old favorite, V$SGA.

V$IM_SEGMENTS

The V$IM_SEGMENTS view is clearly the most granular and important view you’ll use to monitor the status of your IM column store.

image

Note that even for columns marked as FOR QUERY LOW, the compression ratios are quite high; in the case of PAT_ENC_HSP, the compression ratio is approximately 12:1. The column’s compression (primarily due to repeating column values) reduces both the memory footprint for storing that column and the time it takes to scan all column values. This somewhat offsets the CPU cost of “uncompressing” the column values when using them in a query.

Other columns in V$IM_SEGMENTS show the status of the IM column store population process (POPULATE_STATUS), the type of segment in the IM column store (SEGMENT_TYPE: TABLE, TABLE PARTITION, or TABLE SUBPARTITION), and the column’s priority for the population process and its retention in the IM column store (INMEMORY_PRIORITY).

V$INMEMORY_AREA

The view V$INMEMORY_AREA shows a high-level status of each pool in the IM column store. The two pools have a distinctly different purpose: the 1MB pool stores the actual column values in memory while the 64KB pool contains the metadata about the column values stored in the 1MB pool:

image

As you might expect, the 64KB pool is going to be much smaller and take up less memory than the 1MB pool as long as you’re storing columns from tables with millions of rows instead of only hundreds of rows.

V$SGA

The view V$SGA has the same rows as in previous releases of Oracle Database along with the new row for the In-Memory Area:

image

Summary

The In-Memory option, new in Oracle Database 12c (12.1.0.2), is one of the most useful and powerful features second only to the multitenant architecture. You can speed up your queries by a magnitude or more by keeping some or all columns of a table in a special compressed format in a new area of the SGA known as the In-Memory column store. This column store works side by side with the traditional row store that maintains tables in row format on disk and whose data blocks reside in the SGA buffer cache or a session’s private PGA area. All of your applications work as before and the IM column store stays in synch with any DML operations on the underlying table in the row store on disk.

The increased speed of queries using the IM column store and the significant reduction of I/O come at a price, however. You’ll need more memory allocated in the SGA and potentially more CPU resources to perform the compress and decompress operations on the column store. However, the higher CPU and memory demands are typically offset by the reduction of I/O and the accompanying wait events, meaning not only that your elapsed time for any given query is much shorter, but also that you’ll more easily meet your customer’s SLA. Or, in another scenario, your shorter execution times mean that you’ll be able to run more queries in the same amount of time.

Image

Image





All materials on the site are licensed Creative Commons Attribution-Sharealike 3.0 Unported CC BY-SA 3.0 & GNU Free Documentation License (GFDL)

If you are the copyright holder of any material contained on our site and intend to remove it, please contact our site administrator for approval.

© 2016-2024 All site design rights belong to S.Y.A.