An Elegant Architecture Where Information Flows - Information Management: Strategies for Gaining a Competitive Advantage with Data (2014)

Information Management: Strategies for Gaining a Competitive Advantage with Data (2014)

Chapter 14. An Elegant Architecture Where Information Flows

Guidelines for absorbing the information in the book and putting it into practice.

Keywords

business intelligence; information architecture; leadership; information technology; information management maturity; big data; data warehouse; cloud computing; data virtualization; master data management; data warehouse appliance; analytics; cubes; multidimensional databases; data warehouse consolidation; data marts; data stream processing; NoSQL

If you’ve been reading from the beginning, by now I have reviewed with you all of the major data platforms, permutations, and workloads. Now it’s time to put it all together.

The Starting Point

On the operational side, where business transactions are forged, there has been less change in recent years than on the analytical side. Most organizations look at the line between operational and analytical systems as fairly hard and fast, and resources are allocated as such. Most operational systems are based on relational databases. The pattern over time that the operational systems, especially the more common Enterprise Resource Planning (ERP) programs like SAP, go through are:

1. We provide all of the reporting you would ever need right out of the ERP system.

2. When concurrency issues happen with that database and customers begin to need to do extensive development of reports, a separate database—akin to a data mart or warehouse—is built, where you can also try to integrate other data.

3. Concurrency issues are obviated and some additional structures are commingled with the operational database; integration with other data is still challenging.

4. A balance is struck between operational/ERP reporting and post-operational data mart/warehouse reporting, with the latter being used primarily for analytical reporting, for form-fit reporting, and when data integration is necessary.

Though it’s hard to generalize, many information environments will be loosely centered around one, or a few, data warehouses. See Figure 14.1. These will feed data marts a subset of the data in the warehouse for more specific consumption profiles. There will also be numerous marts that sit in isolation. No environment is without these inelegant structures due to reasons of expediency and difficulty integrating with IT structures like the data warehouse.

image

FIGURE 14.1 Desired information architecture through 2013.

Many of today’s environments can have many multidimensional/cube structures. Like data marts, these can be fed from operational sources, data marts, or data warehouses. Similar to how marts are structured, analytical cubes should be fed from the data warehouse, where data is (theoretically) guaranteed to be “clean” (see Chapter 4) according to corporate governance and to otherwise conform to standard.

Where cubes have been introduced into an organization, they can expand their footprint, either in number or size. I have provided a Data Warehouse Rescue Service to many organizations over the years and one of the main reasons is abuse of the cube idea. While organizations are largely moving forward into new areas regardless of the current situation, there are a couple of areas of projects that are well worth pursuing:

1. Data Warehouse Consolidation

2. Cube Consolidation

Plenty of Work to be Done

The architecture to move organizations into the next couple of decades is the No-Reference architecture. See Figure 14.2. It is labeled No-Reference to distinguish it from hard-and-fast references. It is a guideline as to where things belong relatively and what technologies are viable, but there are many permutations, especially when it comes to the data warehouse/marts/appliances and Hadoop section.

image

FIGURE 14.2 The No-Reference information architecture.

I have been giving Action Plans in each chapter. The summation of all of the action plans leads you to the No-Reference Architecture.

I have added many items to the Starting Point architecture. You will notice the new functions placed into both the “post-operational” (right-hand) side of the line and the operational side. There is no one-size-fits-all when it comes to optimizing the important information asset.

The hard line between an operational side and an analytical side to information management is blurring. Reporting, analytics, and integration are needed everywhere. The skills owned by the analytical-side people are now needed everywhere in the organization. As a result, Information Management is a discipline being led more by those with analytics, data quality, and integration skills than operational/ERP skills.

Master Data Management is worthwhile just about everywhere. The biggest problem with MDM is the long, strange justification cycle. Getting cooperation from different departments about value proposition is always a challenge. To keep Figure 14.2 from looking like spaghetti, it shows MDM feeding some of the other operational systems, but in reality MDM will eventually feed just about ALL of the systems in the No-Reference Architecture. This could be, as described in Chapter 7, feeding structures fit for working with that system, or it could be allowing access to the data in the master hub. Regardless, the data that gets IN to the MDM hub will be considered clean and governed, which may necessitate a workflow process with manual involvement from multiple constituencies—which is what the workflow icon in Figure 14.2 references.

Continuing around the architecture for its additions from the Starting Point, there is syndicated data. Syndicated data is not a new data store. It is a new class of data being brought into the architecture. I show it in the MDM hub, as discussed in the chapter on MDM.

In the Operational Applications and Users area, we are now doing more business intelligence in the form of Data Stream Processing, discussed in Chapter 8. High velocity data with real-time needs can be processed before it is optionally stored with Data Stream Processing. MDM data can accompany this processing.

Finally, there is the new significant data store of NoSQL, supporting operational big data needs, as described in Chapter 10. There is data integration and data virtualization all over the place in the architecture, even in the operational space.

Moving to the post-operational side, we have Hadoop for analytical data with potential varying structure (“unstructured”) that grows much more rapidly than the structured data in the data warehouse. We need to have the business capability to deal with Hadoop’s data for this to be worthwhile. That will certainly be a gating factor to success.

Hadoop looks larger than the data warehouse, and it very well may be in terms of data size. In terms of importance, it very well may become a vital part of competitive advantage, but the data warehouse usually is quite important today and its advancement is assured.

There still may be cubes, although these structures implemented for performance, although their advantages have been trumped over time. There will certainly be a data warehouse, although consolidation may mean one or fewer than before.

The data warehouse ecosystem within information architecture may include relational, row-based, “normal” database management systems, data warehouse appliances, a separate columnar database, in-memory databases, or utilization of in-memory capabilities. As much as we are moving into heterogeneity—a familiar concept of this book—market offerings are consolidating necessary features like in-memory and columnar capabilities to the point of mitigating some of the diversity of structure. Nonetheless, it will be important to implement these capabilities, wherever they may be, for the right workloads.

And, oh yes, all of the above could be in a cloud or clouds. Ultimately, no two architectures will appear the same with all of these variables. The key to architecture— and, consequently, business success—is allocation of workloads to the right platforms and tying it all together meaningfully.

Companies continually develop good architecture, but it usually happens in a series of “two steps forward, one step back” moves over time. Eventually, the criteria of architectural progress, defined in Chapter 1 as performance, fast performance, and scalability, are improved. The goal of this book is to accelerate your progress toward good architecture and eliminate the steps backward.

image

FIGURE 14.3 How eBay determines best fit of a workload to a large data platform.3

Information Management Maturity

With all this movement afoot and all the endless possibilities, I give you some conceptual information about what brings companies to the architectural table and how you need to progress with your maturity. Without architecture, you might as well skip to the next chapter on business intelligence and just keep expanding the data access layer instead of the data platform layer. That is a loser’s game. It reminds me of a passage in the book “Thinking Fast and Slow,” which explains that we try hard just to do the expedient and not to think:

“System 1 provides the impressions that often turn into your beliefs, and is the source of the impulses that often become your choices and your actions. It offers a tacit interpretation of what happens to you and around you, linking the present with the recent past and with expectations about the near future. It contains the model of the world that instantly evaluates events as normal or surprising. It is the source of your rapid and often precise intuitive judgments. And it does most of this without your conscious awareness of its activities. System 1 is also, as we will see in the following chapters, the origin of many of the systematic errors in your intuitions.”1

While, obviously, adding BI to our information architecture is the necessary last step, it is fast thinking and perhaps not the best (slow) thinking. BI is the “tip of the iceberg” of information management. The more leverageable and important work can be found in the “back end” of data warehouses, Hadoop, NoSQL stores, master data management, data stream processing, data virtualization, data warehouse appliances, with some strategically placed platforms and software in the cloud.

How Cisco will Improve its Information Management Maturity2

Year 0: Establish the vision, charter, goals, benefits, and roadmap.

Year 1: Establish oversight and execution teams; implement sustainable processes; identify success measures aligned with the business.

Year 2+: Execute and evolve operations; drive continuous improvement; achieve best-in-class recognition for data management practices.


2Source “Creating An Enterprise Data Strategy: Managing Data As A Corporate Asset”: http://docs.media.bitpipe.com/io_10x/io_100166/item_417254/Creating%20an%20Enterprise%20Data%20Strategy_final.pdf

However, to be sure, all the information management back end (the focus of this book) is ultimately about supporting the BI. BI is what the users crave. Information management software sales teams understand this and commonly sell BI to the users, leaving the detail of getting the data act (the back end) together for an unsuspecting customer.

This is not necessarily an unseemly tactic since many sales teams do not understand architecture or the real work involved just to use a product. Even the “packaged” solutions typically require 50%–300% more work in data integration, data quality, and setup than is claimed, due to the ultimate uniqueness of companies. Information architecture is essential. You cannot simply acquire products in the shop and “stack” your way to success.

As much as a consultant might be able to truly see the future of an organization’s needs, in the real world, it’s hard to fix problems that aren’t glaringly “in your face.” Therefore, we have stages of progress.

There are steps in the maturity progression and they will be followed, whether we like it or not. In my maturity workshops and assessments, I move organizations to the next level and lay out a plan to Leadership. I do not take a Reactive organization and move them to Leadership in one week. However, with knowledge, you can move very quickly through the steps to Leadership, spending as little as a month in each Stage.

Organization size and age are not barometers for information management maturity.

Reactive Stage

In this stage, everything is “bottom up” and it is about reacting to the needs of the day. New applications get new databases. There is little reusability from past work and little focus on software maintenance or much of anything after the initial implementation. While there may not be a focus on maintenance and support, it exists. These necessities are treated as a surprise and crammed into the team’s workload.

More importantly, there is no focus on architecture and, thus, no reuse is built into the architecture. This usually manifests itself in the “spaghetti” flow of data and the limited to no reusability of any data extract. Those who see no value proposition beyond the Reactive Stage will gloss over most of this book, except maybe to land on the next chapter on business intelligence topics.

Education is nonexistent and information management, as an extension of IT (which is usually an extension of something else), is completely off the radar as a necessary discipline. Phrases like “big data,” “columnar databases,” “master data management,” “analytics,” “data quality,” “cloud computing,” “self-service,” “documentation,” etc. are never heard in these organizations. Curiosity and creativity are low and the method for problem solving changes at a snail’s pace.

There may be no data warehouse, although there may be some structures that are replicas of source structure that receive some data to “off-load” reporting. There is limited integration in any data store, but if there is any done, it is seen as a “one time only” need that will not need to be repeated (“Who needs integrated data?” may be heard).

Management lobs simple-sounding requests at IT,3 but, in reality, without any informational foundation laid, these requests are hard. Often, IT will make a yeoman’s effort to do the “one-time” report. This effort seldom gets back to the requesting management, furthering the incorrect impression that this is easy. And correct. I had one client who “covered up” the immense work these requests caused for years, until the executive raised some questions about the data and was irate to learn the process was not repeatable. He reflected on the decisions he had made with the data and determined that he had made wrong moves to the tune of several million dollars. Explain your work efforts and give management the chance to advocate change.

IT is a bottleneck in these organizations since the expedient “IT does it” mentality comes into play very quickly whenever anything remotely technical sounding is mentioned.

If you have a data warehouse, think about how the program started. Most likely, it was avoided until deep pain ensued.

This is a description of a worst-case scenario, obviously. These organizations clearly miss a lot of business value with their inattention to the important asset of information. They are also at a crossroads at which they are very susceptible to vendor product pitches (“cure-alls”) for “sudden” perilous information needs. These might be accepted without reference to forming any architecture or processes to support the product, or light bulbs might go off which indicate that the organization is very much in the information business and needs to treat information as an asset.

In order to progress to the Repeatable Stage of Information Management maturity, an organization will:

• Understand and plan for the support and maintenance requirements of information management systems

• Reuse one or more data extracts by using them for different data stores

• At least one extract is scheduled and not considered “one-off”

Repeatable Stage

The feeds for those marts built for the one-offs that turned into repeatable needs might be systemized in the Repeatable Stage, a time when information begins to turn the corner and be seen as something that will need repeatable processes. Reports are recurring. Calculations are shared across multiple purposes.

Some variability in the business intelligence formats are being requested. IT is starting to get educated and speak up about the possibilities. “Analytics” and “data warehousing” may enter the vernacular and we may see a first-pass data warehouse: a copy of operational data onto a separate platform where concurrency with operations is not an issue and reporting can be pursued to the fullest. Data there may share a platform, but it is not integrated into the data model. For now, it’s just data silos sharing a platform. Integration and data model revisions (perhaps substantial) will come into play for the Productive Stage.

Often companies will hit a wall trying to add the second internal “customer” to the data warehouse because they did not architect the data warehouse initially with scalable principles in mind. It does not take longer to do it right. It does, however, take knowledge.

Reactive Stage tools may not be good enough anymore in the Repeatable Stage. Budgets begin to come in line with, although not quite completely to, the reality of the workload. A full life-cycle mentality that incorporates post-production and an iterative nature to information management may creep in.

There is no way that an organization can progress beyond the Repeatable Stage unless there is some leadership on the part of IT (bringing to bear the possibilities) and non-IT (colloquially referred to as the “business”) brings their requirements to, and trusts, IT.

Ironically, occasionally it takes a screw up like a minor security breach to bring organizations to the next stage.

Productive Stage

Now we come to a critical juncture. Information is recognized as critical in all projects. There is a data warehouse. There may be a burgeoning master data management project. Basic analytics (see Chapter 3) are calculated, shared, and incorporated into business processes. Database options like in-memory and columnar are turned on and used— maybe not to the fullest, but the path has been started on.

One client, a large financial services firm, went through an exercise to determine how to utilize the latest features of their data warehouse DBMS, including in-memory and columnar. After implementing the changes, query times were reduced, on average, by 75%, some by 99%.

Non-data warehouse analytical data storage options emerge, but not always for the right reasons. Still-rogue business departments doing their own thing and decentralized but uninformed decision making may have brought some data warehouse appliances into the shop. It will require adjustment and some workload maneuvering to get to the Leadership Stage.

Data virtualization may or may not be utilized, since the habit of data integration is the first resort. There may be some optimization to be done here.

Master data management is not in place, at least not formally. MDM is not treated as a separate discipline. “Master data” comes together in the data warehouse, the one-way street, and is not shared back with the operational environment. The value of MDM would be clear, but it takes organizations some time to be ready for the MDM message.

No data stream processing or graph databases, yet. These may be foreign concepts, even at the Productive Stage of information management. Syndicated data is brought in for a single application or two into data marts, but not into any leverageable data store.

Big data may be discussed, both in the context of supporting operations as well as in the context of post-operational analytical data. Whatever big data is stored is selectively (by date or limited scale) stored in the data warehouse. However, there may be some “double secret” downloads of big data software like Hadoop happening for experimentation purposes.

One way or another, either from organizational directive or information management initiative, some data storage is happening in the cloud—perhaps some marts with unique characteristics or some of those still-rogue business departments doing their own thing have made it happen. This bottom-up cloud strategy will morph into a more heads-up approach that works best for the organization in the Leadership Stage.

The Productive Stage is, well, productive, for information management. It would certainly clear ROI hurdles if ever measured. Critical hurdles have been crossed and the organization is set up nicely for the Leadership Stage. IT is finally staying slightly ahead of the business community, but not by much.

It will require immense leadership to get the organization to the exponentially valuable Leadership Stage. This book is geared toward those who want to bring their organizations this kind of value.

Leadership Stage

In the Leadership Stage, the lens of information is very clear. All information within the organization is captured because all information can be used. Outside, syndicated information is brought in to leverageable data stores like an MDM hub.

The MDM hub is now in place for key subject areas, taking the formerly immense burden of coming up with this data for each application off of each application and replacing it with an “API”5 approach. Best of all, the data is trusted.

Big data curiosity is replaced by big data projects. Big data solutions are necessary to capture all of the organization’s data and then serve the data operationally and serve it up to analytics. Analytics is embraced in the Leadership Stage and embedded into all applications and business processes.

One large telecommunications company was only able to find value in storing one month’s worth of detailed call detail records in their data warehouse DBMS. By adopting Hadoop, they decided to store seven years’ worth, and found value in doing so.

Like MDM, Data Stream Processing is siphoning cycles and emphasis away from the data warehouse and toward a more real-time function. Speaking of the data warehouse, it is a mini-ecosystem now with elements of in-memory, columnar, and data temperature consideration spanning multiple databases.

In the Leadership Stage, no two companies’ architectures are going to be the same. Though most of the data stores in this book are necessary in every Leadership Stage company, some will be embraced to different degrees than others. There will be a diversity of data stores in every Leadership Stage company as a result of the agile pursuit of information, and business, leadership.

Though it’s never a primary focus, even in Leadership Stage companies, to “clean up” legacy messes, data warehouse consolidation will be done to reduce unnecessary and redundant stores and cube consolidation will be done to deemphasize these structures in favor of more targeted data stores.

A corporate cloud strategy, or at least an Information Management one, is in place; it is a hybrid strategy (Chapter 13) and the strategy is being realized. At decision points for information management software and hardware, the decisions consider the cloud strategy.

In addition, illustrating the importance of information to the bottom line, the CIO has a seat in the boardroom.

Finally, these companies will pursue their projects using agile strategies and organization change management and with collaboration and self-service approaches to business intelligence, which also incorporates mobile platforms. Read on to the last three chapters for more information on these leadership enablers.

Do the stage descriptions above sound too specific? Too descriptive? I’ve been in information management my whole career and have performed close to 100 Assessments in which I review an organization’s progress and get them to the next level, with a plan to become leaders in information and leaders in business. It is uncanny how closely maturity follows the paths I outline here. I anticipate this to be the case for many years to come.

Leadership

Leadership is the ability to impress, guide, and challenge people toward common goals. What I said about information management leadership in 2008 is true today:

“More demonstrated leadership would help secure funding, leverage vendors, and establish partnerships with the business that would uncover true business requirements. It would increase the confidence of information management personnel in leadership positions to make organizational suggestions for improvement, to bring in new technologies and processes and, importantly, to gain a greater depth of vision with which to provide ROI to the business. Leadership gives more leash to information management undertakings, allowing for some limited deferral for business gains. The switchover from non-credible, lights-on tactical information management to a credible source of potential business ROI happens with the maturity and success of the business, but it does not happen without demonstrated leadership.”6

I get concerned when my clients want to tightly hold the reins and fade into the wallpaper. That is a losing strategy. At the end of the day, information management leadership is all about personal leadership. Only through personal leadership can the progress be made to support a company to compete with information.

Action Plan

• Diagram your current information architecture

• Diagram a 3-year, currently desired future state information architecture

• Know which stage of maturity you are in

• Study what it takes to get to the next stage

• Develop specific action plans to achieve that stage; it will be uncannily consistent with improving the business

• Take a keen focus on leadership; your personal education should not be strictly technical

www.mcknightcg.com/bookch14


4“Singularity” is an eBay homegrown platform; ignore for our purposes.

1Kahneman, Daniel (2011-10-25). Thinking, Fast and Slow (p. 58). Farrar, Straus and Giroux. Kindle Edition

3again, referring to those who do the function, not necessarily those in an organization with an “IT” title

5Application Programming Interface

6http://www.information-management.com/issues/2007_48/10001363-1.html