Foreword - Information Management: Strategies for Gaining a Competitive Advantage with Data (2014)

Information Management: Strategies for Gaining a Competitive Advantage with Data (2014)

Foreword

Andy Palmer

In 2014, it’s no longer a question of whether1 to become an analytics-driven organization, but how.

William McKnight tells us how beautifully, in this definitive book. It’s an easily understood, action-oriented guide for Information Managers who want to help their organizations compete on data analytics in an era of Big Data and rapid technological innovation.

Successful analytics-driven organizations will build information architectures that match analytics workloads in the context of key questions that people in their organizations need to answer in order to create value on the front lines of their business. The best architectures will be loosely coupled so that they can absorb rapidly changing new technologies to meet competitive challenges and opportunities. Just like the one-size-fits-all database is dead,2 the one-size-fits-all information architecture is dead.

I faced this reality personally while building out analytical informatics infrastructure as SVP and CIO at Infinity Pharmaceuticals and later as Global Head of Software and Data Engineering for Novartis Institutes for Biomedical Research (NIBR). It was also starkly clear when bringing new database and analytics products to market as a founder, advisor and investor in startups like Vertica, Data-Tamer, VoltDB and Cloudant.

This book provides an invaluable framework for making sane decisions about which technologies and approaches are right for you in building your information architecture. You’ll learn how to focus your resources on solving the problems that will have the biggest impact for your business this month, quarter, and year. Competing on analytics is not about big multi-year projects: it’s about having an impact on decision-making every day, week and month. And ensuring that when you do have a significant strategic decision to make, that you do so with the context of all the data/information available to your organization.

Everyone involved in Information Management can benefit from this book: from Information Architecture experts and business-process owners to IT pros to the C-Suite. It covers the changing role of traditional analytics architecture – e.g., the data warehouse and DBMSs – and the incorporation of new analytics architecture, from column-oriented approaches and NoSQL/Hadoop to rapidly evolving IT infrastructure like cloud, open source and mobile.

Ultimately, however, this book isn’t about technology or even about analytics. It’s about people and empowering them.

Analytics starts with questions, from real people at all levels of an organization. What are the 100 (or 200 or 500) questions that would create significant value – if the people in your organization could answer them with the support of all the data available in your organization at any given time?

The questions that need to be answered aren’t just the broad strategic questions that C-level execs talk about, but also the very tactical questions. In the past, most enterprises have focused on the former, because it cost too much and was a huge pain to extend analytics to anyone but business analysts or senior management. The technologies discussed in this book are making analytics practical for people throughout an enterprise. Democratizing analytics is a key trend that I see every day.

Great analytics provides the CONTEXT for all business people to create value throughout their day so they can make more-strategic decisions on tactical matters. Think about product support. Pretty prosaic stuff, right? Not really. When someone calls in for product support, what’s the value of knowing that caller represents a top 5% customer – or whether she’s even a customer? How do the support people know how to prioritize requests without analytic context?

The Information Manager’s job over the next 20 years is to provide analytical context for every employee in the company. So that he or she can make the best decisions about how to allocate his or her time and the company’s resources.

To the great information, experience and clear-thinking advice that William shares here, I’d like to add some personal observations.

ent Always start with the questions. What are the questions that the people in your organization find most interesting and want to answer? Avoid data engineering projects that take quarters or years. Instead, embrace projects that are focused on collecting and answering very specific questions with high-quality data, using repeatable and sharable queries of data that interconnect sources across the company and leverage both external (publically available) and internal data.

ent Segment your workloads! William makes a big point about this, and I totally agree. The simplest approach is by “read-oriented” and “write-oriented.” Then, implement your infrastructure to ensure that there is minimal latency between your read and write systems, and you’ll have something close to real time analytics. Within “read-oriented” workloads, separate read access that requires longitudinal access (a small number of records and many or all columns/fields of data) from “data mining” access (a small number of columns across many or all records). This will ensure that you can implement queries against a system designed to match the requirements of those queries – under the covers. These two types of queries are orthogonal, and the most effective way to address them is to separate these query workloads to run against systems that are designed to match the workload.

ent Remember the three key types of analytics: descriptive, predictive and prescriptive.

Descriptive

Reporting on historical data and trends

Predictive

Reporting and exploratory on the future (which can range from very short term and tactical to very forward-looking and exploratory)

Prescriptive

Recommendations of actions based on descriptive and predictive analysis


Statistics matter for all analytics. But for predictive and prescriptive analytics, you can’t operate without significant statistical expertise and infrastructure. R and SAS are no longer good enough. You need next-generation tools and infrastructure, most of which are not yet available in commercial third-party products. So, start with descriptive and work your way up. For an interesting reference framework for an infrastructure spanning (or ready for) all three kinds of analytics, check out Mu Sigma.3

ent Don’t trust product vendors to optimize for you. To minimize the number of lines of code in their systems and the cost of maintenance, product vendors usually force you into the design pattern of their product instead of setting you up with a competitive product that is better aligned with a given workload. Further, for obvious reasons, vendors don’t make it easy to integrate between products. This is one of the reasons for building a best-of-breed infrastructure (as William recommends) versus one based on a single vendor. No vendor has it all, and they are almost all radically biased towards one data engineering design pattern (row-oriented, column-oriented, document-oriented, viz-oriented, graph-oriented), Remember: one size doesn’t fit all!

ent Plan for the quantity of data sources to be vast, and set up your analytics infrastructure accordingly. Data quality matters! The best way is to control data quality at the point of data creation and by leveraging all your data sources to assess and augment any one source. Yes, the ambiguity of your data sources is significant and broad – so much so that we’re going to need new ways of curating to improve and maintain data quality for any analytical use case. All data is valuable, but not all data is analytically relevant given the context of a specific question. This is why the collection and curation of a set of key analytical questions is so important: it helps you determine what data is analytically relevant to your organization, so you know where to invest your curation time and budget.

ent Accept that you’re never done optimizing for performance. Achieving performance requires significant effort. You’ll need to integrate products from multiple vendors thoughtfully and iteratively over a long time period.

ent Push for simplicity. Database appliances have given IT shops a taste. But I’m betting that more enterprises will realize that true simplicity comes via cloud-based solution-oriented services such as DBaaS (database-as-a-service) over the cost and complexity of maintaining dedicated physical appliances.

ent The cloud matters. Hosted, multi-tenant databases such as such as Cloudant and Dynamo are going to be the default choice for building new systems. Eventually enterprises will realize that leveraging hosted, multi-tenant and highly optimized infrastructure is radically more cost-efficient and effective than trying to replicate the expertise required to run high-performance database systems as a service internally.

ent The future of master data management is automated data integration at scale. This means bottom-up development of integrated models of data and meta-data using machine learning techniques – similar to the logical evolution of data virtualization. Top-down models for information management such as “master data management” do not work. Modern analytics need more bottom-up data management and stewardship of data.

ent Focus on the real issues, not the red herrings. Things like the NoSQL/SQL debate are just semantics, and trivialize the real struggle: who needs access to data and how are you going to get them that data? Most of your users don’t care if you’re using declarative languages such as SQL or not. Therefore, don’t allow your organization to get caught up in the nonsensical narratives fueled by industry press.

Competing on analytics requires a combination of great systems and empowered, motivated people who believe in their right to information and analytics for optimal, value-creating decisions. As William emphasizes in this book, it’s not either-or. It’s the seamless integration of systems and people that creates non-incremental value.

We need to empower business people at the point of decision-making with analytics that will help them create significant value for their companies – every single day. Information Management: Strategies for Gaining a Competitive Advantage with Data is your roadmap. Good luck!

Andy Palmer is Co-Founder of Data-Tamer and Founder of Koa Labs; previously he co-founded Vertica Systems with Michael Stonebraker, PhD. A technologist and serial entrepreneur, Andy has founded, backed or advised more than 40 start-ups over the last 20 years.


1Tom Davenport and Jeanne Harris made the case in their 2007 book Competing on Analytics: The New Science of Winning.

2See: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.68.9136&rep=rep1&type=pdf. One of the technologies described briefly in this paper and fully in a later paper, the C-Store column-store database, became the basis of Vertica Systems, the company that I co-founded with Michael Stonebraker in 2006.

3See http://en.wikipedia.org/wiki/Mu_Sigma_Inc.