Insideanalysis.com

The Power of the Cube

2013-10-21

Online analytical processing (OLAP) is a mature technology. Let’s just go ahead and call it an old technology: for nigh on two decades – and, really, before that – business analysts have been working with cubes. A new generation of analysts is working with cubes today, actually, inasmuch as the tools which power business intelligence (BI) “discovery” are enabled and powered by OLAP engines. The user interfaces and the overall user experience are vastly improved in the visualization and discovery tools, but the basic building block of interactive BI now as ever, is the cube.

Watch Mark Madsen in The Briefing Room with IBM Analytics’ Chris McPherson.

But: doesn’t big data change everything? More to the point, didn’t the Golden Age of OLAP coincide with the Age of Scarce Data? Viewed through the lens of big data and advanced analytics, isn’t OLAP a relict technology? No, it isn’t. In fact, there’s still plenty of innovation on the OLAP tip, as the recent success of vendors such as Tableau and QlikView demonstrates. Established players are by no means sitting still either: consider IBM, which last year introduced a new “Dynamic Cubes” technology with version 10.1 of its Cognos BI platform – and which just recently shipped a revamped version of Dynamic Cubes with Cognos BI v10.2.

The simple fact is that OLAP is uniquely adapted to the requirements and vicissitudes of user-driven analysis. Even in an age of big data, OLAP-driven analysis – drilling up, drilling down; slicing and dicing; moving sideways, adding other bits of data; viewing information (e.g., sales and inventory information, which are typically sourced from different systems) in context – is still an extremely useful tool. So much so that some of the would-be Hadoop-based BI platforms – such as, for example, Platfora – essentially implement an OLAP-on-Hadoop model.

That being said, OLAP – like most other decision support technologies – needs to be accelerated if it’s to support analysis at bigger scale. Processing, memory and storage performance have increased over time as have data volumes.

Over the same period, software has become a lot smarter about how and where it stores or caches data: in a growing number of cases, and particularly in the database space, software is able to exploit processor-specific optimizations and different levels of memory or “cache.”

You’ve got two to three levels of cache in your microprocessor, but you also have different levels of caching for I/O: on the storage controller or HBA; on the physical storage device itself; on the network controller; and (increasingly) on the PCI-E bus in the form of dedicated I/O flash cache modules. These are all instances of physical cache that software can exploit if it intelligently optimizes for memory. This is in addition to a computer’s main memory.

This ability to optimize for memory usage is one of the advantages QlikView has always touted with its in-memory engine – and it’s likewise one of the things IBM is trumpeting with its latest Cognos release. The claim is that in-memory OLAP technologies better support interactive analysis and reporting over bigger volumes of data.

There’s something to this. A big problem with BI is that its interaction model is very stilted: i.e., it makes us wait while it goes off and fetches data via SQL. In most cases, this means going out to a database and fetching some data and doing some calculations and – preferably in two to five seconds – bringing something back and displaying it on our screen. This approach works reasonably well for reporting, but not really for analysis where the user’s train of thought is important.

When we start asking more complicated questions or ask questions that build on prior answers then traditional SQL-driven analysis breaks down. It doesn’t break down because it’s technically impossible; it breaks down because it cannot accommodate the pace or the rate – the flow – at which human analysis happens.

OLAP-driven analysis also breaks down when it interrupts or disrupts – when it distracts – the thought processes of the human analyst: when it contests its very premise – i.e., interactivity. In-memory OLAP technologies and smart caching, particularly across multiple physical or virtual cubes, aim to support an interaction paradigm measured in response times of less than 2 to 3 seconds. That’s consonant with the way in which we think. The response time doesn’t break or disrupt a train of thought, interrupt a flow and engender frustration.

Cognos is one such example. It’s a memory-optimized OLAP technology that it makes use of several different kinds of cache – e.g., it caches result sets and aggregates – but it also maintains a live connection to a source database. Unlike SAP’s (claimed) approach with HANA, however, IBM Cognos explicitly isn’t moving everything into memory. Instead, says IBM Cognos senior product manager Chris McPherson, Dynamic Cubes works by moving attribute and hierarchy data into memory and by doing extensive caching of results to memory. The detail facts, however, may remain in the data warehouse.

It works like this: a user initiates a query, which Cognos attempts to answer from its own cache, if possible. If not, it will a) post simple queries to the source database and b) store the responses that it receives in its local in-memory caches. Over time, Dynamic Cubes will be able to answer a majority of queries from cache – without querying the source database. IBM Cognos also offers an “Aggregate Advisor” tool that it says can be used to tweak or accelerate OLAP-powered applications. You’d use Aggregate Advisor to analyze your cube models or your workload for a particular application. The Advisor examines the queries that this application generates – e.g., What’s being executed in the database? What are the query response times? What if any constraints or bottlenecks are obtruding? – and looks for ways to improve them by creating in-database or in-memory aggregates.

The memory improvements in the latest release of Cognos attempt to make it easier to support analysis across multiple dimensions by using so-called “virtual cubes.” IBM Cognos defines a “dynamic cube” as a single fact table with any number of dimensions. It uses virtual cubes – or combinations of cubes that have at least one common connection – to enable analysis across subject areas. From a user’s perspective, a virtual cube appears as just one cube – e.g., a collection of measures and dimensions – even though (in the background) Cognos automatically reroutes queries to the appropriate dynamic cubes.

OLAP technologies have changed a lot from their 1990s roots, when one defined a cube containing as much data as one could fit on a PC for one user with no other real options for performance improvement. The newest products on the market try to use memory more intelligently, keeping frequently accessed metrics, attributes or aggregates in memory and building dynamic caches of different types based on the usage patterns of users.

About the Author: Mark, president of Third Nature, is a former CTO and CIO with experience working in both IT and vendors, including a stint at a company used as a Harvard Business School case study. Over the past decade Mark has received awards for his work in data warehousing, business intelligence and data integration from the American Productivity & Quality Center, the Smithsonian Institute and TDWI. He is co-author of “Clickstream Data Warehousing” and lectures and writes about business intelligence, emerging technology and data integration.