The OLAP Report

Performance matters

The OLAP Surveys hold some performance surprises

You can contact Nigel Pendse, the author of this section, by e-mail on NigelP@olapreport.com if you have any comments, observations or user experiences to add. Last updated on August 28, 2007.

When you’re choosing a BI product, there are many considerations. You’ll probably start with functionality — after all, it’s not worth considering a product unless it can do the things you need. Ease of use probably comes high on the list, too, if you expect business users to pick up the product without much training and support. If budgets are tight, price may be an issue, too. But how high on your list of priorities is query performance?

If you’re like the many thousands of people surveyed over the years in The OLAP Surveys, you’ll probably put functionality first, followed by ease of use, and then query performance:

Chart showing Hyperion revenue and license fees
Figure 1: These are the results from The OLAP Survey 6 in 2006, but the results have been much the same year after year. The selection reasons are color-coded into product and corporate categories.

This may seem perfectly sensible, but is this the right priority? The results from the same Surveys suggest not.


Figure 2: Trend of most serious problems reported

For example, projects where the product was selected on the basis of query performance have consistently delivered more business benefits than those that selected products based on functionality. In fact, a number of other criteria had a higher correlation with project success than did functionality. It seems that, contrary to widespread assumptions, functionality is not the best way to choose products.

Why is this? Probably because many of the features on buyers’ shopping lists aren’t really needed. If the product is mature and widely used, the chances are that all the key features that are needed in the real world are already included and it’s not worth demanding additional features dreamed up by a selection committee with limited experience of this type of product. And if a requested feature really is missing, there may well be workarounds, but there is no way to disguise poor performance.

But this is not the only evidence: slow query performance has been consistently the most serious product-related reported problem, and for the last few years it has been the single most often complained of problem (see Figure 2). This shows that the data quality and company politics have improved significantly over the years, but more and more people are complaining of poor performance. Indeed, it is the only problem to have increased every single year. Conversely, few respondents reported missing functionality as a major problem.

This may seem strange – after all, hardware performance improves every year. Indeed many people implementing BI solutions tend to assume that even if the initial application is too slow, a hardware upgrade may take care of the problem. So, is performance actually getting worse, or are expectations increasing?

The most obvious theory might be that data volumes are increasing even faster than hardware performance. We’ve all heard of the exploding data volumes in data warehouses, so this does sounds like the most likely reason for the rising complaint rate.

Once again, The OLAP Surveys can help answer the question, and the result is very surprising (see Figure 3). It plots median (ie, typical) input data volumes and median reported query times from thousands of sites from 2002 to 2006.

Note that these are the input data volumes of the largest BI applications reported, not the data warehouse as a whole. This is sensible, as the query performance will depend on the data volumes in a particular data mart or OLAP database, rather than the enterprise data warehouse.

The first surprise is that median data volumes have changed little over the years and are much lower than you might have expected. We all hear about the headline-making giant data warehouses, but these are relatively rare — multi-terabyte BI applications account for less than five percent of the total. As this chart, which plots median sizes, shows, the ‘typical’ BI application has about five gigabytes of input data, and this hasn’t changed significantly over the last few years.


Figure 3: Median input data volume and median query performance trends

The second surprise is the unexpectedly close correlation between median input data volumes and query times. As volumes fluctuate up and down each year, so does query time. Roughly speaking, you can expect a query time of around 1.5 seconds per gigabyte of input data.

Of course, this is an industry-wide average, and the actual performance in a particular application will depend on a host of other factors, notably the product architecture: MOLAPs are invariably faster than ROLAPs with the same data volumes. ROLAP performance also degrades more as data volumes increase, but they are also better able to handle really large applications.

Slow query performance is now reported as a problem more often than either data quality or company politics. It is the only problem to have increased every year. Conversely, few respondents reported missing functionality as a major problem.

But we now return to the original question — if data query performance hasn’t changed much over the years, why is the complaint rate rising steadily? This must be because of rising expectations. We all benefit from steadily improving Internet speeds, as well as faster computers. As a result, other applications are getting steadily faster. One obvious example is Web searches: any Google query seems to complete in under a quarter of a second, regardless of its complexity or whether it found a single result or tens of thousands. With fast communications, you probably see the result on-screen in well under a second.

Not surprisingly, if Google can search unimaginably large volumes of unstructured data, which grow every hour, and consistently responds in well under a second, it’s increasingly unacceptable for BI queries based on just a few gigabytes of structured data to be an order of magnitude slower. So BI query performances that were perfectly acceptable five years ago are now regarded as painfully slow.

Why does this matter? It’s not just a matter of user grumbles. For example, if users grow impatient with slow queries, they are more likely to print or copy query results into local Excel spreadsheets which they can recall more quickly and conveniently. But these may also get out-dated without the users realizing

If users are constrained only to run live queries, they may simply do less analysis, because they can’t afford to spend the time or because they lose their train of thought. And the time spent waiting cuts into their productive working time.

So what’s the solution? There are several steps you can take:

This page is part of the free content of The OLAP Report, but which represents less than a tenth of the information available to subscribers. You can register to access a free preview of a small sample of the large volume of subscriber-only information.


 

Analyses

Product reviews

Case studies

Glossary

Home

FAQ