![]() |
Multidimensional data structures |
You can contact Nigel Pendse, the author of this section, by e-mail on NigelP@olapreport.com if you have any comments or observations. Last updated on March 19, 2001.
Hypercubes
Multicubes
Which is better?
This is one of the older analyses in The OLAP Report, and it refers to many products that are no longer on the market. For historical accuracy, we have left in these references.
The simplest view of the data for a multidimensional application is that it exists in a large Cartesian space bounded by all the dimensions of the application. Some might call this a data space, emphasizing that it is a logical rather than a physical concept. While it would be perfectly possible to build a product that worked like this, with all the data held in a simple uncompressed array, no large-scale product could be as basic as this.
In fact, as shown in Figure 1, multidimensional data is always sparse, and often clumpy: that is, data cells are not distributed evenly through the multidimensional space. Pockets of data cells are clustered together in places and time periods when a business event has occurred.
|
The designers of multidimensional products all know this, and they adopt a variety of technical strategies for dealing with this sparse, but clustered data. They then choose one of two principal ways of presenting this multidimensional data to the user. These are not just viewing options: they also determine how the data is processed, and, in particular, how many calculations are permitted.

Some large scale products do present a simple, single-cube logical structure, even though they use a different, more sophisticated, underlying model to compress the sparse data. They usually allow data values to be entered for every combination of dimension members, and all parts of the data space have identical dimensionality. In this report, we call this data structure a hypercube, without any suggestion that the dimensions are all equal sized or that the number of dimensions is limited to any particular value. We use this term in a specific way, and not as a general term for multidimensional structures with more than three dimensions. However, the use of the term does not mean that data is stored in any particular format, and it can apply equally to both multidimensional databases and relational OLAPs. It is also independent of the degree to which the product pre-calculates the data.
Purveyors of hypercube products emphasize their greater simplicity for the end-user. Of course, the apparently simple hypercube is not how the data is usually stored and there are extensive technical strategies, discussed elsewhere in The OLAP Report, to manage the sparsity efficiently. Essbase (at least up to version 4.1) is an example of a modern product that used the hypercube approach. It is not surprising that Comshare chose to work with Arbor in using Essbase, as Comshare has adopted the hypercube approach in several previous generations of less sophisticated multidimensional products going back to System W in 1981. In turn, Thorn-EMI Computer Software, the company that Arbors founders previously worked for, adopted the identical approach in FCS-Multi, so Essbase could be described as a third generation hypercube product, with logical (though not technical) roots in the System W family.
Many of the simpler products also use hypercubes and this is particularly true of the ROLAP applications that use a single fact table star schema: it behaves as a hypercube. In practice, therefore, most multidimensional query tools look at one hypercube at a time. Some examples of hypercube products include Essbase (and therefore Analyzer and Comshare/Geac Decision), Executive Viewer, CFO Vision, BI/Analyze (the former PaBLO) and PowerPlay.
There is a variant of the hypercube that we call the fringed hypercube. This is a dense hypercube, with a small number of dimensions, to which additional analysis dimensions can be added for parts of the structure. The most obvious products in in this report to have this structure are Hyperion Enterprise and Financial Management, CLIME and Comshare (now Geac) FDC.

The other, much more common, approach is what we call the multicube structure. In multicube products, the application designer segments the database into a set of multidimensional structures each of which is composed of a subset of the overall number of dimensions in the database. In this report, we call these smaller structures subcubes; the names used by the various products include variables (Express, Pilot and Acumate), structures (Holos), cubes (TM1 and Microsoft OLAP Services) and indicators (Media). They might be, for example, a set of variables or accounts, each dimensioned by just the dimensions that apply to that variable. Exponents of these systems emphasize their greater versatility and potentially greater efficiency (particularly with sparse data), dismissing hypercubes as simply a subset of their own approach. This dismissal is unfounded, as hypercube products also break up the database into subcubes under the surface, thus achieving many of the efficiencies of multicubes, without the user-visible complexities. The original example of the multicube approach was Express, but most of the newer products also use modern variants of this approach.
Because Express has always used multicubes, this can be regarded as the longest established OLAP structure, predating hypercubes by at least a decade. In fact, the multicube approach goes back even further, to the original multidimensional product, APL from the 1960s, which worked in precisely this way.
ROLAP products can also be multicubes if they can handle multiple base fact tables, each with different dimensionality. Most serious ROLAPs, like those from Informix, CA and MicroStrategy, have this capability. Note that the only large-scale ROLAPs still on sale are MicroStrategy and SAP BW.
It is also possible to identify two main types of multicube: the block multicube (as used by BusinessObjects, Gentia, Holos, Microsoft Analysis Services and TM1) and the series multicube (as used by Acumate, Express, Media and Pilot). Note that several of these products have now been discontinued.
Block multicubes use orthogonal dimensions, so there are no special dimensions at the data level. A cube may consist of any number of the defined dimensions, and both measures and time are treated as ordinary dimensions, just like any other. Series multicubes treat each variable as a separate cube (often a time series), with its own set of other dimensions. However, these distinctions are not hard and fast, and the various multicube implementations do not necessarily fall cleanly into one type or the other.
The block multicubes are more flexible, because they make no assumptions about dimensions and allow multiple measures to be handled together, but are often less convenient for reporting, because the multidimensional viewers can usually only look at one block at a time (an example of this can be seen in TM1s implementation of the former OLAP Councils APB-1 benchmark). The series multicubes do not have this restriction. Microsoft Analysis Services, GentiaDB and especially Holos get round the problem by allowing cubes to be joined, thus presenting a single cube view of data that is actually processed and stored physically in multiple cubes. Holos allows views of views, something that not all products support. Essbase transparent partitions are a less sophisticated version of this concept.
Series multicubes are the older form, and only one product (Speedware Media) whose development first started after the early 1980s has used them. The block multicubes came along in the mid 1980s, and now seem to be the most popular choice. There is one other form, the atomic multicube, which was mentioned in the first edition of The OLAP Report, but it appears not to have caught on.

We do not take sides on this matter. We have seen widespread, successful use of both hypercube and both flavors of multicube products by customers in all industries. In general, multicubes are more efficient and versatile, but hypercubes are easier to understand. End-users will relate better to hypercubes because of their higher level view; MIS professionals with multidimensional experience prefer multicubes because of their greater tunability and flexibility. Multicubes are a more efficient way of storing very sparse data and they can reduce the pre-calculation database explosion effect, so large, sophisticated products tend to use multicubes. Pre-built applications also tend to use multicubes so that the data structures can be more finely adjusted to the known application needs.
One relatively recent development is that two products so far have introduced the ability to de-couple the storage, processing and presentation layers. The now discontinued GentiaDB and the Holos compound OLAP architecture allow data to be stored physically as a multicube, but calculations to be defined as if it were a hypercube. This approach potentially delivers the simplicity of the hypercube, with the more tunable storage of a multicube. Microsofts Analysis Services also has similar concepts, with partitions and virtual cubes.
This page is part of the free content of The OLAP Report, but which represents less than a tenth of the information available to subscribers. You can register to access a free preview of a small sample of the large volume of subscriber-only information.
|
All information copyright ©2005, Business Application Research Center, all rights reserved.