The OLAP Report

Commentary: Project Gemini — Microsoft’s Brilliant OLAP Trojan Horse

You can contact Nigel Pendse, the author of this section, by e-mail on NigelP@olapreport.com if you have any comments, observations or user experiences to add. Last updated on October 29, 2008.

Project Gemini was announced at the second annual Microsoft BI Conference in Seattle on October 6, 2008, and there is no doubt that it was the highlight of the three-day conference. The Gemini code-name is meant to imply that Excel and Analysis Services, and end-users and IT, will be twinned using the new product, but our pun (‘BOTH’) reflects the view that Gemini appears to be both a brilliant way of introducing Analysis Services to Excel users and some rather smart technology.

OLAP Report subscribers can read a copiously illustrated preview of Gemini here.

In-memory, column-oriented processing

It’s no surprise to see Microsoft responding to the hype surrounding in-memory analytics and column-oriented processing, though it is sometimes forgotten that Microsoft has been shipping a local OLAP engine for a decade: the local cube engine that works with cub files and is included with every copy of Excel as well as desktop OLAP tools such as ProClarity. This is more scalable than many people expected – it can happily handle files with tens of megabytes – but is limited by the fact that cub files are normally just subsets of server cubes, and that they are not optimized for in-memory processing. Nor is the local cub engine itself (which shares source code with the server engine). In particular, cub files are not used as a way of building AS server cubes.

Of course, the ubiquitous Excel itself is also an in-memory analytical program. Not only does it already include basic multidimensional analysis in the form of PivotTables, but it can be used to build dashboards. Just about any BI calculation is possible in Excel, and its reporting abilities are better than many BI tools. But it does not feature column-oriented processing, and so cannot handle Gemini’s large data sets.

The all-new Gemini engine is not based on the old local cub files engine, but one seasoned observer immediately described Gemini as, “cub files done right”. But this understates the technology in Gemini, for which Microsoft expects to gain more than a dozen patents before it is even released.

Although developed by the Analysis Services team as part of the SQL Server ‘Kilimanjaro’ release, Gemini is expected to be delivered as an extension to Excel (probably as a free add-in, either like the existing Analysis ToolPak which ships with Excel or the data mining add-in which is a free download). Quite simply, it will provide a new kind of turbo-charged tables that can handle far more data than Excel 2007 (which was already much extended compared to its predecessors), with, if the early demonstrations are anything to go by, electrifying performance. Amir Netz, the inventor of Gemini (as well as Analysis Services itself), delivered an audience-wowing demonstration at the BI conference of near-instantaneous analysis of 100m rows of BI data, with no need for cube building or pre-aggregation. However, these demonstrations did not include any data loading and cleansing, which may turn out to be more cumbersome.

Timing

It looks like Microsoft pre-announced Gemini months earlier than it might normally have done, thanks to the competitive pressure from aggressive small vendors like QlikTech, as well as the recent acquisition of in-memory products by IBM (TM1), Oracle (TimesTen) and SAP (BI Accelerator). There is also an increasing buzz about column-oriented processing, hitherto a long-established, but rather quiet backwater.

Microsoft has said that Gemini will ship in the first half of 2010, which we expect to mean not much before June 2010, so the pre-announcement is perhaps as much as 21 months before full product availability. But as with other Microsoft products, the release date is more likely to slip than to be brought forward, particularly if problems emerge in the TAP (alpha, expected in early 2009) and CTP (beta, expected in late 2009) phases.

Curiously, although Gemini will require the next release of Excel (presumed to be code-named Excel 14, perhaps to be released as Excel 2010, if Microsoft continues its recent Office naming convention or simply Excel 14, if Microsoft uses the new Windows 7 approach), Microsoft is still months away from talking about that release of Excel, which already appears to be running late. In fact, it will not even officially confirm the widely known Excel 14 code-name, let alone its features (Excel 2007 is actually Excel 12, as Microsoft superstitiously skipped version 13).

This (premature?) pre-announcement of Gemini is probably designed to disrupt the market for other in-memory BI tools. For Gemini to be fully deployable, new releases of SQL Server (Kilimanjaro), SharePoint Server and Office will all be needed, and it would hardly be surprising if at least one of these product releases slips. And even when all three new versions are shipping, customers wishing to take full advantage of Gemini will have to make the decision to install the latest versions of all three products, which they would not normally rush to do. So it may be a number of years before Gemini has any real impact on the market.

Gemini as a Trojan horse

Gemini’s manifestation as an Excel extension disguises its real role as an ingenious Trojan horse for Analysis Services. The seductively inviting Gemini world is refreshingly free of the off-putting jargon like star/snowflake schemas, fact tables, cubes, measures, dimensions, hierarchies, levels, attributes, aggregations, partitions, MDX calculations and scripts typically encountered in OLAP server deployments. Instead, Excel power users with Gemini installed will be able to analyze and summarize vast amounts of data with absolutely no need to pre-define models or structures. In true spreadsheet-style, they work with the available data, rather than having to first build structures to slot it into. Microsoft is betting that this concession to the natural style of the millions of Excel power users will finally deliver the ‘BI for the masses’ that has so far proved so disappointingly elusive.

Any number to do with Excel user totals normally has a long string of zeros attached. Nobody really knows the number of active Excel users, but anywhere between 400 and 500 million is a reasonable guess. Of course, the majority of this multitude are simply consumers of Excel spreadsheets written by others. But up to ten percent of the total could be regarded as Excel authors and power users, who build spreadsheets for consumption by others. Roughly the same group of power users also build PivotTables, based on either native Excel data or Analysis Services server cubes. Not all of these will need the power and performance of Gemini, but some will, and this ‘small’ minority could still number in the tens of millions, far more than the current total number of developers of all BI tools combined. These millions are the people Microsoft is targeting with Gemini.

These potential Gemini users will just see a turbo-charged Excel; of course, they will have to upgrade to the latest (post-2007) version of Excel to access Gemini, which is one of Microsoft’s major motivations for this project. Microsoft finds it increasingly difficult to convince corporate customers to upgrade to successive new releases of Office, as most organizations feel that the Office products already have more features than their employees can use, so eye-catching new capabilities like Gemini are needed to give them a stronger incentive to upgrade.

Gemini will include new tools to load and cleanse data, and users will be able to load data from any source, including their own keyboards. It’s expected that bulk data loads will be from in-house databases and data warehouses, and Gemini will be able to refresh such data automatically. Data edits and cleansing steps will be remembered and re-applied to refreshed data, even if the model is uploaded to a server. SharePoint will provide features to manage data refreshes on the server, and will maintain statistics on lineage, usage, performance, etc, thus allowing models that started as individual ad hoc projects to become professionally managed, server applications.

Dealing with large data files

The inconvenience of moving large data files to users’ PCs may limit Gemini’s usage, so Microsoft is likely to add capabilities to Integration Services to enable it to generate compressed Gemini models on the server. This approach will mean that the files that need to be moved around could be much smaller, perhaps as little as a tenth of the size of the original size. This may be essential for really large models.

The theoretical Gemini limits are probably irrelevant, being far larger than the data that can be practically loaded into even a well-endowed personal computer with many gigabytes of RAM. This increased size will undoubtedly startle many Excel users, who will be amazed to be able to filter and sort tens of millions of rows of data in a fraction of a second. Microsoft says it is confident that Gemini will comfortably out-scale other in-memory BI solutions, such as QlikView, but this has yet to be proved.

But row level sorting and filtering has its limits. As part of the Gemini delivery, the next version of Excel will also include an upgraded PivotTables module that can work with Gemini’s astonishing data volumes. This will allow easy multidimensional analysis of seriously large data volumes that previously would have needed a hefty server. Consumers of these spreadsheets won’t need Gemini to view these PivotTables, but if they don’t have a Gemini-enabled Excel, they won’t be able to refresh the data or PivotTables.

But even if their colleagues do have the latest version of Excel, complete with the Gemini add-in, it may not be practical to send them gigabyte-sized spreadsheets. Nor will multiple concurrent users be able to open an xlsx file containing a Gemini model on a shared drive, as Gemini will demand exclusive access to the file.

It might make much more sense to publish a sharable server version of larger models, and Microsoft has conveniently (or craftily) made this easy, inevitably using SharePoint Server and Analysis Services. This will provide thin-client Web access to the Gemini Excel reports and PivotTables, with no software needed to be installed locally (not even Excel). Clearly, to gain all the potential benefits of Gemini, customers will need to commit to the Microsoft stack, using the latest versions of Excel, SQL Server and SharePoint Server.

Gemini Excel workbooks shared in this way actually create Analysis Services cubes on the server, with reports delivered via Excel Services. The uploaded Gemini models become a new type of in-memory Analysis Services cubes, without anyone actually touching Analysis Services. This new cube type will be accessible from standard Analysis Services client tools, including both those from Microsoft and third parties – so Excel power users will be able to build server solutions that eventually make no use of Excel, and even tools that know nothing of Gemini will be able to report on its server models.

This seductive scenario means that, without their even knowing it, millions of Excel users are likely to become OLAP developers in a few years. They won’t have to learn anything about dimensions, measures, cubes, MDX, partitions, levels, etc, but they will still be creating local multidimensional cubes based on these concepts. They will be free to remain blissfully ignorant of these concepts by just doing their analyses locally, but if they decide to share their models using a server, these aspects will come into play.

Microsoft’s win-win scenario

This seems like a win-win scenario for Microsoft. Excel power users will get a huge increase in power and capacity without having to do any more than upgrade to the latest version of Excel and install a free add-in. They won’t have to learn any unfamiliar techniques, or install any new products. They will be able to integrate data from multiple sources, including corporate databases, other Excel worksheets and external sources, using familiar Excel-style tools, not complex professional ETL tools like Integration Services.

If they do want to share the results of some of their analyses over the Web, SharePoint will make it easy, but this won’t be compulsory. And the server cubes they generate can be extended using Analysis Services (for example, to add fine-grained security and sophisticated calculations, using MDX). If such applications turn out to be widely used, IT professionals can take over the management of the server cubes, including extending them and managing security, thus freeing the original authors from what might seem a rather boring and repetitive duty (maintaining a stable application is much less fun for an end-user than doing the initial analysis).

Microsoft gains in a number of ways. Its millions of customers will have a real incentive to upgrade to Excel 14 much sooner than they might otherwise have done, thus justifying Microsoft’s maintenance charges and quelling any plans to defect to OpenOffice. And many customers will have been given a good reason to purchase or upgrade to the latest versions of SharePoint and SQL Server. Taking advantage of the ubiquity of Excel gives Microsoft the opportunity to dominate the in-memory analytics market in a way that is simply impossible for vendors that need to sell each copy of their software, rather than simply encouraging existing users to upgrade to a new release, often at no additional cost.

Post-OLAP paradigm?

Some commentators have described Gemini as an ‘emerging new post-OLAP paradigm’. We do not agree: Gemini is not post-OLAP, it is OLAP. Gemini users will be building in-memory OLAP cubes in an Excel-like environment, just as TM1 users have been doing for many years. Gemini will be much easier to use than Analysis Services, which is certainly very welcome, but it will be doing much the same sort of things as Analysis Services has always done.


This page is part of the free content of The OLAP Report, but ten times more information is available only to subscribers, including reviews of dozens of products, case studies and in-depth analyses. Gemini is previewed here. You can register for access to a preview of some of the subscriber-only material in The OLAP Report or subscribe on-line.


 

Analyses

Product reviews

Case studies

Subscribe

Home

FAQ