![]() |
Preview of Microsoft’s Project Gemini |
THIS PAGE REPRESENTS ONLY A VERY SHORT EXTRACT FROM THE FULL REVIEW.
TO VIEW THE FULL REVIEW YOU CAN PURCHASE THE REVIEW INDIVIDUALLY OR PURCHASE AN ANNUAL SUBSCRIPTION TO THE OLAP REPORT WHICH ALLOWS ACCESS TO ALL OLAP REPORT CONTENT.
Gemini is designed to look as much like Excel as possible, so its data will look like a set of Excel tables. Anyone familiar with Excel 2007 tables (which is still probably only a small minority of the hundreds of millions of the world’s Excel users) will immediately feel at home. However, Gemini tables are actually stored as highly compressed, column-indexed, in-memory objects, not Excel worksheets. They are viewed and manipulated in the new Gemini client window, not in the Excel worksheet itself.
Gemini will include capabilities to load data from databases, the clipboard, or elsewhere. There will also be simple data cleansing features, but Microsoft is not yet willing to be specific about these, nor to demonstrate what they might look like (or how well they will perform). It would seem that these features are currently less developed than the engine itself.
Microsoft does say that the data cleansing features will be aimed at Excel end-users, and will not need scripting. It also says that Gemini will remember the data cleansing tasks associated with a model’s data, and will be able to repeat them if the data is refreshed. This will also apply if a model is moved to the server.
This may be an area that potential users will need to focus on: with Gemini’s ability to handle potentially gigabytes of input data, data-loading performance and cleansing is likely to prove every bit as important as the engine itself. Indeed, Gemini has to index and compress data when it is first loaded from external sources, so this might be slow when processing tens of millions of rows of data, just as it can be with a MOLAP server. However, unlike most disk-based OLAP servers, Gemini does not need to pre-aggregate data before it can be used. Data cleansing could also be slow if it involves fuzzy matching of text keys. It will certainly be interesting to see if Microsoft’s future demonstrations of these capabilities are as impressive as the engine’s query performance.
Gemini will be capable of assembling star schemas from a set of tables. It will attempt to do this as automatically as possible by inspecting the data in tables and deducing links between fields, but it is likely that users will be able to override the automatic schemas to define key linkages, hierarchy fields, etc. The trick will be to deliver enough power to cope with messy real-world data without the complexity of professional OLAP cube building tools. While it is simple to demonstrate automatic cube building from carefully constructed, clean demo data, it is much harder to get this to work with unpredictable, inconsistent real-world data sources. It is far too early to assess whether Microsoft has perfected this tricky balancing act.
Microsoft has not said what the overall Gemini model size limits are, but it was comfortable demonstrating a model with more than 100m rows of data at the Seattle BI conference in October 2008. This is the sort of data volume that would normally be encountered in a serious database, not a spreadsheet, and means that despite the Excel-like user interface and in-memory architecture, Gemini is actually more like a database than a spreadsheet. As such, the power and performance of its tools for loading and cleansing data will need to be compared with professional ETL products, but will have to be much easier to use.
Microsoft has initially positioned Gemini as a solution for self-service reporting, so it clearly envisages scenarios where Gemini users will be given access to already cleansed data from data warehouses, and will then be able to do their own local analysis, saving IT staff from constantly having to build new data marts, OLAP cubes and reports. This will give end-users much more ad hoc flexibility than if they had to wait for IT to build formal solutions for all their reporting needs. In some cases, these ad hoc end-user applications will grow to become large and important enough to be turned over to IT to administer and maintain, and this forms much of the rationale for the Gemini project.
THIS PAGE REPRESENTS ONLY A VERY SHORT EXTRACT FROM THE FULL REVIEW.
TO VIEW THE FULL REVIEW YOU CAN PURCHASE THE REVIEW INDIVIDUALLY OR PURCHASE AN ANNUAL SUBSCRIPTION TO THE OLAP REPORT WHICH ALLOWS ACCESS TO ALL OLAP REPORT CONTENT.
All information copyright ©2008, Business Application Research Center, all rights reserved.