Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Free Learning
Arrow right icon
Arrow up icon
GO TO TOP
Expert Cube Development with Microsoft SQL Server 2008 Analysis Services

You're reading from   Expert Cube Development with Microsoft SQL Server 2008 Analysis Services Design and implement fast, scalable and maintainable cubes with Microsoft SQL Server 2008 Analysis Services with this book and eBook

Arrow left icon
Product type Paperback
Published in Jul 2009
Publisher Packt
ISBN-13 9781847197221
Length 360 pages
Edition 1st Edition
Languages
Arrow right icon
Toc

Table of Contents (17) Chapters Close

Expert Cube Development with Microsoft SQL Server 2008 Analysis Services
Credits
About the Authors
About the Reviewers
Preface
1. Designing the Data Warehouse for Analysis Services FREE CHAPTER 2. Building Basic Dimensions and Cubes 3. Designing More Complex Dimensions 4. Measures and Measure Groups 5. Adding Transactional Data such as Invoice Line and Sales Reason 6. Adding Calculations to the Cube 7. Adding Currency Conversion 8. Query Performance Tuning 9. Securing the Cube 10. Productionization 11. Monitoring Cube Performance and Usage Index

The source database


Analysis Services cubes are built on top of a database, but the real question is: what kind of database should this be?

We will try to answer this question by analyzing the different kinds of databases we will encounter in our search for the best source for our cube. In the process of doing so we are going to describe the basics of dimensional modeling, as well as some of the competing theories on how data warehouses should be designed.

The OLTP database

Typically, a BI solution is created when business users want to analyze, explore and report on their data in an easy and convenient way. The data itself may be composed of thousands, millions or even billions of rows, normally kept in a relational database built to perform a specific business purpose. We refer to this database as the On Line Transactional Processing (OLTP) database.

The OLTP database can be a legacy mainframe system, a CRM system, an ERP system, a general ledger system or any kind of database that a company has bought or built in order to manage their business.

Sometimes the OLTP may consist of simple flat files generated by processes running on a host. In such a case, the OLTP is not a real database but we can still turn it into one by importing the flat files into a SQL Server database for example. Therefore, regardless of the specific media used to store the OLTP, we will refer to it as a database.

Some of the most important and common characteristics of an OLTP system are:

  • The OLTP system is normally a complex piece of software that handles information and transactions; from our point of view, though, we can think of it simply as a database.

    We do not normally communicate in any way with the application that manages and populates the data in the OLTP. Our job is that of exporting data from the OLTP, cleaning it, integrating it with data from other sources, and loading it into the data warehouse.

  • We cannot make any assumptions about the OLTP database's structure.

    Somebody else has built the OLTP system and is probably currently maintaining it, so its structure may change over time. We do not usually have the option of changing anything in its structure anyway, so we have to take the OLTP system "as is" even if we believe that it could be made better.

  • The OLTP may well contain data that does not conform to the general rules of relational data modeling, like foreign keys and constraints.

    Normally in the OLTP system, we will find historical data that is not correct. This is almost always the case. A system that runs for years very often has data that is incorrect and never will be correct.

    When building our BI solution we'll have to clean and fix this data, but normally it would be too expensive and disruptive to do this for old data in the OLTP system itself.

  • In our experience, the OLTP system is very often poorly documented. Our first task is, therefore, that of creating good documentation for the system, validating data and checking it for any inconsistencies.

The OLTP database is not built to be easily queried, and is certainly not going to be designed with Analysis Services cubes in mind. Nevertheless, a very common question is: "do we really need to build a dimensionally modeled data mart as the source for an Analysis Services cube?" The answer is a definite "yes"!

As we'll see, the structure of a data mart is very different from the structure of an OLTP database and Analysis Services is built to work on data marts, not on generic OLTP databases. The changes that need to be made when moving data from the OLTP database to the final data mart structure should be carried out by specialized ETL software, like SQL Server Integration Services, and cannot simply be handled by Analysis Services in the Data Source View.

Moreover, the OLTP database needs to be efficient for OLTP queries. OLTP queries tend to be very fast on small chunks of data, in order to manage everyday work. If we run complex queries ranging over the whole OLTP database, as BI-style queries often do, we will create severe performance problems for the OLTP database. There are very rare situations in which data can flow directly from the OLTP through to Analysis Services but these are so specific that their description is outside the scope of this book.

Beware of the temptation to avoid building a data warehouse and data marts. Building an Analysis Services cube is a complex job that starts with getting the design of your data mart right. If we have a dimensional data mart, we have a database that holds dimension and fact tables where we can perform any kind of cleansing or calculation. If, on the other hand, we rely on the OLTP database, we might finish our first cube in less time but our data will be dirty, inconsistent and unreliable, and cube processing will be slow. In addition, we will not be able to create complex relational models to accommodate our users' analytical needs.

The data warehouse

We always have an OLTP system as the original source of our data but, when it comes to a data warehouse, it can be difficult to answer this apparently simple question: "Do we have a data warehouse?" The problem is not the answer, as every analyst will happily reply, "Yes, we do have a data warehouse"; the problem is in the meaning of the words "data warehouse".

There are at least two major approaches to data warehouse design and development and, consequently, to the definition of what a data warehouse is. They are described in the books of two leading authors:

  • Ralph Kimball: if we are building a Kimball data warehouse, we build fact tables and dimension tables structured as data marts. We will end up with a data warehouse composed of the sum of all the data marts.

  • Bill Inmon: if our choice is that of an Inmon data warehouse, then we design a (somewhat normalized), physical relational database that will hold the data warehouse. Afterwards, we produce departmental data marts with their star schemas populated from that relational database.

If this were a book about data warehouse methodology then we could write hundreds of pages about this topic but, luckily for the reader, the detailed differences between the Inmon and Kimball methodologies are out of the scope of this book. Readers can find out more about these methodologies in Building the Data Warehouse by Bill Inmon and The Data Warehouse Toolkit by Ralph Kimball. Both books should be present on any BI developer's bookshelf.

A picture is worth a thousand words when trying to describe the differences between the two approaches. In Kimball's bus architecture, data flows from the OLTP through to the data marts as follows:

In contrast, in Inmon's view, data coming from the OLTP systems needs to be stored in the enterprise data warehouse and, from there, goes to the data marts:

What is important is to understand is that the simple phrase "data warehouse" has different meanings in each of these methodologies.

We will adopt Inmon's meaning for the term "data warehouse". This is because in Inmon's methodology the data warehouse is a real database, while in Kimball's view the data warehouse is composed of integrated data marts. For the purposes of this chapter, though, what is really important is the difference between the data warehouse and the data mart, which should be the source for our cube.

The data mart

Whether you are using the Kimball or Inmon methodology, the front-end database just before the Analysis Services cube should be a data mart. A data mart is a database that is modeled according to the rules of Kimball's dimensional modeling methodology, and is composed of fact tables and dimension tables.

As a result we'll spend a lot of time discussing data mart structure in the rest of this chapter. However, you will not learn how to build and populate a data mart from reading this chapter; the books by Kimball and Inmon we've already cited do a much better job than we ever could.

You have been reading a chapter from
Expert Cube Development with Microsoft SQL Server 2008 Analysis Services
Published in: Jul 2009
Publisher: Packt
ISBN-13: 9781847197221
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $19.99/month. Cancel anytime
Banner background image