baldwhiteguy.co.nz

-
Technical Blog


Star Schema: The Complete Reference

21November

Back in 2008, I was lucky enough to attend a Dimensional Data Modeling course in Auckland run by Ralph Kimball and Margy Ross. My team and I were thrilled at the opportunity to meet two “rock stars” of the Data Warehousing world (yes, sad geeks that we were). Being taught dimensional modeling techniques by the man famous for kicking it all off was a great experience. Although Ralph didn’t claim to be the inventor of dimensional modeling or star schemas, his workshops offered through the Kimball Group, and books such as The Data Warehouse Toolkit were instrumental in getting the techniques widely known and deployed as the de facto standard for data warehousing. To those in our part of the world at least, Ralph Kimball was “Mr Data Warehousing”. Admittedly, being a Data Warehousing god must be tough. At our workshop in Auckland, Ralph and Margy both needed to work through the crowd signing books and smiling through a couple of hundred photos since everyone took the opportunity to capture their magic moment next to a legend.
The Data Warehouse Toolkit

The seminal work explaining dimensional modeling method was Ralph and Margy’s The Data Warehouse Toolkit. This provided an introduction to the basic techniques, then went on to show (1) how the models were applied in different industries, and (2) how to cater for different data issues or reporting requirements. The book was perfect for those of us based on the other side of the world, who couldn’t easily get to a Kimball Group course. It was also the perfect reference for data warehouse designers because of the template solutions that it provided.

I read through several well-thumbed and coffee-stained copies of Ralph and Margy’s books, including both the Data Warehouse Toolkit and the Data Warehouse Lifecycle Toolkit, which contained other useful information and templates covering the business requirements analysis, design, build, and ongoing operations for data warehouses. These books were well written and clear - but (as great as they were) they were written from an IT practitioner’s point of view, and seemed a little too intimidating for the business analysts that we’d sometimes drag into our projects.
Star Schema draft

In truth, the basics of dimensional modeling weren’t really that difficult. I’d often sketch out simple star schema diagrams in my user interviews. I found them to be a powerful tool when communicating data entities and reporting concepts with business users. Just having a picture in front of us made things easier - and star schemas were a LOT easier and less trouble to explain than 3NF entity relationship diagrams (e.g. “Okay, so you’d like to analyse Purchase Order line-items by: Purchase Order, Buyer, Supplier, Order Date, Delivery Date, Delivery Status …”). These diagrams proved especially useful on occasions when I was interviewing users in Latin America, attempting to struggle through in my broken Spanish or Portuguese. Aside from the obvious advantage of identifying the required Fact and Dimension tables, the resulting discussions helped confirm the grain and identify the need for further aggregates.

Impressions

For me, Christopher Adamson’s Star Schema: The Complete Reference fills the gap for introducing new team members to the concepts of dimensional data modeling. Following in the footsteps of Kimball, Adamson has run workshops and taught the techniques to the newest generation of Data Warehouse designers and analysts. His book is more introductory in nature, and includes an early chapter nicely contrasting Kimball, Inmon and stand-alone datamart architectures - while reinforcing the point that dimensional modeling remains important no matter which approach you follow. When reading the text, one gets the impression that Adamson has written each explanation in response to questions that he’s faced hundreds of times in his classes, and that he’s gotten the knack of summarizing potentially complicated subjects down to their essential points. His writing style communicates in a simple manner that’s easy to follow and understand.

Having gotten through the basics in the first couple of chapters, Adamson progresses onto more advanced dimensional modeling topics. Then, towards the end, there’s good practical advice concerning projects, user interviews and documentation. This replicates similar advice from Kimball and Ross’ book - albeit Adamson’s efforts are a little clearer and more concise (IMHO). One comment that struck a chord with me was the observation that development of Star Schema designs is really part of the project analysis phase, since this mostly a matter of teasing out the data and reporting requirements. The project design phase occurs more when you start looking at the physical implementation, performance, and ETL needed to construct/update those star schemas from the original source data.

Structure

If you’re looking for a reference on techniques, Kimball’s book is a little harder to use since its been structured by industry types. For example, “Factless” fact tables are discussed under the chapter on Education (designing datamarts for a university), whereas bridge tables are discussed in the chapter on designing datamarts for Financial Services companies. The Dimensional Modeling training course run by the Kimball Group was similarly structured. Essentially, each chapter presents the reader/student with a real-life business scenario. These serve as lead-ins to illustrate the techniques - showing how dimensional models can be developed to tackle each situation or requirement.

By contrast, Adamson’s book is structured by Dimensional Modeling technique. Therefore, each technique or variation gets its own section - telling you about that technique, the problem it aims to address, where it might be used, and advantages/disadvantages of that approach.

Verdict

Although I recognise the achievement and worth of the The Data Warehouse Toolkit (and it’s nice to learn about Star Schemas from the authors most associated with them), I found Adamson’s Star Schema: The Complete Reference easier to read, and more informative in relation to specific techniques. Kimball and Ross’ book includes the same information - but it’s a little trickier to find and not quite as clear. Hence, I think Adamson’s book is the one that I’d be using more as a day-to-day reference. However, if I was asked to develop datamarts for a particular industry (e.g. Financial Services) then I’d refer to Kimball and Ross’ book as a starting point because of the background it gives about various industries and typical problems/solutions.

Star Schema: The Complete Reference is the book that I’d recommend to anyone who was new to the subject. This is possibly just my preference in writing style, and the truth is that either book would serve you well. My opinion might also be swayed by the fact that now I’m an experienced BI practitioner, the material is more familiar compared to my first experience working through Kimball’s book a decade ago. The best option might simply be to browse both books (or download sample chapters) then choose that which is the easiest to read from your perspective. As for me, I have both books conveniently loaded for quick reference on my iPad (via Safari Books Online) so I’m set either way.

Star Schema: The Complete Reference
Author: Christopher Adamson
ISBN-13: 9780071744324
Link to Amazon.com page or Read on Safari Books Online

The Data Warehouse Toolkit: The Complete Guide to Dimensional Modeling
Author: Ralph Kimball and Margy Ross
ISBN-13: 9780471200246
Link to Amazon.com page or Read on Safari Books Online
Andrew Mercer (Bald White Guy)
Andrew Mercer
I'm a Business Intelligence and Data Warehousing consultant based in Brisbane, Australia. I've consulted on or managed several large BI systems in New Zealand, Australia and Latin America.
Stacks Image 9105
Contact Info
Please use the contact form on this site.
Or phone 04 5704 1640 (Australia)
Latest Photo

Old-timer with a machine gun. Vintage War truck in 2015 ANZAC Day. Flag with original Australian (and ANZAC) colours is held in the background.

Stacks Image 11045
Stacks Image 11049
Stacks Image 11047
blog comments powered by Disqus
© 2015 Andrew Mercer