Data Warehousing – Star Schema vs Flat Table

Question

Your question is very good: the Kimball mantra for dimensional modelling is to improve performance and to improve usability.

But I don’t think it is outdated, or dogma- it is a reasonable, practical approach for many situations and platforms.

The way relational DBs store data means there’s a balancing act to be struck between the numbers and types of tables, the routes in to the data for typical queries, easy maintainability and description of relationships between data, the numbers of joins, the way the joins are constructed, the indexability of columns, etc.

3NF (or further) is one end of the spectrum, suiting OLTP systems, and a single table is the other end of the spectrum. Dimensional models are in the middle and appropriate for reporting, at least when using certain technologies.

Performance isn’t all about ‘number of joins’, although a star schema performs better for reporting workloads than a fully normalised database, in part because of a reduce number of joins. Dimensions are typically very wide. If you are including all those dimension fields in every row of every fact, you have very large rows indeed, and finding your way into those rows will perform very badly for typical queries.

Facts are numerous, so if you can make those tables compact, with the ‘wordier’ dimensions filterable, you hit a sweet spot of performance that a single table isn’t going to match, unless heavily indexed.

And yes a single table for a fact is simpler in terms of numbers of tables but is it really easier to navigate? Dimensions and facts are easy concepts to understand, and what if you want to cross you queries across facts? You’ve got many different data marts but one of the benefits of having a data warehouse in the first place is that these aren’t distinct- they’re related and can be reported across. Conformed dimensions enable this.

Leave a Comment Cancel reply