Database throwdown: Transactional vs Dimensional

At its core, the dimensional design (as described by Ralph Kimball) is intended to simplify the experience of business users that might want to connect to and actually USE their data.  The dimensional model is a response to the transactional model.  Let’s break down those two ideas.

‘Traditional’ databases, or databases that are designed to capture data, are usually designed in what database folk refer to as a transactional design.  There are many rules we use to design these databases (check here for more info on these rules).  To keep it simple, when I’m designing a transactional system I often refer to a post-it on my monitor that reminds me of the golden rule ‘The Key, the WHOLE key and nothing but the key, so help me Codd …’

What all that boils down to is that at the end of the day we are left with a fairly complex database design that included dozens (often hundreds) of small tables.  To further complicate things, to bring all this data back together sometimes there are multiple join paths we could use to define the connections between tables.

To get around this when we create a dimensional design we bring together a lot of the data that we split apart in the transactional design.

The most common form of a dimensional design is a star schema.  It’s called this because we usually see a central table surrounded by 6-10 (sometimes more) other tables.  The central table in this design is called a fact table.  The fact table contains the things we are trying to measure for one business process.  For example, if we are modeling a sales process, the fact table might hold Sale Amount, Tax Amount, Shipping Price and quantity.  The sales fact table would also contain links to other tables that give those measurements context.  If we are modeling a sales process, we will need to know some things about the total sale amount.  We would want to know the product, the customer, the date of the sale and maybe where the product was shipped from.  These additional details give the measurement context.

The great thing about a star schema is that it’s easy to connect to and query because all the supporting tables are (usually) only one join away from the measurements we want to explore.  The queries needed to retrieve data from a star schema are usually significantly less complex than a query designed to pull exactly the same information from a transactional schema.

Note:  Schema is just a geeky way to say database design 🙂

The takeaway here:  Transactional databases are great for capturing data, Dimensional databases are great of analyizing data.

I hope this helps you understand some of the differences between a traditional (transactional) database and a data warehouse (dimensional design).  What do you think?  I’m always looking for new ways to make these concepts easier to understand.  Leave me a note in the comments with how YOU see these databases differently.

Leave a Reply

Your email address will not be published. Required fields are marked *