This post shares and expands on ideas I picked up reading Agile Data Warehouse design by Lawrence Corr. If you are interested in data warehouse design, you should read this book. It’s a game changer.
We are hardwired for stories. It’s baked into us from early childhood. We learned by watching Big Bird, Bob the builder and Dora the Explorer. When we got older, we learned to read so we could read to learn. Today in business the most compelling story wins. We look to the data, form ideas based on the patterns and outliers we see in the data and tell those stories. Nobody remembers facts, but when you surround those facts with context, when you look back at trends and look forward to project growth … you are telling stories. The more compelling your story, the more successful you will be. In business and in life, we crave a good story.
I share these ideas because there is more than one way to organize our data in a data warehouse, and each choice we make in how we save our data will affect the kind of story we can tell.
The most common type of story (by far), is the short story. I think of a short story as a story that has a known beginning and end. A short story is built on an event that happened, we recorded the details of that event and then we moved on. We can look back at any point in time to craft a story of who, what, where, when, why, how many … all the basic building blocks we learned when we wrote our first stories in grade school.
We think of the elements of a short story as being discrete events. These events happened once, had a known duration (usually) and most (or all) of the details of the event are known by the time we record the event in our database.
Examples of short stories we might capture in the data warehouse include retail sales (WHO made the sale, WHO was the customer, WHERE did the sale occur, WHEN did the sale occur, HOW MANY things were sold) and incoming calls to a call center (WHO called, WHO worked the call, WHEN did the call start, WHEN did the call end, HOW MANY minutes was the call active).
Another story we might tell is the saga. The saga is a story with many milestones. Think of one of my favorite movies, the Blues Brothers.
- Jake gets out of jail
- Jake and his brother Elwood decide to put the band back together
- The brothers find the other guys in the band
- The band does a couple gigs
- The band plays a big show to save the orphanage
- The brothers deliver the money to the tax office in Chicago
One story, six milestones.
Now let’s look at a more business focussed example of a saga. Think of a college admissions process. We’ll imagine that in this process there are several known milestones.
- Student submits college application
- Application is reviewed
- offer is sent to the student
- Student decides to accept or decline the offer
- Student starts classes
I love this pattern for storing data in a data warehouse. It gives us a lot of interesting analytics that would be difficult to get otherwise. With our data saved in this way, we can look at a record to determine the interval between milestones (average # days between application received at college and offer sent to the student, average # of days between student submitting an application and student starting classes). We can also look down through the records to see how many students were at any milestone at any point in time and we can calculate how many students dropped out of the process at any given milestone. I think of this story as an evolving event. One of the things that set the evolving event (saga) apart from the discrete events (short story) we started with is that the evolving event (saga) is updated as new details become available. In a discrete event (short story), we usually know all the details of the event when we initially saved the record.
Serial Drama (Soap opera)
The final type of story I’d like to share is the serial drama (or soap opera). The defining characteristic of this kind of story is that we see the same things happen over and over and over again with only minor changes to the details. We can also think of these as recurring events. These events recur at predictable intervals. Common examples of this type of event might be stock levels in a warehouse or account balances in a bank account. We might have other tables that record actual, ‘real time’ activity in bank accounts or warehouse inventory but it would be cumbersome to go through all those details to aggregate weekly, monthly, quarterly or yearly averages. We can save snapshots of the totals at whatever level of detail we need and end up with tables that are much smaller and faster to query. We do lose detail, but for some analytics, we won’t need that detail anyway.
The data warehouse team would think of the short story as a transactional fact table, the saga as an accumulating snapshot fact table and a serial (soap) as a periodic snapshot fact table. I’ve found that it’s better to avoid the technically correct name for the different kinds of fact tables in favor of the more memorable story based names. My experience has been that conversations between the business and technical teams are a lot easier when we keep the terminology we use as simple and descriptive as possible, also, I constantly confuse the accumulating snapshot and periodic snapshot. For me, evolving facts (saga) and recurring facts (serial/soap) are just a lot easier to remember. 🙂
I hope this post helps you see some new ways you might work with your data. To continue on and get more detail about these fact table types and how to implement them I recommend you read Agile Data Warehouse design (for background) and Star Schema, the complete reference (for how to implement). Both books are part of my Data Warehouse Design reading list.