I often daydream of how I would run things if I had a shop of my own, or if I were given the opportunity to build a team from scratch. The following are some of the things that I feel are important. I’ve only seen all these together in one shop, someday they will all be core elements of MY shop.
Pick one and stick with it to the end of time. Use a consistent naming convention for all the things. Tables, views, stored procedures, functions, aliases within your tSQL, temp tables … ALL THE THINGS. It’s a lot like adding a semi colon to the end of each statement, it’s a bitch to pick up the habit but once you’ve got it, you’ve got it forever. Note: the Redgate SQL Prompt can help with being consistent in how you name your aliases in tsql. You will probably be shocked how much easier your life is when you get in the habit of using the same alias for an object every time you use that object in a query.
You can use any naming convention that you like, I recommend and use Jamie Thompson’s rules. They are solid and in my experience most people that use a naming convention use his.
I think frameworks are awesome. Yes, they take some time to build from scratch, or adapt to your environment if you find an existing framework you like but the time you invest is totally worth it. I have multiple variations of Biml based, meta data driven ETL frameworks for the different ways I have used SSIS over the years. I have a patterns based framework, another that let’s me pick and choose different packages from different projects and a third that is great when all the ETL is in stored procedures and SSIS is just the orchestration engine. BTW, that last pattern (using SSIS to run ELT via stored procedures) is my LEAST favorite way to use SSIS, but sometimes as a consultant I have been offered little choice in how things are done.
So what is a framework? at the most basic, frameworks are a great way to ensure consistency and to increase efficiency in implementing common patterns. One of my favorite SSIS frameworks is the Biml based, meta data driven framework that lets you choose (or change) the pattern you want to implement in each set of packages you build. Using this engine, I can rest easy knowing that every package I create will use the same pattern, yes … but more than that they will all have the same key elements that let you know a pro is behind the design of this ETL. Each package has the same naming convention, error catching and logging, run time auditing, performance tweaks and file archiving strategy. You get all those features with EVERY package the framework creates. I love that.
The other thing I love about the pattern based framework is flexibility. If I want to start with a simple Truncate and Load pattern I can do that, then later if I choose I can very easily switch over to a more sophisticated pattern that handles slowly changing dimensions. It’s as simple as changing an entry in a control table and running the Biml to rebuild the packages. Now that’s COOL!
I LOVE tSQL test. If you aren’t using it yet stop reading right now and go download it. It’s amazing. I love having solid test coverage of my tSQL code. At first I was discouraged by how often writing and running tests brought subtle (and not so subtle) bugs into the light. I make a lot more mistakes than I would have guessed before adapting test into my routine. Now, I love it. I LOVE seeing a test go red when I expect green. An opportunity to clean up a mess before it get’s out of hand? Yeah buddy, I’ll take that every day.
The other thing I love about tSQL test is that you can use it for what I think of as assumption or data validation tests. I now spend a lot more time in focused conversation with business stakeholders to learn what they expect from the data. Do they expect records every day? Is there any day that they expect significantly more or less records than the previous day? What range of values would they expect for key fields. Then I look at reports and quiz them about how they read those reports. What kind of things have they seen in the reports in the past that raised flags. When this report has ‘bad data’, how do they know?
With this kind of background I can write hundreds of data validation tests, each testing a different assumption. Automating execution of these tests after my ETL finishes is a great way to make sure that my work isn’t failing silently, or succeeding despite missing records, duplicate records or bad values that might otherwise go through without notice. Any day I can raise a flag over issues in the source data before the business community find those problems them is a good day.
I’ll write more about the things that are important to me in future posts. What are some of the things that are important to YOU? Speak out in the comments.