NoETL – Data Vault Satellite tables
The recent presentations provides a push to wrap up the development and release of the Data Vault virtualisation initiative, so now everything is working properly the next few posts should be...
View ArticleQuick and easy referential integrity validation (for dynamic testing)
This post is in a way related to the recent post about generating some test data. In a similar way I was looking for ways to make life a bit easier when it comes to validating the outputs of Data Vault...
View ArticleNoETL – Data Vault Link tables
Virtualising Data Vault Link structures follows a similar process to that of the virtual Hubs, with some small additions such as the support for (optional) degenerate attributes. To make things a bit...
View ArticleNoETL and ETL automation metadata overview
One of the last items to write about regarding Data Warehouse virtualisation (and any other form of ETL generation) is the handling of the metadata itself. In a previous post I covered what metadata...
View ArticleNoETL – Data Vault Link Satellite tables (part 1)
The final of the series of planned posts (for now at least) about Data Warehouse Virtualisation is all about Link Satellites. As with some of the earlier posts there are various similarities to the...
View ArticleNoETL – Data Vault Link Satellite tables (part 2)
This is the second part of the Link Satellite virtualisation overview (the first post on this topic is here), and it dives deeper into the logic behind Driving Key based Link Satellites. Driving Key...
View ArticleData Warehouse versioning… for virtualisation
Recent discussions around Data Warehouse virtualisation made me realise I forgot to post one of the important requirements: version control. In the various recent presentations this was discussed at...
View ArticleLoading too fast for unique date/time stamps – what to do?
Let’s start by clarifying that this concerns the RDBMS world, not the Hadoop world It’s a good problem to have – loading data too quickly. So quickly that, even at high precision, multiple changes for...
View ArticleVirtual Enterprise Data Warehouse ideas & updates (towards 1.2)
Lately I have had a bit more head space to work on some ideas I find interesting, and these are now intended to culminate into ‘version 1.2’ of the Virtual EDW tool I have been developing. I’ve been...
View ArticleThe DWH Time Machine: synchronising model and automation metadata versions
I’ve completed a fairly large body of work that I’ve been meaning to do for a long time: how to automatically version the Data Warehouse data model in sync with the version of the ETL automation...
View ArticleForeign Keys in the Staging Layer – joining or not?
Warning – this is another post in the ‘options and considerations’ context, meaning that some people will probably disagree with this based on their personal convictions or ideas! One or two...
View ArticleBest practices on developing Data Vault in SQL Server (including SSIS)
Sharing is caring, so today’s post covers some technical details for the Microsoft world: implementing Data Vault models on the SQL Server database and corresponding ETL using SSIS and technologies...
View ArticleData Vault ETL Implementation using SSIS: Step 7 – Link Satellite ETL – part...
I’m catching up on old drafts within WordPress, and in the spirit of being complete on the older SSIS series felt I should pick this one up and complete it. While most of my focus is on developing the...
View ArticleUnknown keys (zero keys or ghost keys) in Hubs for DV2.0
I am still working towards capturing the generation (using BIML in SSIS) and virtualisation (using views / SQL) of the Presentation Layer (in a Dimensional Model). But before we get there, some topics...
View ArticleWhy you really want a Persistent Staging Area in your Data Vault architecture
Recently at the Worldwide Data Vault Conference in Vermont USA (WWDVC) I had many conversations about the Persistent Staging Area (PSA) concept, also known as Historical Staging Area. I have been using...
View ArticleAdvanced row condensing for Satellites
When it comes to record condensing, DISTINCT just doesn’t cut it. I’ve been meaning to post about this for ages as the earliest templates (as also posted on this site) were not flexible enough to work...
View ArticleTech tip: making SSIS Project Connections generate correctly using BIML Express
A bit more of a technical view on things today. In order to stay up to date with the latest when it comes to generating ETL for the Microsoft stack (SSIS), I recently upgraded from Visual Studio 2013...
View ArticleCreating Data Vault Point-In-Time and Dimension tables: merging historical...
Beyond creating Hubs, Links and Satellites and current-state (Type 1) views off the Data Vault, one of the most common requirements is the ability to represent a complete history of changes for a...
View ArticleSome insights about … Insights
Can I get some insights, please? Over the years, I have come to somewhat dislike the term ‘insights’ almost to the same level as, say, a ‘Data Lake’. And that’s saying something. Not because these...
View ArticleWhen is a change a ‘change’?
This is a post that touches on what I think is one the essential best-practices for ETL design: the ability to process multiple changes for the same key in a single pass. This is specifically relevant...
View Article