Quantcast
Channel: Roelant Vos
Viewing all articles
Browse latest Browse all 56

NoETL – Data Vault Link Satellite tables (part 1)

$
0
0

The final of the series of planned posts (for now at least) about Data Warehouse Virtualisation is all about Link Satellites. As with some of the earlier posts there are various similarities to the earlier approaches – most notably the Satellite virtualisation and processing. Concepts such as zero records and ‘virtual’ or computed end-dating are all there again, as are the constructions of using subqueries to do attribute mapping and outer queries to calculate hash keys. As with the previous solutions the logic will use pre-calculated hash keys if they’re there (e.g. already generated going into Staging) but otherwise calculate the hash keys ‘on the fly’.

For the purpose of explaining the logic I have categorised ‘Link Satellites’ into normal, or historical, Link-Satellites and Driving Key based Link Satellites. The normal ones act exactly the same as the Satellite processes including the support of multi-active attributes and the way zero records are handled. The only different, obviously, is that Link Satellites provide context (track history) for relationships (Links) instead of business concepts (Hubs).

The Driving Key based Link Satellites are slightly different, as they focus on capturing history of the relationship itself. There are (typically) no attributes to be tracked here – it’s really about the relationship (as in the combination of Hub keys) itself.

An easy example is the hypothetical relationship between an employee and a company. This information is captured in a Link relationship table; an employee key and company key are created as a Link record. If the employee decides to work for another company this creates a new Link record that captures the existence of a relationship between the employee and the other (new) company. There are now two records in the Link table as over time there have been two relationships.

But here is the thing: if we agree that the move from the employee should end-date the relationship of that employee with the previous company we should create a record in the Link Satellite to reflect this. We are effectively defining what is the ‘one’ side (as in ‘one-to-many’) in an entity that is setup as a ‘many-to-many’ table. From a modelling point of view we are making the employee the ‘one’ side: an employee can only be associated with a single company at any point in time. In other words: employee is the ‘driving key’.

In the virtualisation / automation metadata I use user defined properties / labels on to identify which of the Hub keys in a given Link acts as the driving key. This is forward-engineered as an extended property that can be queried from the data dictionary.

Driving key based logic is more interesting than the regular Satellite / Link-Satellite logic, mainly because you need to cater for re-opening closed relationships over time. An interesting challenge to virtualise!

I won’t repeat myself talking about the metadata in detail (the examples can be downloaded here anyway). The virtualisation tool performs the usual SQL / view generation and deploys the views using the SQL Server SMO API. It’s probably better to go straight into the query logic itself. The only comment I want to make relates to the example metadata which uses LSAT_CUSTOMER_OFFER as the Driving Key example:

LSAT_generation

For normal / historical Link Satellites (the ones with attributes) the example and SQL are displayed below. The example used is LSAT_CUSTOMER_COSTING; this is a ‘multi-active Link-Satellite’ in the spirit in adding some complexity. The COSTING_EFFECTIVE_DATE is sourced from ‘Date_Effective’, just to demonstrate some options. Since this is the last post I have added the majority of exceptions in this SQL :-) .

LSAT_CUSTOMER_COSTING

Regular_LSAT_SQL

This query provides you the representation of the Link Satellite table.

Regular LSAT Output

 As before INSERT INTO statements can be generated from here, but if you have read some of the previous posts you will very likely believe this  by now. The next post will go into more detail about the Driving Key logic since this differs slightly from the SQL as displayed above.


Viewing all articles
Browse latest Browse all 56

Trending Articles