This is a post that touches on what I think is one the essential best-practices for ETL design: the ability to process multiple changes for the same key in a single pass. This is specifically relevant for typical ETL processes that load data to a time-variant target (PSA, Satellite, Dimension etc.). For non-time variant targets (Hubs, Links etc.) the process is a bit easier as this is essentially built-in the patterns already :-). In a given process, there are usually (at least) two rules I maintain:
- Making sure there is a safety catch to prevent loading information multiple times (by accident, out of order etc.)
- Making sure the correct delta is selected to be merged with the target
The paper I’ve written here (click the link below to open) captures the essence that explains the second topic: how the correct delta is selected.
Also, if you happen to be in Melbourne in March and are interested in hearing more on these topics as a classroom training please have a look at the Data Vault implementation course as well.