Support SCD Type 2 Merges

Description

An extremely common data warehousing pattern is ingesting data into a dimension table and performing a 'Slowly Changing Dimension Type 2' Updates.
You will encounter this need when for example Customer, Product, Location(store) data data are ingested into a Big Data data warehouse.
You may see this as a result of ingesting claim data updates and a current view is required.

This operation is relatively easy to perform on an ACID based store but the logic is complex for Hive with its append only backing distributed file system.
This story will extend the Table Merge processor to accommodate the logic necessary to easily merge updates into an existing dimensional table stored in Hive on HDFS( or S3 or ADLS)?

The requested extension will need to know what the primary key parts are, what strategy (versioning or end dating) is being used, what the corresponding version/end date fields are.

Status

Assignee

Unassigned

Reporter

Douglas Moore

Labels

Reviewer

None

Components

Priority

Medium
Configure