Architecture around Joining datasets and how to handle ingest

Description

Design how our framework handles joining with multiple sources (i.e. 2 or more files/datasets with potentially different watermarks

  • support watermark from each sources and at join condition prompt user with options on what to do

  • how do we handle usecase where only part of the datasets have updates? Rollback watermark?

  • Do we support just a single source as the "primary" source which is the driver for what to do. that would be the trigger source and dictate when the data is processed

  • Can we create a framework and also include the Precondition event model and handle joining different datasets, with watermark, and precondition for triggering and running feeds?

Assignee

Unassigned

Reporter

Scott Reisdorf

Labels

None

Reviewer

None

Priority

Medium
Configure