Match data sets being created against existing instances to avoid duplicates

Description

We need to match the properties of a data set that a client is attempting to create with existing sources/sets so that we return a matching data set rather than creating a duplicate with the same properties. Using properties like the format, paths, etc. should provide enough detail to establish whether two submitted data sets point to the same underlying data.

Changes needed:

  • Convert the simple create() method DataSetProvider into a builder so that enough details can be captured before determining whether a create is needed

  • Add a hash or key field (derived from data set properties) to the data set metadata for querying matching data sets against a newly submitted one

  • (possibly) delegate to the connector plugins to derive the hash as that is the best place to encapsulate knowledge about how properties of a particular type of data set uniquely identify underlying data in the system the connector represents

Status

Assignee

Sean Felten

Reporter

Sean Felten

Labels

None

Reviewer

None

Epic Link

Components

Sprint

None

Fix versions

Priority

Highest
Configure