We need to match the properties of a data set that a client is attempting to create with existing sources/sets so that we return a matching data set rather than creating a duplicate with the same properties. Using properties like the format, paths, etc. should provide enough detail to establish whether two submitted data sets point to the same underlying data.
Convert the simple create() method DataSetProvider into a builder so that enough details can be captured before determining whether a create is needed
Add a hash or key field (derived from data set properties) to the data set metadata for querying matching data sets against a newly submitted one
(possibly) delegate to the connector plugins to derive the hash as that is the best place to encapsulate knowledge about how properties of a particular type of data set uniquely identify underlying data in the system the connector represents