Issue on duplicate data in the same file in case of "Dedupe and merge"

Description

As reported by Fabian, if there is a file with an internal duplicate:

key,name,userid
0,paolo,1
1,vanessa,2
1,vanessa,2
2,carlo,3
3,kim,4

with the modality "Dedupe and merge", there is not the information for "Vanessa" on hive.
This happens because in the function "generateMergeNonPartitionQueryWithDedupe" in file TableMergeSyncSupport.java there is the control "having count(processing_dttm) = 1".
This check avoid the correct insert of data because for "Vanessa" the count is 2.
This bug can be solved by add a group by for the second part of the query *union all select " + selectSQL.

Environment

None

Assignee

Scott Reisdorf

Reporter

Davide Gazze

Labels

Reviewer

None

Story point estimate

None

Time tracking

8h

Components

Sprint

None

Fix versions

Affects versions

Priority

Medium
Configure