Transformation template splits by default any column value using comma separator post-transformation phase

Description

The transformation template will split any column value using comma separator, post transformation phase, resulting in column shifts. This happens after the Spark Job which applies the transformations (if any)

Expected: data row values are preserved in the same state, even if they contain commas

The visual query does not split the contents, while applying transformations, so the fault could be in the Merge table phase.

How to reproduce:

  • ingest a CSV which is semi-colon separated, and which contains in 1( n) column(s): values and commas. Target format: parquet

  • transform the ingested table. No transformation is needed. Target format: parquet / orc (didn't test with other formats)

  • check the table data, and see that the column which contained comma separated values was split, and the values are now present in other columns (in the target table)

The impact is that columns which contain valid rows with commas for values and also transformations which concatenate using commas, are not processed correctly.

Environment

CDH 5.10
NiFi 1.3

Assignee

Unassigned

Reporter

Claudiu Stanciu

Labels

None

Reviewer

None

Story point estimate

None

Affects versions

Priority

Medium
Configure