ArrayIndexOutOfBoundsException when excluding attributes

Description

I have an Ingest feed that take a CSV in input with 17 fields.
I excluded 3 of them (#3, #15 and #16) using the feed wizard UI.
When I import a file, I get the following ArrayIndexOutOfBoundsException in nifi-app.log :

2017-06-19 16:14:14,130 INFO [stream error] c.t.nifi.v2.spark.ExecuteSparkJob ExecuteSparkJob[id=2b1f1ef0-4e06-1bd6-4fb2-c6132e1a8dde] 17/06/19 16:14:14 INFO DAGScheduler: ResultStage 1 (saveAsTable at DataSet16.java:89) failed in 0.138 s
2017-06-19 16:14:14,130 INFO [stream error] c.t.nifi.v2.spark.ExecuteSparkJob ExecuteSparkJob[id=2b1f1ef0-4e06-1bd6-4fb2-c6132e1a8dde] 17/06/19 16:14:14 INFO DAGScheduler: Job 1 failed: saveAsTable at DataSet16.java:89, took 0.189666 s
2017-06-19 16:14:14,134 INFO [stream error] c.t.nifi.v2.spark.ExecuteSparkJob ExecuteSparkJob[id=2b1f1ef0-4e06-1bd6-4fb2-c6132e1a8dde] 17/06/19 16:14:14 ERROR Validator: Failed to perform validation
2017-06-19 16:14:14,134 INFO [stream error] c.t.nifi.v2.spark.ExecuteSparkJob ExecuteSparkJob[id=2b1f1ef0-4e06-1bd6-4fb2-c6132e1a8dde] org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 1.0 failed 1 times, most recent failure: Lost task 0.0 in stage 1.0 (TID 1, localhost): java.lang.ArrayIndexOutOfBoundsException: 16
2017-06-19 16:14:14,134 INFO [stream error] c.t.nifi.v2.spark.ExecuteSparkJob ExecuteSparkJob[id=2b1f1ef0-4e06-1bd6-4fb2-c6132e1a8dde] at org.apache.spark.sql.catalyst.expressions.GenericInternalRow.genericGet(rows.scala:227)
2017-06-19 16:14:14,134 INFO [stream error] c.t.nifi.v2.spark.ExecuteSparkJob ExecuteSparkJob[id=2b1f1ef0-4e06-1bd6-4fb2-c6132e1a8dde] at org.apache.spark.sql.catalyst.expressions.BaseGenericInternalRow$class.getAs(rows.scala:35)
2017-06-19 16:14:14,134 INFO [stream error] c.t.nifi.v2.spark.ExecuteSparkJob ExecuteSparkJob[id=2b1f1ef0-4e06-1bd6-4fb2-c6132e1a8dde] at org.apache.spark.sql.catalyst.expressions.BaseGenericInternalRow$class.isNullAt(rows.scala:36)
2017-06-19 16:14:14,134 INFO [stream error] c.t.nifi.v2.spark.ExecuteSparkJob ExecuteSparkJob[id=2b1f1ef0-4e06-1bd6-4fb2-c6132e1a8dde] at org.apache.spark.sql.catalyst.expressions.GenericInternalRow.isNullAt(rows.scala:221)
2017-06-19 16:14:14,134 INFO [stream error] c.t.nifi.v2.spark.ExecuteSparkJob ExecuteSparkJob[id=2b1f1ef0-4e06-1bd6-4fb2-c6132e1a8dde] at org.apache.spark.sql.hive.execution.InsertIntoHiveTable$$anonfun$org$apache$spark$sql$hive$execution$InsertIntoHiveTable$$writeToFile$1$1.apply(InsertIntoHiveTable.scala:107)
2017-06-19 16:14:14,134 INFO [stream error] c.t.nifi.v2.spark.ExecuteSparkJob ExecuteSparkJob[id=2b1f1ef0-4e06-1bd6-4fb2-c6132e1a8dde] at org.apache.spark.sql.hive.execution.InsertIntoHiveTable$$anonfun$org$apache$spark$sql$hive$execution$InsertIntoHiveTable$$writeToFile$1$1.apply(InsertIntoHiveTable.scala:104)

I created another feed where I do not exclude those 3 fields, and that time it worked.

The issue looks related to https://kylo-io.atlassian.net/browse/KYLO-493
However, it is marked as closed in 0.8.1 which is the one I am running in the sandbox.

Environment

Anonymized and modified CSV (header is also modified manually)

EMAIL;IP;NS;DATE1;GENDER;FNAME;LNAME;DATE2;ADDRESSE_L1;ADDRESSE_L2;ZIP;CITY;COUNTRY;PHONE;ATTR1;ATTR2;ID
user.20131025.161309@wrage.com;15.255.155.255;0e58162530b4fdf849c2e5689e9dc8a6;2015-10-25 15:15:25;3;fake;test;1955-05-25;55 Versailles Street;;75020;paris;;;ee25e7b935f074c656bb50ba75f9a5f235;73d575ee531784f57315640df5b65cb4;XYZ

Assignee

Scott Reisdorf

Reporter

Mathieu Marie

Labels

Reviewer

None

Story point estimate

None

Components

Sprint

None

Fix versions

Affects versions

Priority

Medium
Configure