Kylo v0.7.1 validate jar causes spark stage failure in Spark2+
The issue was due to an API change between Spark 1 and 2. Fixed the issue and added a related unit test for each version.
NOTE: the jar can be attached to the jira issue due to a 10MB file size limit in the system. Follow above instructions to work around the issue.
My steps to alleviate the problem were roughly of the form:
% rm /opt/nifi/current/lib/app/kylo-spark-validate-cleanse-jar-with-dependencies.jar
% cd /tmp
% wget http://bit.ly/2l5p1tK (the 0.7.0 RPM)
% rpm2cpio 2l5p1tK | cpio -idmv
% scp opt/kylo/setup/nifi/kylo-spark-validate-cleanse-spark-v2-0.7.0-jar-with-dependencies.jar /opt/nifi/current/lib/app
% ln -s kylo-spark-validate-cleanse-spark-v2-0.7.0-jar-with-dependencies.jar /opt/nifi/current/lib/app/kylo-spark-validate-cleanse-jar-with-dependencies.jar
MAKE sure the link is point to a valid jar. No need to restart your processor, the next spark-submit from Validate and Split Records will use the new jar.
Tested with userdata.csv file and it works with kylo-spark-validate-cleanse-spark-v2-0.7.0-jar-with-dependencies.jar, but doesnt with the 0.7.1 jar
id - int
rest of the columns string
Things I tried to remedy the situation:
changed spark.version and scala.version to be exact match with the EMR cluster I was testing on spark 2.1.0 and scala 2.11.8 (no luck)
dropped in the kylo-spark-validate-cleanse-spark-v2-0.7.0-jar-with-dependencies.jar from version 0.7.0 (which made the error go away)