Kylo v0.7.1 validate jar causes spark stage failure in Spark2+

Description

Environment

None

Activity

Show:
Jagrut Sharma
April 7, 2017, 11:59 PM

The issue was due to an API change between Spark 1 and 2. Fixed the issue and added a related unit test for each version.

Tim Harsch
March 28, 2017, 2:41 PM

NOTE: the jar can be attached to the jira issue due to a 10MB file size limit in the system. Follow above instructions to work around the issue.

Tim Harsch
March 21, 2017, 12:12 AM
Edited

My steps to alleviate the problem were roughly of the form:
% rm /opt/nifi/current/lib/app/kylo-spark-validate-cleanse-jar-with-dependencies.jar
% cd /tmp
% wget http://bit.ly/2l5p1tK (the 0.7.0 RPM)
% rpm2cpio 2l5p1tK | cpio -idmv
% scp opt/kylo/setup/nifi/kylo-spark-validate-cleanse-spark-v2-0.7.0-jar-with-dependencies.jar /opt/nifi/current/lib/app
% ln -s kylo-spark-validate-cleanse-spark-v2-0.7.0-jar-with-dependencies.jar /opt/nifi/current/lib/app/kylo-spark-validate-cleanse-jar-with-dependencies.jar

MAKE sure the link is point to a valid jar. No need to restart your processor, the next spark-submit from Validate and Split Records will use the new jar.

Scott Reisdorf
March 21, 2017, 12:00 AM
Edited

Tested with userdata.csv file and it works with kylo-spark-validate-cleanse-spark-v2-0.7.0-jar-with-dependencies.jar, but doesnt with the 0.7.1 jar

registration_dttm timestamp
id - int
rest of the columns string

Tim Harsch
March 20, 2017, 11:55 PM

Things I tried to remedy the situation:

  • changed spark.version and scala.version to be exact match with the EMR cluster I was testing on spark 2.1.0 and scala 2.11.8 (no luck)

  • dropped in the kylo-spark-validate-cleanse-spark-v2-0.7.0-jar-with-dependencies.jar from version 0.7.0 (which made the error go away)

Done

Assignee

Jagrut Sharma

Reporter

Tim Harsch

Labels

None

Reviewer

None

Story point estimate

None

Components

Sprint

None

Fix versions

Affects versions

Priority

High