Spark Shell not able to run in cluster mode in 0.9.1 sandboxes

Description

The 0.9.1 sandboxes seem unable to support spark shell cluster mode. I'm not sure why yet. My issue came up while attempting to test user impersonation in the VirtualBox sandbox with my local mods. After some time I decided to test the 0.9.1 CDH and HDP sandboxes. For CDH I loaded a pre-existing AWS sandbox of mine that I believe had no modifications from the base sandbox and I noticed the error there. So I went to EC2 and created a fresh 0.9.1 HDP sandbox. After waiting some time for start up I modified spark.properties to be:

I then went to Visual Query page and waited until I saw a process appear:

I watched the spark shell logs proceed through the steps of launching a spark job until it fails with:

Environment

None

Activity

Show:
Tim Harsch
September 11, 2018, 4:20 PM

On EC2 0.9.1. CDH the behavior was slightly different.. The logs showed the stated ACCEPTED, then went to RUNNING, then went to killed after some time:

Tim Harsch
September 11, 2018, 4:59 PM

Tried with a fresh CDH VirtualBox. Spark Shell was able to launch, but was stuck repeating the log every 1s:

after several minutes I went to VQ page, chose venues table and proceeded to next step. After 2 minutes or so I got the following error:

Tim Harsch
September 11, 2018, 7:56 PM

Attached is a large stack trace log, KYLO-2576_0.9.1_CDH_VirtualBox-err.txt, taken from one of the container logs as the spark job was running in 0.9.1 CDH VB environment. It came from the container log at
/var/log/hadoop-yarn/containers/application_1536684030384_000*
The pertinent part of the error is:

This error was also observed in an old version of Kylo. See KYLO-441

Tim Harsch
September 11, 2018, 7:58 PM

yarn cluster mode is working in my EC2 0.9.1 HDP instance (earlier it produced the log in the description). I'm guessing the first time it failed was cause the instance was overloaded...

Tim Harsch
September 11, 2018, 8:24 PM

The EC2 0.9.1 CDH environment still behaves differently, but unable to do spark shell in yarn cluster mode. The error messages kylo-services logs and in container logs don't say alot.

After visiting the visual query page the shell launches and we see this repeatedly in kylo-services log

Periodically we see this in container logs:

After some time I choose concerts.venues and click continue and we immediately see:

A couple of the containers eventually got this error. After the application was killed by yarn:

Assignee

Unassigned

Reporter

Tim Harsch

Labels

None

Reviewer

None

Story point estimate

None

Components

Affects versions

Priority

Medium