The 0.9.1 sandboxes seem unable to support spark shell cluster mode. I'm not sure why yet. My issue came up while attempting to test user impersonation in the VirtualBox sandbox with my local mods. After some time I decided to test the 0.9.1 CDH and HDP sandboxes. For CDH I loaded a pre-existing AWS sandbox of mine that I believe had no modifications from the base sandbox and I noticed the error there. So I went to EC2 and created a fresh 0.9.1 HDP sandbox. After waiting some time for start up I modified spark.properties to be:
I then went to Visual Query page and waited until I saw a process appear:
I watched the spark shell logs proceed through the steps of launching a spark job until it fails with:
On EC2 0.9.1. CDH the behavior was slightly different.. The logs showed the stated ACCEPTED, then went to RUNNING, then went to killed after some time:
Tried with a fresh CDH VirtualBox. Spark Shell was able to launch, but was stuck repeating the log every 1s:
after several minutes I went to VQ page, chose venues table and proceeded to next step. After 2 minutes or so I got the following error:
Attached is a large stack trace log, KYLO-2576_0.9.1_CDH_VirtualBox-err.txt, taken from one of the container logs as the spark job was running in 0.9.1 CDH VB environment. It came from the container log at
The pertinent part of the error is:
This error was also observed in an old version of Kylo. See KYLO-441
yarn cluster mode is working in my EC2 0.9.1 HDP instance (earlier it produced the log in the description). I'm guessing the first time it failed was cause the instance was overloaded...
The EC2 0.9.1 CDH environment still behaves differently, but unable to do spark shell in yarn cluster mode. The error messages kylo-services logs and in container logs don't say alot.
After visiting the visual query page the shell launches and we see this repeatedly in kylo-services log
Periodically we see this in container logs:
After some time I choose concerts.venues and click continue and we immediately see:
A couple of the containers eventually got this error. After the application was killed by yarn: