It should be made clear that this tutorial can only be run on the HDP sandbox: it is not compatible with the CDH sandbox due to incompatible versions of Kafka 0.8 and Spark 1.6 on CDH, and no way of upgrading the sandbox to support Kafka 0.10 and Spark 2 without parcels (which cannot be used in the sandbox).
Further, the Twitter Sentiment analysis utility linked to on github is not compatible with Spark 2, but all the rest of the scripts and templates that arrive with the tutorial are meant for Spark 2. Luckily I found that a fork exists that does work: https://github.com/jatin7/twitter-sentiment-analysis
Building this Twitter Sentiment code was overly tricky because it didn't explain what parts were relevant. For the record, all that was needed for the tutorial to work was:
git clone https://github.com/jatin7/twitter-sentiment-analysis.git
curl -L -O https://dl.bintray.com/sbt/native-packages/sbt/0.13.9/sbt-0.13.9.tgz
tar xvzf sbt-0.13.9.tgz
JAVA_OPTS=-Xmx2G sbt/bin/sbt clean package assembly
Getting the spark classpath right was also difficult:
sudo vim /etc/spark2/conf/spark-defaults.conf
where /opt/spark-receiver/ is a directory and has files:
rw-rr- 1 centos centos 211938 Jul 15 11:44 ejml-0.23.jar
-rwxr-xr-x 1 centos centos 6047 Jul 15 10:03 sentiment-job-kafka.scala
rw-rr- 1 centos centos 190507 Jul 15 11:37 spark-streaming-kafka-0-10_2.11-126.96.36.199.6.5.0-292.jar
rw-rr- 1 centos centos 8146873 Jul 15 11:37 stanford-corenlp-3.9.2.jar
-rwxr-xr-x 1 centos centos 2103 Jul 15 10:02 stream-submit-kafka.sh
rw-rr- 1 centos centos 779055 Jul 15 11:37 twitter-sentiment-analysis-0.1-SNAPSHOT.jar
The actual Kylo template needs some refinement as the Twitter api credentials and twitter.terms variable doesn't pass through properly into NiFi.
I was able to get it to work eventually, but it shouldn't have been so hard.