Twitter Sentiment Tutorial docs improvements

Description

https://kylo.readthedocs.io/en/v0.10.0/how-to-guides/SparkStreamingTutorial.html

It should be made clear that this tutorial can only be run on the HDP sandbox: it is not compatible with the CDH sandbox due to incompatible versions of Kafka 0.8 and Spark 1.6 on CDH, and no way of upgrading the sandbox to support Kafka 0.10 and Spark 2 without parcels (which cannot be used in the sandbox).

Further, the Twitter Sentiment analysis utility linked to on github is not compatible with Spark 2, but all the rest of the scripts and templates that arrive with the tutorial are meant for Spark 2. Luckily I found that a fork exists that does work: https://github.com/jatin7/twitter-sentiment-analysis

Building this Twitter Sentiment code was overly tricky because it didn't explain what parts were relevant. For the record, all that was needed for the tutorial to work was:

git clone https://github.com/jatin7/twitter-sentiment-analysis.git
cd twitter-sentiment-analysis
curl -L -O https://dl.bintray.com/sbt/native-packages/sbt/0.13.9/sbt-0.13.9.tgz
tar xvzf sbt-0.13.9.tgz
JAVA_OPTS=-Xmx2G sbt/bin/sbt clean package assembly

Getting the spark classpath right was also difficult:

sudo vim /etc/spark2/conf/spark-defaults.conf

spark.driver.extraClassPath /usr/lib/kafka/libs/:/usr/lib/spark2/jars/spark-streaming_2.11-2.3.0.2.6.5.0-292.jar:/usr/hdp/2.6.5.0-292/kafka/libs/kafka-:/usr/hdp/2.6.5.0-292/spark2/examples/jars/spark-examples_2.11-2.3.0.2.6.5.0-292.jar:/usr/hdp/2.6.5.0-292/kafka/libs/kafka_2.11-1.0.0.2.6.5.0-292.jar:/usr/hdp/2.6.5.0-292/kafka/libs/kafka-clients-1.0.0.2.6.5.0-292.jar:/opt/spark-receiver/*

where /opt/spark-receiver/ is a directory and has files:
rw-rr- 1 centos centos 211938 Jul 15 11:44 ejml-0.23.jar
-rwxr-xr-x 1 centos centos 6047 Jul 15 10:03 sentiment-job-kafka.scala
rw-rr- 1 centos centos 190507 Jul 15 11:37 spark-streaming-kafka-0-10_2.11-2.3.0.2.6.5.0-292.jar
rw-rr- 1 centos centos 8146873 Jul 15 11:37 stanford-corenlp-3.9.2.jar
-rwxr-xr-x 1 centos centos 2103 Jul 15 10:02 stream-submit-kafka.sh
rw-rr- 1 centos centos 779055 Jul 15 11:37 twitter-sentiment-analysis-0.1-SNAPSHOT.jar

The actual Kylo template needs some refinement as the Twitter api credentials and twitter.terms variable doesn't pass through properly into NiFi.

I was able to get it to work eventually, but it shouldn't have been so hard.

Environment

None

Status

Assignee

Unassigned

Reporter

Brad Rushworth

Labels

None

Reviewer

None

Story point estimate

None

Components

Affects versions

0.10.0

Priority

Medium
Configure