Twitter Sentiment Tutorial docs improvements


It should be made clear that this tutorial can only be run on the HDP sandbox: it is not compatible with the CDH sandbox due to incompatible versions of Kafka 0.8 and Spark 1.6 on CDH, and no way of upgrading the sandbox to support Kafka 0.10 and Spark 2 without parcels (which cannot be used in the sandbox).

Further, the Twitter Sentiment analysis utility linked to on github is not compatible with Spark 2, but all the rest of the scripts and templates that arrive with the tutorial are meant for Spark 2. Luckily I found that a fork exists that does work:

Building this Twitter Sentiment code was overly tricky because it didn't explain what parts were relevant. For the record, all that was needed for the tutorial to work was:

git clone
cd twitter-sentiment-analysis
curl -L -O
tar xvzf sbt-0.13.9.tgz
JAVA_OPTS=-Xmx2G sbt/bin/sbt clean package assembly

Getting the spark classpath right was also difficult:

sudo vim /etc/spark2/conf/spark-defaults.conf

spark.driver.extraClassPath /usr/lib/kafka/libs/:/usr/lib/spark2/jars/spark-streaming_2.11-*

where /opt/spark-receiver/ is a directory and has files:
rw-rr- 1 centos centos 211938 Jul 15 11:44 ejml-0.23.jar
-rwxr-xr-x 1 centos centos 6047 Jul 15 10:03 sentiment-job-kafka.scala
rw-rr- 1 centos centos 190507 Jul 15 11:37 spark-streaming-kafka-0-10_2.11-
rw-rr- 1 centos centos 8146873 Jul 15 11:37 stanford-corenlp-3.9.2.jar
-rwxr-xr-x 1 centos centos 2103 Jul 15 10:02
rw-rr- 1 centos centos 779055 Jul 15 11:37 twitter-sentiment-analysis-0.1-SNAPSHOT.jar

The actual Kylo template needs some refinement as the Twitter api credentials and twitter.terms variable doesn't pass through properly into NiFi.

I was able to get it to work eventually, but it shouldn't have been so hard.


Your pinned fields
Click on the next to a field label to start pinning.




Brad Rushworth