Day 3 Lab1: Spark Streaming with Kafka Example Introductions In - PDF document

Day 3 – Lab1: Spark Streaming with Kafka Example Introductions In this example, we will write a Spark Streaming program that consumes messages from Kafka. We will reuse the whole setup from the previous lab, so this lab is best done as a continuation of Kafka Streaming setup. For Spark, we will use Scala. To build the program, you will need to download SBT, the Scala Build Tool. This is an easy install! Follow the installation instructions from the site http://www.scala-sbt.org/ and install it for your machine. Spark Streaming Program package com.scispike.kafka import org.apache.kafka.common.serialization.{StringDeserializer, StringSerializer} import org.apache.spark.SparkConf import org.apache.spark.streaming.kafka010.ConsumerStrategies.Subscribe import org.apache.spark.streaming.kafka010.LocationStrategies.PreferConsistent import org.apache.spark.streaming.kafka010._ import org.apache.spark.streaming.{Duration, Seconds, StreamingContext} import org.apache.log4j.{Level, Logger} object SparkKafka { def main(args: Array[String]): Unit = { println("Spark Kafka Example - Word count from a Kafka stream") if (args.length < 3) { System.err.println(s""" |Usage: SparkKafka <brokers> <topics> <interval> | <brokers> is a list of one or more Kafka brokers: broker1,broker2 | <topics> is a list of one or more kafka topics to consume from | <interval> interval duration (ms) | """.stripMargin) System.exit(1)

} // Show only errors in console val rootLogger = Logger.getRootLogger() rootLogger.setLevel(Level.ERROR) // Consume command line parameters val Array(brokers, topics, interval) = args // Create Spark configuration val sparkConf = new SparkConf().setAppName("SparkKafka") // Create streaming context, with batch duration in ms val ssc = new StreamingContext(sparkConf, Duration(interval.toLong)) ssc.checkpoint("./output") // Create a set of topics from a string val topicsSet = topics.split(",").toSet // Define Kafka parameters val kafkaParams = Map[String, Object]( "bootstrap.servers" -> brokers, "key.deserializer" -> classOf[StringDeserializer], "value.deserializer" -> classOf[StringDeserializer], "group.id" -> "use_a_separate_group_id_for_each_stream", "auto.offset.reset" -> "latest", "enable.auto.commit" -> (false: java.lang.Boolean)) // Create a Kafka stream val stream = KafkaUtils.createDirectStream[String, String]( ssc, PreferConsistent, Subscribe[String, String](topicsSet,kafkaParams)) // Get messages - lines of text from Kafka val lines = stream.map(consumerRecord => consumerRecord.value) // Split lines into words val words = lines.flatMap(_.split(" ")) // Map every word to a tuple val wordMap = words.map(word => (word, 1)) // Count occurrences of each word val wordCount = wordMap.reduceByKey(_ + _) //Print the word count wordCount.print() // Start stream processing ssc.start() ssc.awaitTermination() } }

Building and Packaging the Spark Program Go to the directory spark-kafka . You will see that it contains the file build.sbt . We will use it to compile and build a jar file that we can deploy to Spark. In that directory run the command in the terminal: sbt assembly The first time you run sbt, it may take a while, as SBT needs to download Scala in the right version and all needed libraries. The result is that the jar file is created in the directory target/scala-2.11/sparkkafka.jar . Deploying the Program to Dockerized Spark We will move the file to the directory from where we will deploy to Spark: mv target/scala-2.11/spark-kafka.jar ../docker/spark/ Go to the docker directory and run the deployment command. If you don't have the docker cluster running already, you'll have to first run the docker- compose up command. Run the command to submit the jar to Spark. The directory app in the image is mapped to directory spark on our machine, as set in the volumes parameter in the docker- compose.yml file. We are also passing command line parameters for the broker, topic, and interval: docker-compose exec master spark-submit \ --master spark://master:7077 \ /app/spark-kafka.jar \ kafka:9092 stream-input 2000 The program should be running and waiting for the input from the stream-input topic. In terminal, you should see the output of Spark Streaming:

------------------------------------------- Time: 1497743366000 ms ------------------------------------------- ------------------------------------------- Time: 1497743368000 ms ------------------------------------------- Soon, we will see some word counts! In some instances, we have observed crashes of the Spark program on deployment. If that happens, just re-run the previous command again. Sometimes, the dockerized Spark does not start until we attempt to deploy to it. You can observe the console of the docker compose and you will notice the logs for the start of the Spark master. Running the Spark Streaming with Kafka Run the producer as in the previous lab. Enter some text and observe the output. In the producer terminal, run the producer and then enter a line of text: $ docker-compose exec kafka /opt/kafka/bin/kafka-console-producer.sh --broker-list kafka:9092 --topic stream-input hello hello hi In the Spark terminal, you will see the word counts for the interval: ------------------------------------------- Time: 1497746685000 ms ------------------------------------------- (hello,2) (hi,1) Done! Congratulations - You've made it!

Day 3 Lab1: Spark Streaming with Kafka Example Introductions In - PDF document

Day 3 Lab1: Spark Streaming with Kafka Example Introductions In this example, we will write a Spark Streaming program that consumes messages from Kafka. We will reuse the whole setup from the previous lab, so this lab is best done as a

Lecture 5 Logistics HW2 posted on Wed, due 10/8 Lab1 done Lab1 done Final exam

At Creation Common Holy Day 1 Day 2 Day 8 Day 9 Day 3 Day 4 Day 5 Day 6 Day 7 7 Days The

Science with a Little Altitude | QS18 Fah Sathirapongsasuti, PhD EBC Everest Day 1 Day 2 Day

Day 4 Lab1: Docker container for Kafka - Spark streaming - Cassandra This Dockerfile sets up

Download this page onto your computer. Also download the template file which you can use whenever

Assignments Homework 1 due Friday Lab1 due next Wednesday Section - will talk

ENGLAND | APRIL 12 20, 2020 8 DAY TOU R SUGGE STE D ITI N E R ARY* DAY 0 DAY 1 DAY 2

Day 1 Day 1 Staging area Buses & Ambulances In Use Day 1 Day 2 Days 2 & 3 Day 4

Introduction to R Day 4: Functions October 10, 2019 Agenda Day 1: Figures Day 2: Selecting,

Module 4 AFA CyberCamp Format Day T wo Day Three Day Four Day Five Day One Windows

Workflow 6 Touchpoints After First Visit Day 0 - Sunday Day 2 - Tuesday Day 6 -

Summer School Overview Day 0: R bootcamp Day 1: Workflow, Google App Engine Day 2:

2014 Investor Day DECEMBER 10, 2014 5 | 2014 INVESTOR DAY | 2014 INVESTOR DAY Welcome MARK

BJC BJC BJC BJC Opportunity Day Opportunity Day 4Q09 Opportunity Day Opportunity Day

CACTM Patricia Arizmendi Garcia LauraCalleja Diez Fernando Carmona Mateos Andrea Magn

Europe 2014 A Million Heartbeats by rt Ahlin 2 Who am I? 3 What do I do? 4 Why QS? 5

Active network support for deployment of Java-based games on mobile platforms Laurent Lefvre,

NetServ: Dynamically Deploying In-network Services Suman Srinivasan , Jae Woo Lee , Eric

H-Store Introduction Andy Pavlo February 13, 2012 Terminology Partition: Logical subset of

CSE 510 Web Data Engineering The Struts 2 Framework UB CSE 510 Web Data Engineering Whats

A"&:($/12(D ( ! E%,/'+(;'&F%,( ! A'/(A%,-( ! G>+/>8(E%4/'8HH( !

Working With Hadoop Mostly based on Tom Whites book Hadoop: Now that we covered the

Overview of Prometheus Mediator Snehal Thakkar CSCI 548 Information Integration On the Web

Data Processing WWW and search Internet introduced a new challenge in the form of a web

Sambuz

Useful Links

Newsletter

Mail Us

Day 3 Lab1: Spark Streaming with Kafka Example Introductions In - PDF document

Day 3 Lab1: Spark Streaming with Kafka Example Introductions In this example, we will write a Spark Streaming program that consumes messages from Kafka. We will reuse the whole setup from the previous lab, so this lab is best done as a

Lecture 5 Logistics HW2 posted on Wed, due 10/8 Lab1 done Lab1 done Final exam

At Creation Common Holy Day 1 Day 2 Day 8 Day 9 Day 3 Day 4 Day 5 Day 6 Day 7 7 Days The

Science with a Little Altitude | QS18 Fah Sathirapongsasuti, PhD EBC Everest Day 1 Day 2 Day

Day 4 Lab1: Docker container for Kafka - Spark streaming - Cassandra This Dockerfile sets up

Download this page onto your computer. Also download the template file which you can use whenever

Assignments Homework 1 due Friday Lab1 due next Wednesday Section - will talk

ENGLAND | APRIL 12 20, 2020 8 DAY TOU R SUGGE STE D ITI N E R ARY* DAY 0 DAY 1 DAY 2

Day 1 Day 1 Staging area Buses &amp; Ambulances In Use Day 1 Day 2 Days 2 &amp; 3 Day 4

Introduction to R Day 4: Functions October 10, 2019 Agenda Day 1: Figures Day 2: Selecting,

Module 4 AFA CyberCamp Format Day T wo Day Three Day Four Day Five Day One Windows

Workflow 6 Touchpoints After First Visit Day 0 - Sunday Day 2 - Tuesday Day 6 -

Summer School Overview Day 0: R bootcamp Day 1: Workflow, Google App Engine Day 2:

2014 Investor Day DECEMBER 10, 2014 5 | 2014 INVESTOR DAY | 2014 INVESTOR DAY Welcome MARK

BJC BJC BJC BJC Opportunity Day Opportunity Day 4Q09 Opportunity Day Opportunity Day

CACTM Patricia Arizmendi Garcia LauraCalleja Diez Fernando Carmona Mateos Andrea Magn

Europe 2014 A Million Heartbeats by rt Ahlin 2 Who am I? 3 What do I do? 4 Why QS? 5

Active network support for deployment of Java-based games on mobile platforms Laurent Lefvre,

NetServ: Dynamically Deploying In-network Services Suman Srinivasan , Jae Woo Lee , Eric

H-Store Introduction Andy Pavlo February 13, 2012 Terminology Partition: Logical subset of

CSE 510 Web Data Engineering The Struts 2 Framework UB CSE 510 Web Data Engineering Whats

A&quot;&amp;:($/12(D ( ! E%,/'+(;'&amp;F%,( ! A'/(A%,-( ! G&gt;+/&gt;8(E%4/'8HH( !

Working With Hadoop Mostly based on Tom Whites book Hadoop: Now that we covered the

Overview of Prometheus Mediator Snehal Thakkar CSCI 548 Information Integration On the Web

Data Processing WWW and search Internet introduced a new challenge in the form of a web

Sambuz

Useful Links

Newsletter

Mail Us

Day 1 Day 1 Staging area Buses & Ambulances In Use Day 1 Day 2 Days 2 & 3 Day 4

A"&:($/12(D ( ! E%,/'+(;'&F%,( ! A'/(A%,-( ! G>+/>8(E%4/'8HH( !