Spark Code Camp
Discover Spark Streaming & Spark SQL
Spark Code Camp Discover Spark Streaming & Spark SQL Project - - PowerPoint PPT Presentation
Spark Code Camp Discover Spark Streaming & Spark SQL Project Overview Focus on Spark Streaming and Spark SQL Explored Streaming API of Apache Spark on Ukko Cluster Window based Stream Content Direct Stream content
Discover Spark Streaming & Spark SQL
○ Window based Stream Content ○ Direct Stream content
○ Find out popular hashtags ○ Discover tweet frequency per location ○ Discover tweetings trends over time
○
"org.apache.spark" %% "spark-core" % "1.0.2" % "provided"
○
"org.apache.spark" %% "spark-streaming" % "1.0.2" % "provided"
○
"org.twitter4j" % "twitter4j-core" % "3.0.3"
○
"org.twitter4j" % "twitter4j-stream" % "3.0.3"
○
"org.apache.spark" %% "spark-streaming-twitter" % "1.0.2" % "provided"
○
"com.typesafe.akka" % "akka-actor_2.10" % "2.2-M1"
○
"org.mashupbots.socko" % "socko-webserver_2.10" % "0.4.2",
○
"org.apache.spark" %% "spark-sql" % "1.0.0" % "provided"
streaming
○ One millions tweets collected
○ Few tutorial available to explore streaming in Spark ○ Few Streaming source - Twitter or Other ?
○ Maven or SBT
○ Short time to explore & experiment with different open-source software stack ○ Decision challenges ■ Scala based Framework: Akka or Play ? ■ Web Server: Socko or other http web server ? ■ Graph: Chart.js or other chart libraries ? ■ Storage: File system or Hive or Shark or Spark SQL ?
○ Which attributes of twitter status ( a user tweet == status) is useful ? ○ What can be possible with huge stream of data?
○ Maninder Pal Singh ○ Ayesha Ahmad ○