Making Sense at Scale with Algorithms, Machines & People PI: - PowerPoint PPT Presentation

UC ¡BERKELEY ¡ Making Sense at Scale with Algorithms, Machines & People � PI: Michael Franklin � University of California, Berkeley � � Expeditions in Computing PI Meeting � May 15, 2013 �

The Berkeley AMPLab � 2

Sources Driving Big Data � It’s ¡All ¡Happening ¡On-‑line ¡ User ¡Generated ¡(Web ¡& ¡ Mobile) ¡ Every: Click Ad impression … Billing event Fast Forward, pause,… .. Friend Request Transaction Network message Fault … Internet ¡of ¡Things ¡/ ¡M2M ¡ ScienCfic ¡CompuCng ¡

Challenge 1: Data is Big � Projected ¡Growth ¡ 60 ¡ Increase ¡over ¡2010 ¡ 50 ¡ Moore's ¡Law ¡ Overall ¡Data ¡ 40 ¡ Par8cle ¡Accel. ¡ 30 ¡ DNA ¡Sequencers ¡ 20 ¡ 10 ¡ 0 ¡ 2010 ¡ 2011 ¡ 2012 ¡ 2013 ¡ 2014 ¡ 2015 ¡ Data ¡Grows ¡faster ¡than ¡Moore’s ¡Law ¡ [IDC ¡report, ¡Kathy ¡Yelick, ¡LBNL] ¡

Challenge 2: Data is Dirty � • Variety of diverse sources � • Uncurated � • No schema � • Inconsistent syntax and semantics � Dirty ¡Data ¡worse ¡than ¡Big ¡Data ¡ ¡

Challenge 3: Complex Questions � • Hard questions � – What is the impact on traffic and home prices of building a new on- ramp? � • Detect real-time events � – Is there a cyber attack going on? � • Open-ended questions � – How many supernovae happened last year? �

Our Vision: A Necessary Synergy � lgorithms ¡ ¡ achines ¡ ¡ eople ¡ ¡ Challenge ¡1: ¡ ✔ ¡ ✔ ¡ Data ¡is ¡Big ¡ Challenge ¡2: ¡ ✔ ¡ ✔ ¡ ✔ ¡ Data ¡is ¡Dirty ¡ Challenge ¡3: ¡ ✔ ¡ ✔ ¡ ✔ ¡ Ques8ons ¡ ¡ are ¡complex ¡

The AMPLab Big Bets � • Traditional intellectual borders hinder “Big Data” stacks � – Need Machine Learning/Systems/Database Co-Design � – Requires Cohabitation and Real Collaboration � • Now is a unique opportunity to rethink fundamental design points: � – Changing Latency Demands � – Changing Consistency Requirements � – Cloud-based Elastic Resources � – Huge Desire for New Solutions in the Marketplace � – Open Source is the key to Tech Transfer in Big Data � • Need to consider role of people throughout the entire analytics lifecycle � 8

AMPLab: Collaborative Research � An integration of Faculty Interests (*Directors) : � � Alex ¡Bayen ¡(Mobile ¡Sensing) ¡ Anthony ¡Joseph ¡(Sec./ ¡Privacy) ¡ Ken ¡Goldberg ¡(Crowdsourcing) ¡ Randy ¡Katz ¡(Systems) ¡ � *Michael ¡Franklin ¡(Databases) ¡ Dave ¡Pa`erson ¡(Systems) ¡ � Armando ¡Fox ¡(Systems) ¡ *Ion ¡Stoica ¡(Systems) ¡ � *Mike ¡Jordan ¡(Machine ¡Learning) ¡ Sco` ¡Shenker ¡(Networking) ¡ � 50+ amazing grad students, post-docs, undergrads, developers, staff & visitors � Twice-Yearly Research Retreats (industry & sponsors): � 9

Co-Located for Collaboration � 10

Collaboration: Industry + Government � � AMPLab Launched January 2011 (5 yr plan) � Founding Sponsors: � � Sponsors and Affiliates: � � � � Federal Grants and Contracts: � Expeditions XData Program in Computing 11 �

Collaboration: Applications � � Participatory Sensing � Mobile Millenium - Traffic � Collective Discovery � � Opinion Space - Opinions � � Carat – Smartphone energy � Urban Planning and Simulation � � UrbanSim – data integration � Cancer Genomics/Personalized Medicine (w/ UCSF and UCSC) �� SNAP: Fast Sequence Alignment � � Genome Data Warehouse � 12

Shared Deliverable:   Berkeley Data Analytics Stack (BDAS) � 13

BDAS: Current Snapshot � BlinkDB Spark Spark ML Pig ¡ Data ¡ ¡ Streaming Graph base HIVE ¡ Storm ¡ MPI ¡ Shark Processing ¡ Spark Hadoop ¡ Data ¡ Tachyon ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡HDFS ¡ Mgmt. ¡ Resource ¡ Mesos ¡ Mgmt. ¡ In ¡development ¡(BDAS) ¡ Exis8ng ¡open ¡source ¡stack ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ Released ¡(BDAS) ¡ BDAS ¡Components ¡being ¡released ¡under ¡BSD ¡or ¡Apache ¡Open ¡Source ¡License ¡

Big Data Landscape – Our Corner � 15

Impact (so far) � • Open Source Release of BDAS components: � • Mesos: Cluster Virtualization � • Business critical services on 6000+ servers at Twitter � • see “How Twitter Rebuilt Google’s Secret Weapon” Wired 3/13 � • Spark: In-memory Computation Framework & � Shark: Hive-Compatible SQL Query Engine on Spark � • in use at large companies, start ups, and govt. agencies � • 100x Performance Improvement over Hadoop/Apache Hive � • available on Amazon Elastic Map Reduce � • 700+ member Meetup group � • Best Paper Awards: Eurosys 13, ICDE 13, NSDI 12, SIGCOMM 12 and Best Demo Award: SIGMOD 12 � • Students in high-demand in academia and industry � 16

Spark: Sys/ML Collaboration at Work � Technical Challenge: disk-oriented Hadoop Map Reduce inefficient for iterative Machine Learning iter. ¡1 ¡ iter. ¡2 ¡ . ¡ ¡. ¡ ¡. ¡ Research Challenge Addressed: How to design a distributed memory abstraction that is both fault-tolerant and efficient ? Logistic Regression Performance Solution: Resilient Distributed Datasets (RDDs) 29 GB dataset on 20 EC2 m1.xlarge machines (4 cores each)

Impact: Carat Smartphone App � Over 500,000 18 downloads

MLBase – Declarative ML � Vision: Make Machine Learning usable by “mere mortals” Allow high-level (declarative) specification of ML tasks Use Database-style “query optimization to generate efficient execution strategy 19

Hybrid Human/Machine Systems � Use machines for bulk data CrowdSQL Results processing � Leverage human activity for Turker Relationship Parser MetaData Manager data collection and event UI Form Optimizer detection � Creation Editor Leverage human knowledge, Executor UI Template Manager Statistics reasoning and perception for: � Files Access Methods HIT Manager • subjective entity comparisons � Disk 1 • complex predicates � • finding missing data � Disk 2 • disambiguating questions � e.g., CrowdDB Architecture 20 �

Outreach � AMPCamp I @ Berkeley, August 2012 AMPCamp II @ Strata Conf., Feb 2013 AMPCamp III @ Berkeley, August 2013 AMPCamp Online: ampcamp.berkeley.edu 21

What do we get from Expeditions? � Simply put – the ability to � � � � � � “swing for the fences” � 22

For More Information � amplab.cs.berkeley.edu � • Papers and Project Pages � • News updates and Blogs � Twitter: @amplab � Github and Apache � http://spark.meetup.com � franklin@cs.berkeley.edu � � 23 �

Making Sense at Scale with Algorithms, Machines & People PI: - PowerPoint PPT Presentation

UC BERKELEY Making Sense at Scale with Algorithms, Machines & People PI: Michael Franklin University of California, Berkeley Expeditions in Computing PI Meeting May 15, 2013 The Berkeley AMPLab 2 Sources

TUFF TUFF TUFF TUFF TUFF TUFF TUFF TUFF MAKING MAKING MAKING MAKING SENSE OF SENSE OF

Word Sense Word Sense Word Sense Disambiguation Disambiguation Disambiguation Presented by

MAKING SENSE OF MEDIA Dr Idil Osman MAKING SENSE OF MEDIA; ENGAGING VULNERABLE COMMUNITIES

State of the WHO- -FIC FIC State of the WHO making sense of classifications making sense of

Start Making Sense How to stay on track when going agile gets hard Joe Kearns : Principal

Making Sense of Word Sense 24 February, 2011 Deutschen Gesellschaft fr Sprachwissenschaft (DGfS)

The quantity of a small set You perceive the parts and put together the whole can be intuitively

Market Systems In Intro Making Sense: sustainability, scale, facilitation, systemic change?

SENSE 2013 Findings for College of Southern Idaho Presentation Overview SENSE Overview

The Holy Grail of Sense Definition: The Holy Grail of Sense Definition: Creating a

When the plain sense of Scripture makes common sense, make no other sense, therefore take every

Making maps pretty Andrea Aime Jim Groffen Making Maps Pretty Making Maps Pretty 1 1 Making

Making Sense of Word Sense Variation Rebecca J. Passonneau and Ansaf Salleb-Aouissi Nancy Ide

Perception. Planning. Control Making sense of the surroundings Planning the fastest racing line

December 2005 Current Sense Circuit Collection Making Sense of Current Tim Regan, Jon Munson

Geo Sense Presentation Actions Geo Sense Actions What is it? How does it work? Before Geo

Block Ciphers Chester Rebeiro IIT Madras CR STINSON : chapters 3 Block Cipher K D K E

Analyse de primitives sym etriques Pierre Karpman a lInria Saclay & Rennes, l Th`

Adaptive Application Security Testing Model Ashish Khandelwal Gunankar Tyagi Agendum

Anycast for Any Service Michael J. Freedman Karthik Lakshminarayanan David Mazires

Route map of our journey this evening Ciphers - coming of age The Enigma Machine Poles

3rd Grade PSI Ecosystems: Group Behavior www.njctl.org Slide 3 / 78 Ecosystems: Group Behavior

Small-Footprint Block Cipher Design - How far can you go? A. Bogdanov 1 , L.R. Knudsen 2 , G.

Computational semantics for the humanities Diarmuid O S eaghdha Natural Language and

Making Sense at Scale with Algorithms, Machines & People PI: - PowerPoint PPT Presentation

UC BERKELEY Making Sense at Scale with Algorithms, Machines & People PI: Michael Franklin University of California, Berkeley Expeditions in Computing PI Meeting May 15, 2013 The Berkeley AMPLab 2 Sources

TUFF TUFF TUFF TUFF TUFF TUFF TUFF TUFF MAKING MAKING MAKING MAKING SENSE OF SENSE OF

Word Sense Word Sense Word Sense Disambiguation Disambiguation Disambiguation Presented by

MAKING SENSE OF MEDIA Dr Idil Osman MAKING SENSE OF MEDIA; ENGAGING VULNERABLE COMMUNITIES

State of the WHO- -FIC FIC State of the WHO making sense of classifications making sense of

Start Making Sense How to stay on track when going agile gets hard Joe Kearns : Principal

Making Sense of Word Sense 24 February, 2011 Deutschen Gesellschaft fr Sprachwissenschaft (DGfS)

The quantity of a small set You perceive the parts and put together the whole can be intuitively

Market Systems In Intro Making Sense: sustainability, scale, facilitation, systemic change?

SENSE 2013 Findings for College of Southern Idaho Presentation Overview SENSE Overview

The Holy Grail of Sense Definition: The Holy Grail of Sense Definition: Creating a

When the plain sense of Scripture makes common sense, make no other sense, therefore take every

Making maps pretty Andrea Aime Jim Groffen Making Maps Pretty Making Maps Pretty 1 1 Making

Making Sense of Word Sense Variation Rebecca J. Passonneau and Ansaf Salleb-Aouissi Nancy Ide

Perception. Planning. Control Making sense of the surroundings Planning the fastest racing line

December 2005 Current Sense Circuit Collection Making Sense of Current Tim Regan, Jon Munson

Geo Sense Presentation Actions Geo Sense Actions What is it? How does it work? Before Geo

Block Ciphers Chester Rebeiro IIT Madras CR STINSON : chapters 3 Block Cipher K D K E

Analyse de primitives sym etriques Pierre Karpman a lInria Saclay &amp; Rennes, l Th`

Adaptive Application Security Testing Model Ashish Khandelwal Gunankar Tyagi Agendum

Anycast for Any Service Michael J. Freedman Karthik Lakshminarayanan David Mazires

Route map of our journey this evening Ciphers - coming of age The Enigma Machine Poles

3rd Grade PSI Ecosystems: Group Behavior www.njctl.org Slide 3 / 78 Ecosystems: Group Behavior

Small-Footprint Block Cipher Design - How far can you go? A. Bogdanov 1 , L.R. Knudsen 2 , G.

Computational semantics for the humanities Diarmuid O S eaghdha Natural Language and

Analyse de primitives sym etriques Pierre Karpman a lInria Saclay & Rennes, l Th`