Refactoring Earthquake-Tsunami Causality and Messaging via Big Data - - PowerPoint PPT Presentation

refactoring earthquake tsunami causality and messaging
SMART_READER_LITE
LIVE PREVIEW

Refactoring Earthquake-Tsunami Causality and Messaging via Big Data - - PowerPoint PPT Presentation

Refactoring Earthquake-Tsunami Causality and Messaging via Big Data Analytics: The Transformative Potential of Credible Tweets L. I. Lumb 1,2 & J. R. Freemantle 3 1 York University, 2 Univa Corporation & 3 Independent MCBDA 2016 (First


slide-1
SLIDE 1

Refactoring Earthquake-Tsunami Causality and Messaging via Big Data Analytics: The Transformative Potential of Credible Tweets

  • L. I. Lumb1,2 & J. R. Freemantle3

1York University, 2Univa Corporation & 3Independent

MCBDA 2016 (First Workshop) PVAMU, May 17, 2016

slide-2
SLIDE 2

Agenda

  • Motivation
  • Traditional Data
  • Social-Networking Data

○ Graphs, Semantics & Machine Learning

  • Conclusions
slide-3
SLIDE 3

Geist, E.L., Titov, V.V., and Synolakis, C.E., 2006, Tsunami: wave of change: Scientific American, v. 294, p. 56-63

slide-4
SLIDE 4
slide-5
SLIDE 5
slide-6
SLIDE 6
slide-7
SLIDE 7

Motivation

  • Non-deterministic cause

○ Uncertainty inherent in any attempt to predict earthquakes ■ In situ measurements may reduce uncertainty

  • Lead times

○ Availability of actionable observations ○ Communication of situation - advisories, warnings, etc.

  • Cause-effect relationship

○ Energy transfer - inputs ... coupling ... outputs ■ ‘Geometry’ - bathymetry and topography ○ Other factors - e.g., tides

  • Established effect

○ Far-field estimates of tsunami propagation (pre-computed) and coastal inundation (real-time) have proven to be extremely accurate ... requires

  • Distributed array of deep-ocean tsunami detection buoys + forecasting model
slide-8
SLIDE 8

Agenda

  • Motivation
  • Traditional Data
  • Social-Networking Data

○ Graphs, Semantics & Machine Learning

  • Conclusions
slide-9
SLIDE 9

http://www.gitews.org/en/concept/

slide-10
SLIDE 10

http://www.eas.slu.edu/GGP/images/igrav2.jpg

slide-11
SLIDE 11

Lumb & Aldridge, http://dx.doi.org/10.1109/HPCS.2006.26

slide-12
SLIDE 12

Agenda

  • Motivation
  • Traditional Data
  • Social-Networking Data

○ Graphs, Semantics & Machine Learning

  • Conclusions
slide-13
SLIDE 13
slide-14
SLIDE 14

GGP Scientific Data Twitter SN Data Volume small, finite BIG, ‘infinite’ Variety semi-structured, restricted unstructured, unrestricted - except for IDs, hashtags & URLs (pages, images) Velocity slow, sampled fast, streamed Veracity biases, noise & abnormalities Validity accuracy & correctness Volatility low (stationary, irreplaceable) high? (mobile?, disposable?)

6Vs: Scientific vs. Social Networking Data

http://insidebigdata.com/2013/09/12/beyond-volume-variety-velocity-issue-big-data-veracity/

slide-15
SLIDE 15

Karau et al., Learning Spark, O’Reilly, 2015

Machine Learning Pipeline

slide-16
SLIDE 16

Deep Learning from Twitter?

Represent data

  • Twitter data manually curated into ‘ham’ and ‘spam’
  • In-memory representation via Spark RDDs

Extract features

  • Frequency-based usage via Spark MLlib HashingTF ⇒ feature vectors

Develop model object

  • Spark MLlib LogisticRegressionWithSGD used for classification

Evaluate model

slide-17
SLIDE 17
slide-18
SLIDE 18

Future Work

  • Machine Learning

○ Classification algorithms ... with categories? ○ Training Experiments ■ Larger data sets ■ Degrees of ‘hammyness’ ■ Stop-word removal, stemming, ... ○ Real-time streaming - data from Twitter

  • Multiparameter credibility - TweetCred + ML + RDF/OWL GA
  • Cloud-native platform

○ Containerization, dynamic scheduling and micro services

  • Other examples

○ Alberta wildfires ○ Industrial incidents ○ Hurricanes

slide-19
SLIDE 19

Agenda

  • Motivation
  • Traditional Data
  • Social-Networking Data

○ Graphs, Semantics & Machine Learning

  • Conclusions
slide-20
SLIDE 20

Conclusions

  • Credible tweets could be transformative

○ Mission-critical Big Data complement to existing data sources and approaches

  • Current challenges/opportunities

○ Twitter Data ■ Extraction - only 100 tweets at a time (!!!) ■ Curation - manual (read: time consuming!!!) ○ Emphasizing Machine Learning ... appears encouraging, BUT ... ■ Graph Analytics ... as well ??? ■ Semantics ... as well ???

slide-21
SLIDE 21

Q&A

  • L. I. Lumb1,2 & J. R. Freemantle3

1ianlumb@yorku.ca, 2ilumb@univa.com & 3james.

freemantle@rogers.com

slide-22
SLIDE 22

Graph Analytics Problem

http://www.jma.go. jp/jma/en/2016_Kumamoto_Earthquake/2016_Kumamoto_Earthq uake.html

slide-23
SLIDE 23
slide-24
SLIDE 24

Perl script prototype

  • Acquires tweets with the keyword “earthquake”

use Net::Twitter::Lite::WithAPIv1_1; my $nt = Net::Twitter::Lite::WithAPIv1_1->new( consumer_key => 'xxxx...xxxxxxx', consumer_secret => 'xxxxxx.....xxxxxxxxxx', access_token => 'xxxxx....xxxxxxxxxxx', access_token_secret => 'xxxxx.....xxxxxxxxxxx', ssl => 1 ); my $result = $nt->search("earthquake"); for my $status(@{$result->{statuses}} ) { print "$status->{text}\n"; }

slide-25
SLIDE 25

Resilient Distributed Datasets (RDDs)

  • Abstraction for in-memory computing
  • Fault-tolerant, parallel data structures
  • Cluster-ready
  • Optionally persistent
  • Can be partitioned for optimal placement
  • Manipulated via operators

Zaharia et al., NSDI 2012

slide-26
SLIDE 26
slide-27
SLIDE 27