Refactoring Earthquake-Tsunami Causality and Messaging via Big Data - - PowerPoint PPT Presentation

▶

Dec 30, 2022 128 likes •412 views

Refactoring Earthquake-Tsunami Causality and Messaging via Big Data Analytics: The Transformative Potential of Credible Tweets L. I. Lumb 1,2 & J. R. Freemantle 3 1 York University, 2 Univa Corporation & 3 Independent MCBDA 2016 (First

SLIDE 1

Refactoring Earthquake-Tsunami Causality and Messaging via Big Data Analytics: The Transformative Potential of Credible Tweets

L. I. Lumb1,2 & J. R. Freemantle3

1York University, 2Univa Corporation & 3Independent

MCBDA 2016 (First Workshop) PVAMU, May 17, 2016

SLIDE 2

Agenda

Motivation
Traditional Data
Social-Networking Data

○ Graphs, Semantics & Machine Learning

Conclusions

SLIDE 3

Geist, E.L., Titov, V.V., and Synolakis, C.E., 2006, Tsunami: wave of change: Scientific American, v. 294, p. 56-63

SLIDE 4

SLIDE 5

SLIDE 6

SLIDE 7

Motivation

Non-deterministic cause

○ Uncertainty inherent in any attempt to predict earthquakes ■ In situ measurements may reduce uncertainty

Lead times

○ Availability of actionable observations ○ Communication of situation - advisories, warnings, etc.

Cause-effect relationship

○ Energy transfer - inputs ... coupling ... outputs ■ ‘Geometry’ - bathymetry and topography ○ Other factors - e.g., tides

Established effect

○ Far-field estimates of tsunami propagation (pre-computed) and coastal inundation (real-time) have proven to be extremely accurate ... requires

Distributed array of deep-ocean tsunami detection buoys + forecasting model

SLIDE 8

Agenda

Motivation
Traditional Data
Social-Networking Data

○ Graphs, Semantics & Machine Learning

Conclusions

SLIDE 9

http://www.gitews.org/en/concept/

SLIDE 10

http://www.eas.slu.edu/GGP/images/igrav2.jpg

SLIDE 11

Lumb & Aldridge, http://dx.doi.org/10.1109/HPCS.2006.26

SLIDE 12

Agenda

Motivation
Traditional Data
Social-Networking Data

○ Graphs, Semantics & Machine Learning

Conclusions

SLIDE 13

SLIDE 14

GGP Scientific Data Twitter SN Data Volume small, finite BIG, ‘infinite’ Variety semi-structured, restricted unstructured, unrestricted - except for IDs, hashtags & URLs (pages, images) Velocity slow, sampled fast, streamed Veracity biases, noise & abnormalities Validity accuracy & correctness Volatility low (stationary, irreplaceable) high? (mobile?, disposable?)

6Vs: Scientific vs. Social Networking Data

http://insidebigdata.com/2013/09/12/beyond-volume-variety-velocity-issue-big-data-veracity/

SLIDE 15

Karau et al., Learning Spark, O’Reilly, 2015

Machine Learning Pipeline

SLIDE 16

Deep Learning from Twitter?

Represent data

Twitter data manually curated into ‘ham’ and ‘spam’
In-memory representation via Spark RDDs

Extract features

Frequency-based usage via Spark MLlib HashingTF ⇒ feature vectors

Develop model object

Spark MLlib LogisticRegressionWithSGD used for classification

Evaluate model

SLIDE 17

SLIDE 18

Future Work

Machine Learning

○ Classification algorithms ... with categories? ○ Training Experiments ■ Larger data sets ■ Degrees of ‘hammyness’ ■ Stop-word removal, stemming, ... ○ Real-time streaming - data from Twitter

Multiparameter credibility - TweetCred + ML + RDF/OWL GA
Cloud-native platform

○ Containerization, dynamic scheduling and micro services

Other examples

○ Alberta wildfires ○ Industrial incidents ○ Hurricanes

SLIDE 19

Agenda

Motivation
Traditional Data
Social-Networking Data

○ Graphs, Semantics & Machine Learning

Conclusions

SLIDE 20

Conclusions

Credible tweets could be transformative

○ Mission-critical Big Data complement to existing data sources and approaches

Current challenges/opportunities

○ Twitter Data ■ Extraction - only 100 tweets at a time (!!!) ■ Curation - manual (read: time consuming!!!) ○ Emphasizing Machine Learning ... appears encouraging, BUT ... ■ Graph Analytics ... as well ??? ■ Semantics ... as well ???

SLIDE 21

Q&A

L. I. Lumb1,2 & J. R. Freemantle3

1ianlumb@yorku.ca, 2ilumb@univa.com & 3james.

freemantle@rogers.com

SLIDE 22

Graph Analytics Problem

http://www.jma.go. jp/jma/en/2016_Kumamoto_Earthquake/2016_Kumamoto_Earthq uake.html

SLIDE 23

SLIDE 24

Perl script prototype

Acquires tweets with the keyword “earthquake”

use Net::Twitter::Lite::WithAPIv1_1; my $nt = Net::Twitter::Lite::WithAPIv1_1->new( consumer_key => 'xxxx...xxxxxxx', consumer_secret => 'xxxxxx.....xxxxxxxxxx', access_token => 'xxxxx....xxxxxxxxxxx', access_token_secret => 'xxxxx.....xxxxxxxxxxx', ssl => 1 ); my $result = $nt->search("earthquake"); for my $status(@{$result->{statuses}} ) { print "$status->{text}\n"; }

SLIDE 25

Resilient Distributed Datasets (RDDs)

Abstraction for in-memory computing
Fault-tolerant, parallel data structures
Cluster-ready
Optionally persistent
Can be partitioned for optimal placement
Manipulated via operators

Zaharia et al., NSDI 2012

SLIDE 26

SLIDE 27