Complex Event Recognition in the Big Data Era Nikos Giatrakos 1 , - PowerPoint PPT Presentation

Tutorial: Complex Event Recognition in the Big Data Era Nikos Giatrakos 1 , Alexander Artikis 2 , 3 , Antonios Deligiannakis 1 , Minos Garofalakis 1,4 1 Technical University of Crete, Chania, Greece 2 University of Piraeus, Greece 3 NCSR Demokritos, Athens, Greece 4 ATHENA Research & Innovation Center, Athens, Greece

Big Data is Big News (and Big Business) Rapid growth due to several information- • generating technologies, such as mobile computing, sensornets, and social networks How can we cost-effectively manage and • analyze all this data…?

Big Data Challenges: The Four V‟s (… and one D) Volume: Scaling from Terabytes to Exa/Zettabytes • Velocity: Processing massive amounts of streaming data • Variety: Managing the complexity of multiple relational and • non-relational data types and schemas Veracity: Handling inherent uncertainty and noise in the data • Distribution: Dealing with massively distributed information •

Existing Big Data Platforms Large computing clusters – scale out to 1000s of commodity nodes Map/Reduce, Hadoop, Spark Simple programmatic models, scalable, replication for robustness BUT: Batch processing of static data Focus on relational model (tables, SQL) Storm/Heron, Flink, Spark Streaming Simple, scalable dataflow processing Hard to map from higher level logic and complex analytics tasks!

Complex Event Recognition (Event Pattern Matching, CEP) • Input Massive streams of time-stamped Simple Derived Events • (SDEs) coming from (distributed) sources • Output Complex/Composite Events (CEs) – collections of SDEs • and/or CEs satisfying some pattern Patterns defined using variety of constraints • (temporal, spatial, logical, …) Not restricted to simple aggregation! • Complex, multi-level CE hierarchies • Inherent uncertainty (SDEs, patterns) •

Complex Event Recognition (Event Pattern Matching, CEP) Local Distributed CER Event per Cluster Streams

This Tutorial: CER + Big Data (4Vs + D) Introduction • Complex Event Recognition Languages • Handling Uncertainty • Scalable (Parallel and Distributed) CER • Outlook •

Statistical Relational Learning Improving performance through experience L EARNING L OGIC P ROBABILITIES Formal and Sound mathematical declarative foundation for relational reasoning under representation uncertainty

Event Calculus in Markov Logic Networks (MLN-EC) I NPUT › T RANSFORMATION › I NFERENCE › O UTPUT □ Complex Compact Event Knowledg Markov Logic Networks Definitions e Base Recognise d Complex Event Events Calculus Axioms Simple Event Stream

Part 3: Scalable, Distributed Complex Event Recognition

How to scale CER in the Big Data Era https://en.wikipedia.org/wiki/Blue_Gene Scaling out to – Parallel Architectures: Computer Clusters/Grids, The Cloud – Networked Settings: Dispersed Clusters, Multi-Cloud Platforms

Scalable - Distributed Complex Event Recognition Why? Well, It‟s the Big Data Era › Volume, Velocity, Variety, Veracity (Uncertainty) Centralized Architecture Sequential CER I NPUT › O UTPUT . . . . . . . . . . . . CER Streams/Queries Recognised CEs System . . . . . . . . . . . .

Scalable - Distributed Complex Event Recognition Why? Well, It‟s the Big Data Era › Volume, Velocity, Variety, Centralized Architecture Sequential CER I NPUT › O UTPUT . . . . . . . . . . . . CER Streams/Queries Recognised CEs System . . . . . . . . . . . .

Scalable - Distributed Complex Event Recognition Clustered Architecture Parallel CER CER I NPUT › O UTPUT . . . . . . . . . . . . CER Streams/Queries Recognised CEs . . . . . . . . . . . . … Tools Performance metrics › Parallelism › Throughput CER › Elastic Resource › CPU utilization Allocation

Scalable Complex Event Recognition Parallelization & Elasticity in state-of-the-art DSMSs: › Horizontal Scalability in Stream Processing by design › Facilities for Elastic Resource Allocation › Fault Tolerance in message processing › Popular Platforms: Apache Storm (Heron/Trident), Spark Streaming CER Languages & CER Systems: › High-Level CER Language Support › Uncertainty-aware CER (sometimes) › Support for various streaming operations (windowing etc.) How to bridge the gap ? HackerBrucke Munich

CER + modern DSMSs: Case Study Apache Storm Storm Topology Tuple Bolt Spout … Tasks

CER + modern DSMSs: Case Study Apache Storm Storm Topology Tuple Bolt CER Open-Source Examples Spout CER CER Queries, CER Operators CER go here (manually/custom automation) CER … Tasks

CER + modern DSMSs: Case Study Apache Storm Storm Topology Tuple Bolt CER Spout CER CER Queries, CER Operators CER go here (manually/custom automation) CER Data Partitioning – Which task a tuple goes to? › Shuffle Grouping: Random tuple distribution … › Fields Grouping: Partition based on field(s) – keys › All Grouping: Replicate tuple to all tasks Tasks › Custom: Define your own

CER + modern DSMSs: Case Study Spark Streaming Receiver time DStream RDD@t1 RDD@t2 RDD@t3 RDD@t4 › Transformations › Window Operators › Output Operators CER CE stream

Are we done? CER Parallelization must guarantee Correctness: Patterns in Centralized CER ≡ Patterns in Parallel CER Which parallelization scheme to use? Criteria – Common Pitfalls Rep lication/ Com munication Parallelization Granularity - Agility L oad (Im) B alance Support for Event Selection Policies Need for Support for Event Consumption Policies Support for Parallelization of Windows

Categorization of Parallelization Approaches in CER & Parallelization Granularity - Agility Query-based [T-REX, JSS‟12 ] Partition-based Task Parallelism [Hirzel et al, DEBS‟12 ] Operator-based [Mayer et al, DEBS‟16 ] [Moeller et al, DEBS‟09 ] State-based [Balkesen et al, DEBS‟13 ] Run-based Data Parallelism [Balkesen et al, DEBS‟13 ] Graph-based [Mayer et al, DEBS‟16 ] Hardware-based [Woods et al, PVLDB‟10 ] [CudaCEP, JPDC‟12 ]

Recap on Event Selection Policies › Strict contiguity [Sc] : No intervening events allowed between two sequence events in the pattern. › Partition contiguity [Pc] : Same as above, but the stream is partitioned into substreams according to a partition attribute. Events must be contiguous within the same partition. › Skip-till-next-match [Stnm] : irrelevant events are skipped until an event matching the next pattern component is encountered. If multiple events in the stream can match the next pattern component, only the first of them is considered. E.g. for SEQ ( A , B , C ) and a 1 , b 1 , b 2 , c 1 , only a 1 , b 1 , c 1 will be detected. › Skip-till-any-match [Stam] : Most flexible (and expensive). Detects every possible occurrence. For the previous example, a 1 , b 2 , c 1 will also be detected.

Event Consumption Policies › Consume [Co] : Single event is used in a single pattern match 1 * Event Match › Reuse [Re] : Single event can participate in multiple pattern matches as long as it remains valid e.g. given window constraints * * Event Match › Bounded Reuse [BRe] : Single event can participate in up to N pattern matches as long as it remains valid * N Event Match E.g. for SEQ(A, B, C) and a 1 , b 1 , b 2 , c 1 skip-till-any-match & Reuse  ( a 1 , b 1 , c 1 ), ( a 1 , b 2 , c 1 ) skip-till-any-match & Consume  ( a 1 , b 1 , c 1 )

Generic Stream Window Types › Time-based Windows [TiW] : The upper bound of the current window is the current timestamp while the lower bound is determined based on a given time-interval parameter. › Tuple-based Windows [TuW] : The upper and lower bound of the current window is determined so that it contains a certain amount of tuples

Categorization of Parallelization Approaches in CER Query-based [T-REX, JSS‟12 ] Partition-based Task Parallelism [Hirzel et al, DEBS‟12 ] Operator-based [Mayer et al, DEBS‟16 ] [Moeller et al, DEBS‟09 ] State-based [Balkesen et al, DEBS‟13 ] Run-based Data Parallelism [Balkesen et al, DEBS‟13 ] Graph-based [Mayer et al, DEBS‟16 ] Hardware-based [Woods et al, PVLDB‟10 ] [CudaCEP, JPDC‟12 ]

Query-based Parallelization [T-REX, JSS‟12 ] . . . . . . Event Streams Static Index . . . . . . Automaton Models CER Queries B 1 C 1 D B 1 C 1 D 1 1 B C D … A A A E F E F E … State Idx State Idx State Idx Stored Events … Sequences Sequences Sequences … Generator Generator Generator Subscribed Applications Recogn. CEs

Categorization of Parallelization Approaches in CER Query-based [T-REX, JSS‟12 ] Partition-based Task Parallelism [Hirzel et al, DEBS‟12 ] Operator-based [Mayer et al, DEBS‟16 ] [Moeller et al, DEBS‟09 ] State-based [Balkesen et al, DEBS‟13 ] Run-based Data Parallelism [Balkesen et al, DEBS‟13 ] Graph-based [Mayer et al, DEBS‟16 ] Hardware-based [Woods et al, PVLDB‟10 ] [CudaCEP, JPDC‟12 ]

Complex Event Recognition in the Big Data Era Nikos Giatrakos 1 , - PowerPoint PPT Presentation

Tutorial: Complex Event Recognition in the Big Data Era Nikos Giatrakos 1 , Alexander Artikis 2 , 3 , Antonios Deligiannakis 1 , Minos Garofalakis 1,4 1 Technical University of Crete, Chania, Greece 2 University of Piraeus, Greece 3 NCSR

ERA 1 ERA I I ( i) Deakin and Faculty of Bus. & Law Response to ERA I ( ii)

Machine Learning Anders Holst SICS Big Data Analytics Analysis Big Data Big Value Big Data

Complex Numbers Complex Numbers 1 / 19 Complex Numbers Complex numbers ( C ) are an extension of

E RA- MIN 2 Sta rting De c 1 st 2016 2 About ERA MIN 2 ERA MIN 2 is an ERA NET

Reactive Systems Why now? Electronic Commerce Era Multicore Era Cloud Era Backlash to the BOFH

Big Data Algorithms with Medical Applications Yixin Chen Outline Challenges to big data

BIG DATA 2 This is the Big Data era Big Data are linked System G WHAT IS GRAPH COMPUTING

Events Event-driven programming Event loop Event dispatch Event handling Event Driven

Events Event-driven programming Event loop Event dispatch Event handling Event Driven

Intermembrane Space H + H + Cyt c Co Q Complex Complex III IV H + ATPase H + Complex

CS535 Big Data 1/22/2020 Sangmi Lee Pallickara CS535 Big Data | Computer Science Department

COMP9313: Big Data Management Introduction to Big Data Management What is big data? Tweeted by

A summary of deep models for face recognition Qianli Liao Face recognition Face recognition:

8-Speech Recognition Speech Recognition Concepts Speech Recognition Approaches

More Event Combinators CML provides two more event combinators: guard and withNack : val guard :

An introduction to complex numbers The complex numbers Are the real numbers not sufficient? A

Information Retrieval and Filtering over Self-Organising Digital Libraries Paraskevi Raftopoulou

Enabling Technologies for a Programmable Many-core Ben Juurlink TU Berlin Partner and work

Lecture 13: Architecture and Design Patterns 2018-06-25 Prof. Dr. Andreas Podelski, Dr. Bernd

SABANA Shariah Compliant Industrial REIT FY 2019 and 4Q 2019 Financial Results Presentation

LifeWatch - EGI Competence Centre EGI Community Forum Bari 2015 Observatories: VREs and Data

Area and Time Tradeoffs in FPGAs Examining the concept of area/time tradeoffs in FPGA design,

MetaBorg in Action Examples of Domain-specific Language Embedding and Assimilation using

Software Engineering Bertrand Meyer, Martin Nordio ETH Zurich Peter Kolb Red Expel Christian

Complex Event Recognition in the Big Data Era Nikos Giatrakos 1 , - PowerPoint PPT Presentation

Tutorial: Complex Event Recognition in the Big Data Era Nikos Giatrakos 1 , Alexander Artikis 2 , 3 , Antonios Deligiannakis 1 , Minos Garofalakis 1,4 1 Technical University of Crete, Chania, Greece 2 University of Piraeus, Greece 3 NCSR

ERA 1 ERA I I ( i) Deakin and Faculty of Bus. &amp; Law Response to ERA I ( ii)

Machine Learning Anders Holst SICS Big Data Analytics Analysis Big Data Big Value Big Data

Complex Numbers Complex Numbers 1 / 19 Complex Numbers Complex numbers ( C ) are an extension of

E RA- MIN 2 Sta rting De c 1 st 2016 2 About ERA MIN 2 ERA MIN 2 is an ERA NET

Reactive Systems Why now? Electronic Commerce Era Multicore Era Cloud Era Backlash to the BOFH

Big Data Algorithms with Medical Applications Yixin Chen Outline Challenges to big data

BIG DATA 2 This is the Big Data era Big Data are linked System G WHAT IS GRAPH COMPUTING

Events Event-driven programming Event loop Event dispatch Event handling Event Driven

Events Event-driven programming Event loop Event dispatch Event handling Event Driven

Intermembrane Space H + H + Cyt c Co Q Complex Complex III IV H + ATPase H + Complex

CS535 Big Data 1/22/2020 Sangmi Lee Pallickara CS535 Big Data | Computer Science Department

COMP9313: Big Data Management Introduction to Big Data Management What is big data? Tweeted by

A summary of deep models for face recognition Qianli Liao Face recognition Face recognition:

8-Speech Recognition Speech Recognition Concepts Speech Recognition Approaches

More Event Combinators CML provides two more event combinators: guard and withNack : val guard :

An introduction to complex numbers The complex numbers Are the real numbers not sufficient? A

Information Retrieval and Filtering over Self-Organising Digital Libraries Paraskevi Raftopoulou

Enabling Technologies for a Programmable Many-core Ben Juurlink TU Berlin Partner and work

Lecture 13: Architecture and Design Patterns 2018-06-25 Prof. Dr. Andreas Podelski, Dr. Bernd

SABANA Shariah Compliant Industrial REIT FY 2019 and 4Q 2019 Financial Results Presentation

LifeWatch - EGI Competence Centre EGI Community Forum Bari 2015 Observatories: VREs and Data

Area and Time Tradeoffs in FPGAs Examining the concept of area/time tradeoffs in FPGA design,

MetaBorg in Action Examples of Domain-specific Language Embedding and Assimilation using

Software Engineering Bertrand Meyer, Martin Nordio ETH Zurich Peter Kolb Red Expel Christian

ERA 1 ERA I I ( i) Deakin and Faculty of Bus. & Law Response to ERA I ( ii)