Publish/Subscribe Hans-Arno Jacobsen Bell University Laboratory - - PowerPoint PPT Presentation

publish subscribe
SMART_READER_LITE
LIVE PREVIEW

Publish/Subscribe Hans-Arno Jacobsen Bell University Laboratory - - PowerPoint PPT Presentation

MIDDLEWARE SYSTEMS RESEARCH GROUP Publish/Subscribe Hans-Arno Jacobsen Bell University Laboratory Chair in Software Engineering Middleware Systems Research Group University of Toronto 1 Amazon to Chapters to you ... . MIDDLEWARE SYSTEMS


slide-1
SLIDE 1

1

Publish/Subscribe

Hans-Arno Jacobsen Bell University Laboratory Chair in Software Engineering Middleware Systems Research Group University of Toronto

MIDDLEWARE SYSTEMS RESEARCH GROUP

slide-2
SLIDE 2

Publish/Subscribe Lecture 2

MIDDLEWARE SYSTEMS RESEARCH GROUP

Amazon to Chapters to you ....

Monday, October 10th in Cyberspace Your book “...” is available at .... $10 off Thursday, November 15th, in Toronto

slide-3
SLIDE 3

Publish/Subscribe Lecture 3

MIDDLEWARE SYSTEMS RESEARCH GROUP

Business Process Execution & Web Service Composition

Broker Broker Broker Broker

WS

Agent Agent

Broker

… …

Database

WS Client Pick Invoke Wait Scope Receive Assign Flow Reply

Business Process

Scope Reply

Business Process

Scope Receive Switch Client

slide-4
SLIDE 4

Publish/Subscribe Lecture 4

MIDDLEWARE SYSTEMS RESEARCH GROUP

Other “Similar” Applications

Selective information dissemination Location-based services Personalization Alerting services Application integration Job scheduling Monitoring, surveillance, and control Network and distributed system management Workforce management (Scientific) workload management Business activity monitoring Business process management, monitoring, and

execution

slide-5
SLIDE 5

Publish/Subscribe Lecture 5

MIDDLEWARE SYSTEMS RESEARCH GROUP

What Relates All These Applications?

Asynchronous state transitions captured

as events that

drive and underlay

all applications and infrastructures implementing these applications

Require middleware support for event

processing

Publish/Subscribe is ideally suited to fulfill

these requirements

slide-6
SLIDE 6

Publish/Subscribe Lecture 6

MIDDLEWARE SYSTEMS RESEARCH GROUP

Publish/Subscribe

slide-7
SLIDE 7

Publish/Subscribe Lecture 7

MIDDLEWARE SYSTEMS RESEARCH GROUP

Publish/Subscribe

Publisher Publisher Subscriber Subscriber

Subscriptions Publications Notification Notification

IBM= 84 MSFT= 27 INTC= 19 JNJ= 58 ORCL= 12 HON= 24 AMGN= 58

Stock markets

NYSE NASDAQ TSX

Subscriptions: IBM > 85 ORCL < 10 JNJ > 60

Broker(s)

slide-8
SLIDE 8

Publish/Subscribe Lecture 8

MIDDLEWARE SYSTEMS RESEARCH GROUP

Data (a lot of) Subscriptions (a lot of) query publication

Query and subscription is very similar. Set of tuples and publication is very similar. However, the two problem statements are inverse.

That’s Like Data Base Querying !!

Sets of tuples Matching subscriptions About past About future

slide-9
SLIDE 9

Publish/Subscribe Lecture 9

MIDDLEWARE SYSTEMS RESEARCH GROUP

Key Benefit of Publish/Subscribe

Decoupling of publishers and subscribers

Publishers do not need to know subscribers Publishers and subscribers do not need to be up

simultaneously

Amenable for physical distribution

slide-10
SLIDE 10

Publish/Subscribe Lecture 10

MIDDLEWARE SYSTEMS RESEARCH GROUP

Benefits of Publish/Subscribe

independence of participants lends itself well to distributed system development

de-coupled development & processing (dynamic) system evolution

interaction with large number of entities facilitated naturally supports non-continuous operations potential for scalability & fault-tolerance

  • pen for (legacy) system integration on either end

Of course it is not a one size fits all paradigm, but a good solution for certain kinds of problems.

slide-11
SLIDE 11

Publish/Subscribe Lecture 11

MIDDLEWARE SYSTEMS RESEARCH GROUP

Publish/Subscribe Matching Problem

Given a set of subscriptions, S, and a publication, e,

return all s in S matched by e.

e is referred to as event or publication Splitting hairs

Event is a state transition of interest in the

environment

Publication is the information about e submitted to the

publish/subscribe system

Simple problem statement, widely applicable, and lots of

  • pen questions
slide-12
SLIDE 12

Publish/Subscribe Lecture 12

MIDDLEWARE SYSTEMS RESEARCH GROUP

Problem Instantiations I

Text / search strings (information filtering) Semi-structured data / queries

attribute-value pairs / attribute-operator-value-

predicates

XML, HTML

Tree-structured data / path expressions

XML ./ XPath expressions

Graph-structured data / graph queries

RDF / RDF queries (e.g., SPARQL)

Regular languages / regular expressions Tables / SQL queries

slide-13
SLIDE 13

Publish/Subscribe Lecture 13

MIDDLEWARE SYSTEMS RESEARCH GROUP

Problem Instantiations II

Different matching semantics

Crisp Approximate, Similar n-of-m (n of m predicates match) Probability of match

slide-14
SLIDE 14

Publish/Subscribe Lecture 14

MIDDLEWARE SYSTEMS RESEARCH GROUP

Problem Instantiations III

Centralized and distributed instantiation Networking architecture

Internet (as overlay network) Peer-to-peer style interface (DHT, table-lookup) With mobile publishers, subscribers, brokers Ad hoc network

slide-15
SLIDE 15

Publish/Subscribe Lecture 15

MIDDLEWARE SYSTEMS RESEARCH GROUP

Publish/Subscribe Models

Channel-based model

Subscribe & publish to a channel

Topic-based model

… topics and topic hierarchy

Type-based model

… typed objects

Content-based model

… to content of messages

Subject-spaces (State-based model)

Maintain state in publications and subscriptions

slide-16
SLIDE 16

Publish/Subscribe Lecture 16

MIDDLEWARE SYSTEMS RESEARCH GROUP

Channel-based

Publisher Broadcast Channel Publisher Subscriber Subscriber Subscriber

slide-17
SLIDE 17

Publish/Subscribe Lecture 17

MIDDLEWARE SYSTEMS RESEARCH GROUP

Topic-based publish/subscribe

publication news Canada politics sports soccer US politics sports soccer

slide-18
SLIDE 18

Publish/Subscribe Lecture 18

MIDDLEWARE SYSTEMS RESEARCH GROUP

The Content-based Model

Language and Data model

Conjunctive Boolean functions over predicates Predicates are attribute-operator-value triples

[class,=,trigger]

Subscriptions are conjunctions of predicates

[class,=,trigger],[appl,=,payroll],[gid,=,g001]

Publications are sets of attribute-value pairs

[class,trigger],[appl,printer],[gid,g007]

Matching semantic

A subscription matches if all its predicates are matched

slide-19
SLIDE 19

Publish/Subscribe Lecture 19

MIDDLEWARE SYSTEMS RESEARCH GROUP

Content-based routing

Distributed publish/subscribe Network of publish/subscribe brokers Subscriptions & publications are injected

into network at closest edge broker

Routing protocol distributes subscriptions

throughout network

Network routes relevant publications to

interested subscribers

Routing is based on content; it is not

based on addresses, which are not available

Subscriptions may change dynamically

slide-20
SLIDE 20

MIDDLEWARE SYSTEMS RESEARCH GROUP

Content-based Routing

Publisher Subscriber

  • 1. Advertise
  • 2. Subscribe
  • 3. Publish

Event-Based Decoupled Flexible Responsive Content Routing Declarative A: [class, =, stock], [name, =, HP], [price, >, 50] S: [class, =, stock], [name, =, *], [price, >, 50] P: [class, stock], [name, *], [price, 50]

slide-21
SLIDE 21

Publish/Subscribe Lecture 21

MIDDLEWARE SYSTEMS RESEARCH GROUP

Applications

A B C D E F

RFID and sensor networks Service oriented architecture Workflows, business processes and job scheduling Supply chain and logistics

Event-Based

Light Callback Razor SKU Transform Fault T e m p e r a t u r e Invoke Loan Order Delivered In flight J

  • b

A d

  • n

e Trigger

slide-22
SLIDE 22

Publish/Subscribe Lecture 22

MIDDLEWARE SYSTEMS RESEARCH GROUP

Publish/Subscribe in Industry

Standards

CORBA Event Service CORBA Notification

Service

OMG Data Dissemination

Service

Java Messaging Service WS Eventing WS Notification INFO-D (Grid Forum)

Emerging technologies

RSS aggregators

  • PubSub.com, FeedTree

Real-time data

dissemination

  • TIBCO, RTI Inc.,

Mantara Software

Application integration

  • Softwired

Hardware-based brokers

  • Sarvega (Intel), Solace

Systems, DataPower (IBM)

slide-23
SLIDE 23

Publish/Subscribe Lecture 23

MIDDLEWARE SYSTEMS RESEARCH GROUP

Publish/Subscribe in Academia

Research projects

Elvin (Australia) Gryphon (IBM) Hermes (Cambridge) SIENA (Boulder) REBECA (Darmstadt) ToPSS (UofT) PADRES (UofT)

slide-24
SLIDE 24

Publish/Subscribe Lecture 24

MIDDLEWARE SYSTEMS RESEARCH GROUP

The Toronto Publish/Subscribe System Family (ToPSS)

Matching algorithms

Language expressiveness,

scalability, speed

Routing protocols

Network architectures,

scalability

Higher level abstractions

Workflow execution Monitoring

S-ToPSS

(semantic)

X-ToPSS

(XML matching)

A-ToPSS

(approximate)

persistent-ToPSS

(subject spaces)

L-ToPSS

(location-based)

ToPSS

(matching)

M-ToPSS

(mobile)

Ad hoc-ToPSS

(ad hoc networking)

Federated-ToPSS

(federation of ToPSS brokers)

Rb-ToPSS

(rule-based)

P2P-ToPSS

(peer-to-peer)

LB-ToPSS

(load balancing)

FT-ToPSS

(fault tolerance)

Historic-ToPSS

(historic data)

CS-ToPSS

(composite subs)

BPEL-ToPSS

(BPEL execution)

JS-ToPSS

(job scheduling)

slide-25
SLIDE 25

Publish/Subscribe Lecture 25

MIDDLEWARE SYSTEMS RESEARCH GROUP

Overall Project Vision

A Real-Time Event-driven Enterprise

slide-26
SLIDE 26

Publish/Subscribe Lecture 26

MIDDLEWARE SYSTEMS RESEARCH GROUP

Matching and Content-based Routing

input queue

  • utput queues

Sn A A

An

Sn An

S

P

Routing Tables

S

P

slide-27
SLIDE 27

Publish/Subscribe Lecture 27

MIDDLEWARE SYSTEMS RESEARCH GROUP

Matching Algorithms

For solving the pub/sub matching problem Tree-based algorithms Graph-based algorithms Automaton-based algorithms (NFA, DFA) Two-staged algorithms

predicate matching subscription matching

slide-28
SLIDE 28

Publish/Subscribe Lecture 28

MIDDLEWARE SYSTEMS RESEARCH GROUP

Predicate Matching

slide-29
SLIDE 29

Publish/Subscribe Lecture 29

MIDDLEWARE SYSTEMS RESEARCH GROUP

Predicate matching problem

Given a set P of predicates and an event e,

identify all predicates p of P which evaluate to true under e.

Example:

e = {…, (price, 5), (color, white) …} p1: price > 5; p2: color = red; p3: price < 4 p1 is false p2 is false p3 is true predicate bit vector:

1 ... p1 p2 p3

predicate IDs

slide-30
SLIDE 30

Publish/Subscribe Lecture 30

MIDDLEWARE SYSTEMS RESEARCH GROUP

Data structure overview

price

... ... predicate index

hash table on

attribute name

e = {…, (price, 5), …}

color

slide-31
SLIDE 31

Publish/Subscribe Lecture 31

MIDDLEWARE SYSTEMS RESEARCH GROUP

Predicate Index

price = < > !=

1 p17 5 p4 p11 2 p6 1 p7 5 p1 5 p9 4 p3 7 p13

  • ne ordered linked list per operator

insert, delete, match are O(n)-operations (per

attribute name in e and per operator)

alternatively, use a B-tree or B+-tree etc.

}=

slide-32
SLIDE 32

Publish/Subscribe Lecture 32

MIDDLEWARE SYSTEMS RESEARCH GROUP

Observations

countable domain types with small cardinality

integer intervals collections (enums) a set of tags

Examples

price : [0, 1000], models variety of prices color, city, state, country, size, weight all tags defined in a given DTD predicate domain is often context dependant, but limited

in size

prices of cars vs. prices of groceries

slide-33
SLIDE 33

Publish/Subscribe Lecture 33

MIDDLEWARE SYSTEMS RESEARCH GROUP

Predicate Matching for Finite Domains

price :[ 0, 1000] = < > !=

1000

e = {…, (price, 5), …}

p4 p1 p3

p1: price > 5; p3: price < 4; p4: price = 5; p9: price != 5...

1 2 3 ... 6 ... ... ... ... ... p7 p9 p13 p17 p11 p6

slide-34
SLIDE 34

Publish/Subscribe Lecture 34

MIDDLEWARE SYSTEMS RESEARCH GROUP

Predicate Matching Symmetries

price :[ 0, 1000]

T

> < !=

1000

e = {…, (price, 5), …}

p4 p1 p3 1 2 3 ... 6 ... ... ... ... ... p7 p9 p13 p17 p11 p6

=

F F F F F F F F F F F F F F F F T T T T

T T T T … F … T T T T

slide-35
SLIDE 35

Publish/Subscribe Lecture 35

MIDDLEWARE SYSTEMS RESEARCH GROUP

Experiments and evaluation

slide-36
SLIDE 36

Publish/Subscribe Lecture 36

MIDDLEWARE SYSTEMS RESEARCH GROUP

Predicate Matching Performance (list-based scheme)

domain sizes: 250, 10,000, 100,000

4.5 M 500 K

slide-37
SLIDE 37

Publish/Subscribe Lecture 37

MIDDLEWARE SYSTEMS RESEARCH GROUP

Predicate Matching Performance (table-based scheme)

4.5M 500K domain sizes: 250, 10,000, 100,000

slide-38
SLIDE 38

Publish/Subscribe Lecture 38

MIDDLEWARE SYSTEMS RESEARCH GROUP

Predicate Matching Performance (tables-based vs. list-based scheme)

for the mixed domain

4.5 M

slide-39
SLIDE 39

Publish/Subscribe Lecture 39

MIDDLEWARE SYSTEMS RESEARCH GROUP

Predicate Matching Data Structure Size (table-based vs. list-based scheme)

20 MB 1.4 GB 4 M

slide-40
SLIDE 40

Publish/Subscribe Lecture 40

MIDDLEWARE SYSTEMS RESEARCH GROUP

Subscription Matching

slide-41
SLIDE 41

Publish/Subscribe Lecture 41

MIDDLEWARE SYSTEMS RESEARCH GROUP

Multiple one-dimension indexes

One-dimension indexes.

hash tables B-trees Interval Skip Lists

Counting algorithm Hanson algorithm Propagation algorithm

slide-42
SLIDE 42

Publish/Subscribe Lecture 42

MIDDLEWARE SYSTEMS RESEARCH GROUP

Counting algorithm

Subscriptions consist of a set of predicates

S1: (2< A<4) & (B=6) & (C >4) ⇒ pA : (2< A<4), pB (B=6), pC:(C>4) S1: (2< A<4) & (C=3)⇒ pA : (2< A<4) pC: (C=3)

A Subscription matches the event if all its

predicates are satisfied.

Idea: Count the number of satisfied predicates

per subscription

slide-43
SLIDE 43

Publish/Subscribe Lecture 43

MIDDLEWARE SYSTEMS RESEARCH GROUP

Data structures for the counting algorithm

p1 p2 p3 p4 S1S2 S1 S2 S1

TOTAL NUMBER COUNT

S1 S1 S2 S2 3 2

Indexes

A B C S1: p1,p2,p4 S3 0

Predicate vector

S2: p1,p3 S3: p3,p5 S3 S3 2

p1 p2 p3 p4 p3 p5

p3 p5 S3

slide-44
SLIDE 44

Publish/Subscribe Lecture 44

MIDDLEWARE SYSTEMS RESEARCH GROUP

Counting algorithm

p1 p2 p5 p3 p4 S1S2 S1 S2 S1

TOTAL NUMBER COUNT

S1 S1 S2 S2 3 2

Indexes

A B C S3 0

Predicate vector

E: (A,5),(B,6) S3 S3 2

p1 p2 p3 p4 p5 p3 p2

p3 p2 S3

p1

p1 S1 S1 S2 S2 S3 1 1 1 2 2 2

slide-45
SLIDE 45

Publish/Subscribe Lecture 45

MIDDLEWARE SYSTEMS RESEARCH GROUP

Subscription Matching

hit vector preds-per-sub

  • pred. bit vec

s17 s4 s11 s6 s7 s1 s9 s3 s13

  • pred. bit vec
  • bit-matrix based
  • list based

hit vector preds-per-sub

subscription IDs

1 1 1 ... + sub-pred association to support deletion

slide-46
SLIDE 46

Publish/Subscribe Lecture 46

MIDDLEWARE SYSTEMS RESEARCH GROUP

Clustering Algorithm

p5

Indexes

S1: p1,p2,p4 S2: p1,p3,p5 S3: p1,p4

Access Predicate vector

0 0 0

Bit Vector

p1 p2 p3 p4 p5 p6 p1

s1 p2 p4 s2 p3 p5 detail of cluster

list of clusters 2 1 s3

A B C

p1 p2 p3 p4 p5 p3 p2 p1

E: (A,5),(B,6),(C,3) satisfies p1,p2,p3,p4

s1,s2

p1 0 0 0 1 1 1 1

s1

p2 p4 p3

s2

p5

s1,s2 p4