Applications of Machine Learning to Performance Evaluation Daniel - - PowerPoint PPT Presentation

applications of machine learning to performance evaluation
SMART_READER_LITE
LIVE PREVIEW

Applications of Machine Learning to Performance Evaluation Daniel - - PowerPoint PPT Presentation

Random walks on graphs Hidden Markov Models and prediction Applications of Machine Learning to Performance Evaluation Daniel Sadoc Menasche 1 Edmundo de Souza e Silva 2 Federal University of Rio de Janeiro 1 Computer Science Department,


slide-1
SLIDE 1

1/82

Random walks on graphs Hidden Markov Models and prediction

Applications of Machine Learning to Performance Evaluation

Daniel Sadoc Menasche1 Edmundo de Souza e Silva2

Federal University of Rio de Janeiro

1Computer Science Department, Institute of Mathematics 2Systems Engineering and Computer Science Department, COPPE

2012

  • D. Menasche, E. de Souza e Silva
slide-2
SLIDE 2

2/82

Random walks on graphs Hidden Markov Models and prediction

Motivation

Large amount of data produced and consumed everyday social networks

  • nline video streaming

microblogging genome information How to obtain insights from data?

data insight ?

  • D. Menasche, E. de Souza e Silva
slide-3
SLIDE 3

3/82

Random walks on graphs Hidden Markov Models and prediction

Which insights are we interested in?

We would like to recommend for search engine users, what is the most important webpage? for clients of an online store, which book to suggest based on browsing history? when distributing vaccines, which person to immunize first? when watching a movie, which resolution to use?

  • D. Menasche, E. de Souza e Silva
slide-4
SLIDE 4

4/82

Random walks on graphs Hidden Markov Models and prediction

Obtaining insights from traces

Machine learning = automatic pattern recognition Performance evaluation = model building and analysis In this talk: how machine learning tools can help to solve performance evaluation problems (and vice versa)

traces recommendations models (performance evaluation) machine learning

  • D. Menasche, E. de Souza e Silva
slide-5
SLIDE 5

5/82

Random walks on graphs Hidden Markov Models and prediction

What do the traces provide?

Traces provide structural (relational) information about entity relationships

structure encoded in graphs (networks) graph set of vertices and edges (links)

propositional information about individual entities

user preferences (e.g., rate given by a user to a movie) user characteristics (e.g., sex of a user) path properties (e.g., loss probability and delay in a network path)

Structural and propositional information may or may not vary over time

  • D. Menasche, E. de Souza e Silva
slide-6
SLIDE 6

6/82

Random walks on graphs Hidden Markov Models and prediction

Social networks

School dating network Sexual contact network

Data: connections between people Recommendation: who is the most influential person?

  • D. Menasche, E. de Souza e Silva
slide-7
SLIDE 7

7/82

Random walks on graphs Hidden Markov Models and prediction

Tuberculosis network

black=clinical disease pink=dormant virus green=exposed to virus gray=unknown

Data: connections between people/virus exposure Recommendation: whom to imunize first?

  • D. Menasche, E. de Souza e Silva
slide-8
SLIDE 8

8/82

Random walks on graphs Hidden Markov Models and prediction

Collaboration networks

Data: connections between researchers Recommendation: who is the most relevant researcher in field A from the perspective of field B?

  • D. Menasche, E. de Souza e Silva
slide-9
SLIDE 9

9/82

Random walks on graphs Hidden Markov Models and prediction

Yeast protein network

Data: connections between interacting proteins Recommendation: which proteins to use for diagnosis?

  • D. Menasche, E. de Souza e Silva
slide-10
SLIDE 10

10/82

Random walks on graphs Hidden Markov Models and prediction

Who bought what network

Link between books A and B, if users that bought A also bought B red=republican blue=democrat magenta=neutral

Data: items bought together Recommendation: which item to recommend next?

  • D. Menasche, E. de Souza e Silva
slide-11
SLIDE 11

11/82

Random walks on graphs Hidden Markov Models and prediction

Internet network

Data: connections between routers Recommendation: which routers should be more protected?

  • D. Menasche, E. de Souza e Silva
slide-12
SLIDE 12

12/82

Random walks on graphs Hidden Markov Models and prediction

Website network

Data: connections between webpages Recommendation: which webpage to suggest to a web user?

  • D. Menasche, E. de Souza e Silva
slide-13
SLIDE 13

13/82

Random walks on graphs Hidden Markov Models and prediction

For more examples

Newman’s website

http://www-personal.umich.edu/~mejn/networks/

Kreb’s website

http://www.orgnet.com/

  • D. Menasche, E. de Souza e Silva
slide-14
SLIDE 14

14/82

Random walks on graphs Hidden Markov Models and prediction

Tutorial goal

Provide performance evaluation perspective on machine learning problems Focus on problems rather than details about tools and techniques

  • D. Menasche, E. de Souza e Silva
slide-15
SLIDE 15

15/82

Random walks on graphs Hidden Markov Models and prediction

Questions addressed by tutorial

How to find the most influential node in a network? How to rank nodes? How to devise better recommendation mechanisms? What are good generative models for traffic load? How to predict behavior?

  • D. Menasche, E. de Souza e Silva
slide-16
SLIDE 16

16/82

Random walks on graphs Hidden Markov Models and prediction

The big picture

Markov chain based models traces with structural and propositional information e.g., 1) links between nodes, 2) user preferences, 3) loss probabilities recommendations inference

  • D. Menasche, E. de Souza e Silva
slide-17
SLIDE 17

17/82

Random walks on graphs Hidden Markov Models and prediction

Outline

1

Random walks on graphs

2

Hidden Markov Models and prediction

  • D. Menasche, E. de Souza e Silva
slide-18
SLIDE 18

18/82

Random walks on graphs Hidden Markov Models and prediction

Recall: Goal

Given a trace, social network movie correlation matrix web pages links questions are how to efficiently infer most central (influential) node? how to rank nodes? given item A and items B and C, which item to recommend after A? But how to define metrics that allow us to answer the questions above?

  • D. Menasche, E. de Souza e Silva
slide-19
SLIDE 19

19/82

Random walks on graphs Hidden Markov Models and prediction

Centrality metrics

Classic centrality metrics Degree: number of links incident upon node Betweenness: fraction of shortest paths between pairs that pass through node Random walk based centrality metrics Random walk betweenness PageRank

  • D. Menasche, E. de Souza e Silva
slide-20
SLIDE 20

20/82

Random walks on graphs Hidden Markov Models and prediction

Key abstraction: random surfer (or random walker)

What is a random walk? Experiment by a random surfer (e.g., user navigating through web pages)

Start from a random node Choose any link on that node at random Follow that link, and choose any link on that node at random, and so on...

Record how often each node was visited

  • D. Menasche, E. de Souza e Silva
slide-21
SLIDE 21

21/82

Random walks on graphs Hidden Markov Models and prediction

Simple example

1/3 1/3 1/3 1 1 1/2 1/2

p2 p3 p4 p1 # of Visits Fraction of Visits Probability of Visit 1 2 3 4 Random Walker Power Method 1 1 1 0 0 0 0 0 0 0 0 0 0 0.33 0.33 0.33 0 0 0 1 0.5 0 0 0.5 0 0 1 0 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1

  • D. Menasche, E. de Souza e Silva
slide-22
SLIDE 22

22/82

Random walks on graphs Hidden Markov Models and prediction

Simple example

1/3 1/3 1/3 1 1 1/2 1/2

p2 p1 p3 p4 # of Visits Fraction of Visits Probability of Visit 1 2 3 4 Random Walker Power Method 1 0.5 0 0 0 0.33 0 0 0.33 1 0.5 0.33 0 0.33 0.33 0.33 0 0 0 1 0.5 0 0 0.5 0 0 1 0 1 0 0.33 0.33 0.33 0 0 0 1 0.5 0 0 0.5 0 0 1 0

  • D. Menasche, E. de Souza e Silva
slide-23
SLIDE 23

23/82

Random walks on graphs Hidden Markov Models and prediction

Simple example

1/3 1/3 1/3 1 1 1/2 1/2

p2 p1 p3 p4 # of Visits Fraction of Visits Probability of Visit 1 2 3 4 Random Walker Power Method 1 0.33 0.1667 0 0 0 1 0.33 0.3333 1 0.33 0.5 0 0.33 0.33 0.33 0 0 0 1 0.5 0 0 0.5 0 0 1 0 2 0.16 0 0.33 0.5 0 0 1 0 0 0.16 0.66 0.16 0.5 0 0 0.5

  • D. Menasche, E. de Souza e Silva
slide-24
SLIDE 24

24/82

Random walks on graphs Hidden Markov Models and prediction

Simple example

1/3 1/3 1/3 1 1 1/2 1/2

p2 p1 p3 p4 # of Visits Fraction of Visits Probability of Visit 1 2 3 4 Random Walker Power Method 1 0.25 0.1667 0 0 0.0556 1 0.25 0.5556 2 0.5 0.2222 0 0.33 0.33 0.33 0 0 0 1 0.5 0 0 0.5 0 0 1 0 3 0.16 0.05 0.55 0.22 0.5 0 0 0.5 0.33 0 0.16 0.5 0 0.16 0.66 0.16

  • D. Menasche, E. de Souza e Silva
slide-25
SLIDE 25

25/82

Random walks on graphs Hidden Markov Models and prediction

Simple example

1/3 1/3 1/3 1 1 1/2 1/2

p2 p1 p3 p4 # of Visits Fraction of Visits Probability of Visit 1 2 3 4 Random Walker Power Method 200 0.20 0.20 67 0.06 0.06 400 0.40 0.40 333 0.33 0.33 0 0.33 0.33 0.33 0 0 0 1 0.5 0 0 0.5 0 0 1 0 1000 0.2 0.06 0.4 0.33 0.2 0.06 0.4 0.33 0.2 0.06 0.4 0.33 0.2 0.06 0.4 0.33

  • D. Menasche, E. de Souza e Silva
slide-26
SLIDE 26

26/82

Random walks on graphs Hidden Markov Models and prediction

PageRank Centrality

Important vertex connected to many vertices or few important vertices A = adjacency matrix Aij = 1, if i connected to j Aij = 0, otherwise di = out degree of vertex i πi = centrality of vertex i, πi =

  • j

Aij dj πj

  • D. Menasche, E. de Souza e Silva
slide-27
SLIDE 27

27/82

Random walks on graphs Hidden Markov Models and prediction

PageRank Centrality

  • D. Menasche, E. de Souza e Silva
slide-28
SLIDE 28

28/82

Random walks on graphs Hidden Markov Models and prediction

PageRank Centrality

πi = centrality of vertex i, πi =

  • j

Aij dj πj source = node that has in degree zero sink = node that has out degree zero (dangling node) What is the centrality of sources and sinks?

  • D. Menasche, E. de Souza e Silva
slide-29
SLIDE 29

29/82

Random walks on graphs Hidden Markov Models and prediction

What is centrality of sources and sinks?

Recall that citation networks are usually acyclic web page networks may have sources and sinks (e.g., pdf files)

1/3 1/3 1/3 1 1/2 1/2

p2 p1 p3 p4

The random surfer is stuck

  • D. Menasche, E. de Souza e Silva
slide-30
SLIDE 30

30/82

Random walks on graphs Hidden Markov Models and prediction

Possible solutions to dangling nodes

After entering a dangling node ⇒ hyperlink to any page (equal probability) There is a set of trusted web pages.

with probability 1 − d the random surfer does not follow hyperlinks. instead, jump to a trusted page (with equal probability).

(1-d)/2+d/3 d/3 (1-d)/2+d/3 d d/2 d/2

p2 p1 p3 p4

(1-d)/2 (1-d)/2 (1-d)/2 (1-d)/2 (1-d)/2 (1-d)/2 d

  • D. Menasche, E. de Souza e Silva
slide-31
SLIDE 31

31/82

Random walks on graphs Hidden Markov Models and prediction

Notation

Let π be the vector of page ranks for the entire web T set of trusted pages nT = |T | cardinality of set of trusted pages T be a vector which has all zero’s except for non-zero values in positions corresponding to the trusted pages. In those locations the value is 1/nT.

(1-d)/2+d/3 d/3 (1-d)/2+d/3 d d/2 d/2

p2 p1 p3 p4

(1-d)/2 (1-d)/2 (1-d)/2 (1-d)/2 (1-d)/2 (1-d)/2 d

  • D. Menasche, E. de Souza e Silva
slide-32
SLIDE 32

32/82

Random walks on graphs Hidden Markov Models and prediction

Google’s equation

Google’s equation πi = d

  • j

πj 1 cj + (1 − d)T[i] ∀ i Everyday, new applications of random surfer abstraction

Rank web pages [Brin and Page, 1998] Recommend movies [Bogers et al., CARS 2010] Control and planning [Mahadevan, AAAI 2010] Cure cancer [Winter et al., PLOS 2012]

Heinrich Hertz (regarding Maxwell’s equation)

One cannot escape the feeling that these mathematical formulas have an independent existence and an intelligence

  • f their own, that they are wiser than we are, wiser even than

their discoverers, that we get more out of them than was

  • riginally put into them.
  • D. Menasche, E. de Souza e Silva
slide-33
SLIDE 33

32/82

Random walks on graphs Hidden Markov Models and prediction

Google’s equation

Google’s equation πi = d

  • j

πj 1 cj + (1 − d)T[i] ∀ i Everyday, new applications of random surfer abstraction

Rank web pages [Brin and Page, 1998] Recommend movies [Bogers et al., CARS 2010] Control and planning [Mahadevan, AAAI 2010] Cure cancer [Winter et al., PLOS 2012]

Heinrich Hertz (regarding Maxwell’s equation)

One cannot escape the feeling that these mathematical formulas have an independent existence and an intelligence

  • f their own, that they are wiser than we are, wiser even than

their discoverers, that we get more out of them than was

  • riginally put into them.
  • D. Menasche, E. de Souza e Silva
slide-34
SLIDE 34

33/82

Random walks on graphs Hidden Markov Models and prediction

Random walks on graphs

Until now: steady state analysis of Markov chains Applications of transient analysis

Movement models for mobile computing Obtaining relevance score between two nodes (one the fundamental problems in data mining) Spectral partitioning of a network into clusters Given a node A, how closely related are B and C to A?

  • D. Menasche, E. de Souza e Silva
slide-35
SLIDE 35

33/82

Random walks on graphs Hidden Markov Models and prediction

Random walks on graphs

Until now: steady state analysis of Markov chains Applications of transient analysis

Movement models for mobile computing Obtaining relevance score between two nodes (one the fundamental problems in data mining) Spectral partitioning of a network into clusters Given a node A, how closely related are B and C to A?

  • D. Menasche, E. de Souza e Silva
slide-36
SLIDE 36

34/82

Random walks on graphs Hidden Markov Models and prediction

Relevance scores

1/3 1/3 1/3 1 1 1/2 1/2

p2 p1 p3 p4

1/3 1/3 1/3 1 1

p2 p1 p3 p4

1 1/3 1/3 1/3 1

p2 p1 p3 p4

1/2 1/2 1

mean number of transitions to reach p3 = 1/π3 mean number of transitions to reach p2 = 1/π2

Initial page: p1

  • D. Menasche, E. de Souza e Silva
slide-37
SLIDE 37

35/82

Random walks on graphs Hidden Markov Models and prediction

Relevance scores

1/3 1/3 1/3 1 1 1/2 1/2

p2 p1 p3 p4

1/3 1/3 1/3 1 1

p2 p1 p3 p4

1 1/3 1/3 1/3 1

p2 p1 p3 p4

1/2 1/2 1

mean number of transitions to reach p3 = 1/π3 = mean number of transitions to reach p2 = 1/π2 =

Initial page: p1

3 11

  • D. Menasche, E. de Souza e Silva
slide-38
SLIDE 38

36/82

Random walks on graphs Hidden Markov Models and prediction

Summary: random walks on graphs

Abstract model: graph model, probabilities associated with edges Useful theory

reversible MCs theory aggregation/disaggregation theory transient analysis Markov chains

applied to new problems

  • D. Menasche, E. de Souza e Silva
slide-39
SLIDE 39

37/82

Random walks on graphs Hidden Markov Models and prediction

Dynamics

How to account for time component?

Graph is discovered online over time Graph changes over time

  • D. Menasche, E. de Souza e Silva
slide-40
SLIDE 40

38/82

Random walks on graphs Hidden Markov Models and prediction

Dynamics: online graph discovery

In real networks Network might be too large to compute exact centralities Network structure might be hidden from public view Question: how to efficiently identify top k most central nodes without complete access to entire network? Local versus global computations Degree centrality can be locally obtained Other centrality metrics require global knowledge

  • D. Menasche, E. de Souza e Silva
slide-41
SLIDE 41

38/82

Random walks on graphs Hidden Markov Models and prediction

Dynamics: online graph discovery

In real networks Network might be too large to compute exact centralities Network structure might be hidden from public view Question: how to efficiently identify top k most central nodes without complete access to entire network? Local versus global computations Degree centrality can be locally obtained Other centrality metrics require global knowledge

  • D. Menasche, E. de Souza e Silva
slide-42
SLIDE 42

39/82

Random walks on graphs Hidden Markov Models and prediction

Methodology

Original Network Sampling Identification Sampled Network Sampled Network

  • D. Menasche, E. de Souza e Silva
slide-43
SLIDE 43

40/82

Random walks on graphs Hidden Markov Models and prediction

Methodology

Original Network Sampling Identification Sampled Network Sampled Network Node in the k most central nodes might not be sampled by sampling algorithm Estimated the k most central nodes can be in- correct top k in actual network

  • D. Menasche, E. de Souza e Silva
slide-44
SLIDE 44

41/82

Random walks on graphs Hidden Markov Models and prediction

Devices, collaborations and social networks

Rank correlation between degree and other centralities

Set Type # of nodes # of edges Description AS-Snapshot Device 22,963 48,436 Snapshot of Internet at level of AS ca-CondMat Collaboration 23,133 186,936 ArXiv Condense Matter ca-HepPh Collaboration 12,008 237,010 ArXiv High Energy Physics email-Enron Social 36,692 367,662 Email network from Enron

  • D. Menasche, E. de Souza e Silva
slide-45
SLIDE 45

42/82

Random walks on graphs Hidden Markov Models and prediction

Sampling and identification options

Sampling phase Random-walk sampling Identification phase Recalculation in sampled network Degree as alias to other centralities

  • D. Menasche, E. de Souza e Silva
slide-46
SLIDE 46

43/82

Random walks on graphs Hidden Markov Models and prediction

Random-walk sampling

Start from randomly selected node Visit next node uniformly at random Assume random-walker can query degrees of visited nodes

2 1 3

  • Sampling Information

Node Queried degree node1 5 node2 6 node3 4 . . .

  • D. Menasche, E. de Souza e Silva
slide-47
SLIDE 47

44/82

Random walks on graphs Hidden Markov Models and prediction

Performance of random-walk sampling

Random-walker quickly includes desired top k nodes into sampled set Fraction of top k nodes in sampled set

ca-CondMat RW efficiently col- lects most central nodes with small sampled fraction

  • D. Menasche, E. de Souza e Silva
slide-48
SLIDE 48

45/82

Random walks on graphs Hidden Markov Models and prediction

Sampling and identification options

Sampling phase Random-walk sampling Identification phase Recalculation in sampled network Degree as alias to other centralities

  • D. Menasche, E. de Souza e Silva
slide-49
SLIDE 49

46/82

Random walks on graphs Hidden Markov Models and prediction

Identification strategy: recalculation

Centralities recalculated on sampled network Recalculated centralities are approximation of original centralities

Centrality calcu- lation on sampled net- work

  • D. Menasche, E. de Souza e Silva
slide-50
SLIDE 50

47/82

Random walks on graphs Hidden Markov Models and prediction

Identification strategy: degree as alias

Visited nodes sorted according to queried degrees Top k nodes in sorted list taken as top k highest other centralities

Use side in- formation for identifying most central nodes

  • D. Menasche, E. de Souza e Silva
slide-51
SLIDE 51

48/82

Random walks on graphs Hidden Markov Models and prediction

Performance of identification strategies

How accurately identify desired top k nodes in sampled network? Overlap ration between identified top k nodes and original top k nodes

ca-CondMat Deg-alias Re- calc. Facebook: more than 500 million users

  • D. Menasche, E. de Souza e Silva
slide-52
SLIDE 52

49/82

Random walks on graphs Hidden Markov Models and prediction

For further details

Online Estimating the k Central Nodes of a Network, Y. Lim, D. Menasche, B. Ribeiro, D. Towsley, P . Basu, IEEE Network Science Workshop, Westpoint, 2011

  • D. Menasche, E. de Souza e Silva
slide-53
SLIDE 53

50/82

Random walks on graphs Hidden Markov Models and prediction

Dynamics: graph changes over time

How to infer most influential nodes in a changing world? Break for advertisement from authors Related work (main conference)

Characterizing Continuous Time Random Walks on Time Varying Graphs, Daniel Figueiredo (Federal University of Rio de Janeiro - UFRJ), Philippe Nain (INRIA), Bruno Ribeiro (UMass Amherst), Edmundo de Souza e Silva (UFRJ) and Don Towsley (UMass Amherst)

  • D. Menasche, E. de Souza e Silva
slide-54
SLIDE 54

50/82

Random walks on graphs Hidden Markov Models and prediction

Dynamics: graph changes over time

How to infer most influential nodes in a changing world? Break for advertisement from authors Related work (main conference)

Characterizing Continuous Time Random Walks on Time Varying Graphs, Daniel Figueiredo (Federal University of Rio de Janeiro - UFRJ), Philippe Nain (INRIA), Bruno Ribeiro (UMass Amherst), Edmundo de Souza e Silva (UFRJ) and Don Towsley (UMass Amherst)

  • D. Menasche, E. de Souza e Silva
slide-55
SLIDE 55

50/82

Random walks on graphs Hidden Markov Models and prediction

Dynamics: graph changes over time

How to infer most influential nodes in a changing world? Break for advertisement from authors Related work (main conference)

Characterizing Continuous Time Random Walks on Time Varying Graphs, Daniel Figueiredo (Federal University of Rio de Janeiro - UFRJ), Philippe Nain (INRIA), Bruno Ribeiro (UMass Amherst), Edmundo de Souza e Silva (UFRJ) and Don Towsley (UMass Amherst)

  • D. Menasche, E. de Souza e Silva
slide-56
SLIDE 56

51/82

Random walks on graphs Hidden Markov Models and prediction

Outline

1

Random walks on graphs

2

Hidden Markov Models and prediction

  • D. Menasche, E. de Souza e Silva
slide-57
SLIDE 57

52/82

Random walks on graphs Hidden Markov Models and prediction

Hidden Markov Models and prediction

Use of HMMs

speech recognition signal processing artificial intelligence computational biology image processing finance medical diagnosis

. . .

  • D. Menasche, E. de Souza e Silva
slide-58
SLIDE 58

53/82

Random walks on graphs Hidden Markov Models and prediction

Hidden Markov Models

Use of HMMs

Given a time series, how to parameterize model to predict future values?

inferring customer behavior modeling network channel losses modeling traffic generating workload

. . .

Note: we have traces of time series of one or more variables. Is there a structure behind the data?

  • D. Menasche, E. de Souza e Silva
slide-59
SLIDE 59

53/82

Random walks on graphs Hidden Markov Models and prediction

Hidden Markov Models

Use of HMMs

Given a time series, how to parameterize model to predict future values?

inferring customer behavior modeling network channel losses modeling traffic generating workload

. . .

Note: we have traces of time series of one or more variables. Is there a structure behind the data?

  • D. Menasche, E. de Souza e Silva
slide-60
SLIDE 60

53/82

Random walks on graphs Hidden Markov Models and prediction

Hidden Markov Models

Use of HMMs

Given a time series, how to parameterize model to predict future values?

inferring customer behavior modeling network channel losses modeling traffic generating workload

. . .

Note: we have traces of time series of one or more variables. Is there a structure behind the data?

  • D. Menasche, E. de Souza e Silva
slide-61
SLIDE 61

53/82

Random walks on graphs Hidden Markov Models and prediction

Hidden Markov Models

Use of HMMs

Given a time series, how to parameterize model to predict future values?

inferring customer behavior modeling network channel losses modeling traffic generating workload

. . .

Note: we have traces of time series of one or more variables. Is there a structure behind the data?

  • D. Menasche, E. de Souza e Silva
slide-62
SLIDE 62

53/82

Random walks on graphs Hidden Markov Models and prediction

Hidden Markov Models

Use of HMMs

Given a time series, how to parameterize model to predict future values?

inferring customer behavior modeling network channel losses modeling traffic generating workload

. . .

Note: we have traces of time series of one or more variables. Is there a structure behind the data?

  • D. Menasche, E. de Souza e Silva
slide-63
SLIDE 63

53/82

Random walks on graphs Hidden Markov Models and prediction

Hidden Markov Models

Use of HMMs

Given a time series, how to parameterize model to predict future values?

inferring customer behavior modeling network channel losses modeling traffic generating workload

. . .

Note: we have traces of time series of one or more variables. Is there a structure behind the data?

  • D. Menasche, E. de Souza e Silva
slide-64
SLIDE 64

53/82

Random walks on graphs Hidden Markov Models and prediction

Hidden Markov Models

Use of HMMs

Given a time series, how to parameterize model to predict future values?

inferring customer behavior modeling network channel losses modeling traffic generating workload

. . .

Note: we have traces of time series of one or more variables. Is there a structure behind the data?

  • D. Menasche, E. de Souza e Silva
slide-65
SLIDE 65

53/82

Random walks on graphs Hidden Markov Models and prediction

Hidden Markov Models

Use of HMMs

Given a time series, how to parameterize model to predict future values?

inferring customer behavior modeling network channel losses modeling traffic generating workload

. . .

Note: we have traces of time series of one or more variables. Is there a structure behind the data?

  • D. Menasche, E. de Souza e Silva
slide-66
SLIDE 66

54/82

Random walks on graphs Hidden Markov Models and prediction

Foundation

Performance/Availability

Analyst understands how the system works Markovian models:

Important issue: choice of state variables Models parameterized from some prior knowledge of the system behavior

Key point: the system state is directly observable

  • D. Menasche, E. de Souza e Silva
slide-67
SLIDE 67

54/82

Random walks on graphs Hidden Markov Models and prediction

Foundation

Performance/Availability

Analyst understands how the system works Markovian models:

Important issue: choice of state variables Models parameterized from some prior knowledge of the system behavior

Key point: the system state is directly observable

  • D. Menasche, E. de Souza e Silva
slide-68
SLIDE 68

54/82

Random walks on graphs Hidden Markov Models and prediction

Foundation

Performance/Availability

Analyst understands how the system works Markovian models:

Important issue: choice of state variables Models parameterized from some prior knowledge of the system behavior

Key point: the system state is directly observable

  • D. Menasche, E. de Souza e Silva
slide-69
SLIDE 69

54/82

Random walks on graphs Hidden Markov Models and prediction

Foundation

Performance/Availability

Analyst understands how the system works Markovian models:

Important issue: choice of state variables Models parameterized from some prior knowledge of the system behavior

Key point: the system state is directly observable

  • D. Menasche, E. de Souza e Silva
slide-70
SLIDE 70

54/82

Random walks on graphs Hidden Markov Models and prediction

Foundation

Performance/Availability

Analyst understands how the system works Markovian models:

Important issue: choice of state variables Models parameterized from some prior knowledge of the system behavior

Key point: the system state is directly observable

  • D. Menasche, E. de Souza e Silva
slide-71
SLIDE 71

55/82

Random walks on graphs Hidden Markov Models and prediction

Foundation

Hidden Markov Models

System state is assumed to not be directly observable. But... can observe values that are a probabilistic function

  • f the state of the underlying Markov process.
  • D. Menasche, E. de Souza e Silva
slide-72
SLIDE 72

55/82

Random walks on graphs Hidden Markov Models and prediction

Foundation

Hidden Markov Models

System state is assumed to not be directly observable. But... can observe values that are a probabilistic function

  • f the state of the underlying Markov process.
  • D. Menasche, E. de Souza e Silva
slide-73
SLIDE 73

56/82

Random walks on graphs Hidden Markov Models and prediction

Hidden Markov Models

Example

Customer browsing through the web pages of an online bookstore Problem: determine the probability that a specific customer is ready to order an item based on their past behavior Assumed that we have access to data that includes the intention of a customer

  • D. Menasche, E. de Souza e Silva
slide-74
SLIDE 74

56/82

Random walks on graphs Hidden Markov Models and prediction

Hidden Markov Models

Example

Customer browsing through the web pages of an online bookstore Problem: determine the probability that a specific customer is ready to order an item based on their past behavior Assumed that we have access to data that includes the intention of a customer

  • D. Menasche, E. de Souza e Silva
slide-75
SLIDE 75

56/82

Random walks on graphs Hidden Markov Models and prediction

Hidden Markov Models

Example

Customer browsing through the web pages of an online bookstore Problem: determine the probability that a specific customer is ready to order an item based on their past behavior Assumed that we have access to data that includes the intention of a customer

  • D. Menasche, E. de Souza e Silva
slide-76
SLIDE 76

57/82

Random walks on graphs Hidden Markov Models and prediction

Hidden Markov Models

Example

JB IP RO

0.02 0.2 0.3 0.6 0.18 0.3 0.2

LV

0.5 0.4 1.0 0.3 0.1

JB, just browsing IP, interested in product RO, ready to order LV, leaving

  • D. Menasche, E. de Souza e Silva
slide-77
SLIDE 77

58/82

Random walks on graphs Hidden Markov Models and prediction

Hidden Markov Models

Example

But... user’s state of intent is not directly observable Suppose only the types of pages a customer visits are

  • bservable.

O product overview D product details C set of products within a category S shopping cart P purchase E exit

Trace: O C D C D C D O C D C D S O C D S P E

  • D. Menasche, E. de Souza e Silva
slide-78
SLIDE 78

58/82

Random walks on graphs Hidden Markov Models and prediction

Hidden Markov Models

Example

But... user’s state of intent is not directly observable Suppose only the types of pages a customer visits are

  • bservable.

O product overview D product details C set of products within a category S shopping cart P purchase E exit

Trace: O C D C D C D O C D C D S O C D S P E

  • D. Menasche, E. de Souza e Silva
slide-79
SLIDE 79

58/82

Random walks on graphs Hidden Markov Models and prediction

Hidden Markov Models

Example

But... user’s state of intent is not directly observable Suppose only the types of pages a customer visits are

  • bservable.

O product overview D product details C set of products within a category S shopping cart P purchase E exit

Trace: O C D C D C D O C D C D S O C D S P E

  • D. Menasche, E. de Souza e Silva
slide-80
SLIDE 80

58/82

Random walks on graphs Hidden Markov Models and prediction

Hidden Markov Models

Example

But... user’s state of intent is not directly observable Suppose only the types of pages a customer visits are

  • bservable.

O product overview D product details C set of products within a category S shopping cart P purchase E exit

Trace: O C D C D C D O C D C D S O C D S P E

  • D. Menasche, E. de Souza e Silva
slide-81
SLIDE 81

59/82

Random walks on graphs Hidden Markov Models and prediction

Hidden Markov Models

Example

States are correlated with the sequence of observations

JB IP RO LV

O D C S P E O D C S P E O D C S P E O D C S P E

JB, just browsing IP, interested in product RO, ready to order LV, leaving 6 observable symbols (page types)

  • D. Menasche, E. de Souza e Silva
slide-82
SLIDE 82

60/82

Random walks on graphs Hidden Markov Models and prediction

Hidden Markov Models

Example

Note: In most problems we do not know how many states the hidden chain contains. the interpretation of the hidden states (e.g. the “states of intent”)

JB IP RO LV

O D C S P E O D C S P E O D C S P E O D C S P E

hidden states

  • bservable symbols

(page types)

JB, just browsing IP, interested in product RO, ready to order LV, leaving

  • D. Menasche, E. de Souza e Silva
slide-83
SLIDE 83

60/82

Random walks on graphs Hidden Markov Models and prediction

Hidden Markov Models

Example

Note: In most problems we do not know how many states the hidden chain contains. the interpretation of the hidden states (e.g. the “states of intent”)

JB IP RO LV

O D C S P E O D C S P E O D C S P E O D C S P E

hidden states

  • bservable symbols

(page types)

JB, just browsing IP, interested in product RO, ready to order LV, leaving

  • D. Menasche, E. de Souza e Silva
slide-84
SLIDE 84

60/82

Random walks on graphs Hidden Markov Models and prediction

Hidden Markov Models

Example

Note: In most problems we do not know how many states the hidden chain contains. the interpretation of the hidden states (e.g. the “states of intent”)

JB IP RO LV

O D C S P E O D C S P E O D C S P E O D C S P E

hidden states

  • bservable symbols

(page types)

JB, just browsing IP, interested in product RO, ready to order LV, leaving

  • D. Menasche, E. de Souza e Silva
slide-85
SLIDE 85

61/82

Random walks on graphs Hidden Markov Models and prediction

Hidden Markov Models

HMM Elements

The HMM elements are a set of hidden states a set of symbols the state transition probability matrix the probabilities of emitting each symbol at each state

JB IP RO LV

O D C S P E O D C S P E O D C S P E O D C S P E

JB, just browsing IP, interested in product RO, ready to order LV, leaving

  • D. Menasche, E. de Souza e Silva
slide-86
SLIDE 86

62/82

Random walks on graphs Hidden Markov Models and prediction

Questions

What is the most probable hidden state (the customer intent) given the observed sequence of pages visited? What are the model parameters that maximize the probability that the observed sequence is generated by the model? Assume 2 types of customers: young and mature.

Given a user session and the sequence of page clicks (but not the customer type), what is the probability that the customer is young versus mature?

  • D. Menasche, E. de Souza e Silva
slide-87
SLIDE 87

62/82

Random walks on graphs Hidden Markov Models and prediction

Questions

What is the most probable hidden state (the customer intent) given the observed sequence of pages visited? What are the model parameters that maximize the probability that the observed sequence is generated by the model? Assume 2 types of customers: young and mature.

Given a user session and the sequence of page clicks (but not the customer type), what is the probability that the customer is young versus mature?

  • D. Menasche, E. de Souza e Silva
slide-88
SLIDE 88

62/82

Random walks on graphs Hidden Markov Models and prediction

Questions

What is the most probable hidden state (the customer intent) given the observed sequence of pages visited? What are the model parameters that maximize the probability that the observed sequence is generated by the model? Assume 2 types of customers: young and mature.

Given a user session and the sequence of page clicks (but not the customer type), what is the probability that the customer is young versus mature?

  • D. Menasche, E. de Souza e Silva
slide-89
SLIDE 89

62/82

Random walks on graphs Hidden Markov Models and prediction

Questions

What is the most probable hidden state (the customer intent) given the observed sequence of pages visited? What are the model parameters that maximize the probability that the observed sequence is generated by the model? Assume 2 types of customers: young and mature.

Given a user session and the sequence of page clicks (but not the customer type), what is the probability that the customer is young versus mature?

  • D. Menasche, E. de Souza e Silva
slide-90
SLIDE 90

63/82

Random walks on graphs Hidden Markov Models and prediction

Problems

Problem 1

Given the observation sequence OT = O1O2 . . . OT and a model M, compute the probability of observing the output sequence OT given the underlying model M, i.e., P[OT|M].

O O O D C S P E P E what is probability? trace

JB IP RO LV

O D C S P E O D C S P E O D C S P E O D C S P E

Hidden Markov Model

0.85

probability

  • D. Menasche, E. de Souza e Silva
slide-91
SLIDE 91

64/82

Random walks on graphs Hidden Markov Models and prediction

Problems

Problem 2

Given the observation sequence OT = O1O2 . . . OT and a model M, how to determine a best state sequence QT = q1, . . . , qT which best explains the output sequence OT.

O O O D C S P E P E what is best sequence of states? trace

JB IP RO LV

O D C S P E O D C S P E O D C S P E O D C S P E

Hidden Markov Model

JB JB IP RO LV JB

sequence of states

  • D. Menasche, E. de Souza e Silva
slide-92
SLIDE 92

65/82

Random walks on graphs Hidden Markov Models and prediction

Problems

Two versions of problem 2

Given the observation sequence OT = O1O2 . . . OT

1

For each time t, what is the most probable state of the underlying MC?

2

What is the most probable sequence of states?

O O O D C S P E P E for each time t, what is the most probable state? trace

JB IP RO LV

O D C S P E O D C S P E O D C S P E O D C S P E

Hidden Markov Model

JB JB IP LV IP LV IP

sequence of states

If pIP,LV =0, * in (1), we could have s t=IP and s t+1=LV * in (2), the sequence

  • f states < IP, LV >

could never occur.

  • D. Menasche, E. de Souza e Silva
slide-93
SLIDE 93

66/82

Random walks on graphs Hidden Markov Models and prediction

Problems

Two versions of problem 2

Given the observation sequence OT = O1O2 . . . OT

1

For each time t, what is the most probable state of the underlying MC?

2

What is the most probable sequence of states?

O O O D C S P E P E what is most probable sequence of states? trace

JB IP RO LV

O D C S P E O D C S P E O D C S P E O D C S P E

Hidden Markov Model

JB JB IP LV IP LV JB

sequence of states

  • D. Menasche, E. de Souza e Silva
slide-94
SLIDE 94

67/82

Random walks on graphs Hidden Markov Models and prediction

Problems

Problem 3

Given the observation sequence OT = O1O2 . . . OT, construct an underlying model M, such that P[OT|M] is maximized.

what are best model parameters? O O O D C S P P E P P E trace JB IP RO LV

O D C S P E O D C S P E O D C S P E O D C S P E

Hidden Markov Model ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?

  • D. Menasche, E. de Souza e Silva
slide-95
SLIDE 95

68/82

Random walks on graphs Hidden Markov Models and prediction

Problems

Problem 4

Given several possible models Mi 1 ≤ i ≤ K and a sequence

  • f observed values, what is the probability that Mj is the actual

model, i.e. P[Mj|O].

O O O D C S P P E P P E

JB IP RO LV

O D C S P E O D C S P E O D C S P E O D C S P E

what is most likely model?

JB IP RO LV

O D C S P E O D C S P E O D C S P E O D C S P E

  • D. Menasche, E. de Souza e Silva
slide-96
SLIDE 96

69/82

Random walks on graphs Hidden Markov Models and prediction

Algorithms

Problem 1 There is an iterative algorithm for computing P[OT|M] which has complexity O(N2T).

O O O D C S P E P E what is probability? trace

JB IP RO LV

O D C S P E O D C S P E O D C S P E O D C S P E

Hidden Markov Model

0.85

probability

  • D. Menasche, E. de Souza e Silva
slide-97
SLIDE 97

70/82

Random walks on graphs Hidden Markov Models and prediction

Algorithms

Problem 2 The forward-backward algorithm solves a specific version of problem 2 computes P[qT = si|M, OT] and has complexity O(N2T)

O O O D C S P E P E at time T, what is the most probable state? trace

JB IP RO LV

O D C S P E O D C S P E O D C S P E O D C S P E

Hidden Markov Model

JB

sequence of states

  • D. Menasche, E. de Souza e Silva
slide-98
SLIDE 98

71/82

Random walks on graphs Hidden Markov Models and prediction

Algorithms

Problem 3 Procedure for adjusting the model parameters based on: the maximum likelihood estimation (MLE) method a technique derived from the Expectation-Maximization (EM) algorithm known as Baum-Welch iterative procedure to obtain a local maximum for the likelihood function.

what are best model parameters? O O O D C S P P E P P E trace JB IP RO LV

O D C S P E O D C S P E O D C S P E O D C S P E

Hidden Markov Model ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?

  • D. Menasche, E. de Souza e Silva
slide-99
SLIDE 99

72/82

Random walks on graphs Hidden Markov Models and prediction

Algorithms

Problem 4 Basically this is a classification problem and one simply applies Bayes Theorem.

O O O D C S P P E P P E

JB IP RO LV

O D C S P E O D C S P E O D C S P E O D C S P E

what is most likely model?

JB IP RO LV

O D C S P E O D C S P E O D C S P E O D C S P E

  • D. Menasche, E. de Souza e Silva
slide-100
SLIDE 100

73/82

Random walks on graphs Hidden Markov Models and prediction

Examples in Performance Evaluation

Traffic Modeling

Build accurate traffic models for the packet flow generated by different applications such as SMTP , HTTP , network games, instant messaging.

Observable data: packet inter-arrival time and packet size. Application: traffic generators, capacity planning.

Traffic models for aggregate traffic.

Application: traffic generators, capacity planning.

Traffic classification.

Application: Identifying distinct applications by observing traffic. This is an example of Problem 4.

  • D. Menasche, E. de Souza e Silva
slide-101
SLIDE 101

73/82

Random walks on graphs Hidden Markov Models and prediction

Examples in Performance Evaluation

Traffic Modeling

Build accurate traffic models for the packet flow generated by different applications such as SMTP , HTTP , network games, instant messaging.

Observable data: packet inter-arrival time and packet size. Application: traffic generators, capacity planning.

Traffic models for aggregate traffic.

Application: traffic generators, capacity planning.

Traffic classification.

Application: Identifying distinct applications by observing traffic. This is an example of Problem 4.

  • D. Menasche, E. de Souza e Silva
slide-102
SLIDE 102

73/82

Random walks on graphs Hidden Markov Models and prediction

Examples in Performance Evaluation

Traffic Modeling

Build accurate traffic models for the packet flow generated by different applications such as SMTP , HTTP , network games, instant messaging.

Observable data: packet inter-arrival time and packet size. Application: traffic generators, capacity planning.

Traffic models for aggregate traffic.

Application: traffic generators, capacity planning.

Traffic classification.

Application: Identifying distinct applications by observing traffic. This is an example of Problem 4.

  • D. Menasche, E. de Souza e Silva
slide-103
SLIDE 103

74/82

Random walks on graphs Hidden Markov Models and prediction

Examples in Performance Evaluation

Loss process in a channel

Objective: model the loss process of an communication channel. Application: include model in simulation studies. Hierarchical HMMs have also been used.

  • D. Menasche, E. de Souza e Silva
slide-104
SLIDE 104

74/82

Random walks on graphs Hidden Markov Models and prediction

Examples in Performance Evaluation

Loss process in a channel

Objective: model the loss process of an communication channel. Application: include model in simulation studies. Hierarchical HMMs have also been used.

  • D. Menasche, E. de Souza e Silva
slide-105
SLIDE 105

74/82

Random walks on graphs Hidden Markov Models and prediction

Examples in Performance Evaluation

Loss process in a channel

Objective: model the loss process of an communication channel. Application: include model in simulation studies. Hierarchical HMMs have also been used.

  • D. Menasche, E. de Souza e Silva
slide-106
SLIDE 106

75/82

Random walks on graphs Hidden Markov Models and prediction

Examples in Performance Evaluation

Predictors of events

Use HMMs as predictors of events in the future. Applications:

Select a path from two or more distinct paths to communicate with the application peer. Estimate future channel loss statistics to better adapt their encoding algorithms.

  • D. Menasche, E. de Souza e Silva
slide-107
SLIDE 107

75/82

Random walks on graphs Hidden Markov Models and prediction

Examples in Performance Evaluation

Predictors of events

Use HMMs as predictors of events in the future. Applications:

Select a path from two or more distinct paths to communicate with the application peer. Estimate future channel loss statistics to better adapt their encoding algorithms.

  • D. Menasche, E. de Souza e Silva
slide-108
SLIDE 108

75/82

Random walks on graphs Hidden Markov Models and prediction

Examples in Performance Evaluation

Predictors of events

Use HMMs as predictors of events in the future. Applications:

Select a path from two or more distinct paths to communicate with the application peer. Estimate future channel loss statistics to better adapt their encoding algorithms.

  • D. Menasche, E. de Souza e Silva
slide-109
SLIDE 109

76/82

Random walks on graphs Hidden Markov Models and prediction

Examples in Performance Evaluation

Predictors of events H F T τ Measure prediction Model training

Prediction instant History sample Prediction window Training instant Training sample Training intervals

τ τ ψ ψ ψ ψ ψ

Prediction intervals

  • D. Menasche, E. de Souza e Silva
slide-110
SLIDE 110

77/82

Random walks on graphs Hidden Markov Models and prediction

Examples in Performance Evaluation

Workload Generation

Objective: generate synthetic workload at the application level, instead of the packet level. For anomaly detection, model is trained using a sequence of requests made by normal web users. Workload at the application level may involve different time scales.

Example: ON state, browser sends requests for a page and its in-line objects; OFF state, user reads the page contents Hierarchical HMMs are useful tools to represent different time scales.

  • D. Menasche, E. de Souza e Silva
slide-111
SLIDE 111

77/82

Random walks on graphs Hidden Markov Models and prediction

Examples in Performance Evaluation

Workload Generation

Objective: generate synthetic workload at the application level, instead of the packet level. For anomaly detection, model is trained using a sequence of requests made by normal web users. Workload at the application level may involve different time scales.

Example: ON state, browser sends requests for a page and its in-line objects; OFF state, user reads the page contents Hierarchical HMMs are useful tools to represent different time scales.

  • D. Menasche, E. de Souza e Silva
slide-112
SLIDE 112

77/82

Random walks on graphs Hidden Markov Models and prediction

Examples in Performance Evaluation

Workload Generation

Objective: generate synthetic workload at the application level, instead of the packet level. For anomaly detection, model is trained using a sequence of requests made by normal web users. Workload at the application level may involve different time scales.

Example: ON state, browser sends requests for a page and its in-line objects; OFF state, user reads the page contents Hierarchical HMMs are useful tools to represent different time scales.

  • D. Menasche, E. de Souza e Silva
slide-113
SLIDE 113

78/82

Random walks on graphs Hidden Markov Models and prediction

Examples in Performance Evaluation

Workload Generation

Student behavioral model accessing a video-lecture. Observed symbols are play, jump forward, rewind, pause, end of session, next slide. Goal: generate a synthetic workload.

  • D. Menasche, E. de Souza e Silva
slide-114
SLIDE 114

78/82

Random walks on graphs Hidden Markov Models and prediction

Examples in Performance Evaluation

Workload Generation

Student behavioral model accessing a video-lecture. Observed symbols are play, jump forward, rewind, pause, end of session, next slide. Goal: generate a synthetic workload.

  • D. Menasche, E. de Souza e Silva
slide-115
SLIDE 115

79/82

Random walks on graphs Hidden Markov Models and prediction

A Tool

Tangram-II

Break for advertisement from authors Tangram-II has a specific module that allows users to work with regular or hierarchical HMMs Supports three different types of HHMM From data trace, user can, for instance, train the model, or calculate the log likelihood of the observation sample Tool implements a forecast algorithm

  • D. Menasche, E. de Souza e Silva
slide-116
SLIDE 116

79/82

Random walks on graphs Hidden Markov Models and prediction

A Tool

Tangram-II

Break for advertisement from authors Tangram-II has a specific module that allows users to work with regular or hierarchical HMMs Supports three different types of HHMM From data trace, user can, for instance, train the model, or calculate the log likelihood of the observation sample Tool implements a forecast algorithm

  • D. Menasche, E. de Souza e Silva
slide-117
SLIDE 117

79/82

Random walks on graphs Hidden Markov Models and prediction

A Tool

Tangram-II

Break for advertisement from authors Tangram-II has a specific module that allows users to work with regular or hierarchical HMMs Supports three different types of HHMM From data trace, user can, for instance, train the model, or calculate the log likelihood of the observation sample Tool implements a forecast algorithm

  • D. Menasche, E. de Souza e Silva
slide-118
SLIDE 118

79/82

Random walks on graphs Hidden Markov Models and prediction

A Tool

Tangram-II

Break for advertisement from authors Tangram-II has a specific module that allows users to work with regular or hierarchical HMMs Supports three different types of HHMM From data trace, user can, for instance, train the model, or calculate the log likelihood of the observation sample Tool implements a forecast algorithm

  • D. Menasche, E. de Souza e Silva
slide-119
SLIDE 119

79/82

Random walks on graphs Hidden Markov Models and prediction

A Tool

Tangram-II

Break for advertisement from authors Tangram-II has a specific module that allows users to work with regular or hierarchical HMMs Supports three different types of HHMM From data trace, user can, for instance, train the model, or calculate the log likelihood of the observation sample Tool implements a forecast algorithm

  • D. Menasche, E. de Souza e Silva
slide-120
SLIDE 120

80/82

Random walks on graphs Hidden Markov Models and prediction

A Tool

Tangram-II

  • D. Menasche, E. de Souza e Silva
slide-121
SLIDE 121

81/82

Random walks on graphs Hidden Markov Models and prediction

The big picture

Markov chain based models traces random surfer model behavior information Random walk model Hidden Markov model recommendations, insights, ... + + inference content centrality user state

  • D. Menasche, E. de Souza e Silva
slide-122
SLIDE 122

82/82

Random walks on graphs Hidden Markov Models and prediction

References

http://www.land.ufrj.br/~sadoc/tutorial/

Thanks!

  • D. Menasche, E. de Souza e Silva