Sequence Data Mining: Techniques and Applications Sunita Sarawagi - - PDF document

sequence data mining techniques and applications
SMART_READER_LITE
LIVE PREVIEW

Sequence Data Mining: Techniques and Applications Sunita Sarawagi - - PDF document

Sequence Data Mining: Techniques and Applications Sunita Sarawagi IIT Bombay http://www.it.iitb.ac.in/~sunita What is a sequence? Ordered set of elements: s = a ,a ,..a


slide-1
SLIDE 1

1

✂✁☎✄ ✁✝✆✞✁✠✟☛✡

Sequence Data Mining: Techniques and Applications

Sunita Sarawagi IIT Bombay

http://www.it.iitb.ac.in/~sunita

✂✁☎✄ ✁✝✆✞✁✠✟☛✡

What is a sequence?

  • Ordered set of elements: s = a
☞ ,a✌ ,..a ✍
  • Each a

could be

– Categorical: domain a finite set of symbols Σ, |Σ|=m – Numerical – Multiple attributes

  • The length n of a sequence is not fixed
  • Order determined by time or position and could

be regular or irregular

slide-2
SLIDE 2

2

✏✂✑☎✒ ✑✝✓✞✑✠✔☛✕

Motivation

  • Several real-life mining applications on

sequence data

  • Classical applications

– Speech, language, handwritten are all complex sequences

  • Newer applications

– Bio-informatics: DNA and proteins – Telecommunication: Network alarms, network packet data – Retail data mining: Customer behavior

✏✂✑☎✒ ✑✝✓✞✑✠✔☛✕

Outline

  • Three case studies

– Intrusion detection – Information Extraction – Bio-informatics: protein classification

  • Sequence mining operators
  • Approaches to sequence mining
  • Conclusions and future work
slide-3
SLIDE 3

3

✖✂✗☎✘ ✗✝✙✞✗✠✚☛✛

Case study: intrusion detection

  • Intrusions could be detected at
✜✣✢✥✤✧✦✩★✫✪✭✬ ✮✧✯✰✮✱✬✳✲✂✴✶✵✫✵✫✴✧✷✧✸✺✹✼✻✱✽✿✾❁❀❃❂ ✯✧❂ ✬ ✮✧❄✧✮✧❅❆✾✧❀❇✻✧❄❈❀❇✴❈❉❊✹✿✬ ❂ ✸✺✮❋✬ ✾❁❀❍●✩✹✰✮❈✽✰❅❈❉✿✴❈❂ ✬ ■ ❏✣❑✥✮✶✵☎▲▼✻❈❀❍✸✺◆✭✬ ✮✶✯✰✮✱✬✺✲✂❅✧✮❖✽❁❂ ✴✱✬ ◆❃✻✶P❇◆❍✹✩✮❈❀◗✯✧❂ ✷✩✮❘✴✶✵✫✵✫✴❁✷✶✸❙✹✧●✶✾✰✻✱❀✫✵❚◆✂✹✩✷✰✴❈✽✰✹❁●✩✮✶✵◗✷❯■ ❱❋❲❨❳❬❩✧❭☛❪❯❫✰❴✳❵❜❛
  • Method
❏❞❝❡❂ ❄❈✽❯✴✧✵✂❢✧❀❇✮✧◆✭❣✰✴✧✹✰✮✧❅❤✲✐❉❊✴✧✵✫✷✧❥✼✹✧❂ ❄✱✽✰✴✶✵❃❢✧❀❚✮❦✻✶P❧✾✧❀❇✮♠✯❁❂ ✻❈❢✰✹✼✴✶✵✫✵❇✴✧✷✶✸❙✹✩■ ❱♦♥q♣❯r✰r✺s❙t♠❪❜✉❙t◗✉✩♥✈t✧r✺✉✞✇②① r✺t❚③❇❫❙❛✺① s♠r✺❛ ❏⑤④⑥✽✰✻❈❉✿✴❈✬ ⑦✩◆⑧❣♠✴✧✹✩✮✧❅❤✲⑨❉❊✻✧❅✧✮✱✬❜✽✰✻❈❀✐❉❊✴❈✬✩❢✰✹✩✴✧❄❁✮✿✴✱✽✰❅❘❅✧✮✶✵❚✮✧✷❜✵❁❅❁✮✶✯❁❂ ✴✧✵❃❂ ✻❈✽❯■
  • Automatic Vs Manual:
❏❶⑩❷✴❈✽✧❢♠✴❈✬ ❸ ❱❺❹⑥① ❻❯❼✺t✶❴▼① ❛❽❛⑥❵❜♣✺t☎t◗✉♠③❇r✺❛✩❾✺❴⑥♣❜❿⑥r❜s✺t❯✉✺➀✺s❯➁ ➀q✉▼♣✩❛⑥r✺s❯③❇❴⑥♣❯➁q❫✺❛✞♣✩❻✩✉✼❵✺♣❙t☎t✫✉❯③❚r ❛❙➁ sq✇➂➁ ❿➃❪♠③❇① ➄☎t◗❛✩➅ ❏⑤④⑥❢✩✵❚✻❈❉❊✴✶✵❚✮✧❅✱❸ ❱➇➆❖❛q✉➈❼✰① ❛✈t✫s❯③❚① ♥✞♣❯➁❽♣✰❫❙❪♠① t✩t❚③✫♣❯① ➁ ❛➂♣❯r❙❪✳♣➈➁ ✉✩♣♠③❇r❜① r❜❻➉♣♠➁ ❻✩s♠③❇① t✫❼✩❴ ❱❺❹➃♣✩❿➉r❙s❙t✧❵✩③✫s❜➀❙① ❪✩✉⑥➄❚❫✩➁ ➁➊♥qs❙➀✺✉❯③✫♣✰❻❜✉ ➋✂➌☎➍ ➌✝➎✞➌✠➏☛➐

Host-level attacks on privileged programs

  • Attacks exploit a loophole in the program

to do illegal actions

➑✣➒❈➓❯➔✱→❤➣✧↔ ↕✱➙✼↕♠➓❁➣✧↔ ➛❈➜ ➝❖➞❁➟✩➠❚➠❚↕✱➡❨➛♠➢✰↕❈➡❇➤☛➠❍↔ ➛✰➥▼➦➈➝❚➛➧➡❍➟✧➨❦➟✰➦✰↕❈➡❚➤ ➩ ➛❁➫✧↕
  • What to monitor of an executing

privileged program to detect attacks?

➭✰➯❨➲❯➳ ➵✶➸ ➲✶➲❨➺ ➵✶➸✰➻✶➼✶➻ ➽✶➽ ➼ ➯ ➲❯➾❁➲❁➚❜➪❨➲ ➶ ➭❁➚ ➻❨➵ ➶ ➭❁➚ ➻❨➵ ➚ ➵ ➭ ➸ ➲ ➲❯➾❁➲❁➚❜➪❨➲ ➚ ➵ ➭ ➸ ➲ ➹ ➳ ➵ ➶ ➳❈➺
  • Sequence of system calls
➑✣➘ ➴➬➷✩➮❘➱✰✃✶❐❁❒✶❮Ï❰❈Ð Ð✩Ñ✰❒✧➱✰➱✶Ò Ó❁Ð ✃❦➱❜Ô❜➱✩❐❚✃✱Õ×Ö✰❰❈Ð Ð ➱❷ØÚÙ✩Û✧Û
  • Mining problem: given traces of previous

normal execution, monitor a new execution and flag attack or normal

  • Challenge: is it possible to do this given

widely varying normal conditions?

slide-4
SLIDE 4

4

Ü✂Ý☎Þ Ý✝ß✞Ý✠à☛á

Bio-informatics

  • Many recent advances in sequence analysis

due to bio-informatics

  • Two main kinds of sequences:

– Genes:

â♦ãåä✧æ❖ç✰ä✱è❯é✩ä❦ê✶ë✧ì♦í✰ê❁î✩î✶ï ð❁ñ ä❋è❁ç✰é✧ñ ä❁ê✶ò❍ï ó❨ä✧î✧ô❨õ ö➬õ ÷✧ì ø❤ù➬ù➃ú➂ûåü➉ù➂úýú➂ûÚü✼ü✼ü✼úýúýú❬ù✥ù➂ûÚúýú

– proteins:

ø♦ãåä✧æ❖ç✰ä✱è❯é✩ä❦ê✶ë❁þÏÿ❤í❯ê✧î✩î✧ï ð❨ñ ä✁✄✂❤ï è❯ê✆☎✝❁é✶ï ó✧î✧ô❘õ ö➬õ ÷❁þ❨ÿ ø✟✞✩ä✱è✡✠✶ò☞☛❷ê✶ë❨î✰ä✧æ❈ç✰ä❈è❯é✩ä❆è✍✌✡✏✎✐ï ä❁î✿ð✰ä✶ò✒✑▼ä✧ä❖è✔✓✩ÿ❁ÿ✶î➈ò❚ê✕✓✩ÿ❈ô ÿ❨ÿ✧ÿ
  • Sequence analysis in bio-informatics: rich and

varied, we will concentrate on one problem

– Protein family classification

✖✘✗✒✙ ✗✛✚✜✗✣✢✥✤

Protein family classification

  • Protein families characterized by common
  • ccurrence of a few scattered amino acids in a

background of other unrelated symbol

  • Example: three aligned sequences of a family
slide-5
SLIDE 5

5

✦✘✧✒★ ✧✛✩✜✧✣✪✥✫

Information extraction

Sequence: text string with elements as words

✬✮✭✏✯✱✰✳✲✔✴✆✵ ✶✳✷✹✸✻✺✼✺✳✽✾✶✆✿✡✿❀✶✆✿✼❁❃❂✆❄ ❂✁✽✾✶❆❅❀❇✏✽❈✺✆✿

Mining problem: Given a set of tags (labels) e.g. address fields, classify parts of the sequence to different labels

❉❋❊❍●❏■☞❑ ▲
  • ✜▼✻◆✹❑P❖
  • ❏❘✒❙✒❚❯❘
▲❯❱ ❲ ❊❯❳❨❚ ❩ ❖✝❑❬❳ ❭ ❘ ❪ ❫✡❴✱❵❜❛❞❝ ❡ ❡ ❢❏❝ ❣✐❤❦❥♠❧☞♥✆♦❀❤q♣r❥❍❤❃s❃❝ ❥t❣✼✉✈❝ ✇✱❤②①✹③✡④⑥⑤✆❝⑧⑦⑩⑨❃❶✁❷❸⑤✆❝❺❹✼❻❃❻❃❻✱❼❃❵ ❽✼❾✥❽✼❾✒❿➁➀r➂✱➃✄➄✒➅❸➀r➆➈➇✏➉➊❾❈❽✼❾✏➋➌➆☞➀✱➍✆➎✹➀r➆➈➇❋➏⑩❾✛➐⑥❾❞➑✱➒✜➓P➔r→✒→ ➇❋➏➣❾↕↔➊❾✏➙➊→ ➀r➆❨➅❀➇✳➛❸❾✾↔➊❾ ➏➊➜✏➆☞➝✄➄ ➎✡➅⑧➞❯➟❏➠r➠✳➡✼➢t❽❃➆☞➜❃➓✈➔r➄✣➂✁➀r➂✐➝➤↔✳➜✳→ ➥✆➔r➂❀➓➦➑❆➂✱➃✳➄✣➂✱➔❃➔r➆P➄✣➂✱➃➧➜❃➨➊↔➫➩❆➭❀➓♠➄✣→✒➄ ➒✡➄✒➂✱➃ ➯✄❽❃➲❋➳❆➄✒➂✕➲❞➔❃➀r➆❨→ ➍✁➐✻➂❆➵❀➍✆➝✳➆➈➜r➩✐➒➺➸t➆➈➃✼➀r➂✼➄ ➎➼➻⑩➔❃➝✄➄ ➀✁➛❸❾ ➐➾➽⑧➔r➆❬❾r➙➊➵✐➔✏➽✕❾✏↔✳➜✆➎✐❾ ➟r➟✱➚✆➇➣➟✡➪r➪❆➡➶➟✱➹❯➟✱➪r➪❆➡r➘✳❾ ➴➫➷❍➬↕➮✹➱❍✃ ❐❋❒❬❮ ✃ ❰✆Ï ➬✾Ð ❒ Ñ ➱❍➷✜✃❈Ò ❮ Ð Ó ➱❯Ð ➷✜Ô ❒ Õ ❮PÖ❯❒ × Ï ➬✣Ø Ù✘Ú✒Û Ú✛Ü✜Ú✣Ý✥Þ

Outline

  • Three case studies
  • Sequence mining operators

– Whole sequence classification – Partial sequence classification (Tagging) – Predicting next symbol of a sequence – Clustering sequences – Finding repeated patterns in a sequence

  • Approaches to sequence mining
  • Conclusion and future work
slide-6
SLIDE 6

6

ß✘à✒á à✛â✜à✣ã✥ä

Classification of whole sequences

Given:

– a set of classes C and – a number of example of instances in each class c,

train a model so that for an unseen sequence we can say to which class it belongs Example:

– Given a set of protein families, find family of new protein – Given a sequence of packets, predict session as intrusion or not – Given several utterances of a set of words, classify a new utterance to the right word

ß✘à✒á à✛â✜à✣ã✥ä

Existing methods of classification

  • Generative classifiers
  • Discriminatory classifiers
  • Distance based classifiers: (Nearest neighbor)
  • Kernel-based classifiers
slide-7
SLIDE 7

7

å✘æ✒ç æ✛è✜æ✣é✥ê

Generative models

  • For each class i,

– train a generative model M

ë

to maximize likelihood over all training sequences in the class i

  • Find Pr(c
ì ) as fraction of

training instances in class i

  • For new sequence x,

– find Pr(x|c

ë ) for each i

– choose i with largest value of Pr(x|c

ë )*P(c ë )

x

í➫î❨ï ð✄ñ ò✆ó ô❀õ✘í➫î❨ï❈ò✆ó ô í➫î❨ï ð✄ñ ò❸ö↕ô❀õ✘í➫î❨ï❈ò❸ö↕ô í➫î❨ï ð✄ñ ò❸÷↕ô❀õ✘í➫î❨ï❈ò❸÷↕ô

Need a generative model for sequence data

ø✘ù✒ú ù✛û✜ù✣ü✥ý

Discriminatory methods

  • Treat training data as points in

n-dimensional space

  • Create boundaries such that all

points in the same region are in the same class

  • Examples:

– Decision trees – Neural networks – Regression methods Need to embed sequence data in a fixed coordinate space

slide-8
SLIDE 8

8

þ✘ÿ✁ ÿ✄✂✜ÿ✆☎✞✝

Kernel-based classifiers

  • Define function K(x
✟ , x) that intuitively defines similarity

between two sequences and satisfies two properties

✠☛✡✌☞ ✍✎✍✑✏✓✒✔✒✖✕✘✗✚✙✚☞ ✛✜☞ ✢ ✕✣✢✥✤✧✦✩★✄✪✓✫ ✬✭✪✯✮✱✰✲✦✩★✄✪✘✬✭✪✓✫ ✮ ✠☛✡✌☞ ✍✜✳✓✴✵✍✘☞ ✗✶☞ ✷✓✕✜✸✹✕✘✺✶☞ ✻✼☞ ✗✽✕
  • Each class c computes f(x,c) = Σ w ✾
✿ K(x ✾ , x)+b ✿

where x

❀ ,

is a training sequence

  • Predicted class is c with highest value f(x,c)
  • Well-known kernel classifiers
❁☛❂❄❃✵❅❇❆❈❃✵❉✑❊✣❋✧❃✣● ❍✣■✹❏✧❑❇❆✼▲✵▼ ❅✹❉✑❉✘● ◆✶● ❃✣❆ ❖◗P❙❘✵❚✼❚✧❑❇❆✽❊✧❯❱❃✵▲✑❊✥❑❇❆❳❲✎❅✹▲✘■✹● ❋✧❃✵❉ ❖☛❨❄❅✵❩❬● ❅❬▼✧❭❬❅✵❉✵● ❉❪◆✶❘✵❋✧▲✑❊❫● ❑✣❋✭❉

Need to define similarity functions between sequences that also satisfy kernel properties

❴❫❵✁❛ ❵✄❜❝❵✆❞✞❡

Partial sequence classification (Tagging)

  • The tagging problem:

– Given:

❢✔❣✐❤✑❥✘❦✼❧✘♠✹❦✽♥✵♦✵❤q♣ ❢sr✉t✽♥❇✈ ✇✵✈ ✇✭♦q❥✓①✓♥❇②✔③✵④ ❥✵❤✎❧✘♠⑤❤✑❥✵⑥❇⑦✭❥❇✇✧⑧✑❥✹❤✎❤✘⑨✧❧✭⑩✖✈ ✇✧♦✜❦✚⑨✭❥s❶✵t❈❥✵♥❇❷✭⑦✵③✲❧✘♠ ❦✚⑨✧❥q❤✑❥✵⑥✣⑦✭❥❇✇✧⑧✧❥❸✈ ✇✧❦✥❧✜❦✚⑨✭❥q❤✑❥✘❦✼❧✘♠✹❦✽♥✵♦✵❤

– Learn to breakup a sequence into tags – (classification of parts of sequences)

  • Examples:

– Text segmentation

❢✌❹❺t❈❥✵♥❇❷❻❤✧❥✵⑥❇⑦✧❥✣✇✧⑧✑❥q❧✘♠✘⑩✱❧❇t✽❼✹❤❪♠✥❧❇t✚②✔✈ ✇✧♦q♥❇✇✲♥✵❼✵❼❇t❈❥✵❤✑❤✎❤✑❦❫t✶✈ ✇✧♦❸✈ ✇✭❦✥❧ ❤✵⑦✵❶✵③✧♥❇t✽❦✽❤✜④ ✈ ❷❽❥s❾❄❧✹♥✵❼✣❿✧➀✱✈ ❦✽➁✜✇✧♥✣②✖❥q❥✘❦✥⑧

– Continuous speech recognition

❢✌➂ ❼✵❥❇✇✑❦✶✈ ♠✽➁❱⑩❪❧✣t✽❼✵❤✜✈ ✇➃⑧✑❧✣✇✑❦✶✈ ✇✵⑦✓❧❇⑦✭❤✎❤✵③✧❥✵❥✵⑧✵⑨
slide-9
SLIDE 9

9

➄❫➅✁➆ ➅✄➇❝➅✆➈✞➉

Approaches used for tagging

  • Rule-based local models
  • Adapt state-based generative models

– Separate model per tag – Combined model with states labeled with tags

➊✌➋❄➌❇➍✶➎✖➏❇➐➒➑❪➓❇➔✧➓❇➍❈➏✘→✚➣ ↔✭➓↕➎✜➌✵➙✵➓❇➐ ➛ ➊➝➜❙➞✧➓✹➟✵➣ ➏✣➐✯➠➡➌❇➔✧➙✣➣ →✶➣ ➌✣➔✭➏✣➐✑➎✖➌✵➙✵➓❇➐ ➛✲➢✚➠➤➌❇➐ ➐ ➣ ➔✧➛➃➥✓➦⑤➧ ➨❫➩✁➫ ➩✄➭❝➩✆➯✞➲

Sequence clustering

  • Given a set of sequences, create groups such

that similar sequences in the same group

  • Three kinds of clustering algorithms

– Distance-based:

➊✌➳❬➵➸➎✖➓✵➏❇➔✭➛ ➊q➺➤➻❇➼✶➽ ➾❇➚✧➪✜➶✼➽ ➹❇➼❈➻❇➼❈➘✘➶✼➽ ➘✧➻❇➴✯➻❇➴ ➷✵➾✣➼✚➽ ➬✶➶✵➮✲➪

– Model-based algorithms

➱✌✃❇❐✼❒✧➹✵➘❮➬✽➻✘➬✶➽ ➾❇❰ÐÏ➃➻✓❐✹➽ ➮✔➽ Ñ✑➻✘➬✶➽ ➾❬❰➃➻✣➴ ➷✵➾❇➼Ò➽ ➬✚➶✵➮

– Density-based algorithms Need similarity function Need generative models Need dimensional embedding

slide-10
SLIDE 10

10

Ó❫Ô✁Õ Ô✄Ö❝Ô✆×✞Ø

Outline

ÙÛÚ❄Ü✵Ý❈Þ✵Þqß✑à✹á✑Þqá❮â✚ã✧ä❇å Þ✵á Ùçæ❺Þ✵è✣ã✭Þ✣é✧ß✧Þ↕ê↕å é✵å é✧ëÐì❇í✭Þ❇Ý✽à✵â✥ì❇Ý❈á
  • Approaches to sequence mining: Three

primitives

î☛ï ê✔ð✧Þ✹ä✜á✧Þ✵è❇ã✧Þ✣é✧ß✑Þ❸å é✲à✜ñ✚å ò✭Þ✹äÐä❇å ê✜Þ❇é✧á✘å ì❇é✧à❬ó❝á✘í✓à✵ß✑Þ ô↕õ❺ö öø÷❝ù✭ú✯û➒ü✭ú➒ý✥þ ù✭ú❽ÿ✭ö✁✥ü❮÷❝ù✂☎✄✝✆❻þ ú✑þ ú✟✞❱ý✥ü✑÷✟✠❮ú✑þ ✡☞☛❽ü☞✌✎✍➤þ ö öøÿ✑✏✒✏✑ö ✓ ✔✖✕✘✗ ✙✛✚✢✜✤✣☞✥✒✦★✧✑✦✩✚✫✪✝✦✬✦✭✣✮✚✫✪✰✯✱✙☞✦✬✲✭✳✂✦✤✣☞✥☞✦✬✙ ô✵✴❇ü☞✡✑☛❽ü✭ú❽÷✯ü ÷❽ö ÿ✒✌✶✌➒þ ✷✥þ ÷✯ÿ➒ý✽þ ù✭ú✂✸✁✴✺✹✼✻✐ÿ✭ú✛✄✾✽✼✽ ô✵✿ ö ☛✛✌øý✥ü✑✽þ ú✛✞❀✌✯ü✛✡✑☛❮ü✧ú❽÷❝ü✒✌☞✸❁✄✭þ ✌ ý ÿ✓ú➒÷ ü✒❂❄❃❽ÿ✒✌❝ü✒✄❻ÿ✑✏✒✏☞ ù✑ÿ✑÷❅✠ ✔❇❆✝✦✤✣☞✦✭❈✢✜✩✚❄✗ ❉☞✦❋❊●✯✬❍✬✦✭■ ✙✝❏❑✯✤❈▲✙☞✦✬✲✤✳☞✦✭✣☞✥✒✦ ô✵✴❇ü☞✡✑☛❽ü✭ú❽÷✯ü ÷❽ö ÿ✒✌✶✌➒þ ✷✥þ ÷✯ÿ➒ý✽þ ù✭ú✂✸▼✍✘✠❽ù✓ö ü❻ÿ✭ú✟✄✝✏❽ÿ✂✆ý✽þ ÿ✭ö ô✵✿ ö ☛✛✌øý✥ü✑✽þ ú✛✞❀✌✯ü✛✡✑☛❮ü✧ú❽÷❝ü✒✌☞✸✁✆➡ù✒✄✑ü✭ö ❂◆❃❽ÿ✒✌✯ü✛✄✱ÿ☞✏✒✏☞✥ù❮ÿ✑÷✟✠ ❖◗P ✯✤✣☞✥✺■ ✳✑✙✩✗ ✯✤✣✾✜✭✣☞❍✱❏❘✳✒✚❄✳✬❈❑✦✾✪❙✯✭❈❄❚ ❯◆❱✫❲ ❱❨❳✶❱❬❩❪❭

Embedding sequences in fixed dimensional space

  • Extract aggregate features

– Real-valued elements: Fourier coefficients, Wavelet coefficients, Auto-regressive coefficients – Categorical data: number of symbol changes

  • Ignore order, each symbol a dimension

– extensively used in text classification and clustering

  • Sliding window techniques (k: window size)

– Define a coordinate for each possible k-gram α

❫❵❴❜❛❄❝❄❞✮❡☞❢✬❣❘❢✤❤✢✐❜❥ ❦✂❧✩❝✢♠★❥ ♥✱❦✬♦✺♣rq☞♠✤❤▲❢✬s✺❝❄❥ ♣●♠✬♥✮t✉❥ ❦✮♥☞♠✬✈✤♦✑♠✭❦☞❡✒♠ ✇✵①③②✑④ ♣●⑤⑥♣r❥ ♥✩♣⑦❧✩❝⑧❡✬❞✾♥✛❡✒❢✤❤❑♠✭⑨✮t❜❣❄❝◆❞⑦❡☞❢✬❣❘❢✤❤✢✐✭❥ ❦☞❧✬❝✢♠❋❥ ♥✱❦✺♦✬♣rq✑♠✤❤▲❢✂s ② ❣ ⑩ ❤❑❧✤♣⑦♥✱❥ ❦✾♥✒♠✬✈✤♦✑♠✭❦☞❡☞♠✾❶●❥ ❝❄❞❷♣❸♣r❥ ♥✂♣⑦❧✩❝✢❡✩❞✑♠✬♥❹❶⑦❥ ❝❘❞⑦t

– Define a coordinate for each of the k-positions

slide-11
SLIDE 11

11

❺◆❻✫❼ ❻❨❽✶❻❬❾❪❿

Sliding window examples

➀ ➀ ➀ ➀ ➀ ➀ ➀ ➀ ➀ ➀ ➀ ➀ ➁ ➀ ➀ ➀ ➀ ➀ ➀ ➀ ➀ ➀ ➀ ➀ ➀ ➂ ➃ ➄ ➅ ➃ ➃ ➄ ➆ ➇ ➈ ➉ ➊ ➋ ➌ ➍☞➎▲➏✑➐ ➑✩➒ ➏✩➏▲➓ ➔ ➍✺→✑➣ ➑ ↔✩↔❜↕☞➎ ➏✑➙✺➏✺→✛➛▲➏ ➔ ➍✺→✑➣ ➑ ➔ ➍✺→✑➣ ➑ ➍☞➎▲➏✑➐ ➏✑➙✺➏✺→✛➛▲➏ → ➑ ➍ ➒ ➏ ↔✩↔❜↕☞➎

One symbol per column Sliding window: window-size 3

➜ ➜ ➜ ➜ ➜ ➜ ➜ ➜ ➜ ➜ ➜ ➜ ➝ ➜ ➜ ➜ ➜ ➜ ➜ ➜ ➜ ➜ ➜ ➜ ➜ ➞ ➟ ➠ ➟ ➠ ➟ ➡ ➢✫➢✫➢ ➤ ➥ ➦ ➤ ➥ ➧ ➨ ➤ ➥ ➩ ➤ ➥ ➥ ➨ ➧

One row per trace

➧ ➦ ➥ ➡ ➢✫➢ ➢❬➢ ➢❬➢ ➡ ➦ ➩ ➧ ➡ ➦ ➥ ➤ ➡ ➥ ➤ ➨ ➡ ➫⑥➭ ➫➲➯ ➫➳➟

Multiple rows per trace

➜ ➜ ➜ ➜ ➜ ➜ ➜ ➜ ➜ ➜ ➜ ➜ ➝ ➜ ➜ ➜ ➜ ➜ ➜ ➜ ➜ ➜ ➜ ➜ ➜ ➞ ➟ ➠ ➟ ➟ ➯ ➡ ➢✫➢✫➢ ➤ ➥ ➦ ➤ ➥ ➧ ➨ ➤ ➥ ➩ ➤ ➥ ➥ ➨ ➧

mis-match scores: m=1

➵◆➸✫➺ ➸❨➻✶➸❬➼❪➽

Detecting attacks on privileged programs

  • Short sequences of system calls made during

normal execution of system calls are very consistent, yet different from the sequences of its abnormal executions

  • Each execution a trace of system calls:

– ignore online traces for the moment

  • Two approaches

– STIDE

➾➪➚❙➶❑➹✬➘✩➴✢➹❷➷✭➬ ➮✛➴❄➬ ➱✭✃☞➘✤➶❑❐✮➱✩❒❜❮✬✃✺➬ ❰✭❮☞➹★Ï✁Ð✢Ñ⑦➬ ✃✑➷✺➱☞Ñ✰Ò✱➬ ✃❷✃☞➱✤➶❘Ó●➘✤Ô❁➴❄➶✢➘✺➮✒➹✬Ò✺Õ ➮☞➱✤❮✬✃✒➴✂Ñ✾Ö✂➘✩➴✬❒❘➶❑➘✬➮✛➴❄➬ ➱✭✃✮➱✺➮✒➮✩❮✺➶❜➬ ✃×✃✑➹☞ÑØ➴❄➶❑➘✬➮✒➹✬Ò✮➘✤✃✂➷✱➴◆Ö✬➶❑➹✬Ò✬Ö☞➱✭Ô ➷✭Ù

– IDS

➾ ✃☞➹✩Ú☞➴❘Ù☎Ù⑧Ù
slide-12
SLIDE 12

12

Û◆Ü✫Ý Ü❨Þ✶Ü❬ß❪à

Classification models on k-grams trace data

  • When both normal

and abnormal data available

– class label = normal/abnormal:

  • When only normal

trace,

– class-label=k-th system call

áãâ⑧ä✩å▼æ✒ç✘è é✶ê☎ë✁ìíì⑥ê☎ëãî☞ï✶ê✢ì ð❀ñ✝ò✒ò❙ò✒ò❙ð✱ó✶ô☞õ❹ò☞ò ö❁÷☞ø✟ù❄ú ë❁ê☎û ü❀ü✰ü ð ü✛ý óãþ ü óãþ✒ð ö ëÿî ÷✑ø✟ù◆ú ëãê☎û
  • ✁✄✂✆☎✝☎✟✞✡✠✟☛✌☞✍☎✟✎✑✏
✒✔✓✝✕✗✖✘✖✙✓✝✕✑✚✌✛✜✓✝✖ ✢✤✣✤✥✌✥✦✥✧✥✦✢✩★✫✪✌✬ ✭✜✥✧✥✧✮ ✯✤✯✦✯ ✢ ✯✍✰ ★✗✱ ✯ ✭✆★✫✱✧✢✍✮ ✲ ✲

Learn rules to predict class-label [RIPPER]

✳✵✴✷✶ ✴✹✸✑✴✻✺✽✼

Examples of output RIPPER rules

  • Both-traces:
✾❀✿ ❁❃❂❅❄❇❆❉❈❋❊❇●■❍❑❏▲❍▲❂✟❆❋▼❖◆❇P❘◗ ◗▲✿ ❍■❙✗❚✻❯ ❱❳❲✧❨❩P❋❊❇●❉❂✵❄✧❆■❬✌❂✵❄❳✿ ❍■❙✗❚✻❭❫❪✆❴❇❲❘❵❑❂❅❄❇❆❘❊❛❂❅❄✍❆ ❍▲❆❃❜❘❝❇❆❘❊❇◆❇❆❞✿ ❍❛❡❢❊❇❣❘❤❫▼✐P❘◗ ❥ ✾❀✿ ❁❃❂❅❄❇❆❉❦✆❂✵❄❩❍▲❏▲❍❑❂✝❆❘▼❧◆▲P❘◗ ◗❑✿ ❍❛♠ ❨✍❲✆❲❃♥❛P❘❊❇●❉❂✵❄✍❆■❬✌❂❅❄■✿ ❍♦❨❇❯ ♣q❙✫❲❃❴✌❵❑❂❅❄❇❆❋❊❛❂❅❄❇❆ ❍▲❆❃❜❘❝❇❆❘❊❇◆❇❆❞✿ ❍❛❡❢❊❇❣❘❤❫▼✐P❘◗ ❥ ✾sr ✾❀✿ ❁q❊❇❣❘❊✍❆❉❣✆❁❃❂❅❄❇❆■P❘t✍❣✧✉✧❆❋❵❑❂❅❄❇❆❋❊❛❂❅❄❇❆■❍▲❆✆❜❋❝✍❆❘❊❇◆❇❆❞✿ ❍❛❡❫P❘t✆❊❇❣❋❤❅▼✐P❋◗ ❥
  • Only-normal:
✾❀✿ ❁❃❂❅❄❇❆❳✈✇❤✝●■❍▲❏❑❍❑❂①❆❘▼❖◆▲P❘◗ ◗▲✿ ❍♦♠ ❨✌❚✻❪②❚✌P❘❊❇●❉❂❅❄✍❆✐③❃❂❅❄❳✿ ❍❉④⑤❭❫❯ ❚⑥❲❘❵❑❂❅❄✍❆❋❊♦❂❅❄✍❆❳❬✧❂❅❄❳✿ ❍ ❨✧❚⑥❪❃❚ ✾❀⑦ ⑧❃⑨❅⑩❇❶❸❷✜❹❑⑨❃❹▲❺❑❹❑⑨✝❶❘❻❧❼▲❽❘❾ ❾▲⑦ ❹♦❿❇➀ ➁❃➂❃➃ ➄✆➅▲➆❛❽❘➇❇➈❉⑨❫⑩❇❶✐➉❃⑨❫⑩■⑦ ❹❛➂❃➀ ➊②➋❋➌❑⑨✵⑩✍❶❋➇❛⑨✵⑩✍❶■➍✌⑨✵⑩ ⑦ ❹♦❿✍➎❃➏✷❿✍➄✆➅▲➆❇➄❑➐➑➏ ➒s➓ ➒❀⑦ ⑧q➇❇➔❘➇✍❶❉➔✆⑧❃⑨❅⑩❇❶■❽❘→✍➔✧➣✧❶❋➌❑⑨❅⑩❇❶❋➇❛⑨❅⑩❇❶❳➍✧⑨❅⑩❳⑦ ❹❩➄❑➐➑➎✌➊
slide-13
SLIDE 13

13

Experimental results on sendmail

↔✟↕✘➙✧➛✗➜✑➝ ➞➠➟✌➡ ➢②➤➥➟✆➦❑↕➨➧➩➙✧➡ ➫➭➞✄➯❑➲ ➳➵➳✘➸✑➺✆➻✘➼ ➼✫➽❃➾ ➚ ➽✍➪✇➾ ➪ ➳➵➳✘➸✑➺✆➻❅➪ ➼✫➽❃➾ ➶ ➽✧➹②➾ ➘ ➳➵➳✘➸✑➺✆➻❫➽ ➼✫➽❃➾ ➶ ➽✧➹②➾ ➘ ➳➥➴❃➳➵➷✻➬✧➮✧➻✽➱❐✃✫❒❉➬▲❮❅✃✗➻✘➼ ➼✧➼❑➾ ➚ ➪q➼▲➾ ➪ ➳➥➴❃➳➵➷✻➬✧➮✧➻✽➱❐✃✫❒❉➬▲❮❅✃✗➻❅➪ ❰✌➾ ➘ ➼✗➚❃➾ ➶ ➳➥➴❃➳➵➷✻➬✧➮✧➻❅➷⑥➬✌➸✗Ï✫➷✝➻✘➼ ➶②➾✝➼ ➼✧➼▲➾✝➼ ➳➥➴❃➳➵➷✻➬✧➮✧➻❅➷⑥➬✌➸✗Ï✫➷✝➻❅➪ ❰✌➾ ➹ ➼✗➚❃➾ Ð Ñ ✃✗➸✫➬ Ñ ✃✫➻✘➼ ➽②➾ Ð ➪✇➾✝➼ Ñ ✃✗➸✫➬ Ñ ✃✫➻❅➪ ➘✇➾ ➪ ➪✇➾ ➹ ➳❐❒Ò➚❇➶✧➚✍Ï ❰✌➾✝➼ ❰✌➾ ➹ ➳❐❒Ò➚❑Ó ❰✌➾ ➪ ➶②➾ ➚ ➳Ô✃✑Õ Ñ ❒❉Ï✜Ö✟➷ ➹②➾ ➶ ➹②➾✝➼
  • The output rule sets contain

~250 rules, each with 2 or 3 attribute tests

  • Score each trace by counting

fraction of mismatches and thresholding Summary: Only normal traces sufficient to detect intrusions

×✵Ø✷Ù Ø✹Ú✑Ø✻Û✽Ü

More realistic experiments

  • Different programs need different thresholds
  • Simple methods [stide] work as well
  • Results sensitive to window size
  • Is it possible to do better with sequence specific

methods?

0.0 10 0.00008 20 xlock 0.0 10 0.0019 20 named 0.0265 4 0.0013 12 Site-2 lpr 0.0016 3 0.0 12 Site-1 lpr

ÝßÞ①à❘á â❇ã✆ä❢å❇æ✆â ç❅è✆é ã✆â è æ❘á ê ÝßÞ①à❘á â▲ã✆ä❢å❇æ✆â ç❅è✆é ã✆â è æ❋á ê ë➭ì✟íîíîï✙ë ðòñ ì✟ó➭ï
slide-14
SLIDE 14

14

ô✵õ✷ö õ✹÷✑õ✻ø✽ù

Outline

  • Three case studies
  • Sequence mining operators
  • Approaches to sequence mining: Three

primitives

– Embed sequence in a fixed dimensional space – Distance between two sequences – Generative models for sequence

úüûþý✆ÿ✁❇ý✄✂✆☎▲ý✝☎✟✞ ✠☛✡☞✡☛✌ ✍✎✌ ☎☞✠✟✏✎✌ ✑✒✂✔✓✖✕✘✗✙✑✒✞ ý✝✠✒✂✆✚✜✛✙✠✄✢✣✏✎✌ ✠✄✞ ú✥✤✦✞ ✙✡☞✏✟ý✒✢✎✌ ✂✆✧✝✡▲ý✆ÿ✄✍ý✒✂✙☎❇ý☛✡☛✓☛★✩✑☛✚✌ý✒✞ ✪✬✫✆✠☛✡▲ý✭✚✘✠✄✛✔✛☛✢✮✑☛✠☛☎✟✗
  • Conclusion and future work
✯✱✰✳✲ ✰✵✴✶✰✸✷✺✹

Modeling sequences

  • Most sequences are naturally generated and

may not follow a well-defined statistical model

  • Complete modeling not possible
  • Approximate modeling still possible in many

applications because

– Sequences have short-term memory – A partial aspect of the sequence might need to be modeled

slide-15
SLIDE 15

15

✻✱✼✳✽ ✼✵✾✶✼✸✿✺❀

Probabilistic models for sequences

  • Independent model
  • One-level dependence (Markov chains)
  • Fixed memory (Order-l markov chains)
  • Variable memory models
  • More complicated models
❁❃❂❅❄ ❆☛❆☛❇✁❈✘❉❋❊✒●✎❍❏■▲❑✝❉▼■✔❆✔❇✒◆ ❖ P✱◗✳❘ ◗✵❙✶◗✸❚✺❯
  • Model structure
❱❳❲❩❨✙❊✒●✮❊✒❬✩❇✟❭❪❇✒●✔❫✮■✒●✭❇✔❊☛❴✟❵❋❖☞❛✆❬❝❜☞■✒◆☞❄ ❈❋❞
  • Probability of a sequence s being generated

from the model

❱❡❇▲❢✆❊✒❬❝❨☛◆ ❇✁❣☛❤✐●✺❥❪❲❦❲❅❧♠❲♦♥ ♣ ❤q❥❪❲♦♥❦❤q❥❪❲♦♥✐❤q❥r❧s♥❦❤q❥❪❲♦♥ ♣ ❤q❥✮❲♦♥✺t♦✉q✈r✇s① ②④③✒⑤✮⑥ t⑧⑦ ③✒⑤✵⑨
  • Training transitions probability between states
⑩❃❶❸❷✟❹❪❷❻❺❽❼▼❾☞❿☛❹✔➀✟➁➂❾☞❿☛➃✄➄✙❿✒➅✙➆✙❿☛❾ ⑩➇✇❅➀✒➄☛➅✙❹✮✈✱❾▼➈➉❺✐①✬❼➊❹❪➀✟❹✣❷✄➋➌➅☛➄☛➍④➎✆❿✒➏✭➀✟➁✭❹✱➐ ➍✩❿☛❾▼❾✟➄✔➎✙❾➌❹r➏r➐ ➅✙➑✝❾ ❷✒➒☛➒✆❿✔❷✒➏✮❾✘➐ ➅▼❹r➏❪❷✁➐ ➅✭➐ ➅✆➑➂➓☛❷✟❹❪❷✩❺ ✉✐➏✱✈✺➔✁① ② ✇❅➀✒➄☛➅☞❹✺✈✱➔✜➈→❺➣①✭↔↕➋ ❿✒➅✙➑☛❹r➙✟✈✮❺✐①

Independent model

Pr(A) = 0.1 Pr(C) = 0.9

slide-16
SLIDE 16

16

➛✱➜✳➝ ➜✵➞✶➜✸➟✺➠
  • Model structure
➡❳➢➥➤➌➦❪➧✟➦✣➨✘➩✮➫✒➭✭➨✔➧☛➯✟➲❋➤☞➳✆➵❝➸✙➫✒➺☞➻ ➼➾➽ ➡❃➚✁➪☛➶✭➨☛➤✘➸✙➨✟➦✳➹➊➨☛➨✒➼❋➤☞➦✣➧✟➦✮➨☛➤✦➹❋➻ ➦✎➲➂➘☛➭✮➫✒➸✙➧✒➸✭➻ ➺ ➻ ➦✱➻ ➨☛➤
  • Probability of a sequence s being generated

from the model

➡❡➴▲➷✆➬✒➮❝➱☛✃ ➴✁❐☛❒✐❮✺❰❪Ï❦Ï❅Ð♠Ï♦Ñ Ò ❒q❰❪Ï✦Ó Ï❅Ñ❦❒q❰❪ÏÔÓ Ï♦Ñ✐❒q❰rÐ➊Ó Ï❅Ñ✐❒q❰✮Ï✦Ó ÐsÑ Ò④Õ✒Ö✮×❏ØÙÕ✒Ö✮×❏ØÚÕ✄Ö Û✟ØÚÕ✒Ö Ü
  • Training transitions probability between states
❒✐❮✱❰✺Ý❸Ó Þ▲Ñ Ò Ð❅ß✄à☛á☞â✺❰✎Þ✟Ý✜ã→äqÑ✭å✒Ð❅ß✒à✔á☞â✮❰æÞ✩ã➉ä➣Ñ

Markov chains (Order(1))

C A

ç❏è é ç❏è ê ç❏èìë ç❏è í î✱ï✳ð ï✵ñ✶ï✸ò✺ó

l = memory of sequence

  • Model

– A state for each possible suffix of length l |Σ|

ô

states – Edges between states with probabilities and single symbols

  • P(AACA)

= P(A|AC) P(A|CA)P(C|AA) P(A|AC) = 0.7*0.4*0.9*0.7

  • Training model

Pr(σ|s) = count(sσ 2 T) / count(s 2 T)

Higher order Markov Chains

AC AA

õ÷ö✖ø ù õ÷ö✖ø ú ûüö✖ø✣ý

l = 2 CC CA

ö❏ø þ û✥ö❏ø ÿ õ✝ö✖ø
  • ûüö✖ø
✁ õ✝ö✖ø ✂
slide-17
SLIDE 17

17

✄✆☎✞✝ ☎✠✟✡☎☞☛✍✌

Variable Memory models

  • Probabilistic Suffix Automata (PSA)
  • Model
✎✑✏✓✒✕✔✖✒✘✗✚✙✜✛✢✙✜✣✜✤✥✙✦✒★✧✪✩ ✫✭✬✜✙✯✮✖✰✱✙✜✩ ✲✢✗✳✫✭✮✴✬✵✧✕✗✚✔✖✒✘✗✶✧✚✒✆✷✸✔✶✫✴✹ ✺ ✷✥✗✵✧✻✗✼✫✸✮✴✙✦✒★✧✪✩ ✫✥✬✽✩ ✙✯✙✖✣✥✰✾✰✪✩ ✿❀✮✖✰✚✔✵✫✥✮✜✒✆✷✭✗✵✧
  • Calculating Pr(AACA):
❁✑❂❄❃✕❅❇❆ ❈❉❈❋❊●❂❄❃✘❅❍❆ ❅■❊●❂❄❃✆❈❏❆ ❅■❊❑❂❄❃✻❅❇❆ ❅■❈❋❊ ❁▼▲✵◆ ❖✚P◗▲✶◆✠❘✜P◗▲✵◆✠❙✜P❚▲✵◆✕❯
  • Training: not straight-forward
❱❳❲❨✔✜✙✥✗✜❩✽✤✢❬ ❂ ✧✻✗✜❩✶✩ ❭✦✒★✩ ✮✵✫❪✏❫✣✢✰✕✰✪✩ ✿✯❴❵✧✻✗✜✗✜✙ ❱ ❂ ✏✓❴❄✙✯❭✥✔✵✫✴✤✥✗✴❭✢✮✵✫✥❛✥✗✵✧✕✒✘✗✱❩❜✒✘✮ ❂ ✏ ❅ ✔✖✰✕✒✕✗✵✧✚✒★✧✕✔✵✩ ✫✚✩ ✫✸✬

CC AC

❝✴❞ ❡ ❢ ❝✴❞ ❡ ❣ ❤ ❞ ❡✘✐

A

❝✴❞ ❡ ❥ ❤ ❞ ❡ ❦ ❤ ❞ ❡ ❥ ❧✆♠✞♥ ♠✠♦✡♠☞♣✍q

Prediction Suffix Trees (PST)

  • Suffix trees with emission probabilities of
  • bservation attached with each tree node
  • Linear time algorithms exist for constructing

such PSTs from training data [Apostolico 2000]

CC AC

r✴s t ✉ r✴s t ✈ ✇ s t✘①

A

✇ s t ② r✴s t ③

e A C AC CC

④⑥⑤ ⑦✡⑧✡④⑨⑤ ⑩ ④⑥⑤ ❶❸❷✡⑧❹④⑥⑤ ⑩❹❶ ④⑥⑤ ❶⑨❺❹⑧✡④⑨⑤ ⑩❸❺ ④⑥⑤✘❻●⑧✡④⑨⑤ ❼ ④⑥⑤ ❽⑨⑧❹④⑥⑤ ❾

P(AACA)=0.28*0.3*0.7*0.1

slide-18
SLIDE 18

18

❿✆➀✞➁ ➀✠➂✡➀☞➃✍➄

Hidden Markov Models

  • Doubly stochastic models
  • Efficient dynamic programming

algorithms exist for

➅❳➆✶➇ ➈✸➉✵➇ ➈✥➊➌➋➎➍✆➏★➐➒➑ ➅➔➓❵→✸➣✳→✚↔ ↕✵→✭➣✜➙✦➛❨➜✜➝✕➞✶➟✥➠✶➟✜↔ ➡ ↔ ➛✕➢❪➜✥➠✜➛✆→➥➤✽➛✆→✸➠✖➛ ➦ ➠✸➧✱↔ ➦ ↔ ➨✢➣✜➙✴➤➎➝✆➩✆➫❋➭✞➤❄➯➲➩☞➳❇↔ ➛✻➣✵➝✪➟✜↔ ↔ ➯
  • Training model
➵❳➸❨➠✵➺ ➦❜➻✕➼ ➣✵➡ ➽✖→➾➠✵➡ ↕✜➞✵➝✪↔ ➛✆→ ➦

S

S

S

➶ ➹⑥➘ ➴ ➹⑥➘ ➷ ➹⑥➘ ➷ ➹⑥➘ ➬ ➹⑥➘ ➮ ➹⑥➘✾➱

S

✃ ❐ ❒ ➹⑥➘ ❮ ➹⑥➘ ❰ ❐ ❒ ➹⑥➘ Ï ➹⑥➘ Ð ❐ ❒ ➹⑥➘ ➷ ➹⑥➘ ➷ ❐ ❒ ➹⑥➘ ➴ ➹⑥➘✾➱ Ñ✆Ò✞Ó Ò✠Ô✡Ò☞Õ✍Ö

Discriminative training of HMMs

  • Models trained to maximize likelihood of data

might perform badly when

– Model not representative of data – Training data insufficient

  • Alternatives to Maximum-likelihood/EM

– Objective functions:

×ÙØ✴Ú Û✱Ú Ü➥Ý✖ÜßÞ✜à á✜â✥â✖Ú ã✪Ú Þ✥á✖ä✆Ú å✵Û➾æ✵ç✪ç✕å❨ç èÙØ➾á✭é✚Ú Ü➥Ý✖Ü▼ê✥å✜â✢ä✘æ✵ç✪Ú å✵ç❨ê✜ç✻å✵ë✥á✵ë✱Ú à Ú ä✘ì❏å✜ã✱á✜Þ✢ä✆Ý✥á❨à✦à á✵ë✥æ✶à✸í➎ç✍î✆Þ✜ï é✖ð èÙØ➾á✭é✚Ú Ü➥Ý✖ÜñÜòÝ✢ä✆Ý✸á✵à✢Ú Û✢ã✻å✵ç✆Ü❜á✖ä✆Ú å✵Û❇ó❀Ú ä★ô➾Þ✜à á✜â✥â

– Harder to train above functions, number of alternatives to EM proposed

èöõ❏æ✵Û✥æ✵ç✻á✵à Ú ÷✥æ✜ø✽ê✜ç✻å✵ë✥á❨ë✚Ú à Ú â✢ä✆Ú Þ✯ø✚æ✜â✥Þ✢æ✵Û✥äúù✻û❨á✜ä✕á✚ü✶Ú ç✪Úþý✚ÿ✁ è✄✂❵æ✖ä✕æ✵ç★Ü✳Ú Û✜Ú â✦ä★Ú Þ✯á✵Û✚Û✥æ✚á✶à Ú Û✭ü➥ù✆☎❵á✱å✞✝✠✟✡
slide-19
SLIDE 19

19

☛✌☞✎✍ ☞✑✏✒☞✔✓✖✕

HMMs for profiling system calls

  • Training:

– Initial number of states = 40 (roughly equals number

  • f distinct system calls)

– Train using Baum Welch on normal traces

  • Methods of testing:

– Need to handle variable length and online data – For each call, find the total probability of outputting given all calls before it.

✗✙✘ ✚✜✛✣✢✆✤✦✥★✧✩✥✣✪ ✫ ✪ ✬✮✭✯✥✱✰✦✫ ✤★✲✳✧✯✬✵✴✣✢✆✰✣✶✣✴✷✤✦✫ ✸✞✹✺✧✩✫ ✫★✪ ✬✻✧✩✥✣✼✷✤✦✢✵✽✯✧✦✫ ✾

– Trace is abnormal if fraction of abnormal calls are high

✿✌❀✎❁ ❀✑❂✒❀✔❃✖❄

More realistic experiments

  • HMMs
❅❇❆ ✧❉❈✡✰❊✫ ✤✩✼★❋✻✰✦✢●✬✌✪ ✽✯✰❍✬■✤❍✬❏✢✮✧✩✪ ✼ ❅▲❑ ✰✣✶★✶▼✶✺✰✦✼✷✶✁✪ ✬✌✪ ◆★✰❍✬✆✤❍✬✌✴✣✢✆✰✣✶✣✴★✤✩✫ ✸✣✶●❖✁✼★✤P✲✯✪ ✼★✸✻✤✷✲◗✛✷✧✩✢✮✧✦✽❍✰✁✬■✰✦✢ ❅▲❘ ✰✣✶✺✬✻✤✁◆★✰✩✢✮✧✜✫ ✫★✛★✰✦✢✮✚✮✤✩✢✌✽✯✧✦✼✷✹✺✰
  • VMM and Sparse Markov Transducers also shown to perform

significantly better than fixed window methods [Eskin 01]

❙❯❚ ❙❯❚ ❱ ❲ ❳■❨❬❩ ❭✒❪✒❨❴❫❛❵ ❜ ❝❡❞■❢❣❢❣❤✐❝ ❥✦❦✑❥ ❥✦❦✑❥ ❥✦❦✑❥✱❧✣♠✦♥ ❥✦❦✑❥✣❥✜♦❯♠ ♣rq✎s ❵ ❪❴❭❬t ✉ ❫❬❪ ❥✦❦ ❥ ♦✺❥✷✈ ✇ ❥✦❦✑❥✣❥✣❥●❥✣① ❧●❥ ②●③ ④✣⑤✣⑥ ❥✦❦ ❥ ♦✺❥✷✈ ✇ ❥✦❦✑❥✣❥✜♦★⑦ ❧●❥ ⑧★⑨✩⑩✯❶✣❷ ❥✦❦ ❥✻❥✜♦✺♥ ♦✺❥✷✈ ✇ ❥✦❦✑❥✣❥✜♦★❸ ♦❛❧ ❹r❺ ❻ ❶✣❼ ❧ ③ ❽✣❾ ❥✦❦ ❥✻❥✣❥✻❸ ♦✺❥✷✈ ✇ ❥✦❦✑❥ ♦❛❧ ❹r❺ ❻ ❶✣❼ ♦ ③ ❽✣❾ ♣rq✑s ❵ ❪❿❭✒t ✉ ❫➀❪ ❳■❨❬❩ ❭❬❪✒❨❴❫✡❵ ❜ ♣rq✎s ❵ ❪❴❭❬t ✉ ❫❬❪ ❳■❨❬❩ ❭❬❪✒❨ ❫✡❵ ❜ ➁➃➂✞➂ ❹➅➄ ❞➇➆❡❤

[from Warrender 99]

slide-20
SLIDE 20

20

➈✌➉✎➊ ➉✑➋✒➉✔➌✖➍

Case study: classifying protein sequences

  • Classifying proteins into its functional/structural

classes based on its sequence of amino acids

  • Methods proposed

– Nearest neighbor classifiers based on pair-wise sequence alignment as the distance measure – Consensus patterns using Motifs – Profile Hidden Markov Models – Support Vector Machines with various kernels

➎✙➏✩➐ ➑✁➒✷➓✦➔✌→ ➑❍➣✡➓✦➔✵↔★➓✦↕➛➙➜➏✜➐ ➑✣➒★➓✦➔✆➝✵➞✩➟➡➠➤➢ ➎➥➠✞➐ ➑✁➦✯➧✁➨✮➩✁➒P➑✺➨✖➔✵➐ ↔★➫❊➣✡➓✦➔✵↔★➓✦↕ ➑ ➭✌➯✎➲ ➯✑➳✒➯✔➵✖➸

Profile Hidden Markov Models

  • Protein families characterized by common
  • ccurrence of a few scattered amino acids in a

background of other unrelated symbol

slide-21
SLIDE 21

21

➺✌➻✎➼ ➻✑➽✒➻✔➾✖➚

Profile HMM

Profile HMM of a family has for each aligned symbol three kinds of states:

➪➹➶P➘✁➴✮➷✁➬P➮❯➴✮➘✁➴✮➱✩✃✡❐✻❒ ➮✁❒ ➴✆➱✣❮P❰✯➬★➱✦ÏP➮✺Ð✱ÑÓÒ✷Ô✦Õ❛➘✦Ö✣Ö✱➱✣➘✦×✆➮❍❒ Ï➤➘✞➮✺➱✣Ø✩Ù✷➱✦Ï✱➷✺➱ Ú▲Û❣➱✩Õ ➱✁➴✮➱✣➮▼➮❯➴✮➘✁➴✮➱✣➮✣✃Ü➴✮Ô✞➘✩Õ Õ Ô✷❰ÝÔ✣➷✷➷✺➘✻➮✁❒ Ô✦Ï✷➘✦Õ❛❮✦×✆Ô✦ÖPÔ✁Þ✻➴❏➬★➘✁➴✻➮✺Ð✱ÑÓÒ★Ô✩Õ Ú▲ß➇Ï★➮✺➱✦×✮➴✮➮✣✃❯➴✮Ô✞➘✦Õ Õ Ô✷❰à❒ Ï★➮✺➱✩×■➴❏❒ Ô✦ÏPÔ✁Þ✩ÑÓÙ✣Õ ➴❏❒ Ö●Õ ➱✞➮★Ð✱ÑÓÒ★Ô✦Õ ➮❍Ò✷➱✁➴✎❰➡➱✻➱✦Ï➤➘✩Õ ❒ á✩Ï✷➱✣❮ ➮❯➴✮➘✁➴✮➱✣➮

[

â✩ã➀ä➀å❿ærçéè êìë■í✒î æðï■î äéñ ò ë ë✔ç❛ó ô ô õ✻õ✻õ✐ö êø÷❴æ✺ö íùêì÷ùê❬ö æ❬ú✡íøô✮î æ❬÷❴æéû❛î ê ò ô✎ê❴ä❬ñüç✒ãéè ä❿ô ò ë✑ñrý þ✡ï✎ä❛î ñüûùë þ✺ç❬û✒çùæ❯î ÷ìô ò í❿ÿ ò✁ î ä➀ÿ ò✄✂✆☎ ô✞✝❴ä❬ú❬æ ✟ ö ò ë✎ñ ý ✠☛✡✌☞ ✡✎✍✏✡✒✑✔✓

SVMs on Fisher’s kernel

  • Train a HMM for the positive class,

– θ: set of all parameters of the HMM – θ

: the trained values of parameters

  • Fisher’s score for each sequence s is gradient

vector w.r.t θ,

that is, r Pr(s|θ)|

✖✁✗✘✖✆✙
  • For two sequences s
✚ , s ✛ , kernel is K(s ✚ ,s ✛ ) =

similarity between their fisher’s score

  • Train SVM using this kernel
  • Combines biological information in HMM with

discriminatory power of SVMs

slide-22
SLIDE 22

22

✜☛✢✌✣ ✢✎✤✏✢✒✥✔✦

HMMs for information extraction

✧✩★✫✪✭✬✮✪✰✯✒✱✲✪✴✳✵✪✭✶✸✷✆✬✸✹✘✺✼✻✘✪✽✷✴✾❀✿❁✪✭❂❄❃❄✪✭❂❆❅✘❇✌✪✭✶✸✷✲❈ ❈ ❈ ✧❊❉ ★✫✪✭✬✮✪✰✯✒✱✲✪✴✳✵✪✭✶✸✷✆✬✸✹✘✺✼✻✘✪✽✷✴✾❀✿❁✪✭❂ ❋✰●■❍✆❏▲❑✁▼◆❍✆❖◗P✘❘▲❃❙✪❚❂❆❅✸❇✌✪✭✶✘✷❀❈ ❈ ❈ ❯

Naïve Model: One state per element

Nested model Each element another HMM

❲☛❳✌❨ ❳✎❩✏❳✒❬✔❭

Summary

  • Several applications of sequence mining
  • Record mining techniques on sequence data

may not be effective

  • Many interesting options for sequence-specific

generative models

  • Case studies on three applications:

– Intrusion detection – Protein classification – Information Extraction

  • Future work: practical general purpose data

mining tools for handling sequence data

slide-23
SLIDE 23

23

❪☛❫✌❴ ❫✎❵✏❫✒❛✔❜

References

❝❡❞❁❢✔❣❤❢✔✐❄❥ ❦ ❧♥♠❆♦◗♣✰❥ q◗r✏❢☛s✘❢✔t✵✉✔✈◗✈◗✇❆①❚q②✐❙❢☛✐❙❢◗❞❚♠✞♦✰✉✔③ ③✎✇❆④✒q⑥⑤✔❢☛⑦✄♦✰✉✞①❚⑧◗q⑥⑦✘❢☛⑦❤♦◗✉✔①❚⑧⑥q⑥⑨❀❢✔t✵⑩ ❥ ❥ ✇❆④✒q◗✉✞①✰✈▲❶❷❢☛⑤☛❢✞s✁⑩ ❸✰❹❙✉✔①✭❢②❺▲✉✞❸◗❸❚✇✔✈ ❻ s❤✐❙❞✰r■✉✔①◗✈▲❼✆❞◗❽✎❾ ❻ s❤✐❙❞❚r✘❿☛✐➀①✰✇✞➁◆⑧②✇✔①◗✇✔④✌✉✞❦ ⑩ ➂⑥①❙➂◗③✭❸◗④✌➂⑥❦ ✇☛⑩ ①➃✈◗✉✞❦ ✉✞➄❚✉☛❧✒✇▲❧✒✇✔✉❆④✎♠✔♦▲❸◗④✌➂✰⑧②④✎✉✔❹➅❧✞❢➇➆❄➈◗➉✔➊ ➋✔➌ ➉❤➍➎➉✔➌ ➏⑥➐ ➑ ➋✞➐❆➋❆➒✰➓♥➉✞➔✭q⑥→❚➣❚❿ ↔◗↔✰↕②➙②➛❚↔◗➜✰➝⑥→✰q✽➞❆➙◗➙◗➟✭❢ ❝➠✐➎❸✭➂✰❧✎❦ ➂◗❥ ⑩ ♠✔➂✰q⑥✐❙❢ q◗✉✞①✰✈ ❻ ✇✌➡♥✇✞④✌✉❆①✰➂✰q◗❺➃❢☛→✰➝◗➝◗➝❚❢②➢➎❸◗❦ ⑩ ❹❷✉✔❥❚✉☛❹➃①✰✇②❧✒⑩ ♠✮❸⑥④✌➂②➄❚✉✞➄❚⑩ ❥ ⑩ ❧✎❦ ⑩ ♠➎✉❆♣⑥❦ ➂◗❹❷✉✞❦ ✉➎➂②④❤♦✰➂②➁➤❦ ➂❄❥ ✇✔✉❆④✎①➃✉✔①◗✈ ♠❆❥ ✉☛❧➇❧♥⑩ ③ ➥▲❸✰④✌➂②❦ ✇✔⑩ ①✰❧✏⑩ ①❙❥ ⑩ ①✰✇✔✉✔④✽❦ ⑩ ❹❷✇➎✉❆①✰✈❙❧✎❸❚✉✔♠☛✇⑥❢☛❽ ①▲❼❁④✎➂✰♠☛✇✔✇✔✈◗⑩ ①❚⑧◗❧✆➂◗③❁➦✸➧✆➨✮➢✮t ❻ →✰➝◗➝◗➝❚❢ ♦②❦ ❦ ❸✄❿ ➩ ➩ ♠☛⑩ ❦ ✇☛❧♥✇✔✇❆④➇❢ ①✞➡❆❢ ①✰✇✔♠⑥❢ ♠✔➂✰❹❄➩ ✉✞❸❚➂◗❧✌❦ ➂◗❥ ⑩ ♠✔➂◗➝⑥➝◗➂②❸✰❦ ⑩ ❹❙✉☛❥ ❢ ♦⑥❦ ❹❙❥ ❝➠➦❄❢✔❶➫♣⑥④ ➄❚⑩ ①✰q❚❞✽❢✔➧✆✈⑥✈⑥➥❚q②✐❙❢☛➭➫④✎➂✰⑧②♦❚q◗✉❆①✰✈➃❺➃❢✔t✴⑩ ❦ ♠❆♦✰⑩ ❧✒➂⑥①❚q✮➯✸➌ ➲◗➊ ➲☛➳❁➌ ➉❆➒✰➊◗➐❆➋❆➵✰➈◗➋✞➸❚➉✔➋✮➒⑥➸✰➒✰➊ ➺✰➐✔➌ ➐❆➻❆➼✏➓➇➲⑥➽⑥➒❚➽◗➌ ➊ ➌ ➐✞➾ ➌ ➉✮➚▲➲⑥➏✰➋✔➊ ➐✮➲ ➪ ➼✏➓♥➲◗➾ ➋✔➌ ➸⑥➐✸➒⑥➸✰➏▲➸❚➈◗➉✔➊ ➋☛➌ ➉✮➒✰➉✔➌ ➏⑥➐✔q◗➨✘✉✔❹➃➄⑥④✌⑩ ✈✰⑧◗✇✸➶❄①✰⑩ ➹✰✇✞④✒❧✒⑩ ❦ ➥❄❼✭④✌✇☛❧➇❧♥q✽➞❆➙◗➙✰↕✰❢ ❝➠➧✆❥ ✇✔✉❆➘②✉✞④✄➧✏❧✎➴❚⑩ ①❚q⑥⑨➷✇❆①⑥➴❚✇✮s✁✇✔✇➎✉✔①◗✈❷❞❚✉✔❥ ➹❚✉✞❦ ➂②④✌✇✮⑤☛❢◗❞⑥❦ ➂◗❥ ③ ➂✭❢⑥➬ ➬ t✵➂◗✈◗✇✔❥ ⑩ ①❚⑧❷❞✰➥✰❧✌❦ ✇❆❹➮➨✸✉✔❥ ❥ ❧✆③ ➂⑥④❤❽ ①②❦ ④✎♣❚❧♥⑩ ➂②①▲❶➃✇✞❦ ✇✔♠✔❦ ⑩ ➂⑥①❄➁❄⑩ ❦ ♦ ❶➫➥◗①❚✉❆❹❙⑩ ♠✮⑨➀⑩ ①◗✈✰➂②➁➮❞✰⑩ ➘☛✇②❧✞❢ ➱ ➱✌✃✘➓➇➲◗➉✔➋✔➋✔➏✰➌ ➸②➳❚➐✮➲ ➪✸❐➫❒✒❮❚❰✄Ï❤Ð✼❒ ❒✔❢☛⑤♥♣◗①◗✇✮→✰➝◗➝✽➞②❢ ❝Ñ❽ ❶❙❞❙♦②❦ ❦ ❸✄❿ ➩ ➩ ➁▲➁➫➁❷❢ ♠②❧✞❢ ♠✔➂⑥❥ ♣✰❹➃➄❚⑩ ✉⑥❢ ✇☛✈⑥♣②➩ ⑩ ✈✰❧✎➩ ❸✰♣⑥➄❚❥ ⑩ ♠②✉✞❦ ⑩ ➂⑥①❚❧✌➩ ❝Òr✏❢☛⑤➇✉☛✉✞➴◗➴✰➂⑥❥ ✉☛q②t➀❢✔❶▲⑩ ✇✞➴✰♦✰✉❆①✭❧➇q◗✉✞①✰✈➫❶➅❢✔Ó▲✉❆♣❚❧➇❧✒❥ ✇✔④➇❢✔✐➤✈◗⑩ ❧✒♠❆④✌⑩ ❹❷⑩ ①❚✉✞❦ ⑩ ➹◗✇➫③ ④✌✉☛❹❙✇❆➁❄➂②④✎➴❙③ ➂②④✁✈◗✇✞❦ ✇✔♠✞❦ ⑩ ①❚⑧➎④✒✇✔❹❷➂☛❦ ✇✸❸◗④✎➂②❦ ✇☛⑩ ① ♦◗➂✰❹❙➂◗❥ ➂✰⑧⑥⑩ ✇☛❧❆❢♥Ô❚➲◗➈◗➓✌➸✰➒✰➊❚➲ ➪▲❰✸➲⑥➚✄➼✘➈◗➾ ➒✰➾ ➌ ➲⑥➸✰➒✰➊☛➯✸➌ ➲◗➊ ➲❆➳☛➺❁q⑥→✰➝◗➝◗➝✭❢ ❝➠❶❷❢✔Ó▲✉✔♣✰❧➇❧♥❥ ✇✞④➇❢②➨✸➂⑥①⑥➹✰➂◗❥ ♣⑥❦ ⑩ ➂⑥①❄➴❚✇✞④✎①✰✇✔❥ ❧✏➂⑥①➃✈◗⑩ ❧♥♠✞④✌✇✞❦ ✇▲❧✎❦ ④ ♣✰♠✞❦ ♣◗④✌✇⑥❢②r❤✇✔♠✔♦◗①◗⑩ ♠☛✉✔❥◗④✌✇✞❸❚➂⑥④ ❦ q②➶❄➨➷❞✰✉✞①⑥❦ ✉➫➨✆④✎♣✰➘②q✽➞❆➙◗➙◗➙✭❢ ❝Ñ⑨➷✇✞①◗➴❚✇✮s✁✇✔✇➎✉✔①◗✈❷❞✰✉☛❥❁❞⑥❦ ➂◗❥ ③ ➂✭❢⑥➬ ➬ ❶❄✉✞❦ ✉✸t✴⑩ ①✰⑩ ①❚⑧❄✐➎❸✰❸◗④✌➂◗✉✔♠❆♦✰✇☛❧✏③ ➂☛④❤❽ ①◗❦ ④✎♣❚❧♥⑩ ➂⑥①▲❶➃✇✞❦ ✇☛♠❆❦ ⑩ ➂⑥①❚➱ ➱✌❒✌➸▲✃✘➓➇➲✰➉✔➋✔➋❆➏✰➌ ➸②➳❚➐✮➲ ➪❄➾ ➔❚➋ ❮❚➋✔Õ✔➋✞➸❚➾ ➔■Ö✁❮☛Ï❤➆✸❒ Ð×❮❚➋✔➉✔➈◗➓➇➌ ➾ ➺❙❮❆➺❚➚❤➼✆➲②➐✔➌ ➈⑥➚➮Ø ❮☛Ï✸❰➃Ö ➑ ❒❆Ù✽ÚÜÛ Ý◗Þ ß✰q❚❞✰✉✔①➃✐➫①⑥❦ ➂⑥①✰⑩ ➂✰q◗r✄à❄q⑥⑤✞✉✞①⑥♣✰✉✞④✎➥Ü➞❆➙⑥➙❚↕ ❝➠✐❙❢②❺➃❢✔t✴♣②④✎➘②⑩ ①✰q❚❞✽❢✔➧✸❢ ❻ ④✌✇✔①⑥①❚✇✞④✒q◗r✏❢✔Ó▲♣②➄◗➄❚✉❆④✌✈✰q◗✉✞①✰✈❄➨➫❢②➨✏♦❚➂②❦ ♦✰⑩ ✉⑥❢◗❞✰➨✸➢➎❼✆❿☛✐✲❧✎❦ ④✎♣✰♠✞❦ ♣⑥④✌✉✔❥❚♠☛❥ ✉☛❧➇❧✒⑩ ③ ⑩ ♠②✉✞❦ ⑩ ➂⑥①❙➂◗③❁❸⑥④✌➂②❦ ✇☛⑩ ①✰❧ ✈◗✉✞❦ ✉✞➄❚✉☛❧♥✇✮③✎➂②④✽❦ ♦◗✇➫⑩ ①◗➹◗✇②❧✎❦ ⑩ ⑧②✉❆❦ ⑩ ➂⑥①❙➂◗③✄❧♥✇✔á⑥♣✰✇❆①✰♠✔✇☛❧✘✉✞①✰✈❙❧✎❦ ④✎♣❚♠✞❦ ♣②④✌✇☛❧❆❢♥Ô❚➲◗➈◗➓✌➸✰➒✰➊❚➲ ➪✆â×➲◗➊ ➋✔➉✔➈✰➊ ➒✰➓✄➯✸➌ ➲◗➊ ➲☛➳☛➺❁q⑥→◗➜◗➟✭❿ ➣⑥↔✰ã②➛✭➣☛➜✰➝✰q ➞❆➙⑥➙✰➣❚❢ ❝➠➦➎✉✞➄✰⑩ ①❚✇✞④✒q②s✆✉✞➁➫④✌✇❆①✰♠✔✇✸➦❄❢ q✽➞❆➙◗➙◗➝❚❢✒➍➷➾ ➈⑥➾ ➲⑥➓➇➌ ➒✰➊❚➲②➸➃➔❚➌ ➏◗➏◗➋❆➸✸â✵➒✰➓✌ä⑥➲⑥Õ➎➚▲➲⑥➏✰➋☛➊ ➐✸➒⑥➸◗➏❄➐❆➋✔➊ ➋✔➉☛➾ ➋❆➏➃➒✌➼◗➼✏➊ ➌ ➉☛➒✰➾ ➌ ➲②➸◗➐✸➌ ➸➃➐ ➼✏➋✔➋✔➉✞➔ ➓♥➋☛➉✔➲✔➳✰➸❚➌ ➾ ➌ ➲②➸❁❢☛❽ ①➃✐❄❥ ✇❆å➃⑨➷✇✔⑩ ➄❚✇✔❥❚✉✔①◗✈❄➭▲✉✔➥❚❾ æ✭ç➃è✆é✔é➎ê é✔ë✰ì❆í î✎ï②ð➎é✔ñ☛ë②ò ó✰ô◗ì✏ò ó➅õ⑥ö❚é✔é✔÷✔ø▲ð➎é✔÷✔ù✰ô②ó✰ò ú ò ù⑥ó❁í✔è✆ù✰ì❤û❄ü ú ù✰ì➇ï◗ý✆û➅þ ÿ ù✁✒ô⑥ñ❆ó✄✂▲ñ❆ç✆☎ ✝❷ñ❆ó◗ó❚ï②ö✰ñ②ô②é②ì✟✞✆✠☛✡✆☞ ☞ ✞✆✌☛✠✭í ✍✏✎❷í✔ð➎ù②ó✰ï☛✑❙í◗õ◗ò ó❚ô⑥é✒✁ñ❆ó✰ë✔✓ í✁✕❤ò ì✒ø✗✖✆✘✭í✁✕✄ø✰é✸ö❚ù✁✙➃é✚❤ù✗☎✄ñ✚✝➃ó❚é☛ì♥ò ñ◗þ☛ü é✔ñ✒✎ó✰ò ó❚ô➎ö☛✒ù✁✖❚ñ✒✖❚ò ü ò ì✒ú ò ÷➫ñ❆ç⑥ú ù✗✝➅ñ✞ú ñ✛✙❄ò ú ø✄✜✰ñ✚✌ò ñ✒✖❚ü é ✝❷é✢✝❷ù✁✣✘❙ü é❆ó❚ô②ú ø❁í ÿ ñ✔÷✞ø✰ò ó❚é➎è❤é☛ñ✒✎ó✰ò ó❚ô⑥ï✗✞✥✤✭þ ✦✆✦✚✡✆☞ ☞✧✦✒★✆✌✰ï✩✦✚✌✗✌☛✠ ✍✫✪➷ñ✒ ✌é❆ó✰ë⑥é✚✌ï◗ý✘ø✗✌ò ì✌ú ò ó❚ñ☛ï❚õ◗ú é✞ö✰ø✰ñ❆ó✰ò é✮æ❁ù✁✣✌é☛ì✌ú ï◗ñ✞ó❚ë✭✬✸ñ✚✎ñ✚✮✰✯✄é✔ñ✚✌ü ✝❄ç◗ú ú é✒➇í✢✎❄é✞ú é✔÷✞ú ò ó✰ô✭✱ ó②ú ✎ç❚ì♥ò ù⑥ó❚ì✳✲➃ì✒ò ó❚ô➃õ✆✘✭ì✎ú é✢✝ ý✸ñ❆ü ü ì✔þ☛û▲ü ú é✒✎ó✰ñ✞ú ò ✜❚é✛✎➃ñ✞ú ñ ÿ ù◗ë⑥é☛ü ì✞í✁✕❤ù➃ñ✞ö◗ö❚é✔ñ✒✒ï✩✦✚✌☛✌☛✌✭✱ ✴✵✴✟✴✵õ✆✘☛✝➃ö✰ù✰ì♥ò ç✆✝➮ù⑥ó➅õ✰é✔÷✔ç✁✌ò ú ✘➃ñ✔ó◗ë✭✯✶✌ò ✜◗ñ✔÷✢✘✭í✶✦✚✌☛✌☛✌