Combining Machine and Automata Learning for Network Traffic - PowerPoint PPT Presentation

ميحرلا نمحرلا للوا مسب Combining Machine and Automata Learning for Network Traffic Classification Zeynab Sabahi, Fatemeh Ghassemi, and Zahra Alimadadi TTCS 2020,Tehran, Iran 1

Network Traffic Classification, What & Why? For a given interleaved packet trace, we want to detect which applications are running ? For the network management tasks: - Anomaly detection, - Balancing bandwidth usage, - Firewalling, gateway .. . 010011011001111 2

Network Traffic Classification, How?  Port-based classification: Inefficient (random or non-standard ports usage)   Payload inspection: Useless in encrypted traffic   Statistical methods: Flow/packet statistical features Fast but less accurate  Ignore temporal relation among flows   Behavioral classification: Specific to the category of application  3

Our solution  Intuition:  A network application is a program calling different well-known protocols such as HTTP, TCP, SSL, and TLS. TCP TLS HTTP User  Each application has its specific network communication language when calling different well-known protocols. 4

Research Goals  Learning the network language for each application that we do not have its source code, in an automatic way  Classifying an interleaved packet traces of applications according to the learned languages 5

Research Goals k-TSS language  Learning the network language for each application that we do not have its source code, in an automatic way  Classifying an interleaved packet traces of applications according to the learned languages 6

 Introduction  Preliminary: k-TSS Language  NeTLang Framework  Evaluation  Conclusion 7

Formal Foundation: k-TSS Language  k - T estable language in the S trict S ense is a regular k-size window language. Its learning is decidable.  Words are determined by three allowed sets prefixes, suffixes, and segments. 8

Formal Foundation: k-TSS Language  k - T estable language in the S trict S ense is a regular k-size window language. Its learning is decidable.  Words are determined by three allowed sets prefixes, suffixes, and segments. aba aabababb  Window of size 3  Segments = {aba}  Prefixes = {}  Suffixes = {} 9

Formal Foundation: k-TSS Language  k - T estable language in the S trict S ense is a regular k-size window language. Its learning is decidable.  Words are determined by three allowed sets prefixes, suffixes, and segments. a baa abababb  Window of size 3  Segments = {aba, baa}  Prefixes = {}  Suffixes = {} 10

Formal Foundation: k-TSS Language  k - T estable language in the S trict S ense is a regular k-size window language. Its learning is decidable.  Words are determined by three allowed sets prefixes, suffixes, and segments. ab aaa bababb  Window of size 3  Segments = {aba, baa, aaa}  Prefixes = {}  Suffixes = {} 11

Formal Foundation: k-TSS Language  k - T estable language in the S trict S ense is a regular k-size window language. Its learning is decidable.  Words are determined by three allowed sets prefixes, suffixes, and segments. ab aaabababb  Window of size 3  Segments = {aba, baa, aaa, aab, bab, abb}  Prefixes = {ab}  Suffixes = {} 12

Formal Foundation: k-TSS Language  k - T estable language in the S trict S ense is a regular k-size window language. Its learning is decidable.  Words are determined by three allowed sets prefixes, suffixes, and segments. abaaababa bb  Window of size 3  Segments = {aba, baa, aaa, aab, bab, abb}  Prefixes = {ab}  Suffixes = {bb} 13

Formal Definition of k-TSS Language  Definition 1 (k-test vector) Let k > 0. A k-test vector is a 5-tuple 𝑎 = < 𝛵, 𝐽, 𝐺, 𝑈, 𝐷 > where:  𝐽 ⊆ Σ 𝑙−1 is a set of allowed prefixes  𝐺 ⊆ Σ 𝑙−1 is a set of allowed suffixes  𝑈 ⊆ Σ 𝑙 is a set of allowed segments  𝐷 ⊆ Σ <𝑙 is a set of allowed short strings  Definition 2 (k-TSS Language) Let 𝑎 = < Σ, 𝐽, 𝐺, 𝑈, 𝐷 > be a k-test vector, for some k > 0. L(Z) = [(𝐽Σ ∗ ∩ Σ ∗ 𝐺) − Σ ∗ (Σ 𝑙 − 𝑈)Σ ∗ ] ∪ 𝐷  14

Formal Definition of k-TSS Language What is it? How should it be defined for network domain?  Definition 1 (k-test vector) ? Let k > 0. A k-test vector is a 5-tuple 𝑎 = < 𝜯, 𝐽, 𝐺, 𝑈, 𝐷 > where:  𝐽 ⊆ Σ 𝑙−1 is a set of allowed prefixes  𝐺 ⊆ Σ 𝑙−1 is a set of allowed suffixes  𝑈 ⊆ Σ 𝑙 is a set of allowed segments  𝐷 ⊆ Σ <𝑙 is a set of allowed short strings  Definition 2 (k-TSS Language) Let 𝑎 = < Σ, 𝐽, 𝐺, 𝑈, 𝐷 > be a k-test vector, for some k > 0. L(Z) = [(𝐽Σ ∗ ∩ Σ ∗ 𝐺) − Σ ∗ (Σ 𝑙 − 𝑈)Σ ∗ ] ∪ 𝐷  15

Translating Network Concepts to Automata Learning Intuition: some packets always appear together due to the control phase of protocols or the specific functionality of an application A sequence of related packets : A symbol of the alphabet A packet trace of an application : A word of the language For a set of all packet traces of an application its k-TSS language can be learned 17

NeTLang Framework  Ne twork T raffic Lan guage Learner: NeTLang  Architectural View: 18

NeTLang Framework  Ne twork T raffic Lan guage Learner: NeTLang  Architectural View: 1 2 3 19

1) Trace Generator  Different coloring is for their protocol.  Clustering algorithm is Kmeans++.  Stats is statistical features based on length, number, and IAT of packets. 20

2) Language Learner  By moving a k-window sliding parser the k-TSS vector is learned.  For the running example (k=3):  Σ = {H-2, SL-2, SL-3, SL-4, SL-5, T-1, T-10, TL-2, U- 0, U-1}  T = {SL-2 T-1 U-0, SL-4 SL-2 T-1, T-1 SL-5 H-2, T-1 U-0 U-1, T-10 TL-2 T-1, TL-2 T-1 SL-5, U-0 U-1 SL-3}  I = {SL-4 SL-2, T-10 TL-2}  F = {SL-5 H-2, U-1 SL-3} 21

3) Classifier The automata of applications The interleaved packet trace App1 App2 . . . 22

3) Classifier The automata of applications The interleaved packet trace App1 The trace generator module is used to divide the symbolic sub-traces by timing features. App2 . . . 23

3) Classifier The automata of applications Sub-trace s1 App1 Automata word inclusion is not a suitable approach due to App2 the incomplete sub-traces and network noises. . . . 24

3) Classifier The automata of applications Sub-trace s1 App1 Z(App1) = < 𝛵 1 , 𝐽 1 , 𝐺 1 , 𝑈 1 > Z(s1) = < 𝛵, 𝐽, 𝐺, 𝑈 > Window-based Similarity App2 . . . 25

3) Classifier The automata of applications Sub-trace s1 App1 Z(App1) = < 𝛵 1 , 𝐽 1 , 𝐺 1 , 𝑈 1 > Z(s1) = < 𝛵, 𝐽, 𝐺, 𝑈 > Window-based Similarity App2 Percentage 𝛦𝑈 = 𝑈 −𝑈1 1 = 𝑈1 −𝑈 , 𝛦𝑈 𝑈1 , 𝑈 Change 𝛦Ʃ = Ʃ −Ʃ1 , 𝛦𝐽 = 𝐽 −𝐽1 𝐽 , 𝛦𝐺 = 𝐺 −𝐺1 metric Ʃ 𝐺 . . . distance(s1, App1) = 𝛦𝑈 𝛦𝑈 1 𝛦Ʃ 𝛦𝐽 𝛦𝐺 26

3) Classifier The automata of applications Sub-trace s1 App1 Z(App1) = < 𝛵 1 , 𝐽 1 , 𝐺 1 , 𝑈 1 > Z(s1) = < 𝛵, 𝐽, 𝐺, 𝑈 > Window-based Similarity App2 𝛦𝑈 = 𝑈 −𝑈1 1 = 𝑈1 −𝑈 , 𝛦𝑈 𝑈1 , 𝑈 𝛦Ʃ = Ʃ −Ʃ1 , 𝛦𝐽 = 𝐽 −𝐽1 𝐽 , 𝛦𝐺 = 𝐺 −𝐺1 Ʃ 𝐺 In general: . . D(Z(w), Z( 𝐵𝑞𝑞 𝑗 )) = Δ𝑈 Δ𝑈 𝑗 ΔƩ . ΔI ΔF 27

3) Classifier The automata of applications Sub-trace s1 App1 distance(s1, App1) Min = distance(s1, Appj) App2 Class(s1) = 𝐵𝑞𝑞 𝑘 distance(s1, App2) . . . . . . 28

3) Classifier The automata of applications Sub-trace s1 App1 distance(s1, App1) Min = distance(s1, Appj) App2 Class(s1) = 𝐵𝑞𝑞 𝑘 distance(s1, App2) Class(w) = j if D(L(w), L( 𝐵𝑞𝑞 𝑘 )) = . . . . 𝑏𝑠𝑕𝑛𝑗𝑜 ∀ 𝐵𝑞𝑞 𝑗 ∈ |A| (D(L(w), L( 𝐵𝑞𝑞 𝑗 ))) . . 29

Classifier Result for the Running Example  Z(w= SL-4 SL-2 T-10 TL-2 T-1 U-2 ):  Z(App ): Σ’ = {H-2, SL-2, SL-3, SL-4, SL-5, Σ = { SL-2, TL-2, T-1, U-2, SL-4, T-10}   T-1, T-10, TL-2, U-0, U-1} T = {SL-4 SL-2 T-10, SL-2 T-10 TL-2, T-10 TL-  T ’ = {SL-2 T-1 U-0, SL-4 SL-2 T-1,  2 T-1,TL-2 T-1 U-2} T-1 SL-5 H-2, T-1 U-0 U-1, T-10 I = {SL-4 SL-2,}  TL-2 T-1, TL-2 T-1 SL-5, U-0 U-1 F = {T-1 U-2}  SL-3} I ’ = {SL-4 SL-2, T-10 TL-2}  F ’ = {SL-5 H-2, U-1 SL-3}  𝑗 = 0.85, ΔƩ = 0 .16, Δ𝑈 = 0.75, Δ𝑈 ΔI = 0, ΔF = 1 D(L(w),L(app)) = 7484160099 31

Dataset Description We divided pcaps to three sets: - Train: 65% - Validation: 15% - Test: 20% Metrics: - Precision (P), - Recall (R), - F1-Measure= 2∗𝑄∗𝑆 𝑄+𝑆 33

Evaluation Results The Best Configurations of Validation Set Parameter Application Traffic Identification Characterization Session 15 15 Threshold Inactive 5 15 Timeout Flow 10 10 Duration k 3 3 34

Compare with Statistical Classifiers: Application Identification Precision F1-Measure Recall 35

Combining Machine and Automata Learning for Network Traffic - PowerPoint PPT Presentation

Combining Machine and Automata Learning for Network Traffic Classification Zeynab Sabahi, Fatemeh Ghassemi, and Zahra Alimadadi TTCS 2020,Tehran, Iran 1 Network Traffic Classification, What & Why?

CSC 473 Automata, Grammars & Languages 9/29/10 Automata, Grammars and Languages Discourse 03

Pushdown Automata Context Free Languages IV Input tape 1 2 Pushdown Automata 3 5 4 State

Pushdown Automata A pushdown automata (PDA) is essentially: Pushdown Automata An NFA with

Introduction to Machine Learning Introduction to Machine Learning Introduction to Machine

Quantum Machine Learning Adam Brown, HEP-AI Quantum Computing Machine Learning Quantum

MICROSOFT AZURE MACHINE LEARNING Oscar Naim Microsoft Microsoft Azure Machine Learning What is

MACHINE LEARNING Overview 1 1 APPLIED MACHINE LEARNING 2011-2012 APPLIED MACHINE LEARNING

MACHINE LEARNING kernels 1 MACHINE LEARNING 2012 MACHINE LEARNING Kernels: Intuition How

Automata and program analysis Thomas Colcombet FCT Bordeaux 13 September 2017 based on

Graph Automata Jan Leike July 2nd, 2012 Motivation We want an automata model that Motivation

Applied Automata Theory Roland Meyer TU Kaiserslautern Roland Meyer (TU KL) Applied Automata

Applied Automata Theory Roland Meyer TU Kaiserslautern Roland Meyer (TU KL) Applied Automata

Multiple tree automata a new model of tree automata Gwendal Collet (TU Wien), Julien David (LIPN)

Pushdown Automata 7-0 Pushdown Automata The automata we saw so far were

Seminar: Automata Theory Timed Automata Jennifer Nist 11 th February 2016 Chair of Software

Automata Theory Why Study Automata? What the Course is About 1 Why Study Automata? A survey of

Price/Yield Relationship Bond Valuation & Analysis Inverse Relationship Bond Valuation &

MA/CSSE 474 Theory of Computation Minimizing DFSMs Your Questions? Previous class days'

Compiling T echniques Lecture 7: Bottom-Up Parsing Christophe Dubach Overview Bottom-Up

DrivingDatainthe Cybersecurity* Economy Erin Kenneally U.S. Dept of Homeland Security

t t rrs

Epiphyses Localization for Bone Age Assessment Using the Discriminative Generalized Hough

Why so low for so long? A long-term view of real interest rates Discussion by James D. Hamilton

Emptiness & Uselessness Theorem Let G be a CFG. Is L ( G ) = ? is decidable. Proof

Combining Machine and Automata Learning for Network Traffic - PowerPoint PPT Presentation

Combining Machine and Automata Learning for Network Traffic Classification Zeynab Sabahi, Fatemeh Ghassemi, and Zahra Alimadadi TTCS 2020,Tehran, Iran 1 Network Traffic Classification, What & Why?

CSC 473 Automata, Grammars &amp; Languages 9/29/10 Automata, Grammars and Languages Discourse 03

Pushdown Automata Context Free Languages IV Input tape 1 2 Pushdown Automata 3 5 4 State

Pushdown Automata A pushdown automata (PDA) is essentially: Pushdown Automata An NFA with

Introduction to Machine Learning Introduction to Machine Learning Introduction to Machine

Quantum Machine Learning Adam Brown, HEP-AI Quantum Computing Machine Learning Quantum

MICROSOFT AZURE MACHINE LEARNING Oscar Naim Microsoft Microsoft Azure Machine Learning What is

MACHINE LEARNING Overview 1 1 APPLIED MACHINE LEARNING 2011-2012 APPLIED MACHINE LEARNING

MACHINE LEARNING kernels 1 MACHINE LEARNING 2012 MACHINE LEARNING Kernels: Intuition How

Automata and program analysis Thomas Colcombet FCT Bordeaux 13 September 2017 based on

Graph Automata Jan Leike July 2nd, 2012 Motivation We want an automata model that Motivation

Applied Automata Theory Roland Meyer TU Kaiserslautern Roland Meyer (TU KL) Applied Automata

Applied Automata Theory Roland Meyer TU Kaiserslautern Roland Meyer (TU KL) Applied Automata

Multiple tree automata a new model of tree automata Gwendal Collet (TU Wien), Julien David (LIPN)

Pushdown Automata 7-0 Pushdown Automata The automata we saw so far were

Seminar: Automata Theory Timed Automata Jennifer Nist 11 th February 2016 Chair of Software

Automata Theory Why Study Automata? What the Course is About 1 Why Study Automata? A survey of

Price/Yield Relationship Bond Valuation &amp; Analysis Inverse Relationship Bond Valuation &amp;

MA/CSSE 474 Theory of Computation Minimizing DFSMs Your Questions? Previous class days'

Compiling T echniques Lecture 7: Bottom-Up Parsing Christophe Dubach Overview Bottom-Up

Driving*Data*in*the* Cybersecurity* Economy Erin Kenneally U.S. Dept of Homeland Security

t t rrs

Epiphyses Localization for Bone Age Assessment Using the Discriminative Generalized Hough

Why so low for so long? A long-term view of real interest rates Discussion by James D. Hamilton

Emptiness &amp; Uselessness Theorem Let G be a CFG. Is L ( G ) = ? is decidable. Proof

CSC 473 Automata, Grammars & Languages 9/29/10 Automata, Grammars and Languages Discourse 03

Price/Yield Relationship Bond Valuation & Analysis Inverse Relationship Bond Valuation &

DrivingDatainthe Cybersecurity* Economy Erin Kenneally U.S. Dept of Homeland Security

Emptiness & Uselessness Theorem Let G be a CFG. Is L ( G ) = ? is decidable. Proof