Combining Machine and Automata Learning for Network Traffic - - PowerPoint PPT Presentation

combining machine and automata learning
SMART_READER_LITE
LIVE PREVIEW

Combining Machine and Automata Learning for Network Traffic - - PowerPoint PPT Presentation

Combining Machine and Automata Learning for Network Traffic Classification Zeynab Sabahi, Fatemeh Ghassemi, and Zahra Alimadadi TTCS 2020,Tehran, Iran 1 Network Traffic Classification, What & Why?


slide-1
SLIDE 1

Combining Machine and Automata Learning for Network Traffic Classification

Zeynab Sabahi, Fatemeh Ghassemi, and Zahra Alimadadi TTCS 2020,Tehran, Iran

ميحرلا نمحرلا للوا مسب

1

slide-2
SLIDE 2

Network Traffic Classification, What & Why?

2

For a given interleaved packet trace, we want to detect which applications are running ?

For the network management tasks:

  • Anomaly detection,
  • Balancing bandwidth usage,
  • Firewalling,

...

010011011001111

gateway

slide-3
SLIDE 3

3

 Port-based classification:

Inefficient (random or non-standard ports usage)

 Payload inspection:

Useless in encrypted traffic

 Statistical methods: Flow/packet statistical features

Fast but less accurate

Ignore temporal relation among flows

 Behavioral classification:

Specific to the category of application

Network Traffic Classification, How?

slide-4
SLIDE 4

Our solution

 Intuition:

  • A network application is a program calling different well-known

protocols such as HTTP, TCP, SSL, and TLS.

  • Each application has its specific network communication language

when calling different well-known protocols.

4

User

HTTP TLS TCP

slide-5
SLIDE 5

Research Goals

 Learning the network language for each application that we

do not have its source code, in an automatic way

 Classifying an interleaved packet traces of applications

according to the learned languages

5

slide-6
SLIDE 6

Research Goals

 Learning the network language for each application that we

do not have its source code, in an automatic way

 Classifying an interleaved packet traces of applications

according to the learned languages

6

k-TSS language

slide-7
SLIDE 7

 Introduction

 Preliminary: k-TSS Language

 NeTLang Framework

 Evaluation  Conclusion

7

slide-8
SLIDE 8

Formal Foundation: k-TSS Language

 k-Testable language in the Strict Sense is a regular k-size

window language. Its learning is decidable.

 Words are determined by three allowed sets prefixes, suffixes,

and segments.

8

slide-9
SLIDE 9

Formal Foundation: k-TSS Language

 k-Testable language in the Strict Sense is a regular k-size

window language. Its learning is decidable.

 Words are determined by three allowed sets prefixes, suffixes,

and segments.

 Window of size 3

 Segments = {aba}  Prefixes = {}  Suffixes = {}

9

aba aabababb

slide-10
SLIDE 10

Formal Foundation: k-TSS Language

 k-Testable language in the Strict Sense is a regular k-size

window language. Its learning is decidable.

 Words are determined by three allowed sets prefixes, suffixes,

and segments.

 Window of size 3

 Segments = {aba, baa}  Prefixes = {}  Suffixes = {}

10

a baa abababb

slide-11
SLIDE 11

Formal Foundation: k-TSS Language

 k-Testable language in the Strict Sense is a regular k-size

window language. Its learning is decidable.

 Words are determined by three allowed sets prefixes, suffixes,

and segments.

 Window of size 3

 Segments = {aba, baa, aaa}  Prefixes = {}  Suffixes = {}

11

ab aaa bababb

slide-12
SLIDE 12

Formal Foundation: k-TSS Language

 k-Testable language in the Strict Sense is a regular k-size

window language. Its learning is decidable.

 Words are determined by three allowed sets prefixes, suffixes,

and segments.

 Window of size 3

 Segments = {aba, baa, aaa, aab, bab, abb}  Prefixes = {ab}  Suffixes = {}

12

ab aaabababb

slide-13
SLIDE 13

Formal Foundation: k-TSS Language

 k-Testable language in the Strict Sense is a regular k-size

window language. Its learning is decidable.

 Words are determined by three allowed sets prefixes, suffixes,

and segments.

 Window of size 3

 Segments = {aba, baa, aaa, aab, bab, abb}  Prefixes = {ab}  Suffixes = {bb}

13

abaaababa bb

slide-14
SLIDE 14

Formal Definition of k-TSS Language

 Definition 1 (k-test vector)

Let k > 0. A k-test vector is a 5-tuple 𝑎 = < 𝛵, 𝐽, 𝐺, 𝑈, 𝐷 > where:

 𝐽 ⊆ Σ𝑙−1 is a set of allowed prefixes  𝐺 ⊆ Σ𝑙−1 is a set of allowed suffixes  𝑈 ⊆ Σ𝑙 is a set of allowed segments  𝐷 ⊆ Σ<𝑙 is a set of allowed short strings

 Definition 2 (k-TSS Language)

Let 𝑎 = < Σ, 𝐽, 𝐺, 𝑈, 𝐷 > be a k-test vector, for some k > 0.

L(Z) = [(𝐽Σ∗ ∩ Σ∗𝐺) − Σ∗(Σ𝑙 − 𝑈)Σ∗] ∪ 𝐷

14

slide-15
SLIDE 15

Formal Definition of k-TSS Language

 Definition 1 (k-test vector)

Let k > 0. A k-test vector is a 5-tuple 𝑎 = < 𝜯, 𝐽, 𝐺, 𝑈, 𝐷 > where:

 𝐽 ⊆ Σ𝑙−1 is a set of allowed prefixes  𝐺 ⊆ Σ𝑙−1 is a set of allowed suffixes  𝑈 ⊆ Σ𝑙 is a set of allowed segments  𝐷 ⊆ Σ<𝑙 is a set of allowed short strings

 Definition 2 (k-TSS Language)

Let 𝑎 = < Σ, 𝐽, 𝐺, 𝑈, 𝐷 > be a k-test vector, for some k > 0.

L(Z) = [(𝐽Σ∗ ∩ Σ∗𝐺) − Σ∗(Σ𝑙 − 𝑈)Σ∗] ∪ 𝐷

15

What is it? How should it be defined for network domain?

?

slide-16
SLIDE 16

 Introduction  Preliminary: k-TSS Language

 NeTLang Framework

 Evaluation  Conclusion

16

slide-17
SLIDE 17

Translating Network Concepts to Automata Learning

Intuition: some packets always appear together due to the control phase of protocols or the specific functionality of an application

17

For a set of all packet traces of an application its k-TSS language can be learned

A packet trace of an application : A word of the language

A sequence of related packets : A symbol of the alphabet

slide-18
SLIDE 18

NeTLang Framework

18

 Network Traffic Language Learner: NeTLang  Architectural View:

slide-19
SLIDE 19

NeTLang Framework

19

 Network Traffic Language Learner: NeTLang  Architectural View:

1 2 3

slide-20
SLIDE 20

1) Trace Generator

20

  • Different coloring is for their protocol.
  • Clustering algorithm is Kmeans++.
  • Stats is statistical features based on length, number, and IAT of packets.
slide-21
SLIDE 21

2) Language Learner

 By moving a k-window sliding parser the k-TSS vector is

learned.

 For the running example (k=3):

21

 Σ = {H-2, SL-2, SL-3, SL-4, SL-5, T-1, T-10, TL-2, U- 0, U-1}  T = {SL-2 T-1 U-0, SL-4 SL-2 T-1, T-1 SL-5 H-2, T-1 U-0 U-1, T-10 TL-2 T-1, TL-2 T-1 SL-5, U-0 U-1 SL-3}  I = {SL-4 SL-2, T-10 TL-2}  F = {SL-5 H-2, U-1 SL-3}

slide-22
SLIDE 22

3) Classifier

22

App1 App2

. . .

The interleaved packet trace The automata of applications

slide-23
SLIDE 23

3) Classifier

23

The interleaved packet trace The trace generator module is used to divide the symbolic sub-traces by timing features. App1 App2

. . .

The automata of applications

slide-24
SLIDE 24

3) Classifier

24

Sub-trace s1

Automata word inclusion is not a suitable approach due to the incomplete sub-traces and network noises.

App1 App2

. . .

The automata of applications

slide-25
SLIDE 25

3) Classifier

25

Sub-trace s1 App1 App2

. . .

The automata of applications Z(App1) = <𝛵1, 𝐽1, 𝐺

1, 𝑈 1>

Z(s1) = <𝛵, 𝐽, 𝐺, 𝑈 > Window-based Similarity

slide-26
SLIDE 26

3) Classifier

26

Sub-trace s1 App1 App2

. . .

The automata of applications Z(App1) = <𝛵1, 𝐽1, 𝐺

1, 𝑈 1>

Z(s1) = <𝛵, 𝐽, 𝐺, 𝑈> Window-based Similarity

𝛦𝑈= 𝑈 −𝑈1

𝑈

, 𝛦𝑈

1 = 𝑈1 −𝑈

𝑈1 ,

𝛦Ʃ = Ʃ −Ʃ1

Ʃ

, 𝛦𝐽 = 𝐽 −𝐽1

𝐽 , 𝛦𝐺 = 𝐺 −𝐺1 𝐺

distance(s1, App1) = 𝛦𝑈 𝛦𝑈

1 𝛦Ʃ 𝛦𝐽 𝛦𝐺

Percentage Change metric

slide-27
SLIDE 27

3) Classifier

27

Sub-trace s1 App1 App2

. . .

The automata of applications Z(App1) = <𝛵1, 𝐽1, 𝐺

1, 𝑈 1>

Z(s1) = <𝛵, 𝐽, 𝐺, 𝑈> Window-based Similarity

𝛦𝑈= 𝑈 −𝑈1

𝑈

, 𝛦𝑈

1 = 𝑈1 −𝑈

𝑈1 ,

𝛦Ʃ = Ʃ −Ʃ1

Ʃ

, 𝛦𝐽 = 𝐽 −𝐽1

𝐽 , 𝛦𝐺 = 𝐺 −𝐺1 𝐺

In general: D(Z(w), Z(𝐵𝑞𝑞𝑗)) = Δ𝑈 Δ𝑈𝑗 ΔƩ ΔI ΔF

slide-28
SLIDE 28

3) Classifier

28

Sub-trace s1 App1 App2

. . .

The automata of applications distance(s1, App1) distance(s1, App2)

. . .

Class(s1) = 𝐵𝑞𝑞𝑘

Min = distance(s1, Appj)

slide-29
SLIDE 29

3) Classifier

29

Sub-trace s1 App1 App2

. . .

The automata of applications distance(s1, App1) distance(s1, App2)

. . .

Class(s1) = 𝐵𝑞𝑞𝑘

Min = distance(s1, Appj)

Class(w) = j if D(L(w), L(𝐵𝑞𝑞𝑘)) = 𝑏𝑠𝑕𝑛𝑗𝑜∀ 𝐵𝑞𝑞𝑗∈|A|(D(L(w), L(𝐵𝑞𝑞𝑗)))

slide-30
SLIDE 30

Classifier Result for the Running Example

31

Δ𝑈= 0.75, Δ𝑈

𝑗 = 0.85, ΔƩ = 0.16,

ΔI = 0, ΔF = 1 D(L(w),L(app)) = 7484160099

 Z(w= SL-4 SL-2 T-10 TL-2 T-1 U-2):

  • Σ = {SL-2, TL-2, T-1, U-2, SL-4, T-10}
  • T = {SL-4 SL-2 T-10, SL-2 T-10 TL-2, T-10 TL-

2 T-1,TL-2 T-1 U-2}

  • I = {SL-4 SL-2,}
  • F = {T-1 U-2}

 Z(App):

  • Σ’ = {H-2, SL-2, SL-3, SL-4, SL-5,

T-1, T-10, TL-2, U-0, U-1}

  • T’ = {SL-2 T-1 U-0, SL-4 SL-2 T-1,

T-1 SL-5 H-2, T-1 U-0 U-1, T-10 TL-2 T-1, TL-2 T-1 SL-5, U-0 U-1 SL-3}

  • I’ = {SL-4 SL-2, T-10 TL-2}
  • F’ = {SL-5 H-2, U-1 SL-3}
slide-31
SLIDE 31

 Introduction  Preliminary: k-TSS Language  NeTLang Framework

 Evaluation

 Conclusion

32

slide-32
SLIDE 32

Dataset Description

33

We divided pcaps to three sets:

  • Train: 65%
  • Validation: 15%
  • Test: 20%

Metrics:

  • Precision (P),
  • Recall (R),
  • F1-Measure= 2∗𝑄∗𝑆

𝑄+𝑆

slide-33
SLIDE 33

Evaluation Results

34

Parameter Application Identification Traffic Characterization

Session Threshold 15 15 Inactive Timeout 5 15 Flow Duration 10 10 k 3 3 The Best Configurations of Validation Set

slide-34
SLIDE 34

Compare with Statistical Classifiers: Application Identification

35

Precision Recall

F1-Measure

slide-35
SLIDE 35

36

Compare with Statistical Classifiers: Traffic Characterization

Precision Recalls

F1-Measure

slide-36
SLIDE 36

 Introduction  Preliminary: k-TSS Language  NeTLang Framework  Evaluation

 Conclusion

37

slide-37
SLIDE 37

Conclusion

 We have combined unsupervised machine learning and automata learning techniques

  • Advantages of using Automata Learning

 Utilizing window language to partially observing network traffic  Taking into account the flows and packets temporal relation

  • Advantages of using Machine Learning

 Automatically generating the alphabet of automata [Kmeans++ and elbow].  Upgrading the word acceptance by a new proximity metric

 NeTLang outperforms the state-of-the-art methods

More accurate, faster, more noise tolerable, better granularity & not application-specific

38

slide-38
SLIDE 38

Future work

 Evaluate NeTLang using a public dataset  Taking the protocols phases into account in network unit

extraction

39

slide-39
SLIDE 39

Thank You!

z.sabahi@ut.ac.ir

40

slide-40
SLIDE 40

Statistical Features

41

Stats # Features Group 1 TotalPktf/b, TotalLf/b, MinLf/b, MeanLf/b, MaxLf/b, StdLf/b Group 2 MinLf/b, MeanLf/b, MaxLf/b, StdLf/b, PktCntRf/b, DtSizeRf/b, AvgInvalf/b

slide-41
SLIDE 41

Statistical Features

Feature Description MinLf/b Minimum length of packet sent/received within a network unit MeanLf/b Average length of packet sent/received within a network unit. MaxLf/b Maximum length of packet sent/received within a network unit StdLf/b Standard deviation of packets length sent/received within a unit PktCntRf/b Rate of TotalPktf/b to total number of packets within a network unit DtSizeRf/b Rate of TotalLf/b to sum of all packets' length within a network unit AvgInvalf/b Average of sent/received packets time interval within a network unit

42

slide-42
SLIDE 42

Evaluation Results

43

Param Application Identification

Traffic Characterization

Session Threshold 15 15 Inactive Timeout 5 15 Flow Duration 10 10 Feature Group 2 2 k 3 3

slide-43
SLIDE 43

Compare with a Behavioral Classifier

44

slide-44
SLIDE 44

Confusion Matrixes

45

Application Identification Traffic Characterization