statistical modeling of unix statistical modeling of unix
play

Statistical Modeling of UNIX Statistical Modeling of UNIX Users and - PowerPoint PPT Presentation

Statistical Modeling of UNIX Statistical Modeling of UNIX Users and Processes With Users and Processes With Application to Computer Application to Computer Intrusion Detection Intrusion Detection Wen-Hua Ju 1 Acknowledgement


  1. Statistical Modeling of UNIX Statistical Modeling of UNIX Users and Processes With Users and Processes With Application to Computer Application to Computer Intrusion Detection Intrusion Detection Wen-Hua Ju 1

  2. Acknowledgement Acknowledgement Yehuda Vardi (Rutgers) Matthias Schonlau (RAND) William DuMouchel (AT&T Labs) Alan F. Karr (NISS) Allan Wilks (AT&T Labs) Daryl Pregibon (AT&T Labs) 2

  3. How Statistician got involved … How Statistician got involved … • Refine techniques, developed by AT&T Labs Statistics Research, for detection of telephone fraud to detection of intrusion into networked computer systems. • But … – Multiple intruder motives – Hard-to-quantify losses – Massive data • Something simpler: Characterization of and differentiation among users of a computer system 3

  4. Outline Outline • Experiments and Data – UNIX users – UNIX processes • Models for finite-state discrete stochastic processes – Hybrid High-order Markov Chain – Rarity of Occurrence • Results and Discussion 4

  5. Computer Intrusion And Intrusion Detection Computer Intrusion And Intrusion Detection • Computer Intrusion: A sequence of related actions by a malicious adversary that results in the occurrence of unauthorized security threats to a target computing or networking domain. Edward Amoroso (1999) 5

  6. Experiments And Data Experiments And Data • UNIX Users: Detecting Masquerades – Command sequences (AT&T Labs) – Collected by the UNIX acct auditing mechanism 6

  7. Experiments And Data Experiments And Data • UNIX Users: Detecting Masquerades – 70 users, 15,000 commands each • 50 users: normal users (intrusion target) • 20 users: masqueraders – Simplifying assumption • Block of 100 commands – Blocks are randomly chosen from masqueraders and inserted to normal users – Data available at http://www.schonlau.net/intrusion.html 7

  8. Experiments And Data Experiments And Data • UNIX Processes: – System-call traces (Computer Immune System Research, University of New Mexico) – Normal data: synthetic and live – Intrusion data: real intrusion 9

  9. High-order Markov Chain Model High-order Markov Chain Model • High-order vs. regular Markov model • Problem: Huge Parameter Space • Mixture Transition Distribution (MTD) (Raftery 85; Raftery and Tavaré 94) – Auto-regressive – Only one extra parameter is added to the model for each extra lag 10

  10. High-order Markov Chain Model High-order Markov Chain Model MTD Model MTD Model = = = = ( | ,..., ) P X s X s X s − − 1 t i t i t l i 0 1 l l ∑ λ = + + ( | ), 1 , 2 r s s t l l j i i 0 j = 1 j = = λ where { ( | )} and { } satisfy R r s s ? i j i K ∑ ≥ = ∀ = ( | ) 0 and ( | ) 1 , 1 ,... r s s r s s j K i j i j = i 1 l ∑ λ ≥ λ = 0 , 1 i i = 1 i 11

  11. High-order Markov Chain Model High-order Markov Chain Model MTD Model: Parameter estimation via MLE MTD Model: Parameter estimation via MLE   K K l ∑ ∑ ∑   = λ log ( ,..., ) ... ( ,..., ) log ( | ) L x x N s s r s s   1 T i i j i i 0  0  l j = = = 1 1 1 i i j 0 l Direct maximization: Sequential quadratic • programming algorithm, but … Alternating maximization • Fix r(.|.): easy – Fix λ : still too many parameters – ∑ log a b k k k ∑ ∑ = − = l where and a T l b k k k k k 12

  12. High-order Markov Chain Model High-order Markov Chain Model MTD Model: MLE MTD Model: MLE It’s equivalent to solve the following linear system for b (or λ ) a a ∑ ˆ = = ∀ l k k , b b K k ∑ − k k a T l k k k l l K ∑ λ = ∀ ˆ ( | ) ( ,..., ) , ( ,... ) r s s N s s i i − 0 j i i i i l 0 j T l 0 l = 1 j Can be “solved” efficiently using EM algorithm in the sense of minimizing the K-L distance 13

  13. High-order Markov Chain Model High-order Markov Chain Model Application to Command Data Application to Command Data • Exhaustive Command Space (ECS) Model: – Treat all commands as Markov chain states • Partial Command Space (PCS) Model: – Treat frequently used commands as Markov chain states, and use “other” to represent the rest • Modification for “other” – r (other | .) are small – r (. | other) are equal • Using the parameter estimations as user profile 14

  14. High-order Markov Chain Model High-order Markov Chain Model Application to Command Data Application to Command Data • Hypothesis Testing as A Decision Rule H 0 : Command blocks are from user u H 1 : Command blocks are NOT from user u • Likelihood-ratio Like test ˆ ˆ ˆ ˆ Λ Λ = ( ,..., | ,..., , ,..., ) X c c R R 1 1 1 u T U U   Λ ˆ ˆ max ( ,..., | , ) L c c R   ≠ 1 v u T v v log   Λ ˆ ˆ ( ,..., | , )   L c c R 1 T u u > Reject if H X w 0 u 15

  15. Hybrid High-order Markov Hybrid High-order Markov Chain Model Chain Model � In case of no or not enough training data: Independence model T T ∏ ∏ = = ( ,... | user ) ( | user ) P c c u P c u q 1 T t uc t = = 1 1 t t � Estimate q’s using modified user/command counts 16

  16. Hybrid High-order Markov Chain Application to Hybrid High-order Markov Chain Application to Command Data Command Data • Test statistics   Λ ˆ ˆ max ( ,..., | , ) L c c R   = ≠ 1 ( ,..., | user ) log v u T v v X c c u   1 1 u T Λ ˆ ˆ  ( ,..., | , )  L c c R 1 T u u  ∏  T ˆ max q   ≠ = = v u vc 1 i ( ,..., | user ) log i X c c u   ∏ 2 1 u T T ˆ q   = uc i 1 i = ρ ˆ X X 1 2 u u τ ρ < τ  ˆ , if X 1 2 1 u  ′ = ρ τ ≤ ρ ≤ τ  ˆ ˆ , if X X X 2 u 2 u 1 2 u 2  τ ρ > τ ˆ  , if X 2 2 2 u 17

  17. Hybrid High-order Markov Chain Application to Hybrid High-order Markov Chain Application to Command Data Command Data • Hybrid test statistic  ≤ ξ , if X s/T 1 1 u  ξ − − ξ  / / s T s T ′ = + ξ ≤ ≤ ξ 2 1  , if X X X s/T ξ − ξ ξ − ξ 1 2 1 2 u u u  2 1 2 1  ′ > ξ  , if X s/T 2 2 u : # of in { ,..., } other s c c 1 T 18

  18. Rarity of Occurrence Model Rarity of Occurrence Model • Motivation: Depend not only on frequency – Schonlau and Theus (2000) • Rarity of Command(s) – Popular and frequently used – Popular but not frequently used – Rare or unique • Define the rarity index of a command based on the number of users who used this command 19

  19. Rarity of Occurrence Model Rarity of Occurrence Model • Rarity Index Example: – Total 50 users – A command used by only 1 user: 50/50 – A command used by all 50 user: 1/50 – A command used by no users: ½(?) – Defined for both individual command and a short sequence of commands 20

  20. Rarity of Occurrence Model Rarity of Occurrence Model • Anomaly signal of user u’s short command sequence (c k1 ,…,c kl ) defined as the weighted rarity index – Weight (+/-) depends on frequency – Case 1: User u has used P u – Case 2: User u didn’t use P u , but has used all the commands – Case 3: User u didn’t use all the commands • Test score is defined as a weighted sum of anomaly signals 21

  21. Rarity of Occurrence Model Rarity of Occurrence Model • Entropy model (only tried on the system call data) – Motivation – Shannon’s entropy of distribution {p i } – Small entropy indicates abnormality – Test score is defined as the sum of weighted entropies 22

  22. Unix command result

  23. x

  24. Discussion Discussion • Hybrid High-order Markov Chain Model – Multi-layer defense scheme – Computation demand – Likelihood-ratio • Rarity of Occurrence Model – Good performance – Global Information are important • Future study – Utilizing more information – Relaxing experiment limitation – Other audit data format 26

  25. Conclusion Conclusion 27

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend