modeling filter configuration and introduction to data
play

Modeling filter configuration and introduction to data exploration - PDF document

Notes Modeling filter configuration and introduction to data exploration Tyler Moore Computer Science & Engineering Department, SMU, Dallas, TX October 4, 2012 Optimal filter configuration Data exploration overview Notes Outline


  1. Notes Modeling filter configuration and introduction to data exploration Tyler Moore Computer Science & Engineering Department, SMU, Dallas, TX October 4, 2012 Optimal filter configuration Data exploration overview Notes Outline Optimal filter configuration 1 ROC curves An economic model of optimal filter configuration Data exploration overview 2 Introduction Data exploration with R 2 / 34 Optimal filter configuration Data exploration overview Notes Some housekeeping Grade distribution change Assignments (50%) Exam (20%) Project (30%) In order to reward progress in learning that occurs over the course of the semester, I will let students replace their lowest score on an assignment with their score on the final exam, provided that the final exam grade is higher than the lowest-graded assignment. For example, suppose you make an 82%, 88%, 90%, and 92% on the homework assignments and receive an 89% on the final exam. The 82% assignment grade is replaced by 89%, and the final exam is also treated as 89%. 3 / 34 Optimal filter configuration Data exploration overview Notes Some housekeeping No class next Tuesday 10/9 Will announce an ungraded R exercise on Blackboard Modified office hours NO office hours Tuesday 10/9-10/10 Office hours Friday 10/5 2pm-3pm Office hours Thursday 10/11 11am-12pm Office hours Friday 10/12 9am-10am 4 / 34

  2. Optimal filter configuration Data exploration overview Notes Homework 2 Posted on Blackboard Download from http://lyle.smu.edu/~tylerm/courses/ econsec/assign/hw2.pdf Due next Friday Oct 12 at 5pm 5 / 34 Optimal filter configuration ROC curves Data exploration overview An economic model of optimal filter configuration Notes Domain-specific models Up to now we have modeled security investment at a very high level Map costs to benefits, assume diminishing marginal returns to investment, etc. Useful for when justifying security budgets compared to non-security expenditures Not useful for deciding how best to allocate a given security budget Today, we discuss a model for a tactical security investment decision: configuring a filter to balance false positives and negatives 7 / 34 Optimal filter configuration ROC curves Data exploration overview An economic model of optimal filter configuration Notes Binary classification is a recurring problem in CS Common task: distill many observations to a binary signal { 0 , 1 } : communications theory S = { undervalued , overvalued } : stock trading S = { reject , accept } : research hypothesis S = { benign , malicious } : security filter Such simplification inevitably leads to errors compared to reality (aka ground truth ) 8 / 34 Optimal filter configuration ROC curves Data exploration overview An economic model of optimal filter configuration Notes Filter defense mechanism Reality Signal no attack attack benign 1 − α β malicious 1 − β α α : false positive rate, β : false negative rate 9 / 34

  3. Optimal filter configuration ROC curves Data exploration overview An economic model of optimal filter configuration Notes Receiver operating characteristic 1 Detection rate 1 − β 45 ◦ 0 1 False positive rate α 10 / 34 Optimal filter configuration ROC curves Data exploration overview An economic model of optimal filter configuration Notes Receiver operating characteristic 1 Detection rate 1 − β α = β 45 ◦ 0 1 EER dashed EER solid False positive rate α 10 / 34 Optimal filter configuration ROC curves Data exploration overview An economic model of optimal filter configuration Notes Model for optimal filter configuration Binary classifiers are imperfect Finding the optimal trade-off, say for an IDS or spam filter, is hard Can be framed as an economic trade-off between opportunity cost of false positives and losses incurred by false negatives 11 / 34 Optimal filter configuration ROC curves Data exploration overview An economic model of optimal filter configuration Notes Model for optimal filter configuration We can see from ROCs that β can be expressed as a function of α . β : [0 , 1] → [0 , 1] defines the false negative rate as a function of the false positive rate α β (0) = 1 , β (1) = 0 We assume β ′ ( x ) < 0 and β ′′ ( x ) ≥ 0 12 / 34

  4. Optimal filter configuration ROC curves Data exploration overview An economic model of optimal filter configuration Notes Model for optimal filter configuration Suppose we rely on a filter to scan incoming email attachments for malware a : cost of false positive (blocking a benign email) b : cost of false negative (delivering malicious email) p : probability of email containing malware Cost C ( α ) = p · β ( α ) · b + (1 − p ) · α · a Suppose p = 0 . 1 , a = $250 , b = $500 , α = 0 . 1 , β = . 2 C ( α ) = 0 . 1 · 0 . 2 · 500 + 0 . 9 · 0 . 1 · 250 = $32 . 50 13 / 34 Optimal filter configuration ROC curves Data exploration overview An economic model of optimal filter configuration Notes Model for optimal filter configuration α ∗ = arg min α p · β ( α ) · b + (1 − p ) · α · a which has first-order condition (FOC) p · β ( α ∗ ) · b + (1 − p ) · α ∗ · a � � 0 = δ α after rearranging, we obtain: β ′ ( α ∗ ) = − 1 − p · a b . p 14 / 34 Optimal filter configuration ROC curves Data exploration overview An economic model of optimal filter configuration Notes Optimal filter configuration (continuous ROC curves) 1 (1 − p ) a p · b Detection rate 1 − β α ∗ B Indifference curves α ∗ A 0 False positive rate α 1 15 / 34 Optimal filter configuration ROC curves Data exploration overview An economic model of optimal filter configuration Notes Optimal filter configuration (continuous ROC curves) A 1 B Detection rate 1 − β EER A = EER B α = β AUC A = AUC B 45 ◦ 0 False positive rate α 1 15 / 34

  5. Optimal filter configuration ROC curves Data exploration overview An economic model of optimal filter configuration Notes Optimal filter configuration (continuous ROC curves) A 1 B (1 − p ) a p · b Detection rate 1 − β α ∗ B α ∗ A 45 ◦ 0 False positive rate α 1 15 / 34 Optimal filter configuration ROC curves Data exploration overview An economic model of optimal filter configuration Notes Optimal filter configuration (discrete ROC curves) 1 E F (1 − p ) a p · b Detection rate 1 − β α ∗ D 45 ◦ C 0 False positive rate α 1 16 / 34 Optimal filter configuration ROC curves Data exploration overview An economic model of optimal filter configuration Notes Optimal filter configuration example (discrete ROC curves) slope 1/3 1 E F 0.1 0.3 0.9 Detection rate 1 − β (1 − p ) a p · b 1 0.5 e p o l s α ∗ D 0.5 0.4 2 α ∗ = 0 . 2 if 1 ≤ (1 − p ) a e ≤ 2 0.4 p p · b o l s C 0.2 0.2 0.7 0 1 False positive rate α 17 / 34 Optimal filter configuration Introduction Data exploration overview Data exploration with R Notes Onto the third phase of the class 1 Introduction to economics and information security 2 Security metrics and investment models 3 Cybercrime econometrics 4 Modeling strategic interaction using game theory 19 / 34

  6. Optimal filter configuration Introduction Data exploration overview Data exploration with R Notes Cybercrime econometrics Cybercrime generates an empirical record of security threats We will discuss common methods of cybercrime We will learn techniques for analyzing data on cybercrime Data on security incidents can be very hard to acquire We will work with several datasets gathered by other researchers 20 / 34 Optimal filter configuration Introduction Data exploration overview Data exploration with R Notes Cybercrime econometrics First step after acquiring data: exploration Goals for security data More reliably estimate probabilities of attack and their costs 1 Look for relationships in the data to better understand 2 relationship between attackers and targets 21 / 34 Optimal filter configuration Introduction Data exploration overview Data exploration with R Notes Our first data source Source: http://www.privacyrights.org/data-breach 22 / 34 Optimal filter configuration Introduction Data exploration overview Data exploration with R Notes Introduction to data exploration with R Download a copy of the database from http://lyle.smu.edu/~tylerm/courses/econsec/data/ databreaches-prc-2012-10-01.csv Download a copy of R code from http://lyle.smu.edu/~tylerm/courses/econsec/code/ initial_explore_PRC.R 23 / 34

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend