Modeling filter configuration and introduction to data exploration - PDF document

Notes Modeling filter configuration and introduction to data exploration Tyler Moore Computer Science & Engineering Department, SMU, Dallas, TX October 4, 2012 Optimal filter configuration Data exploration overview Notes Outline Optimal filter configuration 1 ROC curves An economic model of optimal filter configuration Data exploration overview 2 Introduction Data exploration with R 2 / 34 Optimal filter configuration Data exploration overview Notes Some housekeeping Grade distribution change Assignments (50%) Exam (20%) Project (30%) In order to reward progress in learning that occurs over the course of the semester, I will let students replace their lowest score on an assignment with their score on the final exam, provided that the final exam grade is higher than the lowest-graded assignment. For example, suppose you make an 82%, 88%, 90%, and 92% on the homework assignments and receive an 89% on the final exam. The 82% assignment grade is replaced by 89%, and the final exam is also treated as 89%. 3 / 34 Optimal filter configuration Data exploration overview Notes Some housekeeping No class next Tuesday 10/9 Will announce an ungraded R exercise on Blackboard Modified office hours NO office hours Tuesday 10/9-10/10 Office hours Friday 10/5 2pm-3pm Office hours Thursday 10/11 11am-12pm Office hours Friday 10/12 9am-10am 4 / 34

Optimal filter configuration Data exploration overview Notes Homework 2 Posted on Blackboard Download from http://lyle.smu.edu/~tylerm/courses/ econsec/assign/hw2.pdf Due next Friday Oct 12 at 5pm 5 / 34 Optimal filter configuration ROC curves Data exploration overview An economic model of optimal filter configuration Notes Domain-specific models Up to now we have modeled security investment at a very high level Map costs to benefits, assume diminishing marginal returns to investment, etc. Useful for when justifying security budgets compared to non-security expenditures Not useful for deciding how best to allocate a given security budget Today, we discuss a model for a tactical security investment decision: configuring a filter to balance false positives and negatives 7 / 34 Optimal filter configuration ROC curves Data exploration overview An economic model of optimal filter configuration Notes Binary classification is a recurring problem in CS Common task: distill many observations to a binary signal { 0 , 1 } : communications theory S = { undervalued , overvalued } : stock trading S = { reject , accept } : research hypothesis S = { benign , malicious } : security filter Such simplification inevitably leads to errors compared to reality (aka ground truth ) 8 / 34 Optimal filter configuration ROC curves Data exploration overview An economic model of optimal filter configuration Notes Filter defense mechanism Reality Signal no attack attack benign 1 − α β malicious 1 − β α α : false positive rate, β : false negative rate 9 / 34

Optimal filter configuration ROC curves Data exploration overview An economic model of optimal filter configuration Notes Receiver operating characteristic 1 Detection rate 1 − β 45 ◦ 0 1 False positive rate α 10 / 34 Optimal filter configuration ROC curves Data exploration overview An economic model of optimal filter configuration Notes Receiver operating characteristic 1 Detection rate 1 − β α = β 45 ◦ 0 1 EER dashed EER solid False positive rate α 10 / 34 Optimal filter configuration ROC curves Data exploration overview An economic model of optimal filter configuration Notes Model for optimal filter configuration Binary classifiers are imperfect Finding the optimal trade-off, say for an IDS or spam filter, is hard Can be framed as an economic trade-off between opportunity cost of false positives and losses incurred by false negatives 11 / 34 Optimal filter configuration ROC curves Data exploration overview An economic model of optimal filter configuration Notes Model for optimal filter configuration We can see from ROCs that β can be expressed as a function of α . β : [0 , 1] → [0 , 1] defines the false negative rate as a function of the false positive rate α β (0) = 1 , β (1) = 0 We assume β ′ ( x ) < 0 and β ′′ ( x ) ≥ 0 12 / 34

Optimal filter configuration ROC curves Data exploration overview An economic model of optimal filter configuration Notes Model for optimal filter configuration Suppose we rely on a filter to scan incoming email attachments for malware a : cost of false positive (blocking a benign email) b : cost of false negative (delivering malicious email) p : probability of email containing malware Cost C ( α ) = p · β ( α ) · b + (1 − p ) · α · a Suppose p = 0 . 1 , a = $250 , b = $500 , α = 0 . 1 , β = . 2 C ( α ) = 0 . 1 · 0 . 2 · 500 + 0 . 9 · 0 . 1 · 250 = $32 . 50 13 / 34 Optimal filter configuration ROC curves Data exploration overview An economic model of optimal filter configuration Notes Model for optimal filter configuration α ∗ = arg min α p · β ( α ) · b + (1 − p ) · α · a which has first-order condition (FOC) p · β ( α ∗ ) · b + (1 − p ) · α ∗ · a � � 0 = δ α after rearranging, we obtain: β ′ ( α ∗ ) = − 1 − p · a b . p 14 / 34 Optimal filter configuration ROC curves Data exploration overview An economic model of optimal filter configuration Notes Optimal filter configuration (continuous ROC curves) 1 (1 − p ) a p · b Detection rate 1 − β α ∗ B Indifference curves α ∗ A 0 False positive rate α 1 15 / 34 Optimal filter configuration ROC curves Data exploration overview An economic model of optimal filter configuration Notes Optimal filter configuration (continuous ROC curves) A 1 B Detection rate 1 − β EER A = EER B α = β AUC A = AUC B 45 ◦ 0 False positive rate α 1 15 / 34

Optimal filter configuration ROC curves Data exploration overview An economic model of optimal filter configuration Notes Optimal filter configuration (continuous ROC curves) A 1 B (1 − p ) a p · b Detection rate 1 − β α ∗ B α ∗ A 45 ◦ 0 False positive rate α 1 15 / 34 Optimal filter configuration ROC curves Data exploration overview An economic model of optimal filter configuration Notes Optimal filter configuration (discrete ROC curves) 1 E F (1 − p ) a p · b Detection rate 1 − β α ∗ D 45 ◦ C 0 False positive rate α 1 16 / 34 Optimal filter configuration ROC curves Data exploration overview An economic model of optimal filter configuration Notes Optimal filter configuration example (discrete ROC curves) slope 1/3 1 E F 0.1 0.3 0.9 Detection rate 1 − β (1 − p ) a p · b 1 0.5 e p o l s α ∗ D 0.5 0.4 2 α ∗ = 0 . 2 if 1 ≤ (1 − p ) a e ≤ 2 0.4 p p · b o l s C 0.2 0.2 0.7 0 1 False positive rate α 17 / 34 Optimal filter configuration Introduction Data exploration overview Data exploration with R Notes Onto the third phase of the class 1 Introduction to economics and information security 2 Security metrics and investment models 3 Cybercrime econometrics 4 Modeling strategic interaction using game theory 19 / 34

Optimal filter configuration Introduction Data exploration overview Data exploration with R Notes Cybercrime econometrics Cybercrime generates an empirical record of security threats We will discuss common methods of cybercrime We will learn techniques for analyzing data on cybercrime Data on security incidents can be very hard to acquire We will work with several datasets gathered by other researchers 20 / 34 Optimal filter configuration Introduction Data exploration overview Data exploration with R Notes Cybercrime econometrics First step after acquiring data: exploration Goals for security data More reliably estimate probabilities of attack and their costs 1 Look for relationships in the data to better understand 2 relationship between attackers and targets 21 / 34 Optimal filter configuration Introduction Data exploration overview Data exploration with R Notes Our first data source Source: http://www.privacyrights.org/data-breach 22 / 34 Optimal filter configuration Introduction Data exploration overview Data exploration with R Notes Introduction to data exploration with R Download a copy of the database from http://lyle.smu.edu/~tylerm/courses/econsec/data/ databreaches-prc-2012-10-01.csv Download a copy of R code from http://lyle.smu.edu/~tylerm/courses/econsec/code/ initial_explore_PRC.R 23 / 34

Modeling filter configuration and introduction to data exploration - PDF document

Notes Modeling filter configuration and introduction to data exploration Tyler Moore Computer Science & Engineering Department, SMU, Dallas, TX October 4, 2012 Optimal filter configuration Data exploration overview Notes Outline

STUFF FILTER POPULARITY FILTER PERSONALITY FILTER TALENT FILTER GODS FILTER WE HAVE THE

Recursive State Estimation 2 Lecture 8 Recap Today Kalman Filter Extended Kalman Filter

Kalman filter Kalman Filter Kalman filter is used to filter true system states from noisy

Modeling filter configuration Tyler Moore Computer Science & Engineering Department, SMU,

Configuration management Configuration management Configuration management Configuration

I ntroduction to Mobile Robotics Bayes Filter Kalm an Filter Wolfram Burgard 1 Bayes

Filter Design Specifications Chaiwoot Boonyasiriwat September 29, 2020 Filter Design

THE REPO DOES NOT FORGET STEP 1: GIT FILTER-BRANCH git filter-branch --index-filter 'git rm -rf

Augeas a configuration API Raphal Pinson Configuration Management Sitewide configuration

PURE POWER FILTERS brand guarantee the high quality and performance I N D E X Introduction Air

CNC PINpad USA, December 2014 Configuration Configuration Description POS Dollar General

Multimodality in the Kalman Filter and Ensemble Kalman Filter Maxime Conjard, Henning Omre

Kalman Filter Kalman Filter = special case of a Bayes filter with dynamics model and n

Series R FILTER www.rpesrl.it Series R FILTER 2 Rpe presents the Filter R series for an

Lecture 8 Today: FIR filter design IIR filter design Filter roundoff and overflow sensitivity

IIR Filter Design Chaiwoot Boonyasiriwat October 7, 2020 Filter Design by Pole-zero Placement

Sampling Resampling Warping Morphing Dr. Shai Avidan Faculty of Engineering Tel-Aviv

Defect Removal Metrics September 30, 2004 Swami Natarajan RIT Software Engineering Defect

Multi-Core Computing Instructor: Hamid Sarbazi-Azad Department of Computer Engineering Sharif

CS184c: Computer Architecture [Parallel and Multithreaded] Day 2: April 5, 2001 Message

Introduction to Business Statistics Introduction to Business Statistics QM 120 Ch Chapter 4 t

#wdr2018 www.worldbank.org/wdr2018 Education generates multiple benefits Country Individual

Untangling Composite Commits Untangling Composite Commits Using Program Slicing Using Program

Modern Engagement for the Digitally Disengaged July 19, 2016 Dan Zasloff Director of

Modeling filter configuration and introduction to data exploration - PDF document

Notes Modeling filter configuration and introduction to data exploration Tyler Moore Computer Science & Engineering Department, SMU, Dallas, TX October 4, 2012 Optimal filter configuration Data exploration overview Notes Outline

STUFF FILTER POPULARITY FILTER PERSONALITY FILTER TALENT FILTER GODS FILTER WE HAVE THE

Recursive State Estimation 2 Lecture 8 Recap Today Kalman Filter Extended Kalman Filter

Kalman filter Kalman Filter Kalman filter is used to filter true system states from noisy

Modeling filter configuration Tyler Moore Computer Science &amp; Engineering Department, SMU,

Configuration management Configuration management Configuration management Configuration

I ntroduction to Mobile Robotics Bayes Filter Kalm an Filter Wolfram Burgard 1 Bayes

Filter Design Specifications Chaiwoot Boonyasiriwat September 29, 2020 Filter Design

THE REPO DOES NOT FORGET STEP 1: GIT FILTER-BRANCH git filter-branch --index-filter 'git rm -rf

Augeas a configuration API Raphal Pinson Configuration Management Sitewide configuration

PURE POWER FILTERS brand guarantee the high quality and performance I N D E X Introduction Air

CNC PINpad USA, December 2014 Configuration Configuration Description POS Dollar General

Multimodality in the Kalman Filter and Ensemble Kalman Filter Maxime Conjard, Henning Omre

Kalman Filter Kalman Filter = special case of a Bayes filter with dynamics model and n

Series R FILTER www.rpesrl.it Series R FILTER 2 Rpe presents the Filter R series for an

Lecture 8 Today: FIR filter design IIR filter design Filter roundoff and overflow sensitivity

IIR Filter Design Chaiwoot Boonyasiriwat October 7, 2020 Filter Design by Pole-zero Placement

Sampling Resampling Warping Morphing Dr. Shai Avidan Faculty of Engineering Tel-Aviv

Defect Removal Metrics September 30, 2004 Swami Natarajan RIT Software Engineering Defect

Multi-Core Computing Instructor: Hamid Sarbazi-Azad Department of Computer Engineering Sharif

CS184c: Computer Architecture [Parallel and Multithreaded] Day 2: April 5, 2001 Message

Introduction to Business Statistics Introduction to Business Statistics QM 120 Ch Chapter 4 t

#wdr2018 www.worldbank.org/wdr2018 Education generates multiple benefits Country Individual

Untangling Composite Commits Untangling Composite Commits Using Program Slicing Using Program

Modern Engagement for the Digitally Disengaged July 19, 2016 Dan Zasloff Director of

Modeling filter configuration Tyler Moore Computer Science & Engineering Department, SMU,