Limits of Learning-based Signature Generation with Adversaries - PowerPoint PPT Presentation

Limits of Learning-based Signature Generation with Adversaries Shobha Venkataraman, Carnegie Mellon University Avrim Blum, Carnegie Mellon University Dawn Song, University of California, Berkeley 1

Signatures � Signature: function that acts as a classifier � Input: byte string � Output: Is byte string malicious or benign ? � e.g., signature for Lion worm: “\xFF\xBF” && “\x00\x00\FA” “aaaa” “bbbb” � If both present in byte string, MALICIOUS � If either one not present, BENIGN � This talk: focus on signatures that are sets of byte patterns � i.e., signature is conjunction of byte patterns � Our results for conjunctions imply results for more complex functions, e.g. regexp of byte patterns 2

Automatic Signature Generation � Generating signatures automatically is important: � Signatures need to be generated quickly � Manual analysis slow and error-prone � Pattern-extraction techniques for generating signatures Training Pool Malicious Signature Signature for usage Strings Generator e.g., ‘aaaa’ && ‘bbbb’ Normal Strings 3

History of Pattern-Extraction Techniques Signature Generation Systems Evasion Techniques 2003 Earlybird, Autograph, Honeycomb [SEVS] [KK] [KC] Polymorphic worms Polygraph 2005 [NKS] Malicious Noise Injection Hamsa [PDLFS] [LSCCK] Paragraph [NKS] Anagram 2007 [WPS] Allergy attacks [CM] … … Our Work: Lower bounds on how quickly ALL such algorithms converge to signature in presence of adversaries 4

Learning-based Signature Generation Training Pool Signature Signature Malicious Test Pool Generator Normal Signature generator’s goal : Adversary’s goal : Learn as quickly as possible Force as many errors as possible 5

Our Contributions Formalize a framework for analyzing performance of pattern- extraction algorithms under adversarial evasion � Show fundamental limits on accuracy of pattern-extraction algorithms with adversarial evasion Generalize earlier work (e.g., [FDLFS],[NKS,[CM]]) focused on individual systems � � Analyze when fundamental limits are weakened Kind of exploits for which pattern-extraction algorithms may work � � Applies to other learning-based algorithms using similar adversarial information (e.g., COVERS [LS] ) 6

Outline � Introduction � Formalizing Adversarial Evasion � Learning Framework � Results � Conclusions 7

Strategy for Adversarial Evasion True Signature ‘aaaa’ && ‘bbbb’ Malicious Signature ‘aaaa’ && ‘bbbb’ ‘aaaa’ && ‘dddd’ Generator Signature ‘cccc’ && ‘bbbb’ ‘cccc’ && ‘dddd’ Normal Spurious Patterns Increase resemblance between tokens in true signature and spurious tokens e.g. can add infrequent tokens (i.e, red herrings [NKS] ), change token distributions (i.e., pool poisoning [NKS] ), mislabel samples (i.e, noise-injection [PDLFS] ) Could generate high false positives or high false negatives 8

Definition: Reflecting Set ‘aaaa’ && ‘bbbb’ ‘aaaa’ && ‘dddd’ ‘bbbb’ ’ a a a a ‘ ‘dddd’ ’ c c ‘aaaa’ && ‘bbbb’ c c ‘ ‘cccc’ && ‘bbbb’ Reflecting Reflecting S : True Signature set of ‘aaaa’ set of ‘bbbb’ ‘cccc’ && ‘dddd’ T : Set of Potential Signatures Reflecting Sets: Sets of Resembling Tokens � Critical token : token in true signature S. e.g., ‘aaaa’, ‘bbbb’ � Reflecting set of a critical token i for a signature generator: All tokens as likely to be in S as critical token i , for current signature-generator e.g., Reflecting set for ‘aaaa’: ‘aaaa’, ‘cccc’ 9

Reflecting Sets and Algorithms Specific to the family of algorithms under consideration ’ a a a a ‘ ’ c ‘aaaa’ c c c ‘ ’ e ‘cccc’ e e e ‘ ’ g R 1 g g g ‘ Signature R 1 Signature Generator 2 Generator 1 ‘ b b b b ’ ‘ d d d ‘bbbb’ d ’ ‘ f f f f ’ e.g. fine-grained ‘dddd’ e.g., coarse-grained ‘ h h h h ’ All tokens such that All tokens infrequent in R 2 R 2 individual tokens and pairs normal traffic, say, first- of tokens infrequent order statistics By definition of reflecting set , to signature-generation algorithm , true signature appears to be drawn at random from R 1 x R 2 10

Learning-based Signature Generation ’ a a a a ‘ ’ c c Malicious c c ‘ Signature Generator ‘ b b b Normal b ’ ‘ d d d d ’ � Problem: Learning a signature when a malicious adversary constructs reflecting sets for each critical token � Lower bounds depend on size of reflecting set: � power of adversary, � nature of exploit, � algorithms used for signature generation 11

Outline � Introduction � Formalizing Adversarial Evasion � Learning Framework � Results � Conclusions 12

Framework: Online Learning Model Training Pool Signature Signature Malicious Test Pool Generator Feedback Normal Signature generator’s goal : Adversary’s goal : Learn as quickly as possible Force as many errors as possible Optimal to update with new Optimal to present only one new information in test pool sample before each update Equivalent to the mistake-bound model of online learning [LW] 13

Learning Framework: Problem Mistake-bound model of learning 3. Correct Label l e b a L d 1. Byte string e t c i d e Signature r P . 2 Generator (after initial training) Notation: � � n : number of critical tokens � r : size of reflecting set for each critical token Assumption: true signature is a conjunction of tokens � � Set of all potential signatures: r n Goal: find true signature from r n potential signatures � minimize mistakes in prediction while learning true signature 14

Learning Framework: Assumptions � Signature Generation Algorithms Used � Algorithm can learn any function for signature Not necessary to learn only conjunctions � Adversary Knowledge � Algorithms/systems/features used to generate signature � Does not necessarily know how system/algorithm is tuned � No Mislabeled Samples � No mislabeling, either due to noise or malicious injection e.g., use host-monitoring techniques [NS] to achieve this 15

Outline � Introduction � Formalizing Adversarial Evasion � Learning Framework � Results: � General Adversarial Model � Can General Bounds be Improved? � Conclusions 16

Deterministic Algorithms Theorem : For any deterministic algorithm, there exists a sequence of samples such that the algorithm is forced to make at least n log r mistakes. Additionally, there exists an algorithm (Winnow) that can achieve a mistake-bound of n(log r + log n) Practical Implication : For arbitrary exploits, any pattern-extraction algorithm can be forced into making a number of mistakes : � even if extremely sophisticated pattern-extraction algorithms are used � even if all labels are accurate, e.g., if TaintCheck [NS] is used 17

Randomized Algorithms Theorem : For any randomized algorithm, there exists a sequence of samples such that the algorithm is forced to make at least ½ n log r mistakes in expectation. Practical Implication : For arbitrary exploits, any pattern-extraction algorithm can be forced into making a number of mistakes: � even if extremely sophisticated pattern-extraction algorithms are used � even if all labels are accurate (e.g., if TaintCheck [NS] is used) � even if the algorithm is randomized 18

One-Sided Error: False Positives Theorem : Let t < n . Any algorithm forced to have fewer than t false positives can be forced to make at least (n – t) (r – 1) mistakes on malicious samples. Practical Implication: Algorithms that are allowed to have few false positives make significantly many more mistakes than the general algorithms e.g., at t = 0 , bounded false positives: n(r – 1) general case: n log r 19

One-Sided Error: False Negatives Theorem : Let t < n . Any algorithm forced to have fewer than t false negatives can be forced to make at least r n/(t+1) _ 1 mistakes on non-malicious samples. Practical Implication : Algorithms allowed to have bounded false negatives have far worse bounds than general algorithms e.g., at t = 0 , bounded false negatives: r n - 1 general algorithms: n log r 20

Different Bounds for False Positives & Negatives! e.g. Learning: What is a flower? � Bounded false positives: Ω ((r(n-t)) � learning from positive data only No mistakes allowed on negatives � Adversary forces mistakes with positives � Positive data only � Bounded false negatives: Ω ( r n/t+1 ) � learning from negative data only No mistakes allowed on positives � Adversary forces mistakes with negatives � � Much more “information” about Negative data only signature in a malicious sample 21

Outline � Introduction � Formalizing Adversarial Evasion � Learning Framework � Results: � General Adversarial Model � Can General Bounds be Improved? � Conclusions 22

Limits of Learning-based Signature Generation with Adversaries - PowerPoint PPT Presentation

Limits of Learning-based Signature Generation with Adversaries Shobha Venkataraman, Carnegie Mellon University Avrim Blum, Carnegie Mellon University Dawn Song, University of California, Berkeley 1 Signatures Signature: function that

City Limits Lions Clubs City Limits Lions Clubs City Limits Lions Clubs City Limits Lions

Electronic Signature Electronic Signature El Electronic Signature t i Si t Digital

Different Types of Limits Besides ordinary, two-sided limits, there are one-sided limits (left-

Discharge uncertainty: sources and implications for hydrological analyses Signature 1 Signature

How To Design A Signature Talk: Part 1 How To Design Your Signature Talk: Part 1 Your Signature

MAT 166 Calculus for Bus/Soc Chapter 3 Notes Limits The Deriviative David J. Gisch Limits

Digital Signature And Hash Function

ratification and signature Signature vs ratification Signature formal expression of intent to

1-out-of-2 Signature Jun Shao 2 Whats 1-out-of-2 Signature Mirosaw Kutyowski 1 and Jun

Detecting Attacks Anomaly-based Detection Signature-based Signature-based (Misuse)

Limits (the size of the pie) allocation limits minimum reliability flow of supply Limits

Medical Programs Overview Table 1. Caption Medical SNAP TANF Programs Income Limits Income

Scope & Limits of Scope & Limits of Scope & Limits of Legal Authority Legal

Digital Signature Schemes 1 What is digital signature? Properties Who signed what is

Digital Signatures Good properties of hand-written signatures: 1. Signature is authentic. 2.

Signature Aviation 2019 Final Results Mark Johnstone (CEO) David Crook (FD) Signature Aviation

Oldfield School Expansion Ann Pfeiffer Educational Objective To expand Oldfield Primary School

The API Pla*orm QCon San Francisco November 17, 2011 Marcin

Lecture 1: Introduction Information Visualization CPSC 533C, Fall 2007 Tamara Munzner UBC

Radiative Lifting of Flat Directions of the MSSM during Inflation Bjrn Garbrecht School of

My main messages The N formalism covers all scalar-field cases Slow-roll inf., k -inf.,

When the specification fails: documenting inter-parameter constraints About me Nathalie

Course site: https://complexity-methods.github.io 1 Complexity Methods for Behavioural Science

Rootkit-Resistant Disks Stephen McLaughlin CSE 544 - Systems Security, SP 2010 in conjunction

Limits of Learning-based Signature Generation with Adversaries - PowerPoint PPT Presentation

Limits of Learning-based Signature Generation with Adversaries Shobha Venkataraman, Carnegie Mellon University Avrim Blum, Carnegie Mellon University Dawn Song, University of California, Berkeley 1 Signatures Signature: function that

City Limits Lions Clubs City Limits Lions Clubs City Limits Lions Clubs City Limits Lions

Electronic Signature Electronic Signature El Electronic Signature t i Si t Digital

Different Types of Limits Besides ordinary, two-sided limits, there are one-sided limits (left-

Discharge uncertainty: sources and implications for hydrological analyses Signature 1 Signature

How To Design A Signature Talk: Part 1 How To Design Your Signature Talk: Part 1 Your Signature

MAT 166 Calculus for Bus/Soc Chapter 3 Notes Limits The Deriviative David J. Gisch Limits

Digital Signature And Hash Function

ratification and signature Signature vs ratification Signature formal expression of intent to

1-out-of-2 Signature Jun Shao 2 Whats 1-out-of-2 Signature Mirosaw Kutyowski 1 and Jun

Detecting Attacks Anomaly-based Detection Signature-based Signature-based (Misuse)

Limits (the size of the pie) allocation limits minimum reliability flow of supply Limits

Medical Programs Overview Table 1. Caption Medical SNAP TANF Programs Income Limits Income

Scope &amp; Limits of Scope &amp; Limits of Scope &amp; Limits of Legal Authority Legal

Digital Signature Schemes 1 What is digital signature? Properties Who signed what is

Digital Signatures Good properties of hand-written signatures: 1. Signature is authentic. 2.

Signature Aviation 2019 Final Results Mark Johnstone (CEO) David Crook (FD) Signature Aviation

Oldfield School Expansion Ann Pfeiffer Educational Objective To expand Oldfield Primary School

The API Pla*orm QCon San Francisco November 17, 2011 Marcin

Lecture 1: Introduction Information Visualization CPSC 533C, Fall 2007 Tamara Munzner UBC

Radiative Lifting of Flat Directions of the MSSM during Inflation Bjrn Garbrecht School of

My main messages The N formalism covers all scalar-field cases Slow-roll inf., k -inf.,

When the specification fails: documenting inter-parameter constraints About me Nathalie

Course site: https://complexity-methods.github.io 1 Complexity Methods for Behavioural Science

Rootkit-Resistant Disks Stephen McLaughlin CSE 544 - Systems Security, SP 2010 in conjunction

Scope & Limits of Scope & Limits of Scope & Limits of Legal Authority Legal