Output Spaces Darryl Buller, Aaron Kaufer Information Assurance - PowerPoint PPT Presentation

Estimating Min-Entropy For Large Output Spaces Darryl Buller, Aaron Kaufer Information Assurance Directorate National Security Agency

Overview • Background • Our goal • Using Bayesian Networks • Optimizing with a Genetic Algorithm • Computing Min-Entropy • Examples Information Assurance Directorate // Confidence in Cyberspace

Background Information Assurance Directorate // Confidence in Cyberspace

Background • Source of randomness can be quantified by its entropy • Min-entropy measurement is useful for cryptographic applications • 𝐼 ∞ = −𝑚𝑝𝑕 2 𝑁𝑏𝑦{Pr⁡ [𝑇 = 𝑡 𝑗 ]} • Corresponds to cost of optimal guessing attack • Other measures of entropy can be misleading for these applications Information Assurance Directorate // Confidence in Cyberspace

Background When data from entropy source is processed by a mixing function, we focus our analysis on the raw entropy source data Entropy Source Mixing Function Random Output Information Assurance Directorate // Confidence in Cyberspace

Background • Suppose we have sample data from an entropy source • We wish to find a statistical model and estimate the source’s min -entropy • Sample data is a sequence { s 1 , s 2 , …, s L }, each s i an n -bit value sampled from an output space X Information Assurance Directorate // Confidence in Cyberspace

Background • SP 800-90B has techniques for typical cases that satisfy the following two assumptions: – Output space X is reasonably small – Sample size L is large enough to detect non-IID properties (if they exist) Information Assurance Directorate // Confidence in Cyberspace

Background • What if output space is very large; e.g., each s i is dozens or hundreds of bits? • Example: n = 50 bits, where 15 th bit tends to match 43 rd bit, or 17 th bit is influenced by 3 rd , 8 th , and 31 st bits, etc … • Feasible sample sizes are far too small for us to fully understand the source and search for non- IID properties Information Assurance Directorate // Confidence in Cyberspace

Our Goal Information Assurance Directorate // Confidence in Cyberspace

Our Goal Given n bit positions having an unknown joint distribution on 2 n possible values: 1. Compactly represent the essence of the joint distribution 2. Identify dependencies among bit positions 3. Estimate probability of most likely n -bit value; this lets us estimate min-entropy Information Assurance Directorate // Confidence in Cyberspace

Bayesian Networks Information Assurance Directorate // Confidence in Cyberspace

Bayesian Networks • Definition: Directed acyclic graph (DAG) whose nodes are random variables and edges indicate dependence • Variables can depend on multiple other variables (in our case, each bit is a variable) Information Assurance Directorate // Confidence in Cyberspace

Bayesian Networks Example: • Suppose X consists of 4-bit outputs • A possible BN would be: Pr 𝑦 1 , 𝑦 2 , 𝑦 3 , 𝑦 4 = Pr 𝑦 2 Pr 𝑦 3 Pr 𝑦 1 𝑦 2 , 𝑦 3 Pr⁡ [𝑦 4 |𝑦 1 , 𝑦 3 ] 𝑦 2 𝑦 3 𝑦 1 𝑦 4 Information Assurance Directorate // Confidence in Cyberspace

Bayesian Networks • Given sample data, we want to find a BN that best explains the sample data • Use resulting BN to estimate min-entropy • But how do we find the best BN given our data? Information Assurance Directorate // Confidence in Cyberspace

Genetic Algorithms Information Assurance Directorate // Confidence in Cyberspace

Genetic Algorithms • Optimization technique inspired by biology • Represent a candidate solution as a “genome” (BN in our case) • Maintain sequence of populations of candidate solutions • Define fitness function that measures the quality of a particular genome Information Assurance Directorate // Confidence in Cyberspace

Genetic Algorithms • The process: 1. Randomly generate initial population of candidate solutions 2. Repeatedly create new generation based on previous generation • The goal is to eventually find the best-scoring candidate solution • How does this work? Information Assurance Directorate // Confidence in Cyberspace

Genetic Algorithms • In biology, crossover and mutation result in changes that affect fitness • Increased fitness is rewarded by selection – population increasingly resembles optimal solution • Decreased fitness is penalized – candidates are less likely to influence subsequent generations Information Assurance Directorate // Confidence in Cyberspace

Genetic Algorithms • Our implementation … Information Assurance Directorate // Confidence in Cyberspace

Genetic Algorithms Genome : Encodes the details of a specific candidate solution – Each candidate is a binary nxn adjacency matrix – A ( i,j ) = 1 iff bit j is statistically dependent on bit i   0 0 0 1 𝑦 2 𝑦 3   1 0 0 0     1 0 0 1   𝑦 1 𝑦 4   0 0 0 0 Information Assurance Directorate // Confidence in Cyberspace

Genetic Algorithms • Build conditional probability tables from the sample data as specified by the adjacency matrix 𝑦 2 𝑦 3 • For this example, we need 1x2 table for Pr⁡ [𝑦 2 ] 𝑦 1 𝑦 4   1x2 table for Pr⁡ [𝑦 3 ] 0 0 0 1   1 0 0 0   4x2 table for Pr⁡ [𝑦 1 |𝑦 2 , 𝑦 3 ]   1 0 0 1   4x2 table for Pr⁡ [𝑦 4 |𝑦 1 , 𝑦 3 ]   0 0 0 0 Information Assurance Directorate // Confidence in Cyberspace

Genetic Algorithms Crossover : produces two offspring by combining features of two parents – Randomly pick a crossover point – Join top part of one adjacency matrix and bottom part of the other, and vice-versa   A Parents 1         B A B 2 1 1 Children       B     A B 1   2 2   A 2 Information Assurance Directorate // Confidence in Cyberspace

Genetic Algorithms • Note that crossover often results in an invalid BN due to cycles • Need a “de - cycling” step – children still contain characteristics of both parents Information Assurance Directorate // Confidence in Cyberspace

Genetic Algorithms Mutation : A random change in a candidate’s adjacency matrix 1. Add an edge 2. Remove an edge 3. Move an edge destination 4. Move an edge origin 5. Reverse an edge Information Assurance Directorate // Confidence in Cyberspace

Genetic Algorithms Selection : Rewards high-fitness candidates by giving them a higher chance of selection to influence next generation: 1. Elitist selection: Directly copy most fit candidate to next generation 2. Fill remainder of next generation using rank selection to choose pairs of parents for crossover Information Assurance Directorate // Confidence in Cyberspace

Genetic Algorithms • Fitness function : allows comparison of candidate solutions • We use the Bayes Information Criterion (BIC) • BIC rewards larger likelihood and simpler models; a smaller BIC is better (fitness-wise) BIC = k ln N - 2 ln L k : # of free parameters N : # of sample outputs L : likelihood of observed samples given the BN Information Assurance Directorate // Confidence in Cyberspace

Genetic Algorithms • For the following BN: 𝑦 2 𝑦 3 1x2 table for Pr⁡ [𝑦 2 ] 1x2 table for Pr⁡ [𝑦 3 ] 𝑦 1 𝑦 4 4x2 table for Pr⁡ [𝑦 1 |𝑦 2 , 𝑦 3 ]   0 0 0 1 4x2 table for Pr⁡ [𝑦 4 |𝑦 1 , 𝑦 3 ]   1 0 0 0   • k k = 1 + 1 + 4 + 4 = 10   1 0 0 1     0 0 0 0 Information Assurance Directorate // Confidence in Cyberspace

Min-Entropy Information Assurance Directorate // Confidence in Cyberspace

Min-Entropy • Use Max-Product Variable Elimination algorithm to find the MAP of a BN • Generalization of Viterbi algorithm Information Assurance Directorate // Confidence in Cyberspace

Examples Information Assurance Directorate // Confidence in Cyberspace

Example 1 4-8 11-15 18-22 25-29 • 32-bit blocks; sample size 15,000 • Bits 4-8, 11-15, 18-22, 25-29 follow biased joint distribution on 5-bit values • All other bits unbiased and independent • Actual min- entropy is 17.2877… Information Assurance Directorate // Confidence in Cyberspace

Information Assurance Directorate // Confidence in Cyberspace

Output Spaces Darryl Buller, Aaron Kaufer Information Assurance - PowerPoint PPT Presentation

Estimating Min-Entropy For Large Output Spaces Darryl Buller, Aaron Kaufer Information Assurance Directorate National Security Agency Overview Background Our goal Using Bayesian Networks Optimizing with a Genetic Algorithm

Tra ffi c Management as a Service | Ghent, Belgium INPUT PROCESS OUTPUT INPUT PROCESS OUTPUT

Tyrol Hill Park Phase 4 Elementary Campbell Elementary Campbell Park Spaces Open Park

File Input and Output File Input and Output 1 / 9 File input/output input function reads values

Chapter 12 Overview Devices and Output Visual Output Dynamic Visualizations Sound

16. Recursion 2 Output: 103 Input: (3 + 5) * 20 Output: 160 Input: -(3 + 5) + 20 Output: 12

A two-step method to incorporate task features spaces for large output spaces Michiel Stock

Calibrating the Calibrating the Output of a Linear Output of a Linear Output of a Linear

BASIC INPUT/OUTPUT Fundamentals of Computer Science Outline: Basic Input/Output Screen Output

So Solid lid Wa Waste Collectio llection in in Public lic Spaces Spaces SBWMA 2015 Long Range

KOLKATAS URBAN GREEN SPACES Urban green spaces are public and private open spaces in urban

1 ACCOMMODATION ISSUE 2 SURPLUS SPACES Secondary Surplus Spaces Monsignor Clancy and St.

Topological algebras on Boolean spaces as dual spaces and applications in formal language theory

Squares of function spaces and function spaces on squares Miko laj Krupski University of

Prediction in kernelized output spaces: output kernel trees and ensemble methods Pierre Geurts

Output Perception Colour models Managing output 1 Human Elements of Graphical Output

17. Recursion 2 Input: 3 + 5 * 20 Output: 103 Input: (3 + 5) * 20 Output: 160 Input: -(3 + 5) + 20

1 Introduction 1.1 Problem Definition Let G = ( V, E ) be undirected graph with n vertices, and

Introduction External memory algorithms for well known problems A basic breadth first

Search Engine Architecture 6. Link Analysis This work is licensed under a Creative Commons

The Implementation of T elemaintenance A Study on Change Management with respect to the Naval

chameleon-db Presented by Alu Joint work with

Windowed All- k NN Search over Multidimensional Array Data from Medical Imaging GTC 2016 San

gSpan: Graph-Based Substructure Pattern Mining Xifeng Yan Jiawei Han Department of Computer

Google matrix of the world trade network Leonardo Ermann and Dima Shepelyansky (CNRS, Toulouse)

Sambuz

Useful Links

Newsletter

Mail Us