What deep generative models can do for you: Opportunities, - PowerPoint PPT Presentation

What deep generative models can do for you: Opportunities, challenges, and open questions Giulia Fanti Carnegie Mellon University 1

Kiran Zinan Lin Hao Liang Alankar Jain Thekumparampil Chen Wang Sewoong Oh Vyas Sekar 2

Classifying Classification network traffic Common ML Reinforcement tools in Traffic learning engineering networking Unsupervised Clustering methods signals 3

This talk: Generative models • What are generative models? • Why are they relevant now? • How can they be useful in networking? • What are the limitations? 4

What is a generative model? • Models the joint probability distribution 𝑞(𝑦) of a dataset • Example: 𝑦[0] 𝑦[𝑢 − 1] Time t 𝑦 𝑢 = 𝑔 𝑦 0: 𝑢 − 1 , 𝜄 + 𝑜[𝑢] How do we pick 𝑔 ? How to combine noise? Learned Model Noise parameters 5

How are they used in the networking community? Design data to Use dom domain n kno nowledg edge Use da par param ametric model to extract high-level populate to model those insights parameters insights Network traffic has temporal patterns ! " = 1 day 𝑦 𝑢 = sin 𝜄𝑢 + 𝑜[𝑢] Melamed (1993), Denneulin et al (2004), Swing, BURSE, Hierarchical bundling, Di et al (2014), …

Poor flexibility •Requires new design for Problems every type of data with this Poor fidelity approach •Doesn’t capture properties that were not explicitly modeled 7

Deep generative models Design neur eural networ ork Use da data to to produce data of the populate right dimensionality parameters 𝜄 ∈ 𝑆 ! 8

Generative Adversarial Networks (GANs): Breakthrough in generative modeling • Prior approaches • Likelihood-based • Heavily rely on domain knowledge • GANs • Adversarial learning • Limited a priori assumptions 9

Generative Adversarial Networks (GANs) Noise Generator Discriminator FAKE! REAL z G D 10

How can we use these tools in networking? • Sharing synthetic data • Discovering malicious inputs to black-box systems • Understanding complex datasets 11

Zinan Lin Vyas Sekar Alankar Jain Chen Wang Sharing synthetic data Use case 1 github.com/fjxmlzn/DoppelGANger 12

Key stumbling block: Access to data Enterprises Researchers Division A Division B Collaborative opportunities Unreproducible research go untapped Limited potential 13

(Not a new) idea: Synthetic data models Enterprises Researchers Generative Generative Generative Model Model Model Data Clearinghouse (ISAC, ISAO) 14

Two main problems Fideli lity Pri rivacy Real Business User secrets data Generative Model Generated 15

Existing methods DoppelGANger Generating synthetic time Expert-designed series data with GANs parametric models Machine-learned models Pr Privacy Anonymized, raw data Fi Fide delity 16

What kinds of data are we interested in? Multi- dimensional time series With (U.S., mobile traffic) metadata 17

Datasets: Networking, security, and systems • Cluster traces • Go Google : task resource usage logs from 12.k machines (2011) • IB IBM : resource usage measurements from 100k containers • Traffic measurements • Wi Wikipedia web traffic : # daily views of Wikipedia articles (2016) • FC FCC Meas asuring Broad adban and America ca : Internet traffic and performance measurements from consumer devices around the country 18

DoppelGANger: Time series generation Noi Noise Noi Noise Noi Noise Mi Min/Ma Max Me Metadata Gener Generator Gener Generator … RN RNN RNN RN (MLP) (M (M (MLP) … (A 1 , …, (A …, A m ) (mi min ± ma max/2 /2 R 1 ,… ,…,R ,R S R T-s+ s+1 ,… ,…,R ,R T Au Auxiliary Di Discrimina nator 1: re 1: real Di Discrimina nator 1: re 1: real 0: 0: fa fake 0: fa 0: fake 19

Part I: RNN + Batched Generation Unbatched Batched 20

Challenge: Training on high-dynamic-range time series 21 Day

Part II: Auto normalization • Standard normalization: Normalize by gl global min/max max min • DoppelGANger: Normalize each timeseries individually • Store min/max as “fake” metadata (min, max) (min, max) (min, max) 22

Challenge: Complex relationships in metadata • Need to capture relation between metadata and time series • E.g., Cable vs Mobile users • Straw man: Joint generator of metadata and time series • Problem: too hard for a single generator Before: Single generator Count Time series min value 24

Part III: Decoupled Generation, Auxiliary Discriminator • Two stage decoupling • Generate metadata (using a standard MLP) • Generate measurements conditioned on metadata • Auxiliary discriminator for metadata alone 25

Histogram of !"#$!%& per time series ' hout auxiliary Withou Count discriminator Wi With auxiliary Count discriminator Time series min value 26

Putting it together Noi Noise Noi Noise Noi Noise Mi Min/Ma Max Me Metadata Gener Generator Gener Generator … RN RNN RNN RN (MLP) (M (M (MLP) … (A 1 , …, (A …, A m ) (mi min ± ma max/2 /2 R 1 ,… ,…,R ,R S R T-s+ s+1 ,… ,…,R ,R T Au Auxiliary Di Discrimina nator 1: re 1: real Di Discrimina nator 1: re 1: real 0: 0: fa fake 0: fa 0: fake 27

Temporal Correlations Microbenchmark 28

Predicting job failures in a compute cluster Downstream task • Train on synthetic, test on real 29

Evaluating privacy • Protecting business secrets • Aggregate functions of the data • User privacy • Differential privacy • Robustness against membership inference 30

Differentially-private SGD kills fidelity in GANs 31

Open questions: Synthetic data generation • Fi Fidelity • Long sequences of data • Stateful protocols • Pr Privacy • Differentially-private GANs • New privacy metrics? 32

Zinan Lin Hao Liang Vyas Sekar Identifying malicious inputs to black-box systems Use case 2 33

Bl Black-bo box De Devi vices es and System ems Abound IoT Devices Control Units in Vehicles / Manufacturing Servers / Routers NO NO so source co code / bi bina nary / pr protoco col fo format / de design do doc 11/14/2019 Towards Oblivious Network Analysis using GANs HotNets'19 34

Identifying At Id Attack Pa Packets is is Ha Hard Send packets Attacker We We wa want to to id identif tify at attack ack pack packets, but do but do NO NOT ha have so source co code or system descr cript ption 11/14/2019 Towards Oblivious Network Analysis using GANs HotNets'19 35

Motivating example Can an attacker identify many y • Packet classification packets with hi high classification times? • Vamanan et al [SIGCOMM 2010] • Singh et al [SIGCOMM 2013] • Yingchareonthawornchai et al [TON 2018] • Liang et al [SIGCOMM 2019] • Rashelbach et al [SIGCOMM 2020] • Many more… Classification Time 36

Random packet generation • NeuroCuts, Liang et al [SIGCOMM 2019] Can can we generate many, d , diverse s slo low Thre Threshold hold packets? Fa Fast pac packets Number of 2,000 total packets packets Slo low p packets Classification Time (ms) 37

Common approaches • Fuzzing tools • Random sampling • Optimization of black-box functions • Bayesian optimization • Genetic algorithms • Simulated annealing GANs can help! 38

Ap Approa oach 1: 1: Va Vanilla GA GAN • Challenge: too little training data Training Dataset “Fast” packets Classification Random Packets 1% 1% decision tree “Slow” packets GAN 11/14/2019 Towards Oblivious Network Analysis using GANs HotNets'19 16

Am AmpGAN AN: Tr Training with Feedback Training Dataset “Fast” packets Classification Random Packets decision tree “Slow” packets Generate packets with condition=“slow” AmpGAN Am AN GAN 11/14/2019 Towards Oblivious Network Analysis using GANs HotNets'19 18

Results Thre Threshold hold Number of Random om packets pac packets Number of Am AmpGAN AN packets Classification Time (ms) 41

Results AmpGAN 2.5x jump 10x jump Genetic Algorithms Fraction of Simulated Annealing “slow” packets Generalized SA Bayesian Optimization AmpGAN System Calls 42

Open questions • Sequences of inputs • Can we use this to optimize systems as well as finding attacks • E.g., CherryPick [NSDI 2017] 43

Kiran Sewoong Oh Zinan Lin Thekumparampil Extracting insights from unstructured data Use case 3 github.com/fjxmlzn/InfoGAN-CR 44

Disentangled GANs 𝑙 factors 𝑨 $ • Hair color 𝑒 input 𝑨 % • Rotation Generator noise … • Background 𝑨 & • Bangs • How do 𝑨 , s control the factors? Vanilla GANs Disentangled GANs 𝑨 $ Factor $ 𝑑 $ Factor $ 𝑨 % Factor % 𝑑 % Factor % … … … … 𝑨 & Factor ' 𝑑 ' Factor ' 𝑨 ( s Code & Paper: https://github.com/fjxmlzn/InfoGAN-CR 45

What deep generative models can do for you: Opportunities, - PowerPoint PPT Presentation

What deep generative models can do for you: Opportunities, challenges, and open questions Giulia Fanti Carnegie Mellon University 1 Kiran Zinan Lin Hao Liang Alankar Jain Thekumparampil Chen Wang Sewoong Oh Vyas Sekar 2 Classifying

Learning Deep Generative Models Inference & Representation Lecture 12 Rahul G. Krishnan

generative design systems Generative Brief Design Definitions Workshop Processes

Deep-Learning: Unsupervised Generative models Deep Belief Networks Deep Stacked AutoEncoders

Deep Generative models for Inverse Problems Alex Dimakis joint work with Ashish Bora, Dave Van

Generative networks part 2: GANs 23 / 54 Recap on generative networks Generative networks provide

Generative Adversarial Nets(GANs) Troy Cary and Chenzhi Zhao A generative adversarial net is

CSC421/2516 Lecture 18: Generative Adversarial Networks Roger Grosse and Jimmy Ba Roger Grosse

Probability Functional Descent: A Unifying Perspective on GANs, VI, and RL Casey Chu <

Invertible Generative Models for Inverse Problems Mitigating Representation Error and Dataset Bias

Generative Deep Learning Prof. Kuan-Ting Lai 2020/5/12 DeepFake (Intro) Generative Recurrent

Introduction to Generative Models (and GANs) Haoqiang Fan fhq@megvii.com Nov. 2017 Figures

Compressed Sensing and Generative Models Ashish Bora Ajil Jalal Eric Price Alex Dimakis UT

Deep Hybrid Models: Bridging Discriminative and Generative Approaches Volodymyr Kuleshov and

Applications of GANs Photo-Realistic Single Image Super-Resolution Using a Generative

Applications of GANs Photo-Realistic Single Image Super-Resolution Using a Generative

4 Deep Generative Models BVM 2018 Tutorial: Advanced Deep Learning Methods Jens Petersen Dept.

CS533 benchmark v. trans. To subject (a system) to a series of tests Modeling and Performance In

Synthetic Difference in Differences Dmitry Arkhangelsky Susan Athey David Hirshberg Guido

Lattice Synergy Curtis A. Meyer Carnegie Mellon University May 15 th , 2009 Lattice QCD

Radio recombination lines: the synergy between a big dish and dipoles Pedro Salas The Big

High-frequency imaging of a moving object Clifford Nolan University of Limerick Conference in

Ytterbium quantum gases in Florence Leonardo Fallani University of Florence & LENS Credits

Welfare Dynamics Measurement: Two Definitions of a Vulnerability Line and Their Application

Multi-View Clustering via Joint Nonnegative Matrix Factorization Jialu Liu 1 Chi Wang 1 Jing Gao 2

What deep generative models can do for you: Opportunities, - PowerPoint PPT Presentation

What deep generative models can do for you: Opportunities, challenges, and open questions Giulia Fanti Carnegie Mellon University 1 Kiran Zinan Lin Hao Liang Alankar Jain Thekumparampil Chen Wang Sewoong Oh Vyas Sekar 2 Classifying

Learning Deep Generative Models Inference &amp; Representation Lecture 12 Rahul G. Krishnan

generative design systems Generative Brief Design Definitions Workshop Processes

Deep-Learning: Unsupervised Generative models Deep Belief Networks Deep Stacked AutoEncoders

Deep Generative models for Inverse Problems Alex Dimakis joint work with Ashish Bora, Dave Van

Generative networks part 2: GANs 23 / 54 Recap on generative networks Generative networks provide

Generative Adversarial Nets(GANs) Troy Cary and Chenzhi Zhao A generative adversarial net is

CSC421/2516 Lecture 18: Generative Adversarial Networks Roger Grosse and Jimmy Ba Roger Grosse

Probability Functional Descent: A Unifying Perspective on GANs, VI, and RL Casey Chu &lt;

Invertible Generative Models for Inverse Problems Mitigating Representation Error and Dataset Bias

Generative Deep Learning Prof. Kuan-Ting Lai 2020/5/12 DeepFake (Intro) Generative Recurrent

Introduction to Generative Models (and GANs) Haoqiang Fan fhq@megvii.com Nov. 2017 Figures

Compressed Sensing and Generative Models Ashish Bora Ajil Jalal Eric Price Alex Dimakis UT

Deep Hybrid Models: Bridging Discriminative and Generative Approaches Volodymyr Kuleshov and

Applications of GANs Photo-Realistic Single Image Super-Resolution Using a Generative

Applications of GANs Photo-Realistic Single Image Super-Resolution Using a Generative

4 Deep Generative Models BVM 2018 Tutorial: Advanced Deep Learning Methods Jens Petersen Dept.

CS533 benchmark v. trans. To subject (a system) to a series of tests Modeling and Performance In

Synthetic Difference in Differences Dmitry Arkhangelsky Susan Athey David Hirshberg Guido

Lattice Synergy Curtis A. Meyer Carnegie Mellon University May 15 th , 2009 Lattice QCD

Radio recombination lines: the synergy between a big dish and dipoles Pedro Salas The Big

High-frequency imaging of a moving object Clifford Nolan University of Limerick Conference in

Ytterbium quantum gases in Florence Leonardo Fallani University of Florence &amp; LENS Credits

Welfare Dynamics Measurement: Two Definitions of a Vulnerability Line and Their Application

Multi-View Clustering via Joint Nonnegative Matrix Factorization Jialu Liu 1 Chi Wang 1 Jing Gao 2

Learning Deep Generative Models Inference & Representation Lecture 12 Rahul G. Krishnan

Probability Functional Descent: A Unifying Perspective on GANs, VI, and RL Casey Chu <

Ytterbium quantum gases in Florence Leonardo Fallani University of Florence & LENS Credits