Embe mbedding dding as a To Tool ol for Al Algorithm orithm De - PowerPoint PPT Presentation

Embe mbedding dding as a To Tool ol for Al Algorithm orithm De Design sign Le Song College of Computing Center for Machine Learning Georgia Institute of Technology

What is machine learning (ML) Design algorithms and systems that can improve their performance with data The best The best design pattern design pattern for big data? for big data? Embedding Embedding structures structures 2

Ex 1: Prediction for structured data Drug/materials effective/ineffective? Information spread viral/non-viral? code graphs benign/ malicious? Natural language positive/negative? 3

Big dataset, explosive feature space 2.3 million organic materials Structure Level 1 Level 2 elements … Feature … … … … 4 1 1 0 0 0 2 1 vector metho hod dimen mensio ion MAE Reduce model Reduce model Predict size by size by Level 6 1.3 billion 0.096 10,000 times! 10,000 times! Efficiency (PCE) (0 -12 %) Embedding 0.1 million 0.085 4

Ex 2: Social information network modeling who and when will do what? Jacob ob David vid Alice ce Christine ine 5

Complex behavior not well modeled ~2 million internet TV views 7,100 385 users programs Deal with Deal with Predict future Predict future no data? no data? event? event? No difference? No difference? Book Alice 𝑈 time How long? How long? 𝑢 1 𝑢 3 𝑢 2 Shoe David Epoch 1 Epoch 2 Epoch 3 Reduce error Reduce error by 5 folds! by 5 folds! Hours 100 ≈ + … user 10 1 time tensor factorization Return Time (MAE) 6

Ex 3: Combinatorial optimizations over graphs App pplicati lication on Optimiz imizat ation ion Problem oblem Influence maximization Minimum vertex/set cover Community discovery Maximum cut Resource scheduling Traveling salesman NP-hard problems 7

Simple heuristics do not exploit data Decision not 2 2 - app pproxim oximat ation ion for or minim nimum um vertex ex cover ver data-driven. Repeat till all edges covered: Can we learn 1. Select uncovered edge with largest total degree from data? 1.3 Learn to be Learn to be 1.2 near optimal! near optimal! 1.1 1 approximation ratio 8

Fundamental problems Structure attribute/ attribute/ 𝑌 5 𝑌 5 𝑌 6 𝑌 6 𝜓 raw info. raw info. 𝑌 1 𝑌 1 𝑌 4 𝑌 4 𝑌 2 𝑌 2 𝑌 3 𝑌 3 How to describe node? How to describe entire structure? How to incorporate various info.? How to do it efficiently? 9

Represent structure as latent variable model (LVM) Structure LVM Continuous Continuous 𝑌 5 𝑌 5 𝑌 6 𝑌 6 𝜓 𝐻 = (𝒲, ℇ) Latent Latent 𝑌 6 𝑌 6 𝑌 5 𝑌 5 Ψ 𝑓 𝐼 𝑗 , 𝐼 Ψ 𝑓 𝐼 𝑗 , 𝐼 𝑘 𝑘 Represent Represent 𝑌 1 𝑌 1 𝑌 4 𝑌 4 Ψ 𝑤 𝐼 𝑗 , 𝑌 𝑗 Ψ 𝑤 𝐼 𝑗 , 𝑌 𝑗 𝑌 4 𝑌 4 𝑌 1 𝑌 1 𝑌 2 𝑌 2 𝑌 3 𝑌 3 Categorical / Categorical / 𝑌 2 𝑌 2 𝑌 3 𝑌 3 Continuous/ Continuous/ Raw features Raw features Joint likelihood 𝑞 𝐼 𝑗 , 𝑌 𝑗 ∝ Ψ 𝑤 𝐼 𝑗 , 𝑌 𝑗 |𝜄 𝑤 Ψ 𝑓 (𝐼 𝑗 , 𝐼 𝑘 |𝜄 𝑓 ) 𝑗∈𝒲 𝑗,𝑘 ∈ℇ Nonnegative Nonnegative node potential edge potential 10 [Dai, Dai & Song 2016]

Posterior distribution as features 𝑞 𝐼 1 𝑦 𝑘 Features of nodes LVM 𝐻 = (𝒲, ℇ) 𝜈 1 (𝜓, 𝑋) 𝑌 6 𝑌 6 𝑌 5 𝑌 5 + posterior 𝑞 𝐼 2 𝑦 𝑘 𝜈 2 (𝜓, 𝑋) + 𝑌 4 𝑌 4 𝑌 1 𝑌 1 ⋮ 𝑌 2 𝑌 2 𝑌 3 𝑌 3 = 𝜈 𝑏 (𝜓, 𝑋) 𝑞 𝐼 𝑘 , 𝑦 𝑘 𝑏𝑚𝑚 𝐼 𝑘 𝑓𝑦𝑑𝑓𝑞𝑢 𝐼 𝑗 Features of the 𝑞 𝐼 𝑗 𝑦 𝑘 = 𝑞 𝑦 𝑘 entire structure Capture both nodal and topological info. Aggregate information from distant nodes 11 [Dai, Dai & Song 2016]

Mean field algorithm aggregates information 𝑟 5 (𝐼 5 ) 𝑟 6 (𝐼 6 ) 𝑟 1 (𝐼 1 ) Approximate posterior 𝑞 𝐼 𝑗 𝑦 𝑘 ≈ 𝑟 𝑗 (𝐼 𝑗 ) 𝑌 6 𝑌 6 𝑌 5 𝑌 5 Ψ 𝑓 𝐼 𝑗 , 𝐼 Ψ 𝑓 𝐼 𝑗 , 𝐼 𝑘 𝑘 Ψ 𝑤 𝐼 𝑗 , 𝑌 𝑗 Ψ 𝑤 𝐼 𝑗 , 𝑌 𝑗 𝑟 2 (𝐼 2 ) via fixed point update 𝑌 4 𝑌 4 𝑌 1 𝑌 1 1. Initialize 𝑟 𝑗 𝐼 𝑗 , ∀ 𝑗 𝑌 2 𝑌 2 𝑌 3 𝑌 3 2. Iterate many times 𝑟 𝑗 𝐼 𝑗 ← Ψ 𝑤 𝐼 𝑗 , 𝑌 𝑗 ⋅ exp 𝑟 𝑘 𝐼 𝑘 log Ψ 𝑓 𝐼 𝑗 , 𝐼 𝑒𝐼 , ∀ 𝑗 𝑘 𝑘 𝓘 𝑘∈𝓞 𝑗 𝓤 ∘ 𝑌 𝑗 , 𝑟 𝑘 (𝐼 𝓤 ∘ 𝑌 𝑗 , 𝑟 𝑘 (𝐼 𝑘 ) 𝑘∈𝒪 𝑗 𝑘 ) 𝑘∈𝒪 𝑗 [Song et al. 11a,b] 12 [Song et al. 10a,b]

Embedding of distribution 𝑌 Density 𝑞(𝑌) Feature 𝑌 2 𝔽 𝑞 𝜚 𝑌 𝜚 𝑌 = space 𝑌 3 space ⋮ 𝜈 𝑌 Mean, Variance, 𝑟(𝑌) higher 𝔽 𝑟 𝜚 𝑌 order ′ 𝜈 𝑌 moment ⋮ Injective for rich nonlinear feature 𝜚(𝑦) 𝜈 𝑌 is a sufficient statistic of 𝑞(𝑌) Operator View ∘ 𝜈 𝑌 𝓤 ∘ 𝑞 𝑦 = 𝓤 13 [Smola, Gretton, Song & Scholkopf. 2007]

Structure2vec (S2V): embedding mean field (0) 𝜈 6 (0) 𝜈 5 Approximate embedding of (0) 𝜈 1 (0) 𝜈 4 𝑞 𝐼 𝑗 𝑦 𝑘 ↦ 𝜈 𝑗 𝑌 6 𝑌 6 𝑌 5 𝑌 5 (0) 𝜈 2 via fixed point update 𝑌 4 𝑌 4 𝑌 1 𝑌 1 1. Initialize 𝜈 𝑗 , ∀ 𝑗 (0) 𝜈 3 𝑌 2 𝑌 2 𝑌 3 𝑌 3 2. Iterate many times ∘ 𝑌 𝑗 , 𝜈 𝑘 𝑘∈𝒪 𝑗 𝜈 𝑗 ← 𝓤 ∘ 𝑌 𝑗 , 𝜈 𝑘 𝑘∈𝒪 𝑗 𝜈 𝑗 ← 𝓤 , ∀ 𝑗 , ∀ 𝑗 14

Structure2vec (S2V): embedding mean field (1) 𝜈 6 (1) 𝜈 5 Approximate embedding of (1) 𝜈 1 (1) 𝜈 4 𝑞 𝐼 𝑗 𝑦 𝑘 ↦ 𝜈 𝑗 𝑌 6 𝑌 6 𝑌 5 𝑌 5 (1) 𝜈 2 via fixed point update 𝑌 4 𝑌 4 𝑌 1 𝑌 1 1. Initialize 𝜈 𝑗 , ∀ 𝑗 (1) 𝜈 3 𝑌 2 𝑌 2 𝑌 3 𝑌 3 2. Iterate many times ∘ 𝑌 𝑗 , 𝜈 𝑘 𝑘∈𝒪 𝑗 𝜈 𝑗 ← 𝓤 ∘ 𝑌 𝑗 , 𝜈 𝑘 𝑘∈𝒪 𝑗 𝜈 𝑗 ← 𝓤 , ∀ 𝑗 , ∀ 𝑗 15

Structure2vec (S2V): embedding mean field (2) 𝜈 6 (2) 𝜈 5 Approximate embedding of (2) 𝜈 1 (2) 𝜈 4 𝑞 𝐼 𝑗 𝑦 𝑘 ↦ 𝜈 𝑗 𝑌 6 𝑌 6 𝑌 5 𝑌 5 (2) 𝜈 2 via fixed point update 𝑌 4 𝑌 4 𝑌 1 𝑌 1 1. Initialize 𝜈 𝑗 , ∀ 𝑗 (2) 𝜈 3 𝑌 2 𝑌 2 𝑌 3 𝑌 3 2. Iterate many times ∘ 𝑌 𝑗 , 𝜈 𝑘 𝑘∈𝒪 𝑗 𝜈 𝑗 ← 𝓤 ∘ 𝑌 𝑗 , 𝜈 𝑘 𝑘∈𝒪 𝑗 𝜈 𝑗 ← 𝓤 , ∀ 𝑗 , ∀ 𝑗 ? ? How to parametrize 𝓤 How to parametrize 𝓤 Depends on unknown Ψ 𝑤 𝐼 𝑗 , 𝑌 𝑗 and Ψ 𝑓 𝐼 𝑗 , 𝐼 Depends on unknown Ψ 𝑤 𝐼 𝑗 , 𝑌 𝑗 and Ψ 𝑓 𝐼 𝑗 , 𝐼 𝑘 𝑘 16

Directly parameterize nonlinear mapping ∘ 𝑌 𝑗 , 𝜈 𝑘 𝑘∈𝒪 𝑗 𝜈 𝑗 ← 𝓤 Any universal nonlinear function will do Eg. assume 𝜈 𝑗 ∈ 𝓢 𝑒 , 𝑌 𝑗 ∈ 𝓢 𝑜 , neural network parameterization 𝜈 𝑗 ← 𝜏 𝑋 1 𝑌 𝑗 + 𝑋 2 𝜈 𝑘 𝑘∈𝒪 𝑗 max 0,⋅ tanh (⋅) 𝑒 × 𝑜 𝑒 × 𝑒 sigmoid (⋅) matrix matrix Learn with supervision, unsupervised Learn with supervision, unsupervised learning, or reinforcement learning learning, or reinforcement learning 17

Embedding belief propagation Approximate 𝑞 𝐼 𝑗 𝑦 𝑘 , 𝜄 as 𝐻 = (𝒲, ℇ) 𝑟 𝑗 𝐼 𝑗 = Ψ 𝑤 𝐼 𝑗 , 𝑦 𝑗 |𝜄 ⋅ 𝑌 6 𝑌 6 𝑌 5 𝑌 5 𝑛 𝑘𝑗 (𝐼 𝑗 ) Ψ 𝑓 𝐼 𝑗 , 𝐼 Ψ 𝑓 𝐼 𝑗 , 𝐼 Ψ 𝑤 𝐼 𝑗 , 𝑌 𝑗 Ψ 𝑤 𝐼 𝑗 , 𝑌 𝑗 𝑘 𝑘 𝑘∈𝓞 𝑗 𝑌 1 𝑌 1 𝑌 4 𝑌 4 with iterative messages updates 𝓤′ ∘ 𝑌 𝑗 , 𝑛 ℓ𝑗 (𝐼 𝑗 ) ℓ∈𝒪 𝑗 𝓤′ ∘ 𝑌 𝑗 , 𝑛 ℓ𝑗 (𝐼 𝑗 ) ℓ∈𝒪 𝑗 1. Initialize 𝑛 𝑗𝑘 𝐼 𝑘 , ∀𝑗, 𝑘 𝑌 2 𝑌 2 𝑌 3 𝑌 3 2. Iterate many times 𝑛 𝑗𝑘 𝐼 𝑘 ← Ψ 𝑤 (𝐼 𝑗 , 𝑌 𝑗 |𝜄)Ψ 𝑓 𝐼 𝑗 , 𝐼 𝑘 |𝜄 ⋅ 𝑛 ℓ𝑗 𝐼 𝑗 𝑒𝐼 𝑗 , ∀𝑗, 𝑘 𝓘 ℓ∈𝒪 𝑗 \𝑘 𝓤 ∘ 𝑌 𝑗 , 𝑛 ℓ𝑗 (𝐼 𝑗 ) ℓ∈𝒪 𝑗 \𝑘 𝓤 ∘ 𝑌 𝑗 , 𝑛 ℓ𝑗 (𝐼 𝑗 ) ℓ∈𝒪 𝑗 \𝑘 [Song et al. 11a,b] [Song et al. 10a,b] 18

Ex 1: Prediction for structured data Drug/materials effective/ineffective? Information spread viral/non-viral? code graphs benign/ malicious? Natural language positive/negative? 19

Algorithm learning Given 𝑛 data points 𝜓 1 , 𝜓 2 , … , 𝜓 𝑛 And their labels 𝑧 1 , 𝑧 2 , … , 𝑧 𝑛 Estimate parameters 𝑋 and 𝑊 via 𝑛 𝑊,𝑋 𝑀 𝑊, 𝑋 : = 𝑧 𝑗 − 𝑊 ⊤ 𝜈 𝑏 (𝑋, 𝜓 𝑗 ) 2 min 𝑗=1 Comp omputation tion Ope perati ation on Similar ar to to Objective A sequence of nonlinear Graphical model inference 𝑀 𝑊, 𝑋 mappings over graph Gradient Back propagation in deep Chain rule of derivatives in 𝜖𝑀 learning 𝜖𝑋 reverse order 20

10,000x smaller model but accurate prediction Harvard clean energy project: predict material efficiency (0-12) 2.3 million organic molecules 90% for training, 10% data for testing Test t MAE Test RMSE MSE # pa parame amete ters Mean predictor 1.986 2.406 1 WL level-3 0.143 0.204 1.6 m WL level-6 0.096 0.137 1.3 b S2V-MF 0.091 0.125 0.1 m S2V-BP 0.085 0.117 0.1 m ~4% relative error ~4% relative error 21

Ex 2: Social information network modeling who and when will do what? Jacob ob David vid Alice ce Christine ine 22

Embe mbedding dding as a To Tool ol for Al Algorithm orithm De - PowerPoint PPT Presentation

Embe mbedding dding as a To Tool ol for Al Algorithm orithm De Design sign Le Song College of Computing Center for Machine Learning Georgia Institute of Technology What is machine learning (ML) Design algorithms and systems that can

Pe onie s a re in de ma nd for we dding s a nd e ve nts Ric ha rd Uva , PhD Se a b e rry F a

Ve nue : Upsc ale we dding ve nue or Booth Pac kage s inc lude Pr int lar ge r ve nue

SynAthina Onli line Tools 1. . A mapping tool 2. A Community Tool 3. An Archive Tool 3. An

A A La Carte Emb mbedding: Ch Cheap but Effective Induction on of of Se Semantic Feature

PolyViNE: Pol ic y based Vi rtual N etwork E mbedding Across Multiple Domains Presented by Fady

Da hlia is in de ma nd for we dding s a nd e ve nts Ric ha rd Uva , PhD Se a b e rry F a rm

A dding V S T S upport to L inux A udio A pplications Paul Davis Lin ux Audio System

Odds Algorithm An Online Algorithm Group Fibonado 20. Dec 2016 Group Fibonado Odds Algorithm

PT Lippo po Ka Karawaci ci Tbk 9M19 9 Resul sults ts Presentation sentation Novem embe

Ho How w to o Bui Build & Secur cure a RISC-V Em Embe bedde ded System HARDWEAR.IO,

Citizens Committee Nov ovem embe ber r 19, 2012 Introduction Largest petroleum

2 Q 20 2019 19/20 BTS GROUP EAR ARNIN INGS PR PRESENTATIO ION 19 19 Nov ovem embe ber r

PEM EMBE BERTON ON TWP SCHO HOOLS OLS TENTATIVE 2019-2020 BUDGET ADOPTION FOR SUBMISSION TO

Proposed Riverfront Development District NOV EMBE R 9, 2 017 Pendleton Riverfront Development

So Southeas east t Mid iddle le Vi Virt rtual al Op Open en Hou ouse Se Septem embe

Black Box Scanning Tool + White Box Testing Tool Toshis Black Box Scanning Tool Same

BeyOND Unleashing BOND Thomas Bernecker, Franz Graf, Hans-Peter Kriegel, , , g , Christian

Definition of Stochastic Processes Definition of Stochastic Processes st Order Density

Security proofs for continuous-variable quantum key distribution Anthony Leverrier Inria Paris

Lecture 13 : The Exponential Distribution 0/ 19 Definition A continuous random variable X is

Geometric losses for distributional learning Arthur Mensch (1) , Mathieu Blondel (2) , Gabriel

Key Management and Distribution Symmetric with Asymmetric Public Keys CSS322: Security and

The Climate-G testbed: issues, requirements and results S. Fiore, Ph.D. SPACI and University of

GEO-Software R. Prez, G. Gonzlez, J. Becedas, F. Pedrera, M. J. Latorre Jonathan Becedas, PhD

Embe mbedding dding as a To Tool ol for Al Algorithm orithm De - PowerPoint PPT Presentation

Embe mbedding dding as a To Tool ol for Al Algorithm orithm De Design sign Le Song College of Computing Center for Machine Learning Georgia Institute of Technology What is machine learning (ML) Design algorithms and systems that can

Pe onie s a re in de ma nd for we dding s a nd e ve nts Ric ha rd Uva , PhD Se a b e rry F a

Ve nue : Upsc ale we dding ve nue or Booth Pac kage s inc lude Pr int lar ge r ve nue

SynAthina Onli line Tools 1. . A mapping tool 2. A Community Tool 3. An Archive Tool 3. An

A A La Carte Emb mbedding: Ch Cheap but Effective Induction on of of Se Semantic Feature

PolyViNE: Pol ic y based Vi rtual N etwork E mbedding Across Multiple Domains Presented by Fady

Da hlia is in de ma nd for we dding s a nd e ve nts Ric ha rd Uva , PhD Se a b e rry F a rm

A dding V S T S upport to L inux A udio A pplications Paul Davis Lin ux Audio System

Odds Algorithm An Online Algorithm Group Fibonado 20. Dec 2016 Group Fibonado Odds Algorithm

PT Lippo po Ka Karawaci ci Tbk 9M19 9 Resul sults ts Presentation sentation Novem embe

Ho How w to o Bui Build &amp; Secur cure a RISC-V Em Embe bedde ded System HARDWEAR.IO,

Citizens Committee Nov ovem embe ber r 19, 2012 Introduction Largest petroleum

2 Q 20 2019 19/20 BTS GROUP EAR ARNIN INGS PR PRESENTATIO ION 19 19 Nov ovem embe ber r

PEM EMBE BERTON ON TWP SCHO HOOLS OLS TENTATIVE 2019-2020 BUDGET ADOPTION FOR SUBMISSION TO

Proposed Riverfront Development District NOV EMBE R 9, 2 017 Pendleton Riverfront Development

So Southeas east t Mid iddle le Vi Virt rtual al Op Open en Hou ouse Se Septem embe

Black Box Scanning Tool + White Box Testing Tool Toshis Black Box Scanning Tool Same

BeyOND Unleashing BOND Thomas Bernecker, Franz Graf, Hans-Peter Kriegel, , , g , Christian

Definition of Stochastic Processes Definition of Stochastic Processes st Order Density

Security proofs for continuous-variable quantum key distribution Anthony Leverrier Inria Paris

Lecture 13 : The Exponential Distribution 0/ 19 Definition A continuous random variable X is

Geometric losses for distributional learning Arthur Mensch (1) , Mathieu Blondel (2) , Gabriel

Key Management and Distribution Symmetric with Asymmetric Public Keys CSS322: Security and

The Climate-G testbed: issues, requirements and results S. Fiore, Ph.D. SPACI and University of

GEO-Software R. Prez, G. Gonzlez, J. Becedas, F. Pedrera, M. J. Latorre Jonathan Becedas, PhD

Ho How w to o Bui Build & Secur cure a RISC-V Em Embe bedde ded System HARDWEAR.IO,