Computational and Statistical Learning Theory TTIC 31120 Prof. - PowerPoint PPT Presentation

Computational and Statistical Learning Theory TTIC 31120 Prof. Nati Srebro Lecture 7: Computational Complexity of Learning — Hardness of Improper Learning (continued) Agnostic Learning

Hardness of Learning via Crypto Easy to generate random (𝐿, 𝐸 𝐿 ) −1 (𝑐) very hard: 𝐿, 𝑐 ↦ 𝑔 𝐿 𝐿, 𝑏 ↦ 𝑔 𝐿 (𝑏) easy No poly-time alg for non-negligible 𝐿, 𝑐 −1 (𝑐) i Hard to learn ℋ = ℎ 𝐿 𝑐, 𝑗 : 𝑐, 𝑗 ↦ 𝑔 𝐿 −1 𝑐 easy 𝐸 𝐿 , 𝑐 ↦ 𝑔 𝐿 Hard to learn polytime functions (e.g. polytime)

Hardness of Learning via Crypto Assumption: No poly-time algorithm for 3 𝑐 𝑛𝑝𝑒 𝐿 that works for non- negligible 𝑐 , 𝐿 = 𝑞𝑟 ( 𝑞, 𝑟 primes with 3 ∤ 𝑞 − 1 𝑟 − 1 ) 𝐿, 𝑏 ↦ 𝑏 3 𝑛𝑝𝑒 𝐿 easy −1 (𝑐) very hard: 𝐿, 𝑐 ↦ 𝑔 𝐿 𝐿, 𝑏 ↦ 𝑔 𝐿 (𝑏) easy No poly-time alg for non-negligible 𝐿, 𝑐 −1 (𝑐) i Hard to learn ℋ = ℎ 𝐿 𝑐, 𝑗 : 𝑐, 𝑗 ↦ 𝑔 𝐿 −1 𝑐 easy 𝐸 𝐿 , 𝑐 ↦ 𝑔 𝐿 Hard to learn polytime functions (e.g. polytime) ∀ 𝐿 ℎ 𝐿 ∈ Hard to learn ℋ ℋ 𝑏 ↦ 𝑏 𝐸 𝐿 = 3 𝑏 𝑛𝑝𝑒 𝐿 Hard to learn log-depth circuit Computable using log-depth logic circuit Hard to learn log-depth NN Computable using log-depth neural nets

Hardness of Learning via Crypto • Public-key crypto is possible  hard to learn poly-time functions • Hardness of Discrete Cube Root  hard to learn log(n)-depth logic circuits  hard to learn log(n)-depth poly-size neural networks Michael • Hardness of breaking RSA Kearns  hard to learn poly-length logical formulas  hard to learn poly-size automata  hard to learn push-down automata, ie regexps  for some depth d, hard to learn poly-size depth-d threshold circuits (output of unit is one iff number of input units that are one is greater than threshold) • Hardness of lattice-shortest-vector based cryptography  hard to learn intersection of 𝑜 𝑠 halfspaces (for any 𝑠 > 0 )

Intersections of Halfspaces 𝑙(𝑜) = 𝑙 𝑜 | 𝑥 1 , … , 𝑥 𝑙 𝑜 ∈ ℝ 𝑜 ℋ 𝑜 𝑦 ↦∧ 𝑗=1 𝑥 𝑗 , 𝑦 > 0 𝑃 𝑜 1.5 − 𝑣𝑇𝑊𝑄 ∉ 𝑆𝑄 ⇓ Lattice-based cryptosystem is secure ⇓ Sasha 𝑙 𝑜 =𝑜 𝑠 Sherstov For any 𝑠 > 0 , hard to learn 𝐼 𝑜 ⇓ Hard to learn 2-layer NN with 𝑜 𝑠 hidden units Adam Klivans The unique shortest lattice vector problem: • SVP 𝑤 1 , 𝑤 2 , … , 𝑤 𝑜 ∈ ℝ 𝑜 = arg min 𝑏 1 ,𝑏 2 ,…,𝑏 𝑜 ∈ℤ 𝑏 1 𝑤 1 + 𝑏 2 𝑤 2 + ⋯ + 𝑏 𝑜 𝑤 𝑜 𝑃 𝑜 1.5 − 𝑣𝑇𝑊𝑄 : only required to return SVP if next-shortest is 𝑃 𝑜 1.5 times longer •

Hardness of Learning via Crypto −1 (𝑐) very hard: 𝐿, 𝑐 ↦ 𝑔 Easy to generate 𝐿 No poly-time alg for non-negligible 𝐿, 𝑐 random (𝐿, 𝐸 𝐿 ) 𝐿, 𝑏 ↦ 𝑔 𝐿 (𝑏) easy −1 (𝑐) i Hard to learn ℋ = ℎ 𝐿 𝑐, 𝑗 : 𝑐, 𝑗 ↦ 𝑔 𝐿 −1 𝑐 easy 𝐸 𝐿 , 𝑐 ↦ 𝑔 𝐿 Hard to learn polytime functions (e.g. polytime)

Hardness of Learning via Crypto −1 (𝑐) very hard: 𝐿, 𝑐 ↦ 𝑔 Easy to generate 𝐿 No poly-time alg for non-negligible 𝐿, 𝑐 random (𝐿, 𝐸 𝐿 ) No poly-time alg for all 𝐿 and almost all 𝑐 𝐿, 𝑏 ↦ 𝑔 𝐿 (𝑏) easy −1 (𝑐) i Hard to learn ℋ = ℎ 𝐿 𝑐, 𝑗 : 𝑐, 𝑗 ↦ 𝑔 𝐿 −1 𝑐 easy 𝐸 𝐿 , 𝑐 ↦ 𝑔 𝐿 Hard to learn polytime functions (e.g. polytime)

Hardness of Learning: Take II • Recall how we proved hardness of proper learning: • Reduction from deciding consistency with ℋ • If we had efficient proper learner, could train it and find consistent hypothesis in ℋ if it exists • Problem: if learning is not proper, might return good hypothesis not in ℋ , even though 𝒠 not consistent with ℋ • Instead: reduction from deciding between two possibilities: • Sample is consistent with ℋ • For every consistent sample, return 1 w.p. ≥ 3/4 (over randomization in algorithm) • Sample comes from random “ unpredictable ” distribution • E.g. sampled such that labels 𝑧 independent of 𝑦 • For all but negligible samples 𝑇 ∼ 𝒠 𝑛 , return 0 w.p. ≥ 3/4 Amit Daniely

Hardness Relative to RSAT • RSAT assumption: For some 𝑔 𝐿 = 𝜕 1 , there is no poly-time randomized algorithm that gets as input a K-SAT formula with 𝑜 𝑔(𝐿) constraints, and: • If the input is satisfiable, then w.p. ≥ 3/4 (over the randomization in the algorithm), it outputs 1 • If each constraint is generated independently and uniformly at random, then with probability approaching 1 (as 𝑜 → ∞ ) over the formula, w.p. ≥ 3/4 (over the randomization in the algorithm), it outputs 0 • Theorem: Under the RSAT assumption, • Poly-length DNFs are not efficiently PAC learnable e.g. ℎ 𝑦 = 𝑦 1 ∧ 𝑦 7 ∧ 𝑦 15 ∧ 𝑦 17 ∨ 𝑦 2 ∧ 𝑦 24 ∨ ⋯ • Intersection of 𝜕 log 𝑜 halfspaces are not efficiently PAC learnable  2-layer Neural Networks with 𝑃 log 1.1 𝑜 hidden layers are not efficiently PAC learnable Amit Daniely

Hardness of Learning • Axis-aligned rectangles in 𝑜 dimensions Efficiently Properly • Halfspaces in 𝑜 dimensions Learnable • Conjunctions on 𝑜 variables Efficiently Learnable, • 3-term DNF ’ s but not Properly • DNF formulas of size poly(n) • Generic logical formulas of size poly(n) Not Efficiently • Neural nets with at most poly(n) units Learnable • Functions computable in poly(n) time

Realizable vs Agnostic • Definition : A family ℋ 𝑜 of hypothesis classes is efficiently properly PAC-Learnable if there exists a learning rule 𝐵 such that ∀ 𝑜 ∀𝜗, 𝜀 > 0 , 𝜀 ∃𝑛 𝑜, 𝜗, 𝜀 , ∀𝒠 s.t. 𝑀 𝒠 ℎ = 0 for some ℎ ∈ ℋ , ∀ 𝑇∼𝒠 𝑛 𝑜,𝜗,𝜀 , 𝑀 𝒠 𝐵 𝑇 ≤ 𝜗 1 𝜗 , 𝑚𝑝𝑕 1 and 𝐵(𝑇)(𝑦) can be computed in time 𝑞𝑝𝑚𝑧 𝑜, 𝜀 and 𝐵 always outputs a predictor in ℋ 𝑜 • Definition : A family ℋ 𝑜 of hypothesis classes is efficiently properly agnostically PAC-Learnable if there exists a learning rule 𝐵 such that 𝜀 ∀ 𝑜 ∀𝜗, 𝜀 > 0 , ∃𝑛 𝑜, 𝜗, 𝜀 , ∀𝒠 ∀ 𝑇∼𝒠 𝑛 𝑜,𝜗,𝜀 , 𝑀 𝒠 𝐵 𝑇 ≤ inf ℎ∈ℋ 𝑜 𝑀 𝒠 ℎ + 𝜗 and 𝐵(𝑇)(𝑦) can be computed in time 𝑞𝑝𝑚𝑧 𝑜, 1 𝜗 , 𝑚𝑝𝑕 1 𝜀 and 𝐵 always outputs a predictor in ℋ 𝑜

Conditions for Efficient Agnostic Learning 𝐹𝑆𝑁 ℋ 𝑇 = arg min ℎ∈ℋ 𝑀 𝑇 (ℎ) • Claim: If • VCdim ℋ 𝑜 ≤ 𝑞𝑝𝑚𝑧(𝑜) , and • Each ℎ ∈ ℋ 𝑜 is computable in time poly(n) • There is a poly-time (in size of input) algorithm for 𝐹𝑆𝑁 ℋ (i.e. that returns any ERM) then ℋ 𝑜 is efficiently agnostically properly PAC learnable. 𝐵𝐻𝑆𝐹𝐹𝑁𝐹𝑂𝑈 ℋ 𝑇, 𝑙 = 1 𝑗𝑔𝑔 ∃ ℎ∈ℋ 𝑀 𝑇 ℎ ≤ (1 − 𝑙 𝑇 ) • Claim: If ℋ 𝑜 is efficiently properly agnostically PAC learnable, then 𝐵𝐻𝑆𝐹𝐹𝑁𝐹𝑂𝑈 ℋ ∈ 𝑆𝑄

What is Properly Agnostically Learnable? • Poly-time functions? No! (not even in realizable case) • Poly-length logical formulas? No! (not even in realizable case) • Poly-size depth-2 neural networks? No! (not even in realizable case) • Halfspaces (linear predictors)? No! • 𝒴 𝑜 = 0,1 𝑜 , ℋ 𝑜 = | 𝑥 ∈ ℝ 𝑜 𝑦 ↦ 𝑥, 𝑦 > 0 • Claim: 𝐵𝐻𝑆𝐹𝐹𝑁𝐹𝑂𝑈 ℋ is NP-Hard (optional HW problem) • Conclusion: If 𝑂𝑄 ≠ 𝑆𝑄 , halfspaces are not efficiently properly agnostically learnable • Conjunctions? No! • Also NP-hard! • Unions of segments on the line Yes! 𝑜 • 𝒴 𝑜 = 0,1 , ℋ 𝑜 = 𝑦 ↦∨ 𝑗=1 𝑏 𝑗 ≤ 𝑦 ≤ 𝑐 𝑗 | 𝑏 𝑗 , 𝑐 𝑗 ∈ 0,1 • Efficiently Properly Agnostically PAC Learnable!

Source of the Hardness min ℎ∈ℋ ℓ(ℎ 𝑥 𝑦 𝑗 ; 𝑧 𝑗 ) 𝑗 ℎ 𝑥 𝑦 = 〈𝑥, 𝑦〉 ℓ 01 ℎ 𝑦 ; 𝑧 = 𝑧ℎ(𝑦) ≤ 0 ℓ 01 (ℎ 𝑦 ; 𝑧 = −1) ℓ 𝑡𝑟𝑠 (ℎ 𝑦 ; 𝑧 = −1) 1 1 -1 ℎ 𝑦 ∈ ℝ ℎ 𝑦 ∈ ℝ

Convexity • Definition (convex set): A set 𝐷 in a vector space is convex if ∀𝑣, 𝑤 ∈ 𝐷 and for all 𝛽 ∈ 0,1 : 𝛽𝑣 + 1 − 𝛽 𝑤 ∈ 𝐷

Convexity • Definition (convex function): A function 𝑔: 𝐷 ↦ ℝ is convex if ∀𝑣, 𝑤 ∈ 𝐷: 𝑔 𝛽𝑣 + 1 − 𝛽 𝑤 ≤ 𝛽𝑔 𝑣 + 1 − 𝛽 𝑔(𝑤) 𝑔(𝑤) 𝛽𝑔(𝑣) + 1 − 𝛽 𝑔(𝑤) 𝑔(𝑣) 𝑔(𝛽𝑣 + 1 − 𝛽 𝑤) 𝑣 𝑤 𝛽𝑣 + 1 − 𝛽 𝑤

Using a surrogate loss min ℎ∈ℋ ℓ(ℎ 𝑥 𝑦 𝑗 ; 𝑧 𝑗 ) 𝑗 • Instead of ℓ 01 (𝑨; 𝑧) , use a surrogate ℓ(𝑨; 𝑧) s.t.: • ∀ 𝑧 ℓ(𝑨; 𝑧) is convex in 𝑨 • ∀ 𝑨,𝑧 ℓ 01 𝑨; 𝑧 ≤ ℓ(𝑨; 𝑧) • E.g. • ℓ 𝑡𝑟𝑠 𝑨; 𝑧 = 𝑧 − 𝑨 2 • ℓ 𝑚𝑝𝑕𝑗𝑡𝑢𝑗𝑑 𝑨; 𝑧 = log(1 + exp −𝑧𝑨 ) • ℓ ℎ𝑗𝑜𝑕𝑓 z; 𝑧 = 1 − 𝑧𝑨 + = max{0,1 − 𝑧𝑨 }

Computational and Statistical Learning Theory TTIC 31120 Prof. - PowerPoint PPT Presentation

Computational and Statistical Learning Theory TTIC 31120 Prof. Nati Srebro Lecture 7: Computational Complexity of Learning Hardness of Improper Learning (continued) Agnostic Learning Hardness of Learning via Crypto Easy to generate

Statistical Statistical Statistical Model Statistical Model Model Checking Model Checking

Statistical and Computational Statistical and Computational Learning Theory Learning Theory

Brief introduction to computational & statistical neuroscience Jonathan Pillow Lecture #1

Statistical graphics with Statistical graphics with ggplot2 ggplot2 Programming for Statistical

1. Computational Fluid a. Computational Fluid Dynamics is in the domain of Computational Science

Statistical presentation Statistical presentation Statistical tabulations by age, sex and 3 digit

Statistics 435/535 Statistical Methods for Quality and Productivity Improvement / Statistical

Introduction to Statistical Process Control Statistical Process Control (SPC) uses seven major

EFTA Statistical Cooperation & the European Statistical System EEA Seminar EEA Seminar

EFTA Statistical Cooperation & the European Statistical System EEA Seminar EEA Seminar

13 Jan, 2011 Statistical Literacy: Confounding UTSA Confounding 2011 1 2011 2 Statistical

Nov 2010 Statistical Literacy: Harper's Magazine Fall 2010 1 Fall 2010 2 Statistical

STAT 401A - Statistical Methods for Research Workers Statistical Inference Jarad Niemi (Dr. J)

Statistical Machine Translation George Foster George Foster Statistical Machine Translation A

STA 214: Probability & Statistical Models STA 214: Analysis of Statistical Models

Statistical Simulation in Python Tushar Shanker Data Scientist DataCamp Statistical Simulation

DEM Temperature Analysis of Post-Flare Loops Using Hinode's X-Ray Telescope Kathy Reeves and

About Us We are a project-based multidisciplinary club focused on exploring robotics by building

Experiential Learning Collaborative: First Year Update (2018-19) Rachel Kornhauser, Co-PI Dr.

TIGER BRANDS LIMITED RESULTS PRESENTATION TO INVESTORS for the

FEASIBILITY OF SHRIMP AND TILAPIA POLY-CULTURE IN THE NORTH-WEST OF MEXICO, WITH SPECIAL

Poster (H) Display Board 950 mm (W) 1 2. Posters must be written in English . Each poster must

Investor Presentation : June 2018 Safe Harbor Forward Looking Statements This presentation

Keeping Your Brand Hot 37 th Annual Conference in a changing musical world Judith Ricker,

Computational and Statistical Learning Theory TTIC 31120 Prof. - PowerPoint PPT Presentation

Computational and Statistical Learning Theory TTIC 31120 Prof. Nati Srebro Lecture 7: Computational Complexity of Learning Hardness of Improper Learning (continued) Agnostic Learning Hardness of Learning via Crypto Easy to generate

Statistical Statistical Statistical Model Statistical Model Model Checking Model Checking

Statistical and Computational Statistical and Computational Learning Theory Learning Theory

Brief introduction to computational &amp; statistical neuroscience Jonathan Pillow Lecture #1

Statistical graphics with Statistical graphics with ggplot2 ggplot2 Programming for Statistical

1. Computational Fluid a. Computational Fluid Dynamics is in the domain of Computational Science

Statistical presentation Statistical presentation Statistical tabulations by age, sex and 3 digit

Statistics 435/535 Statistical Methods for Quality and Productivity Improvement / Statistical

Introduction to Statistical Process Control Statistical Process Control (SPC) uses seven major

EFTA Statistical Cooperation &amp; the European Statistical System EEA Seminar EEA Seminar

EFTA Statistical Cooperation &amp; the European Statistical System EEA Seminar EEA Seminar

13 Jan, 2011 Statistical Literacy: Confounding UTSA Confounding 2011 1 2011 2 Statistical

Nov 2010 Statistical Literacy: Harper's Magazine Fall 2010 1 Fall 2010 2 Statistical

STAT 401A - Statistical Methods for Research Workers Statistical Inference Jarad Niemi (Dr. J)

Statistical Machine Translation George Foster George Foster Statistical Machine Translation A

STA 214: Probability &amp; Statistical Models STA 214: Analysis of Statistical Models

Statistical Simulation in Python Tushar Shanker Data Scientist DataCamp Statistical Simulation

DEM Temperature Analysis of Post-Flare Loops Using Hinode's X-Ray Telescope Kathy Reeves and

About Us We are a project-based multidisciplinary club focused on exploring robotics by building

Experiential Learning Collaborative: First Year Update (2018-19) Rachel Kornhauser, Co-PI Dr.

TIGER BRANDS LIMITED RESULTS PRESENTATION TO INVESTORS for the

FEASIBILITY OF SHRIMP AND TILAPIA POLY-CULTURE IN THE NORTH-WEST OF MEXICO, WITH SPECIAL

Poster (H) Display Board 950 mm (W) 1 2. Posters must be written in English . Each poster must

Investor Presentation : June 2018 Safe Harbor Forward Looking Statements This presentation

Keeping Your Brand Hot 37 th Annual Conference in a changing musical world Judith Ricker,

Brief introduction to computational & statistical neuroscience Jonathan Pillow Lecture #1

EFTA Statistical Cooperation & the European Statistical System EEA Seminar EEA Seminar

EFTA Statistical Cooperation & the European Statistical System EEA Seminar EEA Seminar

STA 214: Probability & Statistical Models STA 214: Analysis of Statistical Models