Cyrus Cousins with Eli Upfal Brown University BigData Group Spring - PowerPoint PPT Presentation

Distance Based Learning k -Nearest Representatives Uniform Convergence The Curse of Dimensionality Generalization Bounds for Distance-Based Learning with High-Dimensional Domains and Codomains Cyrus Cousins with Eli Upfal Brown University BigData Group Spring 2019 Web: bigdata.cs.brown.edu Mail: cyrus cousins@brown.edu Cyrus Cousins Brown University Distance Based Learning

Distance Based Learning k -Nearest Representatives Uniform Convergence The Curse of Dimensionality Supervised Learning in Metric Spaces Domain X , with metric ∆( x 1 , x 2 ) : X × X → [0 , ∞ ) R d with Euclidean, L p , or Mahalanobis distance Graph with shortest-path distance Strings with edit distance Cyrus Cousins Brown University Distance Based Learning

Distance Based Learning k -Nearest Representatives Uniform Convergence The Curse of Dimensionality Supervised Learning in Metric Spaces Domain X , with metric ∆( x 1 , x 2 ) : X × X → [0 , ∞ ) R d with Euclidean, L p , or Mahalanobis distance Graph with shortest-path distance Strings with edit distance Codomain Y Probabilistic classification: Y = S n = { y : � y � 1 = 1 , 0 � | y | } Regression: Y = R c Cyrus Cousins Brown University Distance Based Learning

Distance Based Learning k -Nearest Representatives Uniform Convergence The Curse of Dimensionality Supervised Learning in Metric Spaces Domain X , with metric ∆( x 1 , x 2 ) : X × X → [0 , ∞ ) R d with Euclidean, L p , or Mahalanobis distance Graph with shortest-path distance Strings with edit distance Codomain Y Probabilistic classification: Y = S n = { y : � y � 1 = 1 , 0 � | y | } Regression: Y = R c Training set z drawn from X , with labels from Y Assume z drawn i.i.d.from distribution D over Z = X × Y Cyrus Cousins Brown University Distance Based Learning

Distance Based Learning k -Nearest Representatives Uniform Convergence The Curse of Dimensionality Supervised Learning in Metric Spaces Domain X , with metric ∆( x 1 , x 2 ) : X × X → [0 , ∞ ) R d with Euclidean, L p , or Mahalanobis distance Graph with shortest-path distance Strings with edit distance Codomain Y Probabilistic classification: Y = S n = { y : � y � 1 = 1 , 0 � | y | } Regression: Y = R c Training set z drawn from X , with labels from Y Assume z drawn i.i.d.from distribution D over Z = X × Y Underlying Assumption: Nearby points usually have similar labels Cyrus Cousins Brown University Distance Based Learning

Distance Based Learning k -Nearest Representatives Uniform Convergence The Curse of Dimensionality The k -Nearest Neighbors Classifier training set z ∼ D 1 Model defined by k and training set Cyrus Cousins Brown University Distance Based Learning

Distance Based Learning k -Nearest Representatives Uniform Convergence The Curse of Dimensionality The k -Nearest Neighbors Classifier training set z ∼ D 1 Model defined by k and training set 2 Identify k nearest neighbors to the query query Cyrus Cousins Brown University Distance Based Learning

Distance Based Learning k -Nearest Representatives Uniform Convergence The Curse of Dimensionality The k -Nearest Neighbors Classifier training set z ∼ D 1 Model defined by k and training set 2 Identify k nearest neighbors to the query query 3 Winner-takes-all vote over neighbor labels Cyrus Cousins Brown University Distance Based Learning

Distance Based Learning k -Nearest Representatives Uniform Convergence The Curse of Dimensionality The k -Nearest Neighbors Classifier training set z ∼ D 1 Model defined by k and training set 2 Identify k nearest neighbors to the query query 3 Winner-takes-all vote over neighbor labels Lazy learner No training procedure Nearest neighbors queries on training set Cyrus Cousins Brown University Distance Based Learning

Distance Based Learning k -Nearest Representatives Uniform Convergence The Curse of Dimensionality The k -Nearest Neighbors Classifier training set z ∼ D 1 Model defined by k and training set 2 Identify k nearest neighbors to the query query 3 Winner-takes-all vote over neighbor labels Lazy learner No training procedure Nearest neighbors queries on training set Control for overfitting by adjusting k Low k ⇒ high variance High k ⇒ high bias Cyrus Cousins Brown University Distance Based Learning

Distance Based Learning k -Nearest Representatives Uniform Convergence The Curse of Dimensionality The k -Nearest Neighbors Classifier training set z ∼ D 1 Model defined by k and training set 2 Identify k nearest neighbors to the query query 3 Winner-takes-all vote over neighbor labels Lazy learner No training procedure Nearest neighbors queries on training set Control for overfitting by adjusting k Low k ⇒ high variance High k ⇒ high bias Want to bound true error R ( D ) Have leave-one-out cross validation ˆ R loocv ( z ) � � R ( D ) ≤ ˆ Want P R loocv ( z ) + ǫ ≥ 1 − δ Quantify degree of overfitting Cyrus Cousins Brown University Distance Based Learning

Distance Based Learning k -Nearest Representatives Uniform Convergence The Curse of Dimensionality Problems with k -Nearest Neighbors Statistics: probability 1 − δ tail bounds on model loss Hypothesis stability : � � 1 + 24 k/ 2 π R ( D ) ≤ ˆ R loocv ( z ) + 2 mδ Cyrus Cousins Brown University Distance Based Learning

Distance Based Learning k -Nearest Representatives Uniform Convergence The Curse of Dimensionality Problems with k -Nearest Neighbors Statistics: probability 1 − δ tail bounds on model loss Hypothesis stability : � � 1 + 24 k/ 2 π R ( D ) ≤ ˆ R loocv ( z ) + 2 mδ An exponential stability bound: Assume Euclidean distance in R d γ d . = maximum kissing number , exponential in d � κ ≥ 1 . 271 √ 512 eκ ln( 2 δ ) + 2 2 k R ( D ) ≤ ˆ √ πm R loocv ( z ) + γ d k m Cyrus Cousins Brown University Distance Based Learning

Distance Based Learning k -Nearest Representatives Uniform Convergence The Curse of Dimensionality Problems with k -Nearest Neighbors Statistics: probability 1 − δ tail bounds on model loss Hypothesis stability : � � 1 + 24 k/ 2 π R ( D ) ≤ ˆ R loocv ( z ) + 2 mδ An exponential stability bound: Assume Euclidean distance in R d γ d . = maximum kissing number , exponential in d � κ ≥ 1 . 271 √ 512 eκ ln( 2 δ ) + 2 2 k R ( D ) ≤ ˆ √ πm R loocv ( z ) + γ d k m Stability bounds scale poorly with δ , d , k , large constants, metric specific Should improve with k , which smooths predictions Both γ d and k terms due to proof technique Max number of z j s.t. z i is a k -nearest neighbor Cyrus Cousins Brown University Distance Based Learning

Distance Based Learning k -Nearest Representatives Uniform Convergence The Curse of Dimensionality Problems with k -Nearest Neighbors Statistics: probability 1 − δ tail bounds on model loss Hypothesis stability : � � 1 + 24 k/ 2 π R ( D ) ≤ ˆ R loocv ( z ) + 2 mδ An exponential stability bound: Assume Euclidean distance in R d γ d . = maximum kissing number , exponential in d � κ ≥ 1 . 271 √ 512 eκ ln( 2 δ ) + 2 2 k R ( D ) ≤ ˆ √ πm R loocv ( z ) + γ d k m Stability bounds scale poorly with δ , d , k , large constants, metric specific Should improve with k , which smooths predictions Both γ d and k terms due to proof technique Max number of z j s.t. z i is a k -nearest neighbor Computational Efficiency Approximate k -NN queries computationally difficult High storage cost Cyrus Cousins Brown University Distance Based Learning

Distance Based Learning k -Nearest Representatives Uniform Convergence The Curse of Dimensionality The k -Nearest Representatives Classifier Training : 1 Select a parliament Draw a parliament p of unlabeled i.i.d. points from D Cyrus Cousins Brown University Distance Based Learning

Distance Based Learning k -Nearest Representatives Uniform Convergence The Curse of Dimensionality The k -Nearest Representatives Classifier Training : 1 Select a parliament Draw a parliament p of unlabeled i.i.d. points from D 2 Vote Draw training set z of m labeled i.i.d. points from D Associate each z i with its k -nearest representatives (from p ) Cyrus Cousins Brown University Distance Based Learning

Distance Based Learning k -Nearest Representatives Uniform Convergence The Curse of Dimensionality The k -Nearest Representatives Classifier Training : 1 Select a parliament Draw a parliament p of unlabeled i.i.d. points from D 2 Vote Draw training set z of m labeled i.i.d. points from D Associate each z i with its k -nearest representatives (from p ) 3 Decide the election Label each representative by winner-takes all vote Resolve ties arbitrarily Cyrus Cousins Brown University Distance Based Learning

Distance Based Learning k -Nearest Representatives Uniform Convergence The Curse of Dimensionality The k -Nearest Representatives Classifier Training : 1 Select a parliament Draw a parliament p of unlabeled i.i.d. points from D 2 Vote Draw training set z of m labeled i.i.d. points from D Associate each z i with its k -nearest representatives (from p ) 3 Decide the election Label each representative by winner-takes all vote Resolve ties arbitrarily Classification : 1 Identify k -nearest representatives to query point 2 Average associated labels to produce a soft classification Cyrus Cousins Brown University Distance Based Learning

Distance Based Learning k -Nearest Representatives Uniform Convergence The Curse of Dimensionality The 1 -NR Classifier: 1 Draw Parliament Cyrus Cousins Brown University Distance Based Learning

Cyrus Cousins with Eli Upfal Brown University BigData Group Spring - PowerPoint PPT Presentation

Distance Based Learning k -Nearest Representatives Uniform Convergence The Curse of Dimensionality Generalization Bounds for Distance-Based Learning with High-Dimensional Domains and Codomains Cyrus Cousins with Eli Upfal Brown University

CS155/254: Probabilistic Methods in Computer Science Eli Upfal Eli Upfal@brown.edu Office: 319

ELI-NP Nuclear Science and applications with the next generation of High Power Lasers and Gamma

Photonuclear Astrophysics at ELI-NP Dario Lattuada IFIN-HH/ELI-NP OUTLINE ELI-NP HPLS

(ELI-NP at Magurele - Pulse and Impulse of ELI) Extensive Light Investigations-ELI-apoma

(ELI-NP at Magurele - Pulse and Impulse of ELI) Extensive Light Investigations-ELI-apoma

ELI Parent Informational Meeting Monday, January 26th, 2014 Early Literacy Implementation (ELI)

Data and Computing for ELI-NP Mihai Ciubancan ELI-NP, Romania Amsterdam | EGI workshop May,

ELI-ALPS The Future Stronghold of Attoscience Sandro De Silvestri Politecnico di Milano (Italy)

Physical Inorganic Chemistry CH3514 Dr Eli Zysman-Colman CH3514 1 2 Physical Inorganic

VHDL VHDL - Flaxer Eli Ch 8 - 1 VHDL - Flaxer Eli Ch 8 - 2 Structural Modeling Structural

Control Timing Example D 3 T 4 : SC <= 0 Flaxer Eli - Computer Architecture Ch 2 - 1 Register

Machine Language Very difficult to understand ! Flaxer Eli - Computer Architecture Ch 3 - 1

1 ELI-NP at Magurele - Pulse and Impulse of ELI 1) " Polaritonic pulse and coherent X-

Genetic Algorithms Presentation by Eli Hodges Based on the paper by Eli Hodges What to Expect

Efficient Diameter Approximation for Large Graphs in MapReduce Geppino Pucci - Universit` a di

The Hiring Problem: Going Beyond Secretaries Sergei Vassilvitskii (Yahoo!) Andrei Broder

On k -anonymity and the curse of dimensionality Introduction An important method for privacy

Dimensionality Reduction Alexandros Tantos Assistant Professor Aristotle University of

. . . 1 / 5 The curse of dimensionality . many applications require high dimensional data .

CSC 2515: Machine Learning Lecture 1 - Introduction and Nearest Neighbours Roger Grosse

Natural Language Processing with Deep Learning Sentiment Analysis with Machine Learning Navid

Lifting the curse of dimensionality in nonlinear system identification with tensor networks. Kim

The Nearest Neighbor Algorithm The Nearest Neighbor Algorithm Hypothesis Space Hypothesis Space

Unsupervised learning Clustering and Dimensionality Reduction Marta Arias marias@cs.upc.edu

Sambuz

Useful Links

Newsletter

Mail Us

Cyrus Cousins with Eli Upfal Brown University BigData Group Spring - PowerPoint PPT Presentation

Distance Based Learning k -Nearest Representatives Uniform Convergence The Curse of Dimensionality Generalization Bounds for Distance-Based Learning with High-Dimensional Domains and Codomains Cyrus Cousins with Eli Upfal Brown University

CS155/254: Probabilistic Methods in Computer Science Eli Upfal Eli Upfal@brown.edu Office: 319

ELI-NP Nuclear Science and applications with the next generation of High Power Lasers and Gamma

Photonuclear Astrophysics at ELI-NP Dario Lattuada IFIN-HH/ELI-NP OUTLINE ELI-NP HPLS

(ELI-NP at Magurele - Pulse and Impulse of ELI) Extensive Light Investigations-ELI-apoma

(ELI-NP at Magurele - Pulse and Impulse of ELI) Extensive Light Investigations-ELI-apoma

ELI Parent Informational Meeting Monday, January 26th, 2014 Early Literacy Implementation (ELI)

Data and Computing for ELI-NP Mihai Ciubancan ELI-NP, Romania Amsterdam | EGI workshop May,

ELI-ALPS The Future Stronghold of Attoscience Sandro De Silvestri Politecnico di Milano (Italy)

Physical Inorganic Chemistry CH3514 Dr Eli Zysman-Colman CH3514 1 2 Physical Inorganic

VHDL VHDL - Flaxer Eli Ch 8 - 1 VHDL - Flaxer Eli Ch 8 - 2 Structural Modeling Structural

Control Timing Example D 3 T 4 : SC &lt;= 0 Flaxer Eli - Computer Architecture Ch 2 - 1 Register

Machine Language Very difficult to understand ! Flaxer Eli - Computer Architecture Ch 3 - 1

1 ELI-NP at Magurele - Pulse and Impulse of ELI 1) &quot; Polaritonic pulse and coherent X-

Genetic Algorithms Presentation by Eli Hodges Based on the paper by Eli Hodges What to Expect

Efficient Diameter Approximation for Large Graphs in MapReduce Geppino Pucci - Universit` a di

The Hiring Problem: Going Beyond Secretaries Sergei Vassilvitskii (Yahoo!) Andrei Broder

On k -anonymity and the curse of dimensionality Introduction An important method for privacy

Dimensionality Reduction Alexandros Tantos Assistant Professor Aristotle University of

. . . 1 / 5 The curse of dimensionality . many applications require high dimensional data .

CSC 2515: Machine Learning Lecture 1 - Introduction and Nearest Neighbours Roger Grosse

Natural Language Processing with Deep Learning Sentiment Analysis with Machine Learning Navid

Lifting the curse of dimensionality in nonlinear system identification with tensor networks. Kim

The Nearest Neighbor Algorithm The Nearest Neighbor Algorithm Hypothesis Space Hypothesis Space

Unsupervised learning Clustering and Dimensionality Reduction Marta Arias marias@cs.upc.edu

Sambuz

Useful Links

Newsletter

Mail Us

Control Timing Example D 3 T 4 : SC <= 0 Flaxer Eli - Computer Architecture Ch 2 - 1 Register

1 ELI-NP at Magurele - Pulse and Impulse of ELI 1) " Polaritonic pulse and coherent X-