cyrus cousins with eli upfal
play

Cyrus Cousins with Eli Upfal Brown University BigData Group Spring - PowerPoint PPT Presentation

Distance Based Learning k -Nearest Representatives Uniform Convergence The Curse of Dimensionality Generalization Bounds for Distance-Based Learning with High-Dimensional Domains and Codomains Cyrus Cousins with Eli Upfal Brown University


  1. Distance Based Learning k -Nearest Representatives Uniform Convergence The Curse of Dimensionality Generalization Bounds for Distance-Based Learning with High-Dimensional Domains and Codomains Cyrus Cousins with Eli Upfal Brown University BigData Group Spring 2019 Web: bigdata.cs.brown.edu Mail: cyrus cousins@brown.edu Cyrus Cousins Brown University Distance Based Learning

  2. Distance Based Learning k -Nearest Representatives Uniform Convergence The Curse of Dimensionality Supervised Learning in Metric Spaces Domain X , with metric ∆( x 1 , x 2 ) : X × X → [0 , ∞ ) R d with Euclidean, L p , or Mahalanobis distance Graph with shortest-path distance Strings with edit distance Cyrus Cousins Brown University Distance Based Learning

  3. Distance Based Learning k -Nearest Representatives Uniform Convergence The Curse of Dimensionality Supervised Learning in Metric Spaces Domain X , with metric ∆( x 1 , x 2 ) : X × X → [0 , ∞ ) R d with Euclidean, L p , or Mahalanobis distance Graph with shortest-path distance Strings with edit distance Codomain Y Probabilistic classification: Y = S n = { y : � y � 1 = 1 , 0 � | y | } Regression: Y = R c Cyrus Cousins Brown University Distance Based Learning

  4. Distance Based Learning k -Nearest Representatives Uniform Convergence The Curse of Dimensionality Supervised Learning in Metric Spaces Domain X , with metric ∆( x 1 , x 2 ) : X × X → [0 , ∞ ) R d with Euclidean, L p , or Mahalanobis distance Graph with shortest-path distance Strings with edit distance Codomain Y Probabilistic classification: Y = S n = { y : � y � 1 = 1 , 0 � | y | } Regression: Y = R c Training set z drawn from X , with labels from Y Assume z drawn i.i.d.from distribution D over Z = X × Y Cyrus Cousins Brown University Distance Based Learning

  5. Distance Based Learning k -Nearest Representatives Uniform Convergence The Curse of Dimensionality Supervised Learning in Metric Spaces Domain X , with metric ∆( x 1 , x 2 ) : X × X → [0 , ∞ ) R d with Euclidean, L p , or Mahalanobis distance Graph with shortest-path distance Strings with edit distance Codomain Y Probabilistic classification: Y = S n = { y : � y � 1 = 1 , 0 � | y | } Regression: Y = R c Training set z drawn from X , with labels from Y Assume z drawn i.i.d.from distribution D over Z = X × Y Underlying Assumption: Nearby points usually have similar labels Cyrus Cousins Brown University Distance Based Learning

  6. Distance Based Learning k -Nearest Representatives Uniform Convergence The Curse of Dimensionality The k -Nearest Neighbors Classifier training set z ∼ D 1 Model defined by k and training set Cyrus Cousins Brown University Distance Based Learning

  7. Distance Based Learning k -Nearest Representatives Uniform Convergence The Curse of Dimensionality The k -Nearest Neighbors Classifier training set z ∼ D 1 Model defined by k and training set 2 Identify k nearest neighbors to the query query Cyrus Cousins Brown University Distance Based Learning

  8. Distance Based Learning k -Nearest Representatives Uniform Convergence The Curse of Dimensionality The k -Nearest Neighbors Classifier training set z ∼ D 1 Model defined by k and training set 2 Identify k nearest neighbors to the query query 3 Winner-takes-all vote over neighbor labels Cyrus Cousins Brown University Distance Based Learning

  9. Distance Based Learning k -Nearest Representatives Uniform Convergence The Curse of Dimensionality The k -Nearest Neighbors Classifier training set z ∼ D 1 Model defined by k and training set 2 Identify k nearest neighbors to the query query 3 Winner-takes-all vote over neighbor labels Lazy learner No training procedure Nearest neighbors queries on training set Cyrus Cousins Brown University Distance Based Learning

  10. Distance Based Learning k -Nearest Representatives Uniform Convergence The Curse of Dimensionality The k -Nearest Neighbors Classifier training set z ∼ D 1 Model defined by k and training set 2 Identify k nearest neighbors to the query query 3 Winner-takes-all vote over neighbor labels Lazy learner No training procedure Nearest neighbors queries on training set Control for overfitting by adjusting k Low k ⇒ high variance High k ⇒ high bias Cyrus Cousins Brown University Distance Based Learning

  11. Distance Based Learning k -Nearest Representatives Uniform Convergence The Curse of Dimensionality The k -Nearest Neighbors Classifier training set z ∼ D 1 Model defined by k and training set 2 Identify k nearest neighbors to the query query 3 Winner-takes-all vote over neighbor labels Lazy learner No training procedure Nearest neighbors queries on training set Control for overfitting by adjusting k Low k ⇒ high variance High k ⇒ high bias Want to bound true error R ( D ) Have leave-one-out cross validation ˆ R loocv ( z ) � � R ( D ) ≤ ˆ Want P R loocv ( z ) + ǫ ≥ 1 − δ Quantify degree of overfitting Cyrus Cousins Brown University Distance Based Learning

  12. Distance Based Learning k -Nearest Representatives Uniform Convergence The Curse of Dimensionality Problems with k -Nearest Neighbors Statistics: probability 1 − δ tail bounds on model loss Hypothesis stability : � � 1 + 24 k/ 2 π R ( D ) ≤ ˆ R loocv ( z ) + 2 mδ Cyrus Cousins Brown University Distance Based Learning

  13. Distance Based Learning k -Nearest Representatives Uniform Convergence The Curse of Dimensionality Problems with k -Nearest Neighbors Statistics: probability 1 − δ tail bounds on model loss Hypothesis stability : � � 1 + 24 k/ 2 π R ( D ) ≤ ˆ R loocv ( z ) + 2 mδ An exponential stability bound: Assume Euclidean distance in R d γ d . = maximum kissing number , exponential in d � κ ≥ 1 . 271 √ 512 eκ ln( 2 δ ) + 2 2 k R ( D ) ≤ ˆ √ πm R loocv ( z ) + γ d k m Cyrus Cousins Brown University Distance Based Learning

  14. Distance Based Learning k -Nearest Representatives Uniform Convergence The Curse of Dimensionality Problems with k -Nearest Neighbors Statistics: probability 1 − δ tail bounds on model loss Hypothesis stability : � � 1 + 24 k/ 2 π R ( D ) ≤ ˆ R loocv ( z ) + 2 mδ An exponential stability bound: Assume Euclidean distance in R d γ d . = maximum kissing number , exponential in d � κ ≥ 1 . 271 √ 512 eκ ln( 2 δ ) + 2 2 k R ( D ) ≤ ˆ √ πm R loocv ( z ) + γ d k m Stability bounds scale poorly with δ , d , k , large constants, metric specific Should improve with k , which smooths predictions Both γ d and k terms due to proof technique Max number of z j s.t. z i is a k -nearest neighbor Cyrus Cousins Brown University Distance Based Learning

  15. Distance Based Learning k -Nearest Representatives Uniform Convergence The Curse of Dimensionality Problems with k -Nearest Neighbors Statistics: probability 1 − δ tail bounds on model loss Hypothesis stability : � � 1 + 24 k/ 2 π R ( D ) ≤ ˆ R loocv ( z ) + 2 mδ An exponential stability bound: Assume Euclidean distance in R d γ d . = maximum kissing number , exponential in d � κ ≥ 1 . 271 √ 512 eκ ln( 2 δ ) + 2 2 k R ( D ) ≤ ˆ √ πm R loocv ( z ) + γ d k m Stability bounds scale poorly with δ , d , k , large constants, metric specific Should improve with k , which smooths predictions Both γ d and k terms due to proof technique Max number of z j s.t. z i is a k -nearest neighbor Computational Efficiency Approximate k -NN queries computationally difficult High storage cost Cyrus Cousins Brown University Distance Based Learning

  16. Distance Based Learning k -Nearest Representatives Uniform Convergence The Curse of Dimensionality The k -Nearest Representatives Classifier Training : 1 Select a parliament Draw a parliament p of unlabeled i.i.d. points from D Cyrus Cousins Brown University Distance Based Learning

  17. Distance Based Learning k -Nearest Representatives Uniform Convergence The Curse of Dimensionality The k -Nearest Representatives Classifier Training : 1 Select a parliament Draw a parliament p of unlabeled i.i.d. points from D 2 Vote Draw training set z of m labeled i.i.d. points from D Associate each z i with its k -nearest representatives (from p ) Cyrus Cousins Brown University Distance Based Learning

  18. Distance Based Learning k -Nearest Representatives Uniform Convergence The Curse of Dimensionality The k -Nearest Representatives Classifier Training : 1 Select a parliament Draw a parliament p of unlabeled i.i.d. points from D 2 Vote Draw training set z of m labeled i.i.d. points from D Associate each z i with its k -nearest representatives (from p ) 3 Decide the election Label each representative by winner-takes all vote Resolve ties arbitrarily Cyrus Cousins Brown University Distance Based Learning

  19. Distance Based Learning k -Nearest Representatives Uniform Convergence The Curse of Dimensionality The k -Nearest Representatives Classifier Training : 1 Select a parliament Draw a parliament p of unlabeled i.i.d. points from D 2 Vote Draw training set z of m labeled i.i.d. points from D Associate each z i with its k -nearest representatives (from p ) 3 Decide the election Label each representative by winner-takes all vote Resolve ties arbitrarily Classification : 1 Identify k -nearest representatives to query point 2 Average associated labels to produce a soft classification Cyrus Cousins Brown University Distance Based Learning

  20. Distance Based Learning k -Nearest Representatives Uniform Convergence The Curse of Dimensionality The 1 -NR Classifier: 1 Draw Parliament Cyrus Cousins Brown University Distance Based Learning

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend