Dim imensionality ty Redu eduction: Th Theoretic ical Ana - PowerPoint PPT Presentation

Dim imensionality ty Redu eduction: Th Theoretic ical Ana nalysis of Pr Practi tical Mea easu sures Nova Fandina Hebrew University Joint work with Yair Bartal, Hebrew University Ofer Neiman, Ben Gurion University 1

Outl utline • Measuring the Quality of Embedding - in theory : worst case distortion analysis - in practice: average case distortion measures - in between: theoretical analysis of practical measures (for dimensionality reduction methods) • Our Results - upper bounds - lower bounds - approximating optimal embedding 2

Measuring the Quality ty of Embeddin ing: g: in theory Basic question in metric embedding theory (informally) Given metric spaces 𝑌 and 𝑍 , embed 𝑌 into 𝑍 , with small error on the distances How well it can be done? In theory: “well” traditionally means to minimize distortion of the worst pair Definition For an embedding 𝑔: 𝑌 → 𝑍 , for a pair of points 𝑣 ≠ 𝑤 ∈ 𝑌 𝑒 𝑍 𝑔 𝑣 ,𝑔 𝑤 𝑒 𝑌 𝑣,𝑤 • 𝑓𝑦𝑞𝑏𝑜𝑡 𝑔 𝑣, 𝑤 = , 𝑑𝑝𝑜𝑢𝑠 𝑔 𝑣, 𝑤 = 𝑒 𝑌 𝑣,𝑤 𝑒 𝑍 𝑔 𝑣 ,𝑔 𝑤 • 𝑒𝑗𝑡𝑢𝑝𝑠𝑢𝑗𝑝𝑜 𝑔 = 𝑛𝑏𝑦 𝑣≠𝑤∈𝑌 𝑓𝑦𝑞𝑏𝑜𝑡 𝑔 𝑣, 𝑤 ⋅ 𝑛𝑏𝑦 𝑣≠𝑤∈𝑌 {𝑑𝑝𝑜𝑢𝑠 𝑔 (𝑣, 𝑤)} 3

Mea easuring the Quality ty of Embeddin ing: in pract ctice ce Demand for the worst case guarantee is too strong: The quality of a method in practical applications is its average performance over all pairs • Yuval Shavitt and Tomer Tankel. Big-bang simulation for embedding network distances in Euclidean space . IEEE/ACM Trans. Netw., 12(6), 2004. • P. Sharma, Z. Xu, S. Banerjee, and S. Lee . Estimating network proximity and latency. Computer Communication Review, 36(3), 2006. • P. J. F. Groenen, R. Mathar, and W. J. Heiser. The majorization approach to multidimensional scaling for minkowski distances . Journal of Classification, 12(1), 1995. • J. F. Vera, W. J. Heiser, and A. Murillo. Global optimization in any minkowski metric: A permutation- translation simulated annealing algorithm for multidimensional scaling . Journal of Classification, 24(2), 2007. • A. Censi and D. Scaramuzza. Calibration by correlation using metric embedding from nonmetric similarities. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(10), 2013. • C. Lumezanu and N. Spring . Measurement manipulation and space selection in network coordinates . The 28th International Conference on Distributed Computing Systems, 2008. • S. Chatterjee, B. Neff, and P. Kumar. Instant approximate 1-center on road networks via embeddings. In Proceedings of the 19th International Conference on Advances in Geographic Information Systems, GIS ’11, 2011. • S. Lee, Z. Zhang, S. Sahu, and D. Saha. On suitability of Euclidean embedding for host-based network coordinate systems . IEEE/ACM Trans. Netw., 18(1), 2010. • L. Chennuru Vankadara and U. von Luxburg. Measures of distortion for machine learning . Advances in Neural Information Processing Systems, Curran Associates, Inc., 2018. Just a small sample from googolplex number of such studies 4

Measuring the Quality ty of Embeddin ing: g: in pract ctice ce Moments of Distortion and Relative Error For 𝑔: 𝑌 → 𝑍 , for a pair 𝑣 ≠ 𝑤 ∈ 𝑌, 𝑒𝑗𝑡𝑢 𝑔 𝑣, 𝑤 ≔ max{𝑓𝑦𝑞𝑏𝑜𝑡 𝑔 𝑣, 𝑤 , 𝑑𝑝𝑜𝑢𝑠𝑏𝑑𝑢 𝑔 (𝑣, 𝑤)} ℓ 𝒓 -di distortio ion ABN[06] For 𝑔: 𝑌 → 𝑍, for a distribution Π over pairs of 𝑌, 𝑟 ≥ 1 1/𝑟 𝑟 Π - dist(f) = 𝐹 𝛲 ℓ 𝑟 𝑒𝑗𝑡𝑢 𝑔 𝑣, 𝑤 Relative Error Measure [In many papers] 1/𝑟 (Π) = 𝐹 𝛲 𝑟 𝑆𝐹𝑁 𝑟 |𝑒𝑗𝑡𝑢 𝑔 𝑣, 𝑤 − 1| 5

Measuring the Quality ty of Embeddin ing: g: in pract ctice ce Additive Distortion Measures [MDS: optimally embed a given finite X into a k-dim Euclidean space, for a given k] መ For a pair 𝑣 ≠ 𝑤 ∈ 𝑌, 𝑒 𝑣𝑤 = 𝑒 𝑌 𝑣, 𝑤 , 𝑒 𝑣𝑤 = 𝑒 𝑍 𝑔 𝑣 , 𝑔 𝑤 1/𝑟 1/𝑟 𝐹 Π [|𝑒 𝑣𝑤 − መ 𝑒 𝑣𝑤 | 𝑟 ] 𝐹 Π [|𝑒 𝑣𝑤 − መ 𝑒 𝑣𝑤 | 𝑟 ] ∗ 𝑔 = 𝑇𝑢𝑠𝑓𝑡𝑡 𝑟 𝑇𝑢𝑠𝑓𝑡𝑡 𝑟 𝑔 = 𝑟 ] 𝐹 Π [ 𝑒 𝑣𝑤 𝑟 ] 𝐹 Π [ መ 𝑒 𝑣𝑤 1/𝑟 1/𝑟 𝑟 𝑟 | መ | መ 𝑒 𝑣𝑤 − 𝑒 𝑣𝑤 | 𝑒 𝑣𝑤 − 𝑒 𝑣𝑤 | 𝑆𝐹𝑁 𝑟 𝑔 = E Π 𝐹𝑜𝑓𝑠𝑕𝑧 𝑟 𝑔 = E Π min{𝑒 𝑣𝑤 , መ 𝑒 𝑣𝑤 𝑒 𝑣𝑤 } 6

Measuring the Quality ty of Embeddin ing: g: in pract ctice ce 𝝉 -distortion ML motivated, in [VvL18] ➢ Many heuristics for optimizing these measures 1/𝑟 𝑟 𝑓𝑦𝑏𝑞𝑜𝑡 𝑔 𝑣, 𝑤 𝜏 − 𝑒𝑗𝑡𝑢 (Π)𝑟,𝑠 𝑔 = E Π 𝑉 − 𝑓𝑦𝑞𝑏𝑜𝑡 𝑔 − 1 ℓ 𝑠 ➢ Almost nothing is known in terms of rigorous analysis (U) − 𝑓𝑦𝑞𝑏𝑜𝑡 𝑔 = 𝐹 U [(𝑓𝑦𝑞𝑏𝑜𝑡 𝑔 𝑣, 𝑤 𝑠 )] • ℓ 𝑠 (U) − 𝑑𝑝𝑜𝑢𝑠 𝑔 = 𝐹 U [(𝑑𝑝𝑜𝑢𝑠 𝑔 𝑣, 𝑤 𝑠 )] • ℓ 𝑠 “Necessary properties for ML applications” • translation invariance • scale invariance • monotonicity • robustness (outliers, noise) • incorporation of probability 7

Measuring the Quality ty of Embeddin ing: g: in betw tween Bridging the gap between theory and practice outlook 𝛽(𝑙, 𝑟) -Dimension Reduction Given a dimension bound 𝐥 ≥ 𝟐 and 𝐫 ≥ 𝟐 , what is the least 𝛃(𝐥, 𝐫) such that every finite subset of Euclidean space embeds into 𝐥 dim. with 𝐍𝐟𝐛𝐭𝐯𝐬𝐟 𝐫 ≤ 𝛃(𝐥, 𝐫) ? General Metrics: Approximating the Optimal Embedding General Metrics (MDS) For a given finite 𝑌 and for 𝑙 ≥ 1 , compute an embedding of X into k-dim Euclidean For a given finite 𝑌 and 𝑙 ≥ 1 , compute the optimal embedding of 𝑌 into k-dim space that approximates the best possible embedding, for a given 𝑁𝑓𝑏𝑡𝑣𝑠𝑓 𝑟 . Euclidean space, minimizing a particular 𝑁𝑓𝑏𝑡𝑣𝑠𝑓 𝑟 . [CD06] Optimizing is NP-hard for 𝑇𝑢𝑠𝑓𝑡𝑡 𝑟 and 𝑙 = 1 8

Our ur Resu sult lts: : upper boun ounds s previous results 𝛽(𝑙, 𝑟) -Dimension Reduction Given a dimension bound 𝐥 ≥ 𝟐 and 𝐫 ≥ 𝟐 , what is the least 𝛃(𝐥, 𝐫) such that every finite subset of Euclidean space embeds into 𝐥 dim. with 𝐍𝐟𝐛𝐭𝐯𝐬𝐟 𝐫 ≤ 𝛃(𝐥, 𝐫) ? Previous results: worst case distortion 2 𝑒 embeds into ℓ 2 𝑙 with distortion 𝑃 𝑜 𝑙 (log 𝑜)/𝑙 JL[84] Every 𝑜 -point 𝑌 ∈ ℓ 2 𝑙+1 such that any 𝑔: 𝑊 → ℓ 2 𝑙 must have distortion 𝑜 Ω(1/𝑙) Mat[90] There is 𝑊 ∈ ℓ 2 distortion(f) ≤ (ℓ ∞ - dist) 2 ▪ ▪ For every 𝑔: 𝑌 → 𝑍 (scalable) there is g: 𝑌 → 𝑍 with ℓ ∞ - 𝐞𝐣𝐭𝐮 𝐡 = 𝐞𝐣𝐭𝐮 𝐠 What about the 𝑁𝑓𝑏𝑡𝑣𝑠𝑓 𝑟 guarantees for 𝑟 < ∞? 9

Our ur Resu esult lts: upper boun bounds s JL transform: IM implementation The answer to the 𝛽(𝑙, 𝑟) -Dim. Reduction question is, essentially, the JL transform [JL84] Projection onto a random subspace of dim. 𝒍 = 𝑷(𝐦𝐩𝐡 𝒐 /𝝑 𝟑 ) , with const. prob. 𝒆𝒋𝒕𝒖 𝒈 = 𝟐 + 𝝑 [tight, LN16] [IM 98] 𝑈 is a matrix of size 𝑙 × 𝑒 with indep. entries sampled from 𝑂(0,1) . 𝑙 is The embedding 𝑔: 𝑌 → ℓ 2 𝑔 𝑦 = 1/ 𝑙 ⋅ 𝑈(𝑦) • The JL transform of IM98 provides constant upper bounds for all 𝑁𝑓𝑏𝑡𝑣𝑠𝑓 𝑟 The bounds are almost optimal • Other popular implementations of JL do not work for ℓ 𝑟 -dist and for 𝑆𝐹𝑁 𝑟 • PCA may produce an embedding of extremely poor quality for all the measures (this does not happen to the JL) 10

Our ur Resu sult lts: : upper boun ounds s other implementations of JL [Achl03] The entries of T are uniform indep. from {±1} [DKS10,KN10, AL10] Sparse/Fast: particular distr. from {±1,0} Constant bounds cannot be achieved using the above implementations Observation If a linear transformation 𝑈: 𝑆 𝑒 → 𝑆 𝑙 samples its entries form a discrete set of values of size 𝑡 ≤ 𝑒 1/𝑙 , then applying it on a standard basis of 𝑆 𝑒 results in ℓ 𝑟 -dist, 𝑆𝐹𝑁 𝑟 = ∞. 1/𝑟 𝑟 1/𝑟 (Π) = 𝐹 𝛲 Π - dist(f) = 𝐹 𝛲 𝑟 ▪ ℓ 𝑟 𝑒𝑗𝑡𝑢 𝑔 𝑣, 𝑤 , 𝑆𝐹𝑁 𝑟 |𝑒𝑗𝑡𝑢 𝑔 𝑣, 𝑤 − 1| ▪ 𝑒𝑗𝑡𝑢 𝑔 𝑣, 𝑤 = max(𝑓𝑦𝑞𝑏𝑜𝑡 𝑔 𝑣, 𝑤 , 𝑑𝑝𝑜𝑢𝑠𝑏𝑑𝑢 𝑔 (𝑣, 𝑤)) 𝑈 𝑓 1 , … , 𝑓 𝑒 = {𝑑𝑝𝑚𝑣𝑛𝑜𝑡 𝑝𝑔 𝑈}. T he number of different columns is 𝑡 𝑙 < 𝑒 ➢ 11

Our ur Resu sult lts: upper boun ounds s limitation of PCA 𝑒 and a given integer 𝑙 ≥ 1 , computes the PCA/c-MDS For a given finite 𝑌 ∈ ℓ 2 best rank 𝑙 - approx. to 𝑌: A projection 𝑄 onto the 𝑙 - dim subspace spanned by largest eigenvectors of 2 the covariance matrix, with the smallest σ 𝑣∈𝑌 𝑣 − 𝑄 𝑣 𝑙 with optimal σ 𝑣≠𝑤∈𝑌 (𝑒 𝑣𝑤 2 − መ 2 ) over all projections ▪ 𝑔: 𝑌 → ℓ 2 𝑒 𝑣𝑤 ▪ Often misused: “minimizing 𝑇𝑢𝑠𝑓𝑡𝑡 2 over all embeddings into 𝑙 - dim” ▪ Actually, PCA does not minimize any of the mentioned measures 12

Dim imensionality ty Redu eduction: Th Theoretic ical Ana - PowerPoint PPT Presentation

Dim imensionality ty Redu eduction: Th Theoretic ical Ana nalysis of Pr Practi tical Mea easu sures Nova Fandina Hebrew University Joint work with Yair Bartal, Hebrew University Ofer Neiman, Ben Gurion University 1 Outl utline

Redu eduction ction In In For orce ce Revi view w of of th the e Basics sics Lynn

Name: Prone Leg Curl Tube Thickness: 3.0mm Dim: 196013501180mm Weight: 400KG Model No: EJ01

Name: Leg Extension Tube Thickness: 2.5mm Dim: 140105150cm Weight: 214KG Model No: OE502

Name: Prone Leg Curl Tube Thickness: 2.5mm Dim: 15299135cm Weight: 216 KG Model No: TT101

http://cs246.stanford.edu High dim. High dim. Graph Graph Infinite Infinite Machine Machine Apps

Te Texa xas s Emis Emission sion Re Redu duct ction ion Plan Plan (TER (TERP) P) Dray

LSTM M Based sed Ada dapt ptive ive Fil ilterin ering g for r Redu duced ced Pre redi

di dimen ension sion re redu ducti ction on Yury Makarychev, TTIC Konstantin Makarychev,

C ASE S TUDIES: R EDUCTION OF T HERMAL C ONDUCTIVITY & I MPROVEMENT OF M ECHANICAL P ERFORMANCE

R eduction In Mechanical & & Thermal Assemb mblies 2019 Com m only used shielding

FCPF C ARBON F UND : L AO PDR E MISSION R EDUCTION P ROGRAM Steve Danyo, Alexander Lotsch (co-

SHERPA A. Clappier 1 SHERPA SHERPA means, S creening for H igh E mission R eduction P otentials

Community Emissi Community Emissions R ons Reduc eduction tion Pr Program ogram (CE (CERP)

T AX D EDUCTION AT S OURCE [TDS] By : CA Abhijit Sawarkar 1 www.taxguru.in INDEX 1.

Application Moment et R eduction en M ecanique Tudor S. Ratiu School of Mathematics

Corporate Presentation DIMCOIN Foundation Date: 15.06.2017 www.dimcoin.io Structure Structure

GH: definition Z,f,g d Z d GH ( X, Y ) = inf H ( f ( X ) , g ( Y )) 1 The Elad-Kimmel approach

Workshop 15: Q-mode MVA Murray Logan 06 Aug 2016 R-mode analyses preserve euclidean

Sequential data analysis with TraMineR, Part 2 Gilbert Ritschard Department of Econometrics and

Clustering and Dimensionality Reduction Preview Clustering K -means clustering

Machine Learning in Conceptual Spaces Two Learning Processes Lucas Bechberger

Multidimensional scaling and flat split systems Monika Balvoi ut e joint work with

Introduction to Dialectometry II Wilbert Heeringa German Academic Exchange Service DAAD

Off- -The The- -Shelf Classifiers Shelf Classifiers Off A method that can be applied directly

Sambuz

Useful Links

Newsletter

Mail Us

Dim imensionality ty Redu eduction: Th Theoretic ical Ana - PowerPoint PPT Presentation

Dim imensionality ty Redu eduction: Th Theoretic ical Ana nalysis of Pr Practi tical Mea easu sures Nova Fandina Hebrew University Joint work with Yair Bartal, Hebrew University Ofer Neiman, Ben Gurion University 1 Outl utline

Redu eduction ction In In For orce ce Revi view w of of th the e Basics sics Lynn

Name: Prone Leg Curl Tube Thickness: 3.0mm Dim: 1960*1350*1180mm Weight: 400KG Model No: EJ01

Name: Leg Extension Tube Thickness: 2.5mm Dim: 140*105*150cm Weight: 214KG Model No: OE502

Name: Prone Leg Curl Tube Thickness: 2.5mm Dim: 152*99*135cm Weight: 216 KG Model No: TT101

http://cs246.stanford.edu High dim. High dim. Graph Graph Infinite Infinite Machine Machine Apps

Te Texa xas s Emis Emission sion Re Redu duct ction ion Plan Plan (TER (TERP) P) Dray

LSTM M Based sed Ada dapt ptive ive Fil ilterin ering g for r Redu duced ced Pre redi

di dimen ension sion re redu ducti ction on Yury Makarychev, TTIC Konstantin Makarychev,

C ASE S TUDIES: R EDUCTION OF T HERMAL C ONDUCTIVITY &amp; I MPROVEMENT OF M ECHANICAL P ERFORMANCE

R eduction In Mechanical &amp; &amp; Thermal Assemb mblies 2019 Com m only used shielding

FCPF C ARBON F UND : L AO PDR E MISSION R EDUCTION P ROGRAM Steve Danyo, Alexander Lotsch (co-

SHERPA A. Clappier 1 SHERPA SHERPA means, S creening for H igh E mission R eduction P otentials

Community Emissi Community Emissions R ons Reduc eduction tion Pr Program ogram (CE (CERP)

T AX D EDUCTION AT S OURCE [TDS] By : CA Abhijit Sawarkar 1 www.taxguru.in INDEX 1.

Application Moment et R eduction en M ecanique Tudor S. Ratiu School of Mathematics

Corporate Presentation DIMCOIN Foundation Date: 15.06.2017 www.dimcoin.io Structure Structure

GH: definition Z,f,g d Z d GH ( X, Y ) = inf H ( f ( X ) , g ( Y )) 1 The Elad-Kimmel approach

Workshop 15: Q-mode MVA Murray Logan 06 Aug 2016 R-mode analyses preserve euclidean

Sequential data analysis with TraMineR, Part 2 Gilbert Ritschard Department of Econometrics and

Clustering and Dimensionality Reduction Preview Clustering K -means clustering

Machine Learning in Conceptual Spaces Two Learning Processes Lucas Bechberger

Multidimensional scaling and flat split systems Monika Balvoi ut e joint work with

Introduction to Dialectometry II Wilbert Heeringa German Academic Exchange Service DAAD

Off- -The The- -Shelf Classifiers Shelf Classifiers Off A method that can be applied directly

Sambuz

Useful Links

Newsletter

Mail Us

Name: Prone Leg Curl Tube Thickness: 3.0mm Dim: 196013501180mm Weight: 400KG Model No: EJ01

Name: Leg Extension Tube Thickness: 2.5mm Dim: 140105150cm Weight: 214KG Model No: OE502

Name: Prone Leg Curl Tube Thickness: 2.5mm Dim: 15299135cm Weight: 216 KG Model No: TT101

C ASE S TUDIES: R EDUCTION OF T HERMAL C ONDUCTIVITY & I MPROVEMENT OF M ECHANICAL P ERFORMANCE

R eduction In Mechanical & & Thermal Assemb mblies 2019 Com m only used shielding