Introduction to Sparsity in Modeling and Learning Introduction to - PowerPoint PPT Presentation

Introduction to Sparsity in Modeling and Learning

Introduction to Sparsity in Modeling and Learning The Curse of Dimensionality Ockham's Razor Notions of Simplicity Conclusion U M R •C N RS •551 6 •SAIN T -ETIEN N E 2 / 21 − Rémi Emonet − Introduction to Sparsity in Modeling and Learning

The Curse of Dimensionality 3 / 21 − Rémi Emonet − Introduction to Sparsity in Modeling and Learning

The Curse of Dimensionality is can be High-dimensionality a mess.

What is this Curse Anyway? Some definition: Various phenomena that arise when analyzing and organizing data in high-dimensional spaces. Term coined by Richard E. Bellman 1920 − 1984 dynamic programming differential equations shortest path What is (not) the cause? not an intrinsic property of the data depends on the representation depends on how data is analyzed 5 / 21 − Rémi Emonet − Introduction to Sparsity in Modeling and Learning

Combinatorial Explosion Suppose you have d entities each can be in 2 states Then there are 2 combinations to consider/test/evaluate d Happens when considering d all possible subsets of a set ( 2 ) all permutations of a list ( d ! ) d all affectations of entities to labels ( k , with k labels) { } { } { } { } {a} { b} { c} { d} {a } { b } { c } {a,b} { b,c} { c,d} {a } { b } {a, c} { b, d} {a,b } { b,c } {a,b,c} { b,c,d} {a } {a, d} {a, c } 6 / 21 − Rémi Emonet − Introduction to Sparsity in Modeling and Learning

Regular Space Coverage Analogous to combinatorial explosion, in continuous spaces Happens when considering histograms density estimation anomaly detection ... 7 / 21 − Rémi Emonet − Introduction to Sparsity in Modeling and Learning

In Modeling and Learning The world is complicated state with a huge number of variables (dimensions) possibly noisy observations e.g. a 1M-pixel image has 3 million dimensions Learning would need observations for each state it would require too many examples need for an “interpolation” procedure, to avoid overfitting Hughes phenomenon, 1968 paper (which is wrong, it seems) given a (small) number of training samples, additional feature measurements may reduce the performance of a statistical classifier 8 / 21 − Rémi Emonet − Introduction to Sparsity in Modeling and Learning

A Focus on Distances/Volumes Considering a d dimensional space About volumes volume of the cube: C ( r ) = (2 r ) d d π d /2 d volume of a sphere with radius r : S ( r ) = r d Γ( d + 1) 2 ( Γ is the continuous generalization of the factorial) S ( r ) d → 0 (linked to space coverage) ratio: C ( r ) d 9 / 21 − Rémi Emonet − Introduction to Sparsity in Modeling and Learning

A Focus on Distances/Volumes (cont'd) About distances average (euclidean) distance between two random points? everything becomes almost as “far” Happens when considering radial distributions (multivariate normal, etc) k-nearest neighbors (hubiness problem) other distance-based algorithms 10 / 21 − Rémi Emonet − Introduction to Sparsity in Modeling and Learning

The Curse of Dimensionality Many things get degenerated with high dimensions Problem of: approach + data representation We have to hope that there is no curse

Introduction to Sparsity in Modeling and Learning The Curse of Dimensionality Ockham's Razor Notions of Simplicity Conclusion U M R •C N RS •551 6 •SAIN T -ETIEN N E 12 / 21 − Rémi Emonet − Introduction to Sparsity in Modeling and Learning

Shave unnecessary assumptions. Ockham's Razor

Ockham's Razor th Term from 1852, in reference to Ockham (XIV ) lex parsimoniae , law of parsimony Prefer the simplest hypothesis that fits the data. Formulations by Ockham, but also earlier and later More a concept than a rule simplicity parsimony elegance shortness of explanation shortness of program (Kolmogorov complexity) falsifiability (sciencific method) According to Jürgen Schmidhuber, the appropriate mathematical theory of Occam's razor already exists, namely, Solomonoff's theory of optimal inductive inference. 14 / 21 − Rémi Emonet − Introduction to Sparsity in Modeling and Learning

Notions of Simplicity 15 / 21 − Rémi Emonet − Introduction to Sparsity in Modeling and Learning

Simplicity of Data: subspaces Data might be high-dimensional, but we have hope that there is a organization or regularity in the high-dimensionality that we can guess it or, that we can learn/find it Approaches: dimensionality reduction, manifold learning PCA, kPCA, *PCA, SOM, Isomap, GPLVM, LLE, NMF, … 16 / 21 − Rémi Emonet − Introduction to Sparsity in Modeling and Learning

Simplicity of Data: compressibility Idea data can be high dimensional but compressible i.e., there exist a compact representation Program that generates the data (Kolmogorov complexity) Sparse representations wavelets (jpeg), fourier transform sparse coding, representation learning Minimum description length size of the “code” + size of the encoded data 17 / 21 − Rémi Emonet − Introduction to Sparsity in Modeling and Learning

Simplicity of Models: information criteria Used to select a model Penalizes by the number k of free parameters AIC (Aikake Information Criterion) penalizes the Negative-Log-Likelihood by k BIC (Bayesian IC) penalizes the NLL by k log( n ) (for n observations) BPIC (Bayesian Predictive IC) DIC (Deviance IC) FIC (Focused IC) Hannan-Quinn IC TIC (Takeuchi IC) Sparsity of the parameter vector ( l 0 norm) penalizes the number of non-zero parameters 18 / 21 − Rémi Emonet − Introduction to Sparsity in Modeling and Learning

Take-home Message

Thank You! Questions?

Introduction to Sparsity in Modeling and Learning Introduction to - PowerPoint PPT Presentation

Introduction to Sparsity in Modeling and Learning Introduction to Sparsity in Modeling and Learning The Curse of Dimensionality Ockham's Razor Notions of Simplicity Conclusion U M R C N RS 551 6 SAIN T -ETIEN N E 2 / 21

Sparsity, Randomness and Compressed Sensing Petros Boufounos Mitsubishi Electric Research Labs

Sparsity and image processing Aurlie Boisbunon INRIA-SAM, AYIN March 26, 2014 Why sparsity?

The Sparsity Gap Joel A. Tropp Computing & Mathematical Sciences California Institute

Sparsity and optimality of splines: Deterministic vs. statistical justification Michael Unser

Sparsity in Information Theory and Biology Olgica Milenkovic ECE Department, UIUC Joint work

RAMSEY CLASSES SPARSITY AND MODELS FOR FINITE - NESIETPIIL JAROSLAV UNIVERSITY CHARLES

Structured sparsity and convex optimization Francis Bach INRIA - Ecole Normale Sup erieure,

Sparsity-aware sampling theorems and applications Rachel Ward University of Texas at Austin

Commonsense Explanations Zipfs Law: A Brief . . . of Sparsity, Zipf Law, and Main Idea Behind

Computing sparsity stuff in real world graphs Marcin Pilipczuk a lot of slides by Wojciech Nadara

Blind Image Deconvolution Need for Theoretical . . . Based on Sparsity: Need for Improvement

Structural Sparsity Jaroslav Neetil Patrice Ossona de Mendez Charles University CAMS,

Accelerating Sparse DNN Models without Hardware-Support via Tile-Wise Sparsity 2020/11

RegML 2016 Class 6 Structured sparsity Lorenzo Rosasco UNIGE-MIT-IIT June 30, 2016 Exploiting

RegML2017@SIMULA Oslo Class 7 Structured sparsity Lorenzo Rosasco UNIGE-MIT-IIT May 4, 2017

Modeling of proteins and complexes High resolution Low resolution Modeling of domains Modeling

What do noisy datapoints tell us about the true signal? C. R. Hogg November 16, 2011 1/31

All models are wrong, but some are useful George Box London open spaces expenditure and

Pers rspec ecti tives ves fr from a m a Sta tate te Re Regu gula lato tor Lindsey

Framing Nanotechnology in the Media Ryan Shapiro Geology Major at Santa Barbara City College

heat generation because the energy consumed by a TCAM largest TCAM chip available on the market has

The Flying Gator: Towards Aerial Robotics in occam-

Artificial Intelligence as a Path to Closing the Justice Gap: Electronic Civil Gideon Kate

In Infin init ite e Res esolution olution Tex extur tures es Alexander Reshetov David

Introduction to Sparsity in Modeling and Learning Introduction to - PowerPoint PPT Presentation

Introduction to Sparsity in Modeling and Learning Introduction to Sparsity in Modeling and Learning The Curse of Dimensionality Ockham's Razor Notions of Simplicity Conclusion U M R C N RS 551 6 SAIN T -ETIEN N E 2 / 21

Sparsity, Randomness and Compressed Sensing Petros Boufounos Mitsubishi Electric Research Labs

Sparsity and image processing Aurlie Boisbunon INRIA-SAM, AYIN March 26, 2014 Why sparsity?

The Sparsity Gap Joel A. Tropp Computing &amp; Mathematical Sciences California Institute

Sparsity and optimality of splines: Deterministic vs. statistical justification Michael Unser

Sparsity in Information Theory and Biology Olgica Milenkovic ECE Department, UIUC Joint work

RAMSEY CLASSES SPARSITY AND MODELS FOR FINITE - NESIETPIIL JAROSLAV UNIVERSITY CHARLES

Structured sparsity and convex optimization Francis Bach INRIA - Ecole Normale Sup erieure,

Sparsity-aware sampling theorems and applications Rachel Ward University of Texas at Austin

Commonsense Explanations Zipfs Law: A Brief . . . of Sparsity, Zipf Law, and Main Idea Behind

Computing sparsity stuff in real world graphs Marcin Pilipczuk a lot of slides by Wojciech Nadara

Blind Image Deconvolution Need for Theoretical . . . Based on Sparsity: Need for Improvement

Structural Sparsity Jaroslav Neetil Patrice Ossona de Mendez Charles University CAMS,

Accelerating Sparse DNN Models without Hardware-Support via Tile-Wise Sparsity 2020/11

RegML 2016 Class 6 Structured sparsity Lorenzo Rosasco UNIGE-MIT-IIT June 30, 2016 Exploiting

RegML2017@SIMULA Oslo Class 7 Structured sparsity Lorenzo Rosasco UNIGE-MIT-IIT May 4, 2017

Modeling of proteins and complexes High resolution Low resolution Modeling of domains Modeling

What do noisy datapoints tell us about the true signal? C. R. Hogg November 16, 2011 1/31

All models are wrong, but some are useful George Box London open spaces expenditure and

Pers rspec ecti tives ves fr from a m a Sta tate te Re Regu gula lato tor Lindsey

Framing Nanotechnology in the Media Ryan Shapiro Geology Major at Santa Barbara City College

heat generation because the energy consumed by a TCAM largest TCAM chip available on the market has

The Flying Gator: Towards Aerial Robotics in occam-

Artificial Intelligence as a Path to Closing the Justice Gap: Electronic Civil Gideon Kate

In Infin init ite e Res esolution olution Tex extur tures es Alexander Reshetov David

The Sparsity Gap Joel A. Tropp Computing & Mathematical Sciences California Institute