On the Computation of Distances between 2D-Histograms by Minimum - PowerPoint PPT Presentation

Introduction Formulations Results Conclusions References On the Computation of Distances between 2D-Histograms by Minimum Cost Flows Stefano Gualandi Federico Bassetti and Marco Veneroni Università di Pavia, Dipartimento di Matematica email: stefano.gualandi@unipv.it twitter: @famo2spaghi 1/25

Introduction Formulations Results Conclusions References Let X = { x 1 , . . . , x n } and Y = { y 1 , . . . , y m } be two discrete spaces. Let µ = ( µ ( x 1 ) , . . . , µ ( x n )) probability vector on X ν = ( ν ( y 1 ) , . . . , ν ( y m )) probability vector on Y c : X × Y → R + a cost function. 2/25

Introduction Formulations Results Conclusions References Let X = { x 1 , . . . , x n } and Y = { y 1 , . . . , y m } be two discrete spaces. Let µ = ( µ ( x 1 ) , . . . , µ ( x n )) probability vector on X ν = ( ν ( y 1 ) , . . . , ν ( y m )) probability vector on Y c : X × Y → R + a cost function. Definition 1 (Kantorovich-Rubinshtein Functional (San15; LS17)) The Kantorovich-Rubinshtein functional in the discrete setting is the following LP problem (a special case of the Hitchcock Problem) � � W c ( µ, ν ) = min c ( x , y ) π ( x , y ) x ∈ X y ∈ Y � s.t. π ( x , y ) = µ ( x ) ∀ x ∈ X y ∈ Y � π ( x , y ) = ν ( y ) ∀ y ∈ Y x ∈ X π ( x , y ) ≥ 0 . 2/25

Introduction Formulations Results Conclusions References Definition 2 (Wasserstein distance (San15)) When X = Y and c ( x , y ) = d p ( x , y ), where d is a ground distance on X , we define the Wasserstein distance of order p as: W p ( µ, ν ) := W d p ( µ, ν ) min(1 / p , 1) which is a distance on the simplex of probability vectors on X . 3/25

Introduction Formulations Results Conclusions References Definition 2 (Wasserstein distance (San15)) When X = Y and c ( x , y ) = d p ( x , y ), where d is a ground distance on X , we define the Wasserstein distance of order p as: W p ( µ, ν ) := W d p ( µ, ν ) min(1 / p , 1) which is a distance on the simplex of probability vectors on X . OUR CONTRIBUTION: For Wasserstein distances of order p = 1 with the following ground distances d 1 ( x , y ) = || x − y || 1 d 2 ( x , y ) = || x − y || 2 d ∞ ( x , y ) = || x − y || ∞ 3/25

Introduction Formulations Results Conclusions References Definition 2 (Wasserstein distance (San15)) When X = Y and c ( x , y ) = d p ( x , y ), where d is a ground distance on X , we define the Wasserstein distance of order p as: W p ( µ, ν ) := W d p ( µ, ν ) min(1 / p , 1) which is a distance on the simplex of probability vectors on X . OUR CONTRIBUTION: For Wasserstein distances of order p = 1 with the following ground distances d 1 ( x , y ) = || x − y || 1 → Exact method (easy) d 2 ( x , y ) = || x − y || 2 d ∞ ( x , y ) = || x − y || ∞ 3/25

Introduction Formulations Results Conclusions References Definition 2 (Wasserstein distance (San15)) When X = Y and c ( x , y ) = d p ( x , y ), where d is a ground distance on X , we define the Wasserstein distance of order p as: W p ( µ, ν ) := W d p ( µ, ν ) min(1 / p , 1) which is a distance on the simplex of probability vectors on X . OUR CONTRIBUTION: For Wasserstein distances of order p = 1 with the following ground distances d 1 ( x , y ) = || x − y || 1 → Exact method (easy) d 2 ( x , y ) = || x − y || 2 d ∞ ( x , y ) = || x − y || ∞ → Exact method (easy) 3/25

Introduction Formulations Results Conclusions References Definition 2 (Wasserstein distance (San15)) When X = Y and c ( x , y ) = d p ( x , y ), where d is a ground distance on X , we define the Wasserstein distance of order p as: W p ( µ, ν ) := W d p ( µ, ν ) min(1 / p , 1) which is a distance on the simplex of probability vectors on X . OUR CONTRIBUTION: For Wasserstein distances of order p = 1 with the following ground distances d 1 ( x , y ) = || x − y || 1 → Exact method (easy) d 2 ( x , y ) = || x − y || 2 → Exact and approximation methods (tricky) d ∞ ( x , y ) = || x − y || ∞ → Exact method (easy) 3/25

Introduction Formulations Results Conclusions References Definition 2 (Wasserstein distance (San15)) When X = Y and c ( x , y ) = d p ( x , y ), where d is a ground distance on X , we define the Wasserstein distance of order p as: W p ( µ, ν ) := W d p ( µ, ν ) min(1 / p , 1) which is a distance on the simplex of probability vectors on X . OUR CONTRIBUTION: For Wasserstein distances of order p = 1 with the following ground distances d 1 ( x , y ) = || x − y || 1 → Exact method (easy) d 2 ( x , y ) = || x − y || 2 → Exact and approximation methods (tricky) d ∞ ( x , y ) = || x − y || ∞ → Exact method (easy) All the methods rely on solving an Uncapacitated Min Cost Flow problem (but on different networks) 3/25

Introduction Formulations Results Conclusions References 2D Histograms 2D Histograms 2D histograms can be seen as discrete measures on a finite set of points in R 2 . To represent 2D histograms with N × N equally spaced bins, we take X = L N := { i = ( i 1 , i 2 ) : i 1 = 0 , . . . , N − 1 , i 2 = 0 , . . . , N − 1 } One can think of each point ( i 1 , i 2 ) as the center of a bin. 4/25

Introduction Formulations Results Conclusions References Computing Wasserstein Distance of order 1 Computing W 1 distances between 2D histograms with n = N 2 bins reduces to an Uncapacitated Min Cost Flow problem on a bipartite graph with 2 n nodes and n 2 arcs, and can be solved in O ( n 3 log n ) time (Orl93; GTT89). Related to Earth Mover Distance (RTG00; PW09; Cut13) 5/25

Introduction Formulations Results Conclusions References Computing Wasserstein Distance of order 1 Computing W 1 distances between 2D histograms with n = N 2 bins reduces to an Uncapacitated Min Cost Flow problem on a bipartite graph with 2 n nodes and n 2 arcs, and can be solved in O ( n 3 log n ) time (Orl93; GTT89). Related to Earth Mover Distance (RTG00; PW09; Cut13) ( i 1 − j 1 ) 2 + ( i 2 − j 2 ) 2 � ground distance: d 2 ( i , j ) = 5/25

Introduction Formulations Results Conclusions References W 1 distances: Recomputation from the Literature (SSG17) Best results with the Network Simplex of the Lemon Graph Library v1.3.1 Similar results with CPLEX 12.7 and Gurobi 7.0 32x32 vertices: 2 048 arcs: 1 048 576 runtime: 0.4 s 6/25

Introduction Formulations Results Conclusions References W 1 distances: Recomputation from the Literature (SSG17) Best results with the Network Simplex of the Lemon Graph Library v1.3.1 Similar results with CPLEX 12.7 and Gurobi 7.0 32x32 64x64 vertices: 2 048 8 192 arcs: 1 048 576 16 777 216 runtime: 0.4 s 10.9 s 6/25

Introduction Formulations Results Conclusions References W 1 distances: Recomputation from the Literature (SSG17) Best results with the Network Simplex of the Lemon Graph Library v1.3.1 Similar results with CPLEX 12.7 and Gurobi 7.0 32x32 64x64 128x128 vertices: 2 048 8 192 32 768 arcs: 1 048 576 16 777 216 268 435 456 runtime: 0.4 s 10.9 s out-of-memory 6/25

Introduction Formulations Results Conclusions References W 1 distances: Recomputation from the Literature (SSG17) Best results with the Network Simplex of the Lemon Graph Library v1.3.1 Similar results with CPLEX 12.7 and Gurobi 7.0 32x32 64x64 128x128 256x256 vertices: 2 048 8 192 32 768 131 072 arcs: 1 048 576 16 777 216 268 435 456 4 294 967 296 runtime: 0.4 s 10.9 s out-of-memory out-of-memory 6/25

Introduction Formulations Results Conclusions References W 1 distances: Recomputation from the Literature (SSG17) Best results with the Network Simplex of the Lemon Graph Library v1.3.1 Similar results with CPLEX 12.7 and Gurobi 7.0 32x32 64x64 128x128 256x256 512x512 vertices: 2 048 8 192 32 768 131 072 524 288 arcs: 1 048 576 16 777 216 268 435 456 4 294 967 296 68 719 476 736 runtime: 0.4 s 10.9 s out-of-memory out-of-memory out-of-memory 6/25

Introduction Formulations Results Conclusions References Min Cost Flow Formulation on K n For computing W d h distances between µ, ν ∈ X , we define the complete flow network K n = ( V , A ): Nodes: V = L N 7/25

Introduction Formulations Results Conclusions References Min Cost Flow Formulation on K n For computing W d h distances between µ, ν ∈ X , we define the complete flow network K n = ( V , A ): Nodes: V = L N Arcs: A = { ( i , j ) | ∀ i , j ∈ V , i � = j } 7/25

Introduction Formulations Results Conclusions References Min Cost Flow Formulation on K n For computing W d h distances between µ, ν ∈ X , we define the complete flow network K n = ( V , A ): Nodes: V = L N Arcs: A = { ( i , j ) | ∀ i , j ∈ V , i � = j } Arc costs: c ij = d ij = || i − j || h , ∀ ( i , j ) ∈ A Flow balance: b i = µ ( x i ) − ν ( y i ) , ∀ i ∈ V 7/25

On the Computation of Distances between 2D-Histograms by Minimum - PowerPoint PPT Presentation

Introduction Formulations Results Conclusions References On the Computation of Distances between 2D-Histograms by Minimum Cost Flows Stefano Gualandi Federico Bassetti and Marco Veneroni Universit di Pavia, Dipartimento di Matematica

Matching Histograms to Data 1. Consider the following four histograms. Listed below are five

Chapter 18: The Normal Approximation for Probability Histograms The histograms we saw in Chapter

XL1G: Create Histograms using Excel 2013 Functions V0H 3/31/2017 XL1G: 0H Create Histograms

Digital Image Processing (CS/ECE 545) Lecture 2: Histograms and Point Operations (Part 1) Prof

Dr Jeffrey Chow Research Consultant Civic Exchange Distances to public open spaces Distances to

Billion Goods in Few Categories: how Histograms Save a Life? November, 7, 2018 Sveta Smirnova

Prometheus Histograms Past, Present, and Future Bjrn Beorn Rabenstein PromCon EU,

Jeffreys centroids: A closed-form expression for positive histograms and a guaranteed tight

Effective management of high volume numeric data with histograms Fred Moyer @Circonus

Adaptive Histograms from a Randomized Queue that is Prioritized for Statistically Equivalent

Smooth Local Histograms Filters Micheal Kass and Justin Solomon Yeara Kozlov Saarland University

A Sociolinguistic Analysis of Linguistically Sensitive Dialectal Word Pronunciation Distances

Phylogenetic trees II Estimating distances, estimating trees from distances Gerhard Jger

Metric Distances 28 Great Circle Distances North Pole (90N lat) North Pole C Prime

Geodesic distances and intrinsic distances on some fractal sets Masanori Hino (Kyoto Univ.)

Generalized Distances Between Rankings Ravi Kumar Sergei Vassilvitskii Yahoo! Research

Welcome to the Course Hans-Joachim Bckenhauer and Dennis Komm Digital Medicine I: Introduction

Optimal Transport for Machine Learning Aude Genevay CEREMADE (Universit Paris-Dauphine) DMA

GIS-Geographical Information System A universal ( x , y , z ) frame: (latitude, longitude,

The development and performance evaluation of a hybrid photo- detector for Hyper-Kamiokande M.

Software Testing Software testing 1 V model Software testing 2 Program testing goals To

1 & 2 Samuel Series Lesson #172 May 28, 2019 Dean Bible Ministries

Algorithms for high-dimensional non-linear filtering and smoothing problems Jana de Wiljes,

E c o no mic E valuatio n o f a Ho using Pr o je c t: Diff-diff and matc hing using Stata

On the Computation of Distances between 2D-Histograms by Minimum - PowerPoint PPT Presentation

Introduction Formulations Results Conclusions References On the Computation of Distances between 2D-Histograms by Minimum Cost Flows Stefano Gualandi Federico Bassetti and Marco Veneroni Universit di Pavia, Dipartimento di Matematica

Matching Histograms to Data 1. Consider the following four histograms. Listed below are five

Chapter 18: The Normal Approximation for Probability Histograms The histograms we saw in Chapter

XL1G: Create Histograms using Excel 2013 Functions V0H 3/31/2017 XL1G: 0H Create Histograms

Digital Image Processing (CS/ECE 545) Lecture 2: Histograms and Point Operations (Part 1) Prof

Dr Jeffrey Chow Research Consultant Civic Exchange Distances to public open spaces Distances to

Billion Goods in Few Categories: how Histograms Save a Life? November, 7, 2018 Sveta Smirnova

Prometheus Histograms Past, Present, and Future Bjrn Beorn Rabenstein PromCon EU,

Jeffreys centroids: A closed-form expression for positive histograms and a guaranteed tight

Effective management of high volume numeric data with histograms Fred Moyer @Circonus

Adaptive Histograms from a Randomized Queue that is Prioritized for Statistically Equivalent

Smooth Local Histograms Filters Micheal Kass and Justin Solomon Yeara Kozlov Saarland University

A Sociolinguistic Analysis of Linguistically Sensitive Dialectal Word Pronunciation Distances

Phylogenetic trees II Estimating distances, estimating trees from distances Gerhard Jger

Metric Distances 28 Great Circle Distances North Pole (90N lat) North Pole C Prime

Geodesic distances and intrinsic distances on some fractal sets Masanori Hino (Kyoto Univ.)

Generalized Distances Between Rankings Ravi Kumar Sergei Vassilvitskii Yahoo! Research

Welcome to the Course Hans-Joachim Bckenhauer and Dennis Komm Digital Medicine I: Introduction

Optimal Transport for Machine Learning Aude Genevay CEREMADE (Universit Paris-Dauphine) DMA

GIS-Geographical Information System A universal ( x , y , z ) frame: (latitude, longitude,

The development and performance evaluation of a hybrid photo- detector for Hyper-Kamiokande M.

Software Testing Software testing 1 V model Software testing 2 Program testing goals To

1 &amp; 2 Samuel Series Lesson #172 May 28, 2019 Dean Bible Ministries

Algorithms for high-dimensional non-linear filtering and smoothing problems Jana de Wiljes,

E c o no mic E valuatio n o f a Ho using Pr o je c t: Diff-diff and matc hing using Stata

1 & 2 Samuel Series Lesson #172 May 28, 2019 Dean Bible Ministries