SUGAR Geometry Based Data Generation O. Lindenbaum, J.S. Stanley, - PowerPoint PPT Presentation

SUGAR Geometry Based Data Generation O. Lindenbaum, J.S. Stanley, G. Wolf, S. Krishnaswamy Yale University 2018 Lindenbaum et al. (Yale) SUGAR 2018 1 / 14

Acknowledgements This work was done in collaboration with: Jay Stanley Guy Wolf Smita Krishnaswamy Research partially funded by grant from the CZI Lindenbaum et al. (Yale) SUGAR 2018 2 / 14

Introduction & motivation Traditional models: density based data generation Generative models typically infer distribution from collected data, and sample it to generate more data. Biased by sampling density May miss rare populations Does not preserve the geometry Lindenbaum et al. (Yale) SUGAR 2018 3 / 14

Introduction & motivation Traditional models: density based data generation Generative models typically infer distribution from collected data, and sample it to generate more data. ⇐ Biased by sampling density May miss rare populations Does not preserve the geometry Lindenbaum et al. (Yale) SUGAR 2018 3 / 14

Introduction & motivation Traditional models: density based data generation Generative models typically infer distribution from collected data, and sample it to generate more data. ⇐ ⇐ Biased by sampling density May miss rare populations Does not preserve the geometry Lindenbaum et al. (Yale) SUGAR 2018 3 / 14

Introduction & motivation New approach: geometry based data generation Lindenbaum et al. (Yale) SUGAR 2018 4 / 14

Diffusion geometry Manifold learning with random walks g ( x , y ) Local affinities g ( x , y ) ⇒ transition probs. Pr [ x ↝ y ] = ∥ g ( x , ⋅ )∥ 1 Markov chain/process ⇒ random walks on data manifold Lindenbaum et al. (Yale) SUGAR 2018 5 / 14

Diffusion geometry Random walks reveal intrinsic neighborhoods t steps p t ( x , y ) = Pr [ x ⟿ y ] Lindenbaum et al. (Yale) SUGAR 2018 6 / 14

Data generation with diffusion Walk toward the data manifold from randomly generated points Generate random points: Lindenbaum et al. (Yale) SUGAR 2018 7 / 14

Data generation with diffusion Walk toward the data manifold from randomly generated points Generate random points: y ⋅ p t ( x , y ) ∑ Walk towards the data manifold with diffusion: x ↦ y ∈ data Lindenbaum et al. (Yale) SUGAR 2018 7 / 14

Data generation with diffusion Correct density with MGC kernel (Bermanis et al., ACHA 2016) g ( x , r ) , g ( y , r ) Separate density/geometry with new kernel: k ( x , y ) = ∑ density ( r ) r ∈ data k ( x , y ) Use new diffusion process p ( x , y ) = ∥ k ( x , ⋅ )∥ 1 to walk to the manifold Lindenbaum et al. (Yale) SUGAR 2018 8 / 14

Data generation with diffusion Fill sparse areas to create uniform distribution Question: How should we initialize new points to end up with uniform sampling from the data manifold? Answer: For each x ∈ data, initialize ˆ ℓ ( x ) points sampled from N ( x , Σ x ) ; set ˆ ℓ as the mid-point between the upper & lower bounds in the following proposition. Proposition The generation level ˆ ℓ ( x ) required to equalize density is bounded by 1 2 [ max ( ˆ 1 2 max ( ˆ d ( ⋅ )) − ˆ d ( x ) det ( I + Σ x 2 σ 2 ) − 1 ≤ ˆ ℓ ( x ) ≤ det ( I + Σ x 2 σ 2 ) d ( ⋅ )) − ˆ d ( x )] , ˆ d ( x ) + 1 where σ is a scale used when defining Gaussian neighborhoods g ( x , y ) for the diffusion geometry, and ˆ d ( x ) = ∥ g ( x , ⋅ )∥ 1 estimates local density. Lindenbaum et al. (Yale) SUGAR 2018 9 / 14

Applications & results Alleviating class imbalance in classification ⇒ k-NN SVM RUSBoost Orig SMOTE SUGAR Orig SMOTE SUGAR ACP 0.67 0.76 0.78 0.77 0.77 0.78 0.75 ACR 0.64 0.73 0.77 0.78 0.78 0.84 0.81 MCC 0.66 0.74 0.78 0.78 0.78 0.84 0.80 Average class precision/recall (ACP/ACR), and Matthews correlation coefficient (MCC) over 61 imbalanced datasets (10-fold cross validation). Lindenbaum et al. (Yale) SUGAR 2018 10 / 14

Applications & results Density correction improves clustering Spectral Clustering Rand index of k-Means Based on 115 datasets Lindenbaum et al. (Yale) SUGAR 2018 11 / 14

Applications & results Illuminate hypothetical cell types in single-cell data from Velten et al. 2017 Recovering originally-undersampled lineage in early hematopoeisis: B-cell maturation trajectory SUGAR equalizes the total cell enhanced by SUGAR distribution Lindenbaum et al. (Yale) SUGAR 2018 12 / 14

Applications & results Recover gene-gene relationships in single-cell data from Velten et al. 2017 SUGAR improves module correlation and MI identified by Velten et al. Velten et al., Nature Cell Biology, 19 (2017) Lindenbaum et al. (Yale) SUGAR 2018 13 / 14

Applications & results Recover gene-gene relationships in single-cell data from Velten et al. 2017 Generated cells also follow canonical marker correlations Li et al., Nature communications 7 (2016) Lindenbaum et al. (Yale) SUGAR 2018 13 / 14

Conclusion ⇓ Generate data over intrinsic geometry rather than distribution Alleviate sampling bias in supervised & unsupervised learning Enable exploration of sparse (or “hypothetical”) data regions Lindenbaum et al. (Yale) SUGAR 2018 14 / 14

SUGAR Geometry Based Data Generation O. Lindenbaum, J.S. Stanley, - PowerPoint PPT Presentation

SUGAR Geometry Based Data Generation O. Lindenbaum, J.S. Stanley, G. Wolf, S. Krishnaswamy Yale University 2018 Lindenbaum et al. (Yale) SUGAR 2018 1 / 14 Acknowledgements This work was done in collaboration with: Jay Stanley Guy Wolf

Stochastic geometry and random generation 1 Stochastic geometry and random generation

CSR Sugar SUGAR MARKET OUTLOOK PRESENTATION January 2006 Presentation compiled with the

Kilombero 2005 Kilombero 2005 AGENDA AGENDA Kilombero Sugar Company Kilombero Sugar Company

The Sugar Factory Inc. A pinch of sugar The Sweet Science Of Boxing About us The Sugar Factory

EU Sugar Balance AGRI G 4 Economic Board of the Sugar Market Observatory May 2020 update EU

Future of Mackay Sugar Mac Mackay ay Sugar ar L Ltd Nordzucker Proposal Shareh eholder er

Refined Sugar Conditioning Presentation to Thai Sugar Millers Co., Ltd. 0 November 07, 2014 By

WORLD SUGAR MARKET Jos Orive Executive Director International Sugar Organization A year ago

British Sugar and the IED CEA/CRF/RSC Seminar London 22 nd September 2011 Parent company:

Resugaring: Lifting Evaluation Sequences through Syntactic Sugar Justin Pombrio, Shriram

48-175 Descriptive Geometry Basic Concepts of Descriptive Geometry Descriptive geometry is

Hyperbolic Geometry Victor Gonzalez Mentor: Ryan Kirk May 4, 2016 Hyperbolic Geometry We are

Geometry Problems Geometry Problems Examples for Typical ACM Instances Elementary Geometry

3d Geometry for Computer Graphics Lesson 1: Basics & PCA 3d geometry 3d geometry 3d

How Does Body Maintain Normal Blood Sugar? a Insulin Resistance and Its Consequences Sidika E.

6/2/2016 How Does Body Maintain Normal Blood Sugar? a Insulin Resistance and Its Consequences

Segmentation, tracking and lineage analysis of yeast cells in bright field microscopy images

De-industrialization, Re-industrialization and the Resurgence of State Capitalism: The Case of

Quelle assistance au cours de langioplastie avec signes de gravit ? Pr BONELLO Laurent

BBM 413 Fundamentals of Image Processing Erkut Erdem Dept. of Computer Engineering

Mul$lingual Models Linguistic Typology Dan Klein, John DeNero UC Berkeley Constituent Order

Lesson 10 Deep learning for NLP: Mul6lingual Word Sequence Modeling December 15, 2016 EPFL

The Low Resource NLP Toolbox, 2020 Version Graham Neubig @ AfricaNLP 4/26/2020 (collaborators

The Multilingual and Cross- lingual Web PD Dr. Gnter Neumann LT lab German Research Center

SUGAR Geometry Based Data Generation O. Lindenbaum, J.S. Stanley, - PowerPoint PPT Presentation

SUGAR Geometry Based Data Generation O. Lindenbaum, J.S. Stanley, G. Wolf, S. Krishnaswamy Yale University 2018 Lindenbaum et al. (Yale) SUGAR 2018 1 / 14 Acknowledgements This work was done in collaboration with: Jay Stanley Guy Wolf

Stochastic geometry and random generation 1 Stochastic geometry and random generation

CSR Sugar SUGAR MARKET OUTLOOK PRESENTATION January 2006 Presentation compiled with the

Kilombero 2005 Kilombero 2005 AGENDA AGENDA Kilombero Sugar Company Kilombero Sugar Company

The Sugar Factory Inc. A pinch of sugar The Sweet Science Of Boxing About us The Sugar Factory

EU Sugar Balance AGRI G 4 Economic Board of the Sugar Market Observatory May 2020 update EU

Future of Mackay Sugar Mac Mackay ay Sugar ar L Ltd Nordzucker Proposal Shareh eholder er

Refined Sugar Conditioning Presentation to Thai Sugar Millers Co., Ltd. 0 November 07, 2014 By

WORLD SUGAR MARKET Jos Orive Executive Director International Sugar Organization A year ago

British Sugar and the IED CEA/CRF/RSC Seminar London 22 nd September 2011 Parent company:

Resugaring: Lifting Evaluation Sequences through Syntactic Sugar Justin Pombrio, Shriram

48-175 Descriptive Geometry Basic Concepts of Descriptive Geometry Descriptive geometry is

Hyperbolic Geometry Victor Gonzalez Mentor: Ryan Kirk May 4, 2016 Hyperbolic Geometry We are

Geometry Problems Geometry Problems Examples for Typical ACM Instances Elementary Geometry

3d Geometry for Computer Graphics Lesson 1: Basics &amp; PCA 3d geometry 3d geometry 3d

How Does Body Maintain Normal Blood Sugar? a Insulin Resistance and Its Consequences Sidika E.

6/2/2016 How Does Body Maintain Normal Blood Sugar? a Insulin Resistance and Its Consequences

Segmentation, tracking and lineage analysis of yeast cells in bright field microscopy images

De-industrialization, Re-industrialization and the Resurgence of State Capitalism: The Case of

Quelle assistance au cours de langioplastie avec signes de gravit ? Pr BONELLO Laurent

BBM 413 Fundamentals of Image Processing Erkut Erdem Dept. of Computer Engineering

Mul$lingual Models Linguistic Typology Dan Klein, John DeNero UC Berkeley Constituent Order

Lesson 10 Deep learning for NLP: Mul6lingual Word Sequence Modeling December 15, 2016 EPFL

The Low Resource NLP Toolbox, 2020 Version Graham Neubig @ AfricaNLP 4/26/2020 (collaborators

The Multilingual and Cross- lingual Web PD Dr. Gnter Neumann LT lab German Research Center

3d Geometry for Computer Graphics Lesson 1: Basics & PCA 3d geometry 3d geometry 3d