big data with n = 2 Lee Dicker Department of Statistics Rutgers - PowerPoint PPT Presentation

One-shot learning and big data with n = 2 Lee Dicker Department of Statistics Rutgers University Joint work w/Dean Foster DIMACS, May 16, 2013 DIMACS, May 16, 2013 – 1 / 26

Introduction and overview Statistical setting Principal component regression Weak consistency and big data with n = 2 Introduction and overview Risk approximations and consistency Numerical results Conclusions and future directions DIMACS, May 16, 2013 – 2 / 26

One-shot learning Introduction and � Humans are able to correctly recognize and understand objects based on overview very few training examples. Statistical setting Principal component � e.g. images, words. regression Testing Training Weak consistency and big data with n = 2 Risk approximations and consistency Numerical results Conclusions and future directions − → Flamingo � Flamingo? Flamingo? Flamingo � Flamingo? � Vast literature in cognitive science (Tenenbaum et al., 2006; Kemp et al., 2007), language acquisition (Carey et al., 1978; Xu et al., 2007), and computer vision (Fink, 2005; Fei-Fei et al., 2006) DIMACS, May 16, 2013 – 3 / 26

One-shot learning Introduction and � Successful one-shot learning requires the learner to incorporate overview strong contextual information into the learning algorithm. Statistical setting Principal component � Image recognition: Information on object categories. regression Weak consistency and � Objects tend to be categorized by shape, color, etc. big data with n = 2 Risk approximations and � Word-learning: Common function words are often used in consistency conjunction with a novel word and referent. Numerical results � This is a KOBA. Since this , is , and a are function words Conclusions and future directions that often appear with nouns, KOBA is likely the new referent. � Many recent statistical approaches to one-shot learning are based on hierarchical Bayesian models. � Effective in a variety of examples. DIMACS, May 16, 2013 – 4 / 26

One-shot learning Introduction and � We propose a simple factor model for one-shot learning with continuous overview outcomes. Statistical setting Principal component � Highly idealized, but amenable to theoretical analysis. regression � Novel risk approximations for: Weak consistency and big data with n = 2 (i) assessing the performance of one-shot learning methods and Risk approximations and consistency (ii) gaining insight into the significance of various parameters for one-shot learning. Numerical results Conclusions and future � The methods considered here are variants of principal component directions regression (PCR). � One-shot asymptotic regime: Fixed n , large d , strong contextual information. � See work by Hall, Jung, Marron, and co-authors on “high dimension, low sample size” data (especially work on PCA and classification). � New insights into PCR. � Classical PCR estimator is generally inconsistent in the one-shot regime. � Bias-correction via expansion. DIMACS, May 16, 2013 – 5 / 26

Outline Introduction and overview Statistical setting � Statistical setting. Principal component regression � Principal component regression. Weak consistency and big data with n = 2 � Weak consistency and big data with n = 2 . Risk approximations and consistency Numerical results � Risk approximations and consistency. Conclusions and future directions � Numerical results. � Conclusions and future directions. DIMACS, May 16, 2013 – 6 / 26

Introduction and overview Statistical setting Principal component regression Weak consistency and big data with n = 2 Statistical setting Risk approximations and consistency Numerical results Conclusions and future directions DIMACS, May 16, 2013 – 7 / 26

The model Introduction and � The observed data consists of ( y 1 , x 1 ) , ..., ( y n , x n ) , where y i ∈ R is a overview scalar outcome and x i ∈ R d is an associated d -dimensional “context” Statistical setting vector. Principal component regression � We suppose that y i and x i are related via Weak consistency and big data with n = 2 h i ∼ N (0 , η 2 ) , ξ i ∼ N (0 , σ 2 ) , y i = h i θ + ξ i , Risk approximations and √ consistency ǫ i ∼ N (0 , τ 2 I ) . = h i γ d u + ǫ i , x i Numerical results Conclusions and future � NB: directions � h i , ξ i ∈ R and ǫ i ∈ R d , 1 ≤ i ≤ n , are all assumed to be independent. � h i is a latent factor linking y i and x i . � ξ i and ǫ i are random noise. � The unit vector u ∈ R d and real numbers θ, γ ∈ R are non-random. √ d u || 2 ≍ d is � It is implicit in our normalization that the “ x -signal” || h i γ quite strong. � To simplify notation, we let y = ( y 1 , ..., y n ) and X = ( x 1 , ..., x n ) T . DIMACS, May 16, 2013 – 8 / 26

Predictive risk Introduction and overview � Observe that ( y i , x i ) ∼ N (0 , V ) are jointly normal with Statistical setting θγη 2 √ Principal component θ 2 η 2 + σ 2 � d u T � regression θγη 2 √ V = . ( † ) τ 2 I + η 2 γ 2 d uu T Weak consistency and d u big data with n = 2 Risk approximations and y : R d → R so that � Goal: Given the data ( y , X ) , devise prediction rules ˆ consistency the risk Numerical results Conclusions and future y ( x new ) − y new } 2 = E V { ˆ y ( x new ) − h new θ } 2 + σ 2 directions R V (ˆ y ) = E V { ˆ √ is small, where ( y new , x new ) = ( h new θ + ξ new , h new γ d u + ǫ new ) has the same distribution as ( y i , x i ) and is independent of ( y , X ) . � R V (ˆ y ) is a measure of predictive risk , which is completely determined by y and the parameter matrix V , given in ( † ). ˆ DIMACS, May 16, 2013 – 9 / 26

One-shot asymptotic regime Introduction and � We are primarily interested in identifying methods ˆ y that perform overview well in the one-shot asymptotic regime . Statistical setting Principal component regression � Key features of the one-shot asymptotic regime: Weak consistency and big data with n = 2 � n is fixed (i) small n , large d Risk approximations and d → ∞ (ii) consistency Numerical results σ 2 → 0 � (iii) Conclusions and future abundant contextual information inf η 2 γ 2 /τ 2 > 0 directions (iv) � NB: � σ 2 is the noise-level for the “ y -data.” � η 2 γ 2 /τ 2 is the signal-to-noise ratio for the “ x -data.” DIMACS, May 16, 2013 – 10 / 26

Introduction and overview Statistical setting Principal component regression Weak consistency and big data with n = 2 Principal component regression Risk approximations and consistency Numerical results Conclusions and future directions DIMACS, May 16, 2013 – 11 / 26

Linear prediction rules Introduction and � By assumption, the data are multivariate normal. Thus, overview Statistical setting E V ( y i | x i ) = x T i β , Principal component regression where β = θγη 2 √ d u / ( τ 2 + η 2 γ 2 d ) . Weak consistency and big data with n = 2 Risk approximations and � This suggests studying linear prediction rules of the form consistency Numerical results y ( x ) = x T ˆ ˆ β Conclusions and future directions for some estimator ˆ β of β . DIMACS, May 16, 2013 – 12 / 26

Principal component regression Introduction and � Let l 1 ≥ · · · ≥ l n ∧ d ≥ 0 denote the ordered n largest eigenvalues of overview X T X and let ˆ u 1 , ..., ˆ u n ∧ d denote corresponding eigenvectors with unit Statistical setting length. Principal component regression � ˆ u 1 , ..., ˆ u n ∧ d are the principal components of X . Weak consistency and big data with n = 2 � Let U k = (ˆ u 1 · · · ˆ u k ) be the d × k matrix with columns given by Risk approximations and consistency u 1 , ..., ˆ ˆ u k , for 1 ≤ k ≤ n ∧ d . In its most basic form, principal component regression involves regressing y on XU k for some (typically small) k , and Numerical results k X T XU k ) − 1 U T taking ˆ β = U k ( U T k X T y . Conclusions and future directions � In the problem considered here, Cov( x i ) = τ 2 I + η 2 γ 2 d uu T has a single eigenvector larger than τ 2 and the corresponding eigenvector is parallel to β . Thus, it is natural to take k = 1 and consider the principal component regression (PCR) estimator u T 1 X T y ˆ u 1 = 1 ˆ u T 1 X T y ˆ ˆ ˆ β pcr = u 1 . 1 X T X ˆ u T ˆ l 1 u 1 DIMACS, May 16, 2013 – 13 / 26

Introduction and overview Statistical setting Principal component regression Weak consistency and big data with n = 2 Weak consistency and big data with Risk approximations and consistency n = 2 Numerical results Conclusions and future directions DIMACS, May 16, 2013 – 14 / 26

PCR with n = 2 Introduction and � As a warm-up for the general n setting, we consider the special overview case where n = 2 . Statistical setting Principal component � When n = 2 , the PCR estimator ˆ β pcr has an especially simple form regression because the largest eigenvalue of X T X and its corresponding Weak consistency and big data with n = 2 eigenvector are given explicitly by Risk approximations and consistency � � 1 || x 1 || 2 + || x 2 || 2 + � ( || x 1 || 2 − || x 2 || 2 ) 2 + 4( x T l 1 = 1 x 2 ) 2 , Numerical results 2 Conclusions and future l 1 − || x 2 || 2 directions ˆ ∝ x 1 + x 2 . u 1 x T 1 x 2 √ � Recall that x i = h i γ d u i + ǫ i . Using the large d approximations || x i || 2 h 2 i γ 2 d + τ 2 d ≈ x T h 1 h 2 γ 2 d ≈ 1 x 2 leads to... DIMACS, May 16, 2013 – 15 / 26

big data with n = 2 Lee Dicker Department of Statistics Rutgers - PowerPoint PPT Presentation

One-shot learning and big data with n = 2 Lee Dicker Department of Statistics Rutgers University Joint work w/Dean Foster DIMACS, May 16, 2013 DIMACS, May 16, 2013 1 / 26 Introduction and overview Statistical setting Principal

Machine Learning Anders Holst SICS Big Data Analytics Analysis Big Data Big Value Big Data

Big Data Algorithms with Medical Applications Yixin Chen Outline Challenges to big data

CS535 Big Data 1/22/2020 Sangmi Lee Pallickara CS535 Big Data | Computer Science Department

COMP9313: Big Data Management Introduction to Big Data Management What is big data? Tweeted by

HOW BIG IS BIG DATA FOR AN INSURER LIKE AXA? CHALLENGES & OPPORTUNITIES Paris Big Data

BIG DATA CONFERENCE How to transform data into money using Big Data technologies INTRO THE

BIG DATA: Revolutionizing construction business through socmed data mining REVOLUTIONIZING

Getting the Big (Data) Picture Eva Andreasson , Cloudera Big Data? Todays Big Data Landscape

Fundamentals of Big Data BIG DATA F UN DAMEN TALS W ITH P YS PARK Upendra Devisetty Science

Big Data Analytics: What is Big Data? Stony Brook University CSE545, Fall 2016 the inaugural

Big Data Analytics: What is Big Data? H. Andrew Schwartz Stony Brook University CSE545, Fall

HPE SecureData for Big Data Platform HPE Vertica Big Data Platform HPE Security Data

BIG DATA IN HIGH ENERGY PHYSICS Igor Mandrichenko Big Data meeting 4/3/2015 What is Big Data ?

BIG DATA 2 This is the Big Data era Big Data are linked System G WHAT IS GRAPH COMPUTING

From Big Data Management to Big Data Science 1 What is next? Real big data is widely available

CS535 Big Data 2/5/2020 Week 3- B Sangmi Lee Pallickara CS535 Big Data | Computer Science |

Logic It s so easy even Through the computers can Looking Glass do it! 0 SAFE test Can

Sets and Relations Lecture 8 Sets: Basics Unordered collection of elements e.g.: Z , R

Nucleosynthesis across the Galaxy: AGB Stars and Neutron Stars Mergers Diego Vescovi 1,2,3 ,

ANDES WP2: Uncertainties and covariances of nuclear data Arjan Koning, NRG ANDES trimester

Welcome Pockets & Prospects: Loneliness, Social Isolation & Social Enterprise Showcase

Software systems through complex networks science Lovro Subelj & Marko Bajec University

F UE L RE VE NUE INDE XING T HE REG I O NAL T RANSPO RT AT I O N CO MMI SSI O N O

Welcome! Public Transportation and Right-of-Way: Making the Connection will begin at 2:00 p.m.

big data with n = 2 Lee Dicker Department of Statistics Rutgers - PowerPoint PPT Presentation

One-shot learning and big data with n = 2 Lee Dicker Department of Statistics Rutgers University Joint work w/Dean Foster DIMACS, May 16, 2013 DIMACS, May 16, 2013 1 / 26 Introduction and overview Statistical setting Principal

Machine Learning Anders Holst SICS Big Data Analytics Analysis Big Data Big Value Big Data

Big Data Algorithms with Medical Applications Yixin Chen Outline Challenges to big data

CS535 Big Data 1/22/2020 Sangmi Lee Pallickara CS535 Big Data | Computer Science Department

COMP9313: Big Data Management Introduction to Big Data Management What is big data? Tweeted by

HOW BIG IS BIG DATA FOR AN INSURER LIKE AXA? CHALLENGES &amp; OPPORTUNITIES Paris Big Data

BIG DATA CONFERENCE How to transform data into money using Big Data technologies INTRO THE

BIG DATA: Revolutionizing construction business through socmed data mining REVOLUTIONIZING

Getting the Big (Data) Picture Eva Andreasson , Cloudera Big Data? Todays Big Data Landscape

Fundamentals of Big Data BIG DATA F UN DAMEN TALS W ITH P YS PARK Upendra Devisetty Science

Big Data Analytics: What is Big Data? Stony Brook University CSE545, Fall 2016 the inaugural

Big Data Analytics: What is Big Data? H. Andrew Schwartz Stony Brook University CSE545, Fall

HPE SecureData for Big Data Platform HPE Vertica Big Data Platform HPE Security Data

BIG DATA IN HIGH ENERGY PHYSICS Igor Mandrichenko Big Data meeting 4/3/2015 What is Big Data ?

BIG DATA 2 This is the Big Data era Big Data are linked System G WHAT IS GRAPH COMPUTING

From Big Data Management to Big Data Science 1 What is next? Real big data is widely available

CS535 Big Data 2/5/2020 Week 3- B Sangmi Lee Pallickara CS535 Big Data | Computer Science |

Logic It s so easy even Through the computers can Looking Glass do it! 0 SAFE test Can

Sets and Relations Lecture 8 Sets: Basics Unordered collection of elements e.g.: Z , R

Nucleosynthesis across the Galaxy: AGB Stars and Neutron Stars Mergers Diego Vescovi 1,2,3 ,

ANDES WP2: Uncertainties and covariances of nuclear data Arjan Koning, NRG ANDES trimester

Welcome Pockets &amp; Prospects: Loneliness, Social Isolation &amp; Social Enterprise Showcase

Software systems through complex networks science Lovro Subelj &amp; Marko Bajec University

F UE L RE VE NUE INDE XING T HE REG I O NAL T RANSPO RT AT I O N CO MMI SSI O N O

Welcome! Public Transportation and Right-of-Way: Making the Connection will begin at 2:00 p.m.

HOW BIG IS BIG DATA FOR AN INSURER LIKE AXA? CHALLENGES & OPPORTUNITIES Paris Big Data

Welcome Pockets & Prospects: Loneliness, Social Isolation & Social Enterprise Showcase

Software systems through complex networks science Lovro Subelj & Marko Bajec University