Noisy-input classification of Fermi-LAT unidentified point-like - - PowerPoint PPT Presentation

noisy input classification of fermi lat unidentified
SMART_READER_LITE
LIVE PREVIEW

Noisy-input classification of Fermi-LAT unidentified point-like - - PowerPoint PPT Presentation

Noisy-input classification of Fermi-LAT unidentified point-like sources Bryan Zaldivar I F T / U A M M a d r i d work in progress with: Machine Learning Group Carlos Villacampa-Calvo, Javier Coronado-Blzquez, Eduardo Garrido


slide-1
SLIDE 1

Noisy-input classification of Fermi-LAT unidentified point-like sources

Bryan Zaldivar I F T / U A M M a d r i d Javier Coronado-Blázquez, Viviana Gammaldi Miguel A. Sánchez-Conde

Machine Learning Group

work in progress with: Carlos Villacampa-Calvo, Eduardo Garrido Merchán Daniel Hernández-Lobato

slide-2
SLIDE 2

Motivation and Data origin

4 F G L Fermi-catalog circa 5000 point-like sources, out of which ~ 1500 are unidentified (unID) Can we classify those unIDs alla supervised learning ? L

  • g
  • p

a r a b

  • l

a

Spectrum of a particular blazar

Available data contains 3 known classes: pulsars, quasars, blazars and 4 features: What if some of the unIDs are better classified as dark matter? Include the dark matter into the -plane!

improvement in fitting log-parabola vs. power-law significance of detection

1

slide-3
SLIDE 3

Data visualization

credits to Javier Coronado-Blázquez

dark matter class

2

slide-4
SLIDE 4

Data visualization

identified unidentified

  • unID’s seem to be distributed similarly to the ID’s
  • error bars on partially correlated also with

point size: value of

credits to Javier Coronado-Blázquez

3

slide-5
SLIDE 5

Machine Learning procedure Step # 1

slide-6
SLIDE 6

Standard classification without input uncertainties

warm up: want to know what the simplest thing to do can give you Considered classifiers:

  • N

a i v e B a y e s

  • L
  • g

i s t i c r e g r e s s i

  • n
  • R

a n d

  • m

F

  • r

e s t work in progress, but conceptually trivial... 5

slide-7
SLIDE 7

Next steps:

  • search for an out-of-the-box classifier dealing with noisy inputs
  • search for a paper addressing the classification with noisy inputs
  • call your ML-expert colleagues, ask them for references!
  • do it ourselves!!
slide-8
SLIDE 8

Machine Learning procedure Step # 2: incorporating input uncertainties

slide-9
SLIDE 9

In Bayesian approach, we build the predictive distribution for a new point

Bayesian classification with parametric models

Parametric models assume a specific form for the Likelihood of data Cross-entropy and assume a specific form for the function (e.g. a neural network) (softmax function)

: one-hot-encoding of

7

slide-10
SLIDE 10

Gaussian Process

GP approach is non-parametric: no predefined form for Instead you have a Gaussian distribution over functions (in case of regression)

  • Rasmussen & Williams, 2006

8

slide-11
SLIDE 11

Classification with noisy input using Gaussian Processes

  • As usual: introduce one output latent variable per point i per class k,

The predictive distribution for a class at a test point intractable Non-Gaussian Likelihood Gaussian posterior (new term) Variational Inference Costly: Sparse GP

  • NEW: introduce one input latent variable per point i

usual term 9

slide-12
SLIDE 12

Sparse Gaussian Process

If were sufficient statistics for , we were left with

  • here inspired in Titsias (2009)

involves inverting an NxN matrix, cost

Idea is to make inference on a smaller set of function points, which represent approximately the entire posterior over the N function points. Cost: 10

slide-13
SLIDE 13

Variational Inference

Minimize w.r.t.

Kullback-Leibler divergence:

  • Jordan, Ghahramani, Jaakkola & Saul, 1999

Idea is approximate the exact posterior distribution by an easier one (e.g. Gaussians) according to the variational principle 11

slide-14
SLIDE 14

Likelihood of the model

Common form in parametric models: “generalized Bernoulli” (the -log of which is the cross entropy) e.g. if 3 classes: 0.05 0.80 0.15

“misclassification noise”

Labelling rule: Likelihood for label at point i : (noiseless) Instead here

Misclassification noise included in the prior for Labelling noise (with probability e) also included:

  • D. Hernandez-Lobato, J.M. Hernandez-Lobato & P. Dupont, 2011

12

slide-15
SLIDE 15

Results ( p y t h

  • n

+ T e n s

  • r

F l

  • w

)

slide-16
SLIDE 16

Results on toy data

Noiseless model Rasmussen-like This work Input noise level 0.1

  • Err. rate

Input noise level 0.25

  • Err. rate

Input noise level 0.5

  • Err. rate

0.76 0.321 0.259 0.113 0.109 0.108 1.54 0.53 0.37 0.164 0.158 0.158 0.77 0.50 0.218 1.14 0.209 0.210

  • Found no published model against which to compare!
  • we compare with a standard GP classif. without noise
  • we modify an existing GP noise model for regression

McHutchon & Rasmussen, 2011

Generate a set (~100) of synthetic datasets to evaluate average performance

  • ex. of dataset

14

slide-17
SLIDE 17

Conclusions/work in progress

  • Unidentified point-like sources can be classified among predefined known classes

(including the potential dark matter class)

  • Interestingly, including the dark matter class into the well-known beta-plane

for point-like sources results in a reasonably good separability

  • Only non-straightforward issue with this problem:

inputs come with their own error bars surprisingly not yet explicitly addressed in the context of ML classification!

  • A warm-up classification exercise w/o error bars is being conducted
  • Error bars are incorporated in a Gaussian Process model for multiclass classification,

by treating the input as a noisy realization of extra latent variables to be learned.

  • Very satisfactory preliminary results with synthetic data
  • Time to apply it to real Fermi-LAT data!

Thank you!

slide-18
SLIDE 18

bckp

slide-19
SLIDE 19

Classification with error bars in the input (parametric approach) Suppose you have data “one-hot-encoding” e.g. If in class 2 Are noisy samples from unknown means assume Then the (- log) joint Likelihood of data can be written as e.g. a linear model, or a NN model