Some results on Imprecise discriminant analysis 11th Workshop on - - PowerPoint PPT Presentation

some results on imprecise discriminant analysis
SMART_READER_LITE
LIVE PREVIEW

Some results on Imprecise discriminant analysis 11th Workshop on - - PowerPoint PPT Presentation

Some results on Imprecise discriminant analysis 11th Workshop on Principles and Methods of Statistical Inference with Interval Probability CARRANZA-ALARCON Yonatan-Carlos Ph.D. Candidate in Computer Science DESTERCKE Sbastien Ph.D Director


slide-1
SLIDE 1

Some results on Imprecise discriminant analysis

11th Workshop on Principles and Methods of Statistical Inference with Interval Probability

CARRANZA-ALARCON Yonatan-Carlos

Ph.D. Candidate in Computer Science

DESTERCKE Sébastien

Ph.D Director

30 July 2018 to 01 August 2018

slide-2
SLIDE 2

Classification Imprecise Classification Future work Conclusions Références

Overview

Imprecise Discriminant Analysis Classification

  • Classification

❍ Decision Making ❍ Discriminant Analysis

  • Imprecise Classification

❍ Imprecise Decision ❍ Imprecise Linear discriminant analysis

  • Future work
  • Conclusions

11th Workshop on Principles and Methods of Statistical Inference with Interval Probability 2

slide-3
SLIDE 3

Classification Imprecise Classification Future work Conclusions Références

Overview

  • Classification

❍ Decision Making ❍ Discriminant Analysis

  • Imprecise Classification

❍ Imprecise Decision ❍ Imprecise Linear discriminant analysis

  • Future work
  • Conclusions

11th Workshop on Principles and Methods of Statistical Inference with Interval Probability 3

slide-4
SLIDE 4

Classification Imprecise Classification Future work Conclusions Références

Classification - Setting

A classic classification problem is composed of :

  • Data training D = {xi,yi}N

i=0 such as :

❍ (Input) xi ∈ X are regressors or features (often xi ∈ Rp). ❍ (Output) yi ∈ K is a response category variable, with

K = { m1,...,mK }

11th Workshop on Principles and Methods of Statistical Inference with Interval Probability 4

slide-5
SLIDE 5

Classification Imprecise Classification Future work Conclusions Références

Classification - Setting

A classic classification problem is composed of :

  • Data training D = {xi,yi}N

i=0 such as :

❍ (Input) xi ∈ X are regressors or features (often xi ∈ Rp). ❍ (Output) yi ∈ K is a response category variable, with

K = { m1,...,mK }

Objective

Given training data D = {xi,yi}N

i=0, we need to learn a classification

rule : φ : X → Y in order to predict a new observation φ(x∗)

11th Workshop on Principles and Methods of Statistical Inference with Interval Probability 4

slide-6
SLIDE 6

Classification Imprecise Classification Future work Conclusions Références

Classification - Outline (Example)

Getting Training Data

11th Workshop on Principles and Methods of Statistical Inference with Interval Probability 5

slide-7
SLIDE 7

Classification Imprecise Classification Future work Conclusions Références

Classification - Outline (Example)

Getting Training Data

Learning a classification rule : φ : X → Y

11th Workshop on Principles and Methods of Statistical Inference with Interval Probability 5

slide-8
SLIDE 8

Classification Imprecise Classification Future work Conclusions Références

Classification - Outline (Example)

Getting Training Data

Learning a classification rule : φ : X → Y

Predict class for new instances :

  • y∗ := φ(x∗|X,y)

11th Workshop on Principles and Methods of Statistical Inference with Interval Probability 5

slide-9
SLIDE 9

Classification Imprecise Classification Future work Conclusions Références

Classification - Outline (Example)

Getting Training Data

Learning a classification rule : φ : X → Y

Predict class for new instances :

  • y∗ := φ(x∗|X,y)

But :

  • How can we learn the “classification rule” (model) from

training data?

11th Workshop on Principles and Methods of Statistical Inference with Interval Probability 5

slide-10
SLIDE 10

Classification Imprecise Classification Future work Conclusions Références Decision Making Discriminant Analysis

Decision Making in Statistic

  • In statistic : classification rule often seen as a decision-making

problem under risk of getting missclassification. R(y,ϕ(X)) = argmin

ϕ(X)∈K

EX ×Y [L (y,ϕ(X))] (1)

  • Under 1/0 loss function L , minimizing R equivalent to :

φ(x∗|X,y) := argmax

mk∈K

P(y = mk|X = x∗) (2)

  • Where :
  • 1. The predicted class

y∗ = φ(x∗|X,y) is the most probable (equation (2)).

  • 2. This last equation (2) is also known as Bayes classifier [1, pp.

21].

11th Workshop on Principles and Methods of Statistical Inference with Interval Probability 6

slide-11
SLIDE 11

Classification Imprecise Classification Future work Conclusions Références Decision Making Discriminant Analysis

Decision Making in Statistic

  • In statistic : classification rule often seen as a decision-making

problem under risk of getting missclassification. R(y,ϕ(X)) = argmin

ϕ(X)∈K

EX ×Y [L (y,ϕ(X))] (1)

  • Under 1/0 loss function L , minimizing R equivalent to :

φ(x∗|X,y) := argmax

mk∈K

P(y = mk|X = x∗) (2)

  • Where :
  • 1. The predicted class

y∗ = φ(x∗|X,y) is the most probable (equation (2)).

  • 2. This last equation (2) is also known as Bayes classifier [1, pp.

21].

11th Workshop on Principles and Methods of Statistical Inference with Interval Probability 6

slide-12
SLIDE 12

Classification Imprecise Classification Future work Conclusions Références Decision Making Discriminant Analysis

Decision Making in Statistic Definition (Preference ordering [5, pp. 47])

With general loss L (·,·), ma is preferred to mb, denoted by ma ≻ mb, if and only if : EP[L (·,ma)|x∗] < EP[L (·,mb)|x∗] In the particular case where L (·,·) is the 0/1 loss function we get :

ma mb ⇐

⇒ P(y = ma|X = x∗) P(y = mb|X = x∗) > 1 where P(Y = ma|X = x∗) is the class probability. We then take the maximal element of the complete order , i.e.

miK miK−1 .... mi1 ⇐

⇒ P(y = miK |x∗) ≥ .... ≥ P(y = mi1|x∗)

11th Workshop on Principles and Methods of Statistical Inference with Interval Probability 7

slide-13
SLIDE 13

Classification Imprecise Classification Future work Conclusions Références Decision Making Discriminant Analysis

(Precise) Discriminant Analysis

Applying Baye’s rules to P(Y = ma|X = x∗) : P(y = mk|X = x∗) = P(X = x∗|y = mk)P(y = mk)

  • ml∈K P(X = x∗|y = ml)P(y = ml)

where πk := PY=yk such as

K

  • j

πj = 1 and Gk := PX|Y=mk ∼ N (µk,Σk) A frequentist point estimation :

  • πk = nk

N

  • µk = 1

nk

nk

  • i=1

xi,k

  • Σk =

1 N −nk

nk

  • i=1

(xi,k −xk)(xi,k −xk)t

11th Workshop on Principles and Methods of Statistical Inference with Interval Probability 8

slide-14
SLIDE 14

Classification Imprecise Classification Future work Conclusions Références

Overview

  • Classification

❍ Decision Making ❍ Discriminant Analysis

  • Imprecise Classification

❍ Imprecise Decision ❍ Imprecise Linear discriminant analysis

  • Future work
  • Conclusions

11th Workshop on Principles and Methods of Statistical Inference with Interval Probability 9

slide-15
SLIDE 15

Classification Imprecise Classification Future work Conclusions Références Imprecise Decision Imprecise Linear discriminant analysis

Decision Making in Imprecise Probabilities Definition (Partial Ordering by Maximality Criterion)

Let P a set of probabilities, then ma is preferred to mb if the cost of exchanging ma with mb have a positive lower expectation :

ma ≻M mb ⇐

⇒ inf

P∈P EP[L (·,mb)−L (·,ma)|x∗] > 0

if L (·,·) is 1/0 loss function, so :

ma ≻M mb ⇐

⇒ inf

P∈P

P(y = ma|X = x∗) P(y = mb|X = x∗) > 1

11th Workshop on Principles and Methods of Statistical Inference with Interval Probability 10

slide-16
SLIDE 16

Classification Imprecise Classification Future work Conclusions Références Imprecise Decision Imprecise Linear discriminant analysis

Decision Making in Imprecise Probabilities

By applying Bayes theorem on P(y = ma|X = x∗), so :

ma ≻M mb ⇐

inf

PX|y∈P1,Py∈P2

P(x∗|y = ma)P(y = ma) P(x∗|y = mb)P(y = mb) > 1 The resulting set of cautions decisions is : YM = {ma ∈ K | ∃mb : ma ≻M mb} For instance, if K = {ma,mb,mc}, we can have :

  • YM = {ma ≻M mb,mc ≻M mb,ma ≻≺M mc} = {ma,mc}

11th Workshop on Principles and Methods of Statistical Inference with Interval Probability 11

slide-17
SLIDE 17

Classification Imprecise Classification Future work Conclusions Références Imprecise Decision Imprecise Linear discriminant analysis

Imprecise Linear Discriminant Analysis (ILDA)

Objective : Making imprecise the parameter mean µk of each Gaussian distribution family Gk := PX|Y=mk ∼ N (µk, Σ) Assumptions :

  • Covariances precisely estimated and Homoscedasticity, i.e.

Σk = Σ :

  • Σ =

1 (N −K)

K

  • k=1

nk

  • i=1

(xi,k −xk)(xi,k −xk)t

  • Prior probabilities precisely estimated :

πk = nk

N

11th Workshop on Principles and Methods of Statistical Inference with Interval Probability 12

slide-18
SLIDE 18

Classification Imprecise Classification Future work Conclusions Références Imprecise Decision Imprecise Linear discriminant analysis

Decision Making in ILDA

We take the previously maximality criterion and assumptions, so :

ma ≻M mb ⇐

inf

PX|y∈P1,Py∈P2

P(x∗|y = ma)P(y = ma) P(x∗|y = mb)P(y = mb) > 1 (3) ⇐ ⇒

inf

PX|y∈P1

P(x∗|y = ya) πa P(x∗|y = yb) πb > 1 Given Gk :=PX|Y=mk ∼ N (µk, Σ) are independent : ⇐ ⇒ infP∈Ga P(x∗|y = ya)

  • πa

supP∈Gb P(x∗|y = yb)

πb > 1

11th Workshop on Principles and Methods of Statistical Inference with Interval Probability 13

slide-19
SLIDE 19

Classification Imprecise Classification Future work Conclusions Références Imprecise Decision Imprecise Linear discriminant analysis

Decision Making in ILDA (cont...)

Then, the problem reduces to two optimisation problems : P(x∗|y = ya) = inf

P∈Ga

P(x∗|y = ya) (4) P(x∗|y = yb) = sup

P∈Gb

P(x∗|y = yb) (5) As PX|Y=mk ∼ N (µk, Σ) and Σb = Σ, so : P(x∗|y = ya) ⇐ ⇒ µa = inf

P∈Ga

− 1 2(x∗ −µa)T Σ−1(x∗ −µa) (6) P(x∗|y = yb) ⇐ ⇒ µb = sup

P∈Gb

− 1 2(x∗ −µb)T Σ−1(x∗ −µb) (7)

11th Workshop on Principles and Methods of Statistical Inference with Interval Probability 14

slide-20
SLIDE 20

Classification Imprecise Classification Future work Conclusions Références Imprecise Decision Imprecise Linear discriminant analysis

Imprecise Linear Discriminant Analysis

Now, the question is : How could we make imprecise the unknown mean parameter µk ?

  • Confidence intervals.
  • Neighbors around µk.
  • P-Box
  • Robust Bayesian
  • ....

We would use robust Bayesian with conjugate distributions for exponential families

11th Workshop on Principles and Methods of Statistical Inference with Interval Probability 15

slide-21
SLIDE 21

Classification Imprecise Classification Future work Conclusions Références Imprecise Decision Imprecise Linear discriminant analysis

Imprecise Linear Discriminant Analysis

Bayesian inference context In classic Bayesian inference is based on two components :

  • The distribution of the observed data conditional on its

unknown parameters (or Likelihood).

  • A belief information of expert (or prior distribution).

In order to build procedures of posterior inference on the unknown parameter, in this case µk. p(µk | X,y = mk) ∝ p(X | µk,y = mk)p(µk) (8) Where p(µk) ∈ Pµk could belong a set of prior distributions Pµk

11th Workshop on Principles and Methods of Statistical Inference with Interval Probability 16

slide-22
SLIDE 22

Classification Imprecise Classification Future work Conclusions Références Imprecise Decision Imprecise Linear discriminant analysis

Imprecise Linear Discriminant Analysis

We propose to use a set of prior distributions based on near-ignorance approach of [6, eq. 16] : M µ

0 =

  • µ ∈ Rd
  • p(µ|m) ∝ exp(ℓT µ), m = [ℓ1,...,ℓd]T ∈ L
  • (9)

where m is a hyper-parameter which belong to convex space L : L =

  • ℓ ∈ Rd : ℓi ∈ [−ci,ci],ci > 0,i = {1,...,d}
  • [6] Alessio BENAVOLI et Marco ZAFFALON. “Prior near ignorance for inferences in the k-parameter

exponential family”. In : Statistics 49.5 (2015), p. 1104-1140

Remark

M µ

0 satisfy the four minimal properties that model of prior

ignorance require : invariance, near-ignorance, learning and convergence (more details [6]).

11th Workshop on Principles and Methods of Statistical Inference with Interval Probability 17

slide-23
SLIDE 23

Classification Imprecise Classification Future work Conclusions Références Imprecise Decision Imprecise Linear discriminant analysis

Imprecise Linear Discriminant Analysis

By applying Baye’s rule (8) (or equation [6, eq 17]), we get a set of posterior distribution : M µk

nk =

  • µk
  • xnk ,m ∝ N

ℓ+nkxnk nk , 1 nk

  • Σ
  • ,
  • (10)

where xk = 1

nk

nk

i=1xi,k and ℓ ∈ L, and :

inf

M

µk nk

E[µk |xnk ,ℓ] = E[µk |xnk ,m] = −ℓ+nkxnk n (11)

sup

M

µk nk

E[µk |xnk ,ℓ] = E[µk |xnk ,m] = ℓ+nkxnk nk (12)

11th Workshop on Principles and Methods of Statistical Inference with Interval Probability 18

slide-24
SLIDE 24

Classification Imprecise Classification Future work Conclusions Références Imprecise Decision Imprecise Linear discriminant analysis

Imprecise Linear Discriminant Analysis

The two last estimations describe a convex set around µ :

Gk =     

  • µk ∈ Rd
  • µi,k ∈

−ci +nkxi,nk nk , ci +nkxi,nk nk

  • ,

∀i = {1,...,d}     

That we use as constraint in on our two optimisation problems. P(x∗|y = ma) ⇐ ⇒ µa = argmax

  • µa∈Ga

1 2 µT

a

Σ−1 µa +x∗T Σ−1 µa (NPQB) P(x∗|y = mb) ⇐ ⇒ µb = argmin

  • µb∈Gb

1 2 µT

b

Σ−1 µb +x∗T Σ−1 µb (PQB) First problem non-convex → solved through B&B method.

11th Workshop on Principles and Methods of Statistical Inference with Interval Probability 19

slide-25
SLIDE 25

Classification Imprecise Classification Future work Conclusions Références Imprecise Decision Imprecise Linear discriminant analysis

Example

11th Workshop on Principles and Methods of Statistical Inference with Interval Probability 20

slide-26
SLIDE 26

Classification Imprecise Classification Future work Conclusions Références Imprecise Decision Imprecise Linear discriminant analysis

Example (cont..)

11th Workshop on Principles and Methods of Statistical Inference with Interval Probability 21

slide-27
SLIDE 27

Classification Imprecise Classification Future work Conclusions Références Imprecise Decision Imprecise Linear discriminant analysis

Another Example with 3 class

  • 11th Workshop on Principles and Methods of Statistical Inference with Interval Probability

22

slide-28
SLIDE 28

Classification Imprecise Classification Future work Conclusions Références Imprecise Decision Imprecise Linear discriminant analysis

Experiments

Average utility-discounted accuracy measure of [4] u(y, YM) =

  • if y ∉

YM

α | YM| − β | YM|

else Where u65 with (α,β) = (1.6,0.6) and u80 with (α,β) = (2.2,1.2).

# Name # Obs. # Regr. # Classes a iris 150 4 3 b seeds 210 7 3 c glass 214 9 6 DLA IDLA Inference time # u65 u80 a 0.961 0.969 0.975 0.56 sec. b 0.959 0.959 0.962 1.50 sec. c 0.594 0.589 0.642 8.66 sec

11th Workshop on Principles and Methods of Statistical Inference with Interval Probability 23

slide-29
SLIDE 29

Classification Imprecise Classification Future work Conclusions Références

Overview

  • Classification

❍ Decision Making ❍ Discriminant Analysis

  • Imprecise Classification

❍ Imprecise Decision ❍ Imprecise Linear discriminant analysis

  • Future work
  • Conclusions

11th Workshop on Principles and Methods of Statistical Inference with Interval Probability 24

slide-30
SLIDE 30

Classification Imprecise Classification Future work Conclusions Références

Imprecise Quadratic discriminant analysis

(1) Release homoscedasticity assumption, i.e. Σk = Σ

  • µa = argmax 1

2 µT

a

Σ−1

k

µa −x∗T Σ−1

k

µa s.t. −cj +nxj,n n ≤ µj,a ≤ ci +nxj,n n ∀j = {1,...,d} (PQB)

  • Making imprecise P(y = ma) = [P(y = ma),P(y = ma)] and to

solve :

inf

PX|y∈P1,Py∈P2

P(x∗|y = ma)P(y = ma) P(x∗|y = mb)P(y = mb) > 1

11th Workshop on Principles and Methods of Statistical Inference with Interval Probability 25

slide-31
SLIDE 31

Classification Imprecise Classification Future work Conclusions Références

Imprecise Quadratic discriminant analysis

Space Convex Matrices S+ (2) Make imprecise the covariance matrice (i.e. Σk or Σ) by using a prior Wishart distribution : Σk = inf

Ω∈Sn

+

E[Σk|X,y = mk,τ0,Ω] (13) Σk = inf

Ω∈Sn

+

Ω+(n−1) ΣMLE

k

n+τ0 (14) where ΣMLE

k

is the maximal likelihood estimator of covariance matrice Σk and Sn

+ is a convex space of families of positive

semi-definite positive matrices.

11th Workshop on Principles and Methods of Statistical Inference with Interval Probability 26

slide-32
SLIDE 32

Classification Imprecise Classification Future work Conclusions Références

Imprecise Quadratic discriminant analysis

Space Convex Matrices S+ In [2], we can find a good intuitions for minimize the last optimization problem, where Φǫ is a perturbation in the neighbourhood of Ω0 prior parameter value, and ||·||F is Frobenius norm. argmin

Ω0∈Sn

+

Σ = Ω0 +(n−1) Σe n+τ0 s.t. Σ Xi, ∀Xi ∈ Sn

+,i = {1,...,m}

Sn

+ = {Ω0 | ||Ω0 −Φǫ||F Ω0 ||Ω0 +Φǫ||F}

11th Workshop on Principles and Methods of Statistical Inference with Interval Probability 27

slide-33
SLIDE 33

Classification Imprecise Classification Future work Conclusions Références

Imprecise Quadratic discriminant analysis

Space Convex of eigenvalues or eigenvectors (3) Imprecise eigenvalues and eigenvectors of Σk. We’ll propose to use Ω estimation of [3, §3], i.e Ω =

tr(ΣMLE

k

) d

, and then applying it the spectral decomposition :

  • Ω+(n−1)

ΣMLE

k

n+τ0 ⇐ ⇒ tr(d

j=1 λjujut j )

d(n+τ0) I+ n−1 n+τ0

d

  • j=1

λjujut

j

(15)

d

  • j=1

λj tr(ujut

j )

d I+(n−1)ujut

j

  • (16)

11th Workshop on Principles and Methods of Statistical Inference with Interval Probability 28

slide-34
SLIDE 34

Classification Imprecise Classification Future work Conclusions Références

Imprecise Quadratic discriminant analysis

Space Convex of eigenvalues or eigenvectors In [3], it has been proven that eigenvalues have estimations ei- ther biased high (overestimated) or biased low (underestimated) for small and noisy samples. Then, we could assume the variability of direction is “correctly” es- timated (i.e eigenvectors)

λ =argmax

λ∈Sn

+

d

  • j=1

λj tr(ujut

j )

d I+(n−1)ujut

j

  • s.t Sn

+ =

  • Σ =

d

  • j=1

λjvvt

  • Σ

Σ Σ

  • 11th Workshop on Principles and Methods of Statistical Inference with Interval Probability

29

slide-35
SLIDE 35

Classification Imprecise Classification Future work Conclusions Références

Overview

  • Classification

❍ Decision Making ❍ Discriminant Analysis

  • Imprecise Classification

❍ Imprecise Decision ❍ Imprecise Linear discriminant analysis

  • Future work
  • Conclusions

11th Workshop on Principles and Methods of Statistical Inference with Interval Probability 30

slide-36
SLIDE 36

Classification Imprecise Classification Future work Conclusions Références

Conclusions

Imprecise Analysis Discriminant Classification

  • Increasing in imprecision on the estimators has allowed us to

be more cautious in doubt and to improve the prediction of classification [7].

  • More experiments with all imprecise components.
  • Creation of new imprecise statistic models for a sensibility

analysis and a more (cautious) robust prediction.

11th Workshop on Principles and Methods of Statistical Inference with Interval Probability 31

slide-37
SLIDE 37

Classification Imprecise Classification Future work Conclusions Références

References

Jerome FRIEDMAN, Trevor HASTIE et Robert TIBSHIRANI. The elements of statistical learning. T. 1. Springer series in statistics New York, 2001. Stephen BOYD et Lieven VANDENBERGHE. Convex optimization. Cambridge university press, 2004. Santosh SRIVASTAVA. Bayesian Minimum Expected Risk Estimation of Distributions for Statistical Learning. University of Washington, 2007. Marco ZAFFALON, Giorgio CORANI et Denis MAUÁ. “Evaluating credal classifiers by utility-discounted predictive accuracy”. In : International Journal of Approximate Reasoning 53.8 (2012), p. 1282-1301. James O BERGER. Statistical decision theory and Bayesian analysis. Springer Science & Business Media, 2013. Alessio BENAVOLI et Marco ZAFFALON. “Prior near ignorance for inferences in the k-parameter exponential family”. In : Statistics 49.5 (2015), p. 1104-1140. Yonatan-Carlos CARRANZA-ALARCON et Sébastien DESTERCKE. “Analyse Discriminante Imprécise basé sur l’inference Bayésienne robuste”. In : 27 emes rencontres francophones sur la logique floue et ses applications (2018). 11th Workshop on Principles and Methods of Statistical Inference with Interval Probability 32

slide-38
SLIDE 38

Classification Imprecise Classification Future work Conclusions Références 11th Workshop on Principles and Methods of Statistical Inference with Interval Probability 33

slide-39
SLIDE 39

Classification Imprecise Classification Future work Conclusions Références 11th Workshop on Principles and Methods of Statistical Inference with Interval Probability 34