Imprecise probability models for inference in exponential families - - PowerPoint PPT Presentation

imprecise probability models for inference in exponential
SMART_READER_LITE
LIVE PREVIEW

Imprecise probability models for inference in exponential families - - PowerPoint PPT Presentation

Imprecise probability models for inference in exponential families Erik Quaeghebeur & Gert de Cooman SYSTeMS research group p.1/4 Who we are & what we do Erik Quaeghebeur , PhD student of Gert de Cooman , SYSTeMS research group,


slide-1
SLIDE 1

Imprecise probability models for inference in exponential families

Erik Quaeghebeur & Gert de Cooman SYSTeMS research group

– p.1/4

slide-2
SLIDE 2

Who we are & what we do

Erik Quaeghebeur, PhD student of Gert de Cooman, SYSTeMS research group, Ghent University, Belgium.

– p.2/4

slide-3
SLIDE 3

Who we are & what we do

Erik Quaeghebeur, PhD student of Gert de Cooman, SYSTeMS research group, Ghent University, Belgium. Current research interests:

– p.2/4

slide-4
SLIDE 4

Who we are & what we do

Erik Quaeghebeur, PhD student of Gert de Cooman, SYSTeMS research group, Ghent University, Belgium. Current research interests: extreme lower probabilities;

– p.2/4

slide-5
SLIDE 5

Who we are & what we do

Erik Quaeghebeur, PhD student of Gert de Cooman, SYSTeMS research group, Ghent University, Belgium. Current research interests: extreme lower probabilities; (partition) exchangeability;

– p.2/4

slide-6
SLIDE 6

Who we are & what we do

Erik Quaeghebeur, PhD student of Gert de Cooman, SYSTeMS research group, Ghent University, Belgium. Current research interests: extreme lower probabilities; (partition) exchangeability; exponential families.

  • – p.2/4
slide-7
SLIDE 7

Socratic dialogue

– p.3/4

slide-8
SLIDE 8

Our poster: the technical details

Theory on the left. . . . . . examples on the right. Let me guide you through. Ask me questions.

I       

E Q & G  C

SYSTeMS Research Group Department of Electrical Energy, Systems & Automation, Ghent University Technologiepark 914, B-9052 Zwijnaarde, Belgium {Erik.Quaeghebeur,Gert.deCooman}@UGent.be

E 

An exponential family Consider taking i.i.d. samples x (sample space X) of a random variable that is dis- tributed according to an exponential family with probability function of the form Ef(x | ψ) = a(x) exp(ψ, τ(x) − b(ψ)), with functions a : X → R+, b : Ψ → R and with canonical parameter ψ ∈ Ψ and sufficient statistic τ : X → T . The conjugate family By looking at Ef(x | ·) as a likelihood function Lx : Ψ → R+, we can write down the probability density function of the corresponding family of conjugate distributions, CEf(ψ | n, y) = c(n, y) exp(n ψ, y − b(ψ)), with normalization factor c and two parameters which can be given specific interpreta- tions: a (pseudo)count n ∈ R+ and an average sufficient statistic y ∈ Y = co(T ). The predictive family The probability function of the corresponding family of predictive distributions can be derived by by combining Lx and CEf(· | n, y), PEf(x | n, y) =
  • Ψ
CEf(· | n, y)Lx = c(n, y)a(x) c(n + 1, ny+τ(x) n+1 ) . Example: Multinomial sampling In this case, the one sample likelihood function is a multivariate Bernoulli Br(x | θ), the conjugate density function is a Dirichlet Di(θ | ny, ny0) and the predictive mass function is a Dirichlet-multinomial DiMn(x | ny, ny0), where x ∈ {0, 1}d : x ≤ 1; θ ∈ (0, 1)d : θ < 1, θ0 = 1 − i θi; τ(x) = x; ψ(θ) =
  • ln(θi
θ0) d i=1; y ∈ (0, 1)d : y < 1, y0 = 1 − i yi; a = 1; b(ψ(θ)) = ln(θ0); c(n, y) = Γ(n) Γ(ny0) i Γ(nyi). Example: Normal sampling Now, the one sample likelihood function is a Normal N(x | µ, σ), the conjugate density function is a Normal-gamma N(µ | y1, nλ)Ga(λ | n + 3 2 , n
  • y2 − y12
2 ) and the predictive density function is a Student St(x | y1, n+3 n+1 1 y2−y2 1, n + 3), where x ∈ R; µ ∈ R, λ ∈ R+, σ2 = 1 λ; τ(x) = (x, x2); ψ(λ, µ) = (λµ, −1 2λ); y ∈ R × R+ : y2 − y12 > 0; a = 1 √ 2π; b(ψ(µ, λ)) = λµ2−ln(λ) 2 ; c(n, y) = 2 √n √ 2π
  • n[y2−y12]
2 n+3 2 Γ(n+3 2 ) .

I  

The conjugate model The conjugate model for inference in an exponential family is a lower prevision, defined as the lower envelope of a set of linear previsions that correspond to members of the conjugate family: PC( f | nk, Yk) = inf y∈Yk PC( f | nk, y), where PC( f | nk, y) =
  • Ψ
CEf(· | nk, y) f, f ∈ L(Ψ). Here, L(Ψ) is the set of all measurable gambles (bounded functions) on Ψ and Yk is some subset of Y. The predictive model The predictive model for inference in an exponential family is defined similarly: PP( f | nk, Yk) = inf y∈Yk PP( f | nk, y), where PP( f | nk, y) =
  • X
PEf(· | nk, y) f, f ∈ L(X). Updating and imprecision A prior choice n0 and bounded subset Y0 of Y for the parameters of these models must be made. When k samples are taken—with sufficient statistic τk ∈ T —, these can be used to update the models (Bayes’ rule) by obtaining posterior parameters nk = n0 + k, Yk =        n0y + τk n0 + k : y ∈ Y0        ⊂ Y. The imprecision of the inferences of these models are proportional to the volume of co(Yk). So the imprecision decreases with k at a rate that decreases with n0. Example of updating: Multinomial sampling Yk−1 nk−1 = 2 ✁ ✁ ✁ 2 1 “1” observed Yk nk = 3 ✂ ✂ ✂ 2 1 Example of updating: Normal sampling y1 y2 Yk−1 nk−1 = 2 “x” observed y1 y2 Yk nk = 3 τ(x)

A : 

Credal classification A classifier maps attribute values a ∈ A to one or more classes c ∈ C. In a credal classifier, a conditional lower prevision P(· | A) on L(C) is used to make pairwise com- parisons of classes c′ and c′′, given attribute values a. The criterion used is c′ ≻ c′′ ⇔ P(Ic′ − Ic′′ | a) > 0. The maximal elements of the resulting strict partial order are the output of the classifier. The computational complexity of the optimization problem that has to be solved for com- paring two classes c′ and c′′ depends highly on the type of attributes that are used. Creating a credal classifier We derive P(· | A) by conditioning a joint lower prevision E on L(C × A). E is the marginal extension of a class model P on L(C) and an attribute model P(· | C) on L(A). When the numer of classes is finite and the attribute values are distributed according to an exponential family, we can use predictive models PP(· | nC, YC) and PP(· | nA|C, YA|C) for the class and attribute models. Example optimization problem: multiple discrete attributes c′ ≻ c′′ ⇔ inf y∈YC         yc′
  • i
inf yAi|c′∈YAi|c′ yai|c′ − yc′′
  • i
sup yAi|c′′∈YAi|c′′ yai|c′′         . The inf / supyAi|c∈YAi|c of yai|c are simple functions of yc that guarantee the convexity of the objective function. So this problem can easily be solved numerically. Example optimization problem: one normal attribute The criterion is the same as above, but with the sums replaced by the inf / supyA|c∈YA|c
  • f
  • nA|c
nA|c + 1 Γ(nA|c+4 2 ) Γ(nA|c+3 2 ) [nA|cyA|c,2 − nA|cy2 A|c,1] nA|c+3 2 [nA|cyA|c,2 + a2 − 1 nA|c+1[nA|cyA|c,1 + a]2] nA|c+4 2 . It is not yet clear if and how this problem can be solved.

– p.4/4

slide-9
SLIDE 9

Time for questions!

????????????????? ? ????????

– p.5/4