A model selection algorithm for mixture experiments including - - PowerPoint PPT Presentation

a model selection algorithm for mixture experiments
SMART_READER_LITE
LIVE PREVIEW

A model selection algorithm for mixture experiments including - - PowerPoint PPT Presentation

A model selection algorithm for mixture experiments including process variables Hugo Maruri and Eva Riccomagno Department of Statistics, London School of Economics and Dipartimento di Matematica, Universit` a di Genova mODa 8, Almagro, Spain


slide-1
SLIDE 1

A model selection algorithm for mixture experiments including process variables Hugo Maruri and Eva Riccomagno Department of Statistics, London School of Economics and Dipartimento di Matematica, Universit` a di Genova mODa 8, Almagro, Spain

marurimoda8.tex 1 June 2, 2007

slide-2
SLIDE 2

Abstract Experiments with mixture and process variables are often constructed as the cross product of a mixture and a factorial design. Often it is not possible to implement all the runs of the cross product design, or the cross product model is too large to be of practical interest. We propose a methodology to select a model with a given number of terms and minimal condition number. The search methodology is based

  • n weighted term orderings and can be extended to consider other

statistical criteria.

marurimoda8.tex 2 June 2, 2007

slide-3
SLIDE 3

Contents of the talk

  • 1. Mixture experiments with process variables and their models
  • 2. Homogeneous supports for mixtures
  • 3. An algorithm for model selection
  • 4. Examples
  • 5. Conclusions

marurimoda8.tex 3 June 2, 2007

slide-4
SLIDE 4

Mixture experiments with process variables

  • The response is assumed to depend on other factors apart from the

mixture components (Cornell, 2002).

  • Mixture factors are x = (x1, . . . , xk) and process variables

z = (z1, . . . , zq).

  • For instance, the one of z could be the amount of material used and

hence the name mixture-amount experiments.

  • The design is a finite set of points D ⊂ Rk+q. The projection of D
  • ver the x-space is Dx and Dz is the projection over the z-space.
  • Often Dx is a simplex centroid or simplex lattice design, while Dz is a

full factorial design.

marurimoda8.tex 4 June 2, 2007

slide-5
SLIDE 5

Example 1: The bread data set (Næs et al., 1998)

  • Three types of wheat flour (x1, x2, x3) and two process factors

(z1, z2).

  • Response: Loaf volume after baking.
  • Dx a simplex lattice design {3, 3} and Dz a 32 factorial. Design

D = Dx × Dz with 90 runs.

Dx x1 x3 x2 Dz z1 z2 Dx × Dz marurimoda8.tex 5 June 2, 2007

slide-6
SLIDE 6

Models for the combined effect of the factors (Prescott, 2004) Additive regression model y(x, z) = f(x) + g(z) + ε. (1) Complete cross product model y(x, z) = f(x)g(z) + ε (2) Intermediate models y(x, z) = f(x) + g(z) +

q

  • i=1

k

  • j=1

fij(xi, zj) + ε. (3) Often f is taken to be a Scheff´ e quadratic or cubic polynomial model, in a relevant parametrization, and g is a quadratic or cubic model.

marurimoda8.tex 6 June 2, 2007

slide-7
SLIDE 7

Models for the combined effect of the factors 2 A mixture amount model of the form y(x, m) = f0(x) + mf1(x) + . . . + mpfp(x) + ε is suggested in (Cornell, 2002) with fp(x) =

  • i

γ(p)

i xi +

  • i<j

γ(p)

ij xixj + . . . +

  • i1<...<il

γ(p)

1,...,lxi1 . . . xil,

p is a positive integer, l ≤ q and the γ(p) are regression parameters. ε are assumed i.i.d. errors. Proposal: Search for a submodel of the complete cross product using

  • hierarchy (divisibility) condition
  • a statistical criterion (minimal condition number)

marurimoda8.tex 7 June 2, 2007

slide-8
SLIDE 8

Homogeneous models in mixtures with CCA (Maruri et al., (2006)). See every d ∈ D in Pk−1(R), i.e. Cd = {αd}. 1) We construct the homogeneous ideal I(CD). 2) Given a term order τ we use GB-driven CoCoA code to obtain a model of degree s. D CD

×

x1 x2 s LT

Example 2 Simplex lattice {3, 2}. A model with s = 2 for any τ is {x2

1, x2 2, x2 3, x1x2, x1x3, x2x3}.

⇒ Linear relation with K-models (Draper 1998) and S-models (Scheff` e, 1958). Design (Cone) Ideal: all polynomials that vanish on the design (cone).

marurimoda8.tex 8 June 2, 2007

slide-9
SLIDE 9

An algorithm for model selection Cross product support Consider a product design D = Dx × Dz with no replicated runs. Let Ex = {xα : α ∈ Lx} and Ez = {zα : α ∈ Lz} be sets of linearly independent monomials in R[x1, . . . , xk]/I(Dx) and R[z1, . . . , zq]/I(Dz), respectively. Let Ex ⊗ Ez be the Kronecker product of Ex and Ez. Then Ex ⊗ Ez is a set of linearly independent monomials in R[x, z]/I(D). Moreover if also Dz and Dx have no replicated points, then it is a R-vector space basis and it has dimension nxnz where ni is the number of points in Di, i = z, x.

  • Tipically Ex and Ez have a simple structure derived from the designs

Dx and Dz

marurimoda8.tex 9 June 2, 2007

slide-10
SLIDE 10

An algorithm for model selection 2 Minimal condition number The condition number is defined as λ = λmax λmin (4) where λmax and λmin ≥ 0 are the maximum and minimum eigenvalues

  • f the information matrix XT

LXL and XL is the design-model matrix for

the model L.

  • Large values of λ indicate XT

LXL close to singular, i.e. λmin ≈ 0.

  • Small condition number λ indicates more stability in the least square

estimates and smaller variance inlation factor then big condition numbers.

  • Useful when searching among homogeneous models as it favours

Kronecker models, which are conjectured robust to miss-specification

  • f information matrix in mixtures (Prescott et al., 2002).

marurimoda8.tex 10 June 2, 2007

slide-11
SLIDE 11

An algorithm for model selection 3 Algorithm Input A fraction F ⊆ Dx × Dz; Dx and Dz and supports Ex and Ez; n =number of final terms. For identifiability, n ≤ #F must hold. Output A submodel L0 with minimal condition number λ0, formed with the smallest terms of Ex × Ez wrt a weighted order. Technique Generate candidate submodels by ordering Ex × Ez (complete cross product) with weight vectors w ∈ W+, and look for the candidate with smallest condition number.

  • The search is driven by a finite set of weights W+, i.e. it ends.
  • The model L0 respects a hierarchical structure.
  • Use of arbitrary supports Ex and Ez.
  • The Algorithm is of order O((nxnz)2(qk−1)n2) = poly(nxnz).

marurimoda8.tex 11 June 2, 2007

slide-12
SLIDE 12

Example 3: Mixture amount design Factors x = (x1, x2), z = (m) listed as (x1, x2, m). x1 x2 m x2

1 x1x2 x2 2 mx2 1 mx1x2 mx2 2

1 1 1 1 2 2 4 8 1 1 2 1 1 1 2 2 2 2 2 4 8 We have Ex = {x2

1, x1x2, x2 2} and Ez = {1, m}.

The algorithm returns the support for a mixture amount model L0 =

  • x2

1, x2 2, x1x2, mx2 2

  • for w = (1, 2, 3).
  • Set of representatives W+ can be expensive to compute, use of

approximate set ˜ W+, simulated over the (q + k − 1)-simplex.

marurimoda8.tex 12 June 2, 2007

slide-13
SLIDE 13

Example 1 (cont.): Bread data set Analysis in (Prescott, 2004).

  • Final model with 15 terms, R2 = 0.998, ˆ

σ = 21.04.

  • Condition number λ = 86.83.
  • Fitted model

ˆ Y = x1(522.8 + 13.0z1 + 56.3z2 − 39.4z2

1 − 10.2z2 2)

+x2(448.1 + 1.7z1 + 37.2z2 + 3.7z2

1 − 28.4z2 2)

+x3(599.3 + 54.3z1 + 73.8z2 − 46.0z2

1 + 1.0z2 2)

i.e. a mixture of predictive models for every type of flour.

  • Symmetric support

marurimoda8.tex 13 June 2, 2007

slide-14
SLIDE 14

Example 1 (cont.): Bread data set (Using Algorithm). Factors listed as (x1, x2, x3, z1, z2).

  • Scheff`

e model Ex = {x1, x2, x3, x2

1, x2 2, x2 3, x1x2, x1x3, x2x3} and full

product model Ez = {1, z1, z2, z2

1, z1z2, z2 2}.

  • Model with λ0 = 47.47 and support

L0 = {x1, x2, x3} ⊗ {1, z1, z2} ∪ {x2, x3} ⊗ {z2

1, z1z2, z2 2}

for w = (17, 12, 10, 3, 2) ∈ ˜ W.

  • Fitted model with ˆ

σ = 22.7 and R2 = 0.998: ˆ Y = x1(489.7 + 13.0z1 + 56.3z2) + x2(467.9 + 1.7z1 + 37.1z2) +x3(619.1 + 54.2z1 + 73.8z2) +x2(−19.9z2

1 + 3.6z1z2 − 34.6z2 2)

+x3(−69.6z2

1 + 13.3z1z2 − 5.1z2 2)

  • Slight asymmetry allows for reduction in condition number.

marurimoda8.tex 14 June 2, 2007

slide-15
SLIDE 15

Final comments

  • The Algorithm blends the change of basis (Faug`

ere et al., 1993) with a statistical criterion.

  • The search space of the Algorithm presented is much smaller than a

full search.

  • It can be adapted to consider other criterion or even composite
  • criteria. For example it could be used for hierarchical model selection

(Peixoto, 1987) (Bates et al., 2003).

  • Expensive computation of set of weights W, but approximate set ˜

W allows fast search. Stopping rule still empirical.

  • A possible drawback is the potential exclusion of symmetric models.

This is inherent by the use of term orders (w-order), e.g. there is no term order such that x2

1 ≻ x2 2 ≻ x1x2.

marurimoda8.tex 15 June 2, 2007

slide-16
SLIDE 16

References Bates et al. (2003). Technometrics 45,246-255. Cornell (2002). Experiments with mixtures. Draper, Pukelsheim (1998). JSPI 71(1-2),303-311. Faug` ere et al. (1993). Jour. Symb. Comp. 16(4),329-344. Maruri, Notari, Riccomagno (2006). Statistica Sinica (in print). Næs et al. (1998). Chem. Int. Lab. Syst. 41, 221-235. Peixoto (1987). Am. Stat. 41(4),311-313. Prescott et al. (2002) Technometrics 44(3),260-268. Prescott (2004). Qual. Tech. & Qual. Manag. 1(1), 87-103. Scheff` e (1958). JRSS B 20,344-360.

marurimoda8.tex 16 June 2, 2007

slide-17
SLIDE 17

Weighted orders (w-orders) Let T k+q be the set of all monomials in k + q indeterminates. Two monomials xα, xβ in T are w-ordered by w ∈ Zk+q as xα xβ if w · (α − β) ≥ 0. When ordering a finite set A ⊂ T k+q, the vector w forms a total order ≻ in A if w · (α − β) = 0 for all xα, xβ ∈ A. The vectors that create the same total w-ordering in A form an equivalence class. Example 4 The set A = {x2

1, x1x2, x2 2} can only be totally w-ordered

in two ways: x2

1 ≻ x1x2 ≻ x2 2 and x2 2 ≻ x1x2 ≻ x2 1, e.g. for w = (2, 1)

we have (2, 0) · (2, 1) = 4 > (1, 1) · (2, 1) = 3 > (0, 2) · (2, 1) = 2. Let W+ = W+(A) be the set of representatives for w-orderings, one for each equivalence class.

marurimoda8.tex 17 June 2, 2007

slide-18
SLIDE 18

Polyhedral geometry for existance results of the set of representatives W+(A)

  • Duality between vector and hyperplane.
  • Equivalence classes arise from hyperplane arrangements.
  • Equivalence classes correspond to vertexes of a symmetric polytope

(the zonotope of the differences in the set of exponents) Example 3 (cont.): Set A = {x2

1, x1x2, x2 2, mx2 1, mx1x2, mx2 2},

30 pairwise differences = 2 6

2

  • . Zonotope Z(A)

with 32 facets and 48 vertexes, 6 of which have cones in R3

≥0 ⇒ W+(A) has six elements.

marurimoda8.tex 18 June 2, 2007