Strong Consistency of the AIC, BIC, C p and KOO Methods in - - PowerPoint PPT Presentation

strong consistency of the aic bic c p and koo methods in
SMART_READER_LITE
LIVE PREVIEW

Strong Consistency of the AIC, BIC, C p and KOO Methods in - - PowerPoint PPT Presentation

Strong Consistency of the AIC, BIC, C p and KOO Methods in High-Dimensional-Response Regression Jiang Hu (Joint work with Zhidong Bai and Yasunori Fujikoshi ) Northeast Normal University, P. R. China Hiroshima University, Japan


slide-1
SLIDE 1

Strong Consistency of the AIC, BIC, Cp and KOO Methods in High-Dimensional-Response Regression

Jiang Hu∗

(Joint work with Zhidong Bai∗ and Yasunori Fujikoshi† )

∗Northeast Normal University, P. R. China †Hiroshima University, Japan

December, 2019

Jiang Hu (NENU) AIC, BIC, Cp and KOO Methods December, 2019 1 / 38

slide-2
SLIDE 2

Outline

1

Model selection Linear regression model Classical selection criteria

2

Asymptotic properties Low-dimensional Large-dimension and small-model

3

Main results Assumptions and notations Strong consistency of AIC, BIC and Cp KOO methods based on the AIC, BIC, and Cp General KOO methods

4

Proof strategy

5

Simulation

Jiang Hu (NENU) AIC, BIC, Cp and KOO Methods December, 2019 2 / 38

slide-3
SLIDE 3

Outline

1

Model selection Linear regression model Classical selection criteria

2

Asymptotic properties Low-dimensional Large-dimension and small-model

3

Main results Assumptions and notations Strong consistency of AIC, BIC and Cp KOO methods based on the AIC, BIC, and Cp General KOO methods

4

Proof strategy

5

Simulation

Jiang Hu (NENU) AIC, BIC, Cp and KOO Methods December, 2019 3 / 38

slide-4
SLIDE 4

Linear regression model

Consider the multi-response linear regression model: y

1×p

= x

1×k · Θ k×p + e 1×p · Σ1/2 p×p

(1) Aim: find the TRUE model if it exits. References:

[1] Miller ALan. Subset Selection in Regression, Second Edition. Chapman and Hall/CRC, 2002. [2] Gerda Claeskens, Nils Lid Hjort. Model Selection and Model Averaging. Vol. 330. Cambridge University Press Cambridge, 2008.

Jiang Hu (NENU) AIC, BIC, Cp and KOO Methods December, 2019 4 / 38

slide-5
SLIDE 5

Overview of classical model selection criteria

From the point of view of statistical performance of a method, and intended context of its use, there are only two distinct classes of methods: labeled efficient and consistent. Generally there are two main approaches: (I) Optimization of some selection criteria;

(1) Criteria based on some form of mean squared error (e.g., Mallows’s Cp, Mallows 1973) or mean squared prediction error (e.g., PRESS, Allen 1970); (2) Criteria that are estimates of Kullback-Leibler (K-L) information or distance (e.g., AIC, AICc, and QAICc ); (3) Criteria that are consistent estimators of the “true model” (e.g., BIC).

(II) Tests of hypotheses.

Jiang Hu (NENU) AIC, BIC, Cp and KOO Methods December, 2019 5 / 38

slide-6
SLIDE 6

Notation

Observations: Y : n × p and Xω = (x1, . . . , xk) : n × k. Notations: ω = {1, . . . , k}, j∗ ∈ ω, j ∈ ω, kj = the cardinality of j. Full model ω: Y = Xω · Θω + E · Σ1/2. True model j∗: Y = Xj∗ · Θj∗ + E · Σ1/2. Candidate model j: Y = Xj · Θj + E · Σ1/2. Θj = (θji, j ∈ j, i = 1, . . . , p) Xj = (xj, j ∈ j) Pj = Xj(X′

jXj)−1X′ j

  • Σj = n−1Y′(In − Pj)Y

Jiang Hu (NENU) AIC, BIC, Cp and KOO Methods December, 2019 6 / 38

slide-7
SLIDE 7

Classical selection criteria

Akaike’s information criterion (AIC, Akaike (1973,1974)): AICj = n log | Σj| + 2kjp and ˆ jA = arg min AICj Key: Kullback-Leibler information/distance Kullback-Leibler Information Kullback-Leibler information between density functions f and g is defined for continuous functions I(f, g) =

  • f(x) log

f(x) g(x)

  • dx.

The notation I(f, g) denotes the “information lost when g is used to approximate f.” As a heuristic interpretation, I(f, g) is the distance from g to f.

Jiang Hu (NENU) AIC, BIC, Cp and KOO Methods December, 2019 7 / 38

slide-8
SLIDE 8

Classical selection criteria

Akaike’s information criterion (AIC, Akaike (1973,1974)): AICj = n log | Σj| + 2kjp and ˆ jA = arg min AICj Key: Kullback-Leibler information/distance Kullback-Leibler Information Kullback-Leibler information between density functions f and g is defined for continuous functions I(f, g) =

  • f(x) log

f(x) g(x)

  • dx.

The notation I(f, g) denotes the “information lost when g is used to approximate f.” As a heuristic interpretation, I(f, g) is the distance from g to f.

Jiang Hu (NENU) AIC, BIC, Cp and KOO Methods December, 2019 7 / 38

slide-9
SLIDE 9

Classical selection criteria

Bayesian information criterion (BIC, Schwarz (1978), Akaike (1977, 1978)) : BICj = n log | Σj| + log(n)kjp and ˆ jB = arg min BICj Key: Consistence Consistence As n → ∞, under some conditions, ˆ jB → j∗ almost surely.

Jiang Hu (NENU) AIC, BIC, Cp and KOO Methods December, 2019 8 / 38

slide-10
SLIDE 10

Classical selection criteria

Bayesian information criterion (BIC, Schwarz (1978), Akaike (1977, 1978)) : BICj = n log | Σj| + log(n)kjp and ˆ jB = arg min BICj Key: Consistence Consistence As n → ∞, under some conditions, ˆ jB → j∗ almost surely.

Jiang Hu (NENU) AIC, BIC, Cp and KOO Methods December, 2019 8 / 38

slide-11
SLIDE 11

Classical selection criteria

Mallows’s Cp (Cp, Mallows (1973)): Cpj = (n − k)tr( Σ−1

ω

Σj) + 2pkj and ˆ jC = arg min Cpj Key: Mean squared error Remark 1 Atilgan (1996) provides a relationship between AIC and Mallows’s Cp, shows that under some conditions AIC selection behaves like minimum mean squared error selection, and notes that AIC and Cp are somewhat equivalent criteria.

Jiang Hu (NENU) AIC, BIC, Cp and KOO Methods December, 2019 9 / 38

slide-12
SLIDE 12

Outline

1

Model selection Linear regression model Classical selection criteria

2

Asymptotic properties Low-dimensional Large-dimension and small-model

3

Main results Assumptions and notations Strong consistency of AIC, BIC and Cp KOO methods based on the AIC, BIC, and Cp General KOO methods

4

Proof strategy

5

Simulation

Jiang Hu (NENU) AIC, BIC, Cp and KOO Methods December, 2019 10 / 38

slide-13
SLIDE 13

Low-dimensional

Assume k and p are fixed (Fujikoshi, 1985; Fujikoshi and Veitch, 1979). If j is an over-specified model, i.e., j∗ ⊂ j, P(AICj − AICj∗ < 0) ∼ P(χ2

kj−kj∗ > 2(kj − kj∗)) > 0

P(BICj − BICj∗ < 0) ∼ P(χ2

kj−kj∗ > log(n)(kj − kj∗)) → 0

P(Cpj − Cpj∗ < 0) ∼ P(χ2

kj−kj∗ > 2(kj − kj∗)) > 0

If j is an under-specified model, i.e., j∗ ⊂ j, AICj − AICj∗ =O(n) → +∞ BICj − BICj∗ =O(n) → +∞ Cpj − Cpj∗ =O(n) → +∞

Jiang Hu (NENU) AIC, BIC, Cp and KOO Methods December, 2019 11 / 38

slide-14
SLIDE 14

Large-dimension and small-model

Assume j∗ ∈ ω is the true model, k is fixed and p/n → c ∈ (0, 1). Theorem 4.1 in (Fujikoshi et al., 2014) If c ∈ (0, ca ≈ 0.797) where log(1 − ca) + 2ca = 0 and for any j∗ ⊂ j with kj − kj∗ ≤ 0, lim log(|I + Φj|) > (kj∗ − kj)[2c + log(1 − c)] where Φj = 1

nΣ− 1

2 Θ′

j∗X′ j∗(Pω − Pj)Xj∗Θj∗Σ− 1

2 . Then,

lim

p/n→c P(ˆ

jA = j∗) = 1. Otherwise, lim

p/n→c P(ˆ

jA = j∗) = 1. What about BIC?

Jiang Hu (NENU) AIC, BIC, Cp and KOO Methods December, 2019 12 / 38

slide-15
SLIDE 15

Large-dimension and small-model

Assume j∗ ∈ ω is the true model, k is fixed and p/n → c ∈ (0, 1). Theorem 4.1 in (Fujikoshi et al., 2014) If c ∈ (0, ca ≈ 0.797) where log(1 − ca) + 2ca = 0 and for any j∗ ⊂ j with kj − kj∗ ≤ 0, lim log(|I + Φj|) > (kj∗ − kj)[2c + log(1 − c)] where Φj = 1

nΣ− 1

2 Θ′

j∗X′ j∗(Pω − Pj)Xj∗Θj∗Σ− 1

2 . Then,

lim

p/n→c P(ˆ

jA = j∗) = 1. Otherwise, lim

p/n→c P(ˆ

jA = j∗) = 1. What about BIC?

Jiang Hu (NENU) AIC, BIC, Cp and KOO Methods December, 2019 12 / 38

slide-16
SLIDE 16

Large-dimension and small-model

Assume j∗ ∈ ω is the true model, k is fixed and p/n → c ∈ (0, 1). Theorem 4.1 in (Fujikoshi et al., 2014) If c ∈ (0, 1/2) and for any j∗ ⊂ j with kj − kj∗ ≤ 0, tr(Φj) > (kj∗ − kj)c(1 − 2c) where Φj = 1

nΣ− 1

2 Θ′

j∗X′ j∗(Pω − Pj)Xj∗Θj∗Σ− 1

2 . Then,

lim

p/n→c P(ˆ

jC = j∗) = 1. Otherwise, lim

p/n→c P(ˆ

jC = j∗) = 1.

Jiang Hu (NENU) AIC, BIC, Cp and KOO Methods December, 2019 13 / 38

slide-17
SLIDE 17

Outline

1

Model selection Linear regression model Classical selection criteria

2

Asymptotic properties Low-dimensional Large-dimension and small-model

3

Main results Assumptions and notations Strong consistency of AIC, BIC and Cp KOO methods based on the AIC, BIC, and Cp General KOO methods

4

Proof strategy

5

Simulation

Jiang Hu (NENU) AIC, BIC, Cp and KOO Methods December, 2019 14 / 38

slide-18
SLIDE 18

Assumptions and notations

A1: The true model j∗ is a subset of set ω and k∗ := kj∗ is fixed. A2: E = {eij} are i.i.d. with zero means, unit variances and ✿✿✿✿✿ finite ✿✿✿✿✿✿ fourth

✿✿✿✿✿✿✿✿

moments. A3: X′X is (non-random) positive definite uniformly. A4: As {k, p, n} → ∞, p/n → c ∈ (0, 1), k/n → α ∈ [0, 1 − c). A5: Φ := 1

nΣ− 1

2 Θ′

j∗X′ j∗Xj∗Θj∗Σ− 1

2 is bounded uniformly.

A5’: As {k, p, n} → ∞, Φj := 1

nΣ− 1

2 Θ′

j∗X′ j∗(Pω − Pj)Xj∗Θj∗Σ− 1

2 → ∞. Jiang Hu (NENU) AIC, BIC, Cp and KOO Methods December, 2019 15 / 38

slide-19
SLIDE 19

Assumptions and notations

Define two bivariate functions φ(α, c) = 2cα + log (1 − c)1−c(1 − α)1−α (1 − c − α)1−c−α

  • ψ(α, c) = c(α − 1)

1 − α − c + 2c. For under-specified model j with kj∩jc

∗ = m ≥ 0 and kj∩j∗ = s > 0, we

denote τnj := (1 − αm)s−p|(1 − αm)Ip + Φj| → τj ≤ ∞ κnj := tr(Φj) → κj ≤ ∞.

Jiang Hu (NENU) AIC, BIC, Cp and KOO Methods December, 2019 16 / 38

slide-20
SLIDE 20

Strong consistency of AIC, BIC and Cp

Theorem 1 (Bai, Fujikoshi and H. (2019)) Suppose (A1)-(A5) hold. φ(α, c) > 0 ⇔ AIC is almost surely not over-specified; If φ(α, c) > 0, for any under-specified candidate model j with log(τj) > (s − m)(log(1 − c) + 2c) ⇔ AIC is almost surely not under-specified; Theorem 2 (Bai, Fujikoshi and H. (2019)) Suppose (A1)-(A5) hold, BIC is almost surely under-specified;

Jiang Hu (NENU) AIC, BIC, Cp and KOO Methods December, 2019 17 / 38

slide-21
SLIDE 21

Strong consistency of AIC, BIC and Cp

Theorem 1 (Bai, Fujikoshi and H. (2019)) Suppose (A1)-(A5) hold. φ(α, c) > 0 ⇔ AIC is almost surely not over-specified; If φ(α, c) > 0, for any under-specified candidate model j with log(τj) > (s − m)(log(1 − c) + 2c) ⇔ AIC is almost surely not under-specified; Theorem 2 (Bai, Fujikoshi and H. (2019)) Suppose (A1)-(A5) hold, BIC is almost surely under-specified;

Jiang Hu (NENU) AIC, BIC, Cp and KOO Methods December, 2019 17 / 38

slide-22
SLIDE 22

Strong consistency of AIC, BIC and Cp

Theorem 1 (Bai, Fujikoshi and H. (2019)) Suppose (A1)-(A5) hold. φ(α, c) > 0 ⇔ AIC is almost surely not over-specified; If φ(α, c) > 0, for any under-specified candidate model j with log(τj) > (s − m)(log(1 − c) + 2c) ⇔ AIC is almost surely not under-specified; Theorem 2 (Bai, Fujikoshi and H. (2019)) Suppose (A1)-(A5) hold, BIC is almost surely under-specified;

Jiang Hu (NENU) AIC, BIC, Cp and KOO Methods December, 2019 17 / 38

slide-23
SLIDE 23

Strong consistency of AIC, BIC and Cp

Theorem 1 (Bai, Fujikoshi and H. (2019)) Suppose (A1)-(A5) hold. φ(α, c) > 0 ⇔ AIC is almost surely not over-specified; If φ(α, c) > 0, for any under-specified candidate model j with log(τj) > (s − m)(log(1 − c) + 2c) ⇔ AIC is almost surely not under-specified; Theorem 2 (Bai, Fujikoshi and H. (2019)) Suppose (A1)-(A5) hold, BIC is almost surely under-specified;

Jiang Hu (NENU) AIC, BIC, Cp and KOO Methods December, 2019 17 / 38

slide-24
SLIDE 24

Strong consistency of AIC, BIC and Cp

Theorem 3 (Bai, Fujikoshi and H. (2019)) Suppose (A1)-(A5) hold. ψ(α, c) > 0 ⇔ Cp is almost surely not over-specified; If ψ(α, c) > 0, for any under-specified model j, satisfying κj > (s − m)ψ(α, c)(1 − α − c)/(1 − α) ⇔ Cp is almost surely not under-specified;

Jiang Hu (NENU) AIC, BIC, Cp and KOO Methods December, 2019 18 / 38

slide-25
SLIDE 25

Figure: 3D plots for φ(α, c) > 0 and ψ(α, c) > 0.

Jiang Hu (NENU) AIC, BIC, Cp and KOO Methods December, 2019 19 / 38

slide-26
SLIDE 26

Strong consistency of AIC, BIC and Cp

Theorem 4 (Bai, Fujikoshi and H. (2019)) Suppose (A1)-(A4) and (A5’) hold. φ(α, c) > 0 ⇔ AIC is almost surely not over-specified; AIC is almost surely not under-specified; Theorem 5 (Bai, Fujikoshi and H. (2019)) Suppose (A1)-(A4) and (A5’) hold. For any under-specified model j, limn,p

  • log(τnj) − c(s − m) log(n)
  • > (s − m) log(1 − c) ⇔ BIC is

almost surely not under-specified; BIC is almost surely not over-specified;

Jiang Hu (NENU) AIC, BIC, Cp and KOO Methods December, 2019 20 / 38

slide-27
SLIDE 27

Strong consistency of AIC, BIC and Cp

Theorem 6 (Bai, Fujikoshi and H. (2019)) Suppose (A1)-(A4) and (A5’) hold. ψ(α, c) > 0 ⇔ Cp is almost surely not over-specified; Cp is almost surely not under-specified; Remark 2 Under the condition φ(α, c) > 0, if the BIC is strongly consistent, then the AIC is strongly consistent but not vice versa.

Jiang Hu (NENU) AIC, BIC, Cp and KOO Methods December, 2019 21 / 38

slide-28
SLIDE 28

KOO methods based on the AIC, BIC, and Cp

Knock-one-out (KOO) methods, which is introduced by Nishii et al. (1988), is to avoid the well known computational problem of AIC, BIC and

  • Cp. Denote

˜ Aj := 1 n(AICω\j − AICω) = log | Σω\j| − log | Σω| − 2p/n, ˜ Bj := 1 n(BICω\j − BICω) = log | Σω\j| − log | Σω| − log(n)p/n, ˜ Cj := 1 n(Cpω\j − Cpω) = (1 − k/n)tr Σ−1

ω

Σω\j − (n − k + 2)p/n. Choose the model: ˜ jA = {j ∈ ω| ˜ Aj > 0}, ˜ jB = {j ∈ ω| ˜ Bj > 0} ˜ jC = {j ∈ ω| ˜ Cj > 0}.

Jiang Hu (NENU) AIC, BIC, Cp and KOO Methods December, 2019 22 / 38

slide-29
SLIDE 29

KOO methods based on the AIC, BIC, and Cp

Note that for testing θj = 0 v.s. θj = 0 (1) the −2 log likelihood ratio statistic under normality can be expressed as n

  • log(|ˆ

Σω|) − log(|ˆ Σω/j|)

  • ;

(2) the Lawley-Hotelling trace statistic under normality can be expressed as (n − k)tr(ˆ Σ−1

ω ˆ

Σω\j). (3) ˜ Aj ( ˜ Bj, ˜ Cj) is regarded as a measure that expresses the degree of contribution of xj based on Aj (Bj, Cp). As such, the KOO methods may also be referred to as test-based methods.

Jiang Hu (NENU) AIC, BIC, Cp and KOO Methods December, 2019 23 / 38

slide-30
SLIDE 30

KOO methods based on the AIC, BIC, and Cp

Theorem 7 (Bai, Fujikoshi and H. (2019)) Suppose (A1)-(A5) hold. log( 1−α

1−α−c) < 2c ⇔ ˜

jA is almost surely not over-specified. If log( 1−α

1−α−c) < 2c, for any j ∈ j∗, log(τω\j) > log(1 − α − c) + 2c ⇔

˜ jA is almost surely not under-specified; Theorem 8 (Bai, Fujikoshi and H. (2019)) Suppose (A1)-(A5) hold, ˜ jB is almost surely under-specified. Theorem 9 (Bai, Fujikoshi and H. (2019)) Suppose (A1)-(A5) hold. (1 − α) < 2(1 − α − c) ⇔ ˜ jC is almost surely not over-specified. If (1 − α) < 2(1 − α − c), for any j ∈ j∗, κω\j > c(1−α−2c)

1−α

⇔ ˜ jC is almost surely not under-specified;

Jiang Hu (NENU) AIC, BIC, Cp and KOO Methods December, 2019 24 / 38

slide-31
SLIDE 31

KOO methods based on the AIC, BIC, and Cp

Theorem 10 (Bai, Fujikoshi and H. (2019)) Suppose (A1)-(A4) and (A5’) hold. log( 1−α

1−α−c) < 2c ⇔ ˜

jA is almost surely consistent. Theorem 11 (Bai, Fujikoshi and H. (2019)) Suppose (A1)-(A4) and (A5’) hold. For any j ∈ j∗, [log(τω\j) − log(n)c] > log(1 − α − c), ⇔ ˜ jB is almost surely not under-specified; ˜ jB is almost surely not over-specified. Theorem 12 (Bai, Fujikoshi and H. (2019)) Suppose (A1)-(A4) and (A5’) hold. (1 − α) < 2(1 − α − c) ⇔ ˜ jC is almost surely consistent.

Jiang Hu (NENU) AIC, BIC, Cp and KOO Methods December, 2019 25 / 38

slide-32
SLIDE 32

General KOO methods

Recall the KOO AIC: log(| Σω\j|) − log(| Σω|) − 2p/n(> 0);

Figure: We chose a Gaussian sample with p = 750, n = 1500, k = 450 and k∗ = 5. Hence, c = 0.4 and α = 0.3. The histogram represents the distributions

  • f the k values of log(|

Σω\j|) − log(| Σω|) − 2p/n. M1 = log(

1−α 1−α−c) − 2c and

Z1 = 0.

Jiang Hu (NENU) AIC, BIC, Cp and KOO Methods December, 2019 26 / 38

slide-33
SLIDE 33

General KOO methods

Denoting ˘ Aj := log(| Σω\j|) − log(| Σω|) and ˘ Cj := tr( Σω\j Σ−1

ω ),

and a fixed value ϑ ∈ (0, minj∈j∗{κω\j}), choose the model ˘ jA = {j ∈ ω| ˘ Aj > log(1 − α + ϑ 1 − α − c )}, ˘ jC = {j ∈ ω| ˘ Cj > ϑ + c 1 − α − c + p}. Then, we have the following theorem. Theorem 13 Suppose that assumptions (A1) through (A4) hold and that for any j ∈ j∗, κω\j > 0. Then, for any fixed value ϑ ∈ (0, minj∈j∗{κω\j}), lim

n,p→∞

˘ jA

a.s.

→ j∗ and lim

n,p→∞

˘ jC

a.s.

→ j∗.

Jiang Hu (NENU) AIC, BIC, Cp and KOO Methods December, 2019 27 / 38

slide-34
SLIDE 34

General KOO methods

Denoting ˘ Aj := log(| Σω\j|) − log(| Σω|) and ˘ Cj := tr( Σω\j Σ−1

ω ),

and a fixed value ϑ ∈ (0, minj∈j∗{κω\j}), choose the model ˘ jA = {j ∈ ω| ˘ Aj > log(1 − α + ϑ 1 − α − c )}, ˘ jC = {j ∈ ω| ˘ Cj > ϑ + c 1 − α − c + p}. Then, we have the following theorem. Theorem 13 Suppose that assumptions (A1) through (A4) hold and that for any j ∈ j∗, κω\j > 0. Then, for any fixed value ϑ ∈ (0, minj∈j∗{κω\j}), lim

n,p→∞

˘ jA

a.s.

→ j∗ and lim

n,p→∞

˘ jC

a.s.

→ j∗.

Jiang Hu (NENU) AIC, BIC, Cp and KOO Methods December, 2019 27 / 38

slide-35
SLIDE 35

General KOO methods

Remark 3 The condition in this theorem is much weaker than that in the AIC, BIC, and Cp and in the KOO methods based on the AIC, BIC, and Cp. Although κω\j is not estimable for j ∈ j∗, since the general KOO methods are essentially used to detect the univariate outliers, there are many well-developed methods, such as the standard deviation (SD) method, Z-score method, Tukey’s method, and median absolute deviation method, that can be used to determine the value of ϑ for applications.

Jiang Hu (NENU) AIC, BIC, Cp and KOO Methods December, 2019 28 / 38

slide-36
SLIDE 36

Outline

1

Model selection Linear regression model Classical selection criteria

2

Asymptotic properties Low-dimensional Large-dimension and small-model

3

Main results Assumptions and notations Strong consistency of AIC, BIC and Cp KOO methods based on the AIC, BIC, and Cp General KOO methods

4

Proof strategy

5

Simulation

Jiang Hu (NENU) AIC, BIC, Cp and KOO Methods December, 2019 29 / 38

slide-37
SLIDE 37

Proof strategy

(1) Sylvester’s determinant theorem: |n Σj| =|Y′Qj−1Y − Y′a1a′

1Y|

=|n Σj−1|(1 − a′

1Y(Y′Qj−1Y)−1Y′a1).

e.g. ˘ Aj := log(| Σω\j|) − log(| Σω|) and ˘ Cj := tr( Σω\j Σ−1

ω )

(2) Stieltjes transform: n(z) := n−1a′

tY(n−1Y′Qj−tY − zI)−1Y′at : C+ −

→ C+. (3) Vitali’s convergence theorem: For any fixed z ∈ C+, n(z) a.s. → (z) and then let z ↓ 0 + 0i.

Jiang Hu (NENU) AIC, BIC, Cp and KOO Methods December, 2019 30 / 38

slide-38
SLIDE 38

Outline

1

Model selection Linear regression model Classical selection criteria

2

Asymptotic properties Low-dimensional Large-dimension and small-model

3

Main results Assumptions and notations Strong consistency of AIC, BIC and Cp KOO methods based on the AIC, BIC, and Cp General KOO methods

4

Proof strategy

5

Simulation

Jiang Hu (NENU) AIC, BIC, Cp and KOO Methods December, 2019 31 / 38

slide-39
SLIDE 39

Simulation

Setting I: Fix k∗ = 5, p/n = {0.2, 0.4, 0.6} and k/n = {0.1, 0.2} with several different values of n. Set X = (xij)n×k, Θj∗ = √n15θ∗ and Θ = (Θj∗, 0), where {xij} are i.i.d. generated from the continuous uniform distributions U(1, 5), 15 is a five-dimensional vector of ones and θ∗ = ((−0.5)0, . . . , (−0.5)p−1). Setting II: This setting is the same as Setting I, except Θj∗ = n15θ∗. Here, we use the 2 SD method to choose the critical points in the general KOO methods: ˘ jA = {j ∈ ω| ˘ Aj > log( 1 − α 1 − α − c) + 2sdA} and ˘ jC = {j ∈ ω| ˘ Cj > c 1 − α − c + p + 2sdC}, where sdA and sdC are the sample standard deviations of { ˘ Aj} and { ˘ Cj}, respectively.

Jiang Hu (NENU) AIC, BIC, Cp and KOO Methods December, 2019 32 / 38

slide-40
SLIDE 40

c = .2 c = .4 c = .6 V1 V2 V3 V4 V1 V2 V3 V4 V1 V2 V3 V4 α = .1 .15 .50 .87 1.49 .21 .10 .81 1.56 .10

  • .30

.92 1.80 α = .2 .11 .40 .91 1.32 .11 .92 1.43

  • .19
  • .40

1.21 1.72

Table: Values of V1 := 2c − log(

1−α 1−α−c), V2 := 2(1 − α − c) − (1 − α),

V3 := log(τω\{1}) − log(1 − α − c) − 2c, and V4 := tr(Φω\j) − c(1−α−2c)

1−α

.

Jiang Hu (NENU) AIC, BIC, Cp and KOO Methods December, 2019 33 / 38

slide-41
SLIDE 41

(a) Setting I

Jiang Hu (NENU) AIC, BIC, Cp and KOO Methods December, 2019 34 / 38

slide-42
SLIDE 42

(b) Setting I

Jiang Hu (NENU) AIC, BIC, Cp and KOO Methods December, 2019 35 / 38

slide-43
SLIDE 43

(c) Setting II

Jiang Hu (NENU) AIC, BIC, Cp and KOO Methods December, 2019 36 / 38

slide-44
SLIDE 44

(d) Setting II

Jiang Hu (NENU) AIC, BIC, Cp and KOO Methods December, 2019 37 / 38

slide-45
SLIDE 45

Conclusion

We show the necessary and sufficient conditions for the strong consistency of variable selection methods based on the AIC, BIC, and Cp in high-dimensional-response regression; We examine the strongly consistent properties of the knock-one-out methods based on the AIC, BIC, and Cp; On the basis of the KOO methods, we propose two general KOO methods that not only remove the penalty terms but also reduce the conditions for the dimensions and sizes of the predictors. Random matrix theory is introduced to high-dimensional high-dimensional-response regression model.

Thank you!

Jiang Hu (NENU) AIC, BIC, Cp and KOO Methods December, 2019 38 / 38