Learning and Reasoning With Incomplete Data: Foundations and Algorithms
Manfred Jaeger
Machine Intelligence Group Aalborg University Tutorial UAI 2010 1 / 54
Learning and Reasoning With Incomplete Data: Foundations and - - PowerPoint PPT Presentation
Learning and Reasoning With Incomplete Data: Foundations and Algorithms Manfred Jaeger Machine Intelligence Group Aalborg University Tutorial UAI 2010 1 / 54 Outline Part 1: Coarsened At Random Introduction Coarse Data The CAR Assumption
Machine Intelligence Group Aalborg University Tutorial UAI 2010 1 / 54
Tutorial UAI 2010 2 / 54
Tutorial UAI 2010 3 / 54
10
i=1
◮ . . . coin rolled off the table? ◮ . . . one observer does not know whether “harp” is heads or tails of the Irish Euro? Tutorial UAI 2010 4 / 54
Tutorial UAI 2010 5 / 54
Tutorial UAI 2010 6 / 54
Tutorial UAI 2010 7 / 54
◮ Partly observed values: X3 = high ◮ Constraints on multiple variables: X1 = true or X2 = true
◮ More general than missing values ◮ Same as partial information in probability updating ◮ cf. prize ∈ {1, 3} ◮ Simplifies theoretical analysis Tutorial UAI 2010 8 / 54
◮ Finite set of states (possible worlds): W = {x1, . . . , xn} ◮ Complete data variable X with values in W, governed by distribution Pθ (θ ∈ Θ). ◮ Incomplete data variable Y with values in 2W , governed by conditional distribution
Tutorial UAI 2010 9 / 54
Tutorial UAI 2010 10 / 54
i
i
x∈Ui
i
x∈Ui
i
x∈Ui
i
λ
i
Tutorial UAI 2010 11 / 54
Tutorial UAI 2010 12 / 54
◮ learning by maximization of face-value likelihood (EM algorithm) ◮ belief updating by conditioning
Tutorial UAI 2010 13 / 54
Tutorial UAI 2010 14 / 54
λ
λ
i
x∈Ui
λ
i
i
◮ Only if domain of λ-maximization is independent of θ ◮ “Paramter distinctness” [1] ◮ Domain of λ-maximization must not depend on support(Pθ) ◮ If we assume only weak CAR, then the domain of λ-maximization does depend on support(θ) ◮ Need
Tutorial UAI 2010 15 / 54
Tutorial UAI 2010 16 / 54
Tutorial UAI 2010 17 / 54
Tutorial UAI 2010 18 / 54
Tutorial UAI 2010 19 / 54
Tutorial UAI 2010 20 / 54
Tutorial UAI 2010 21 / 54
◮ Observed empirical distribution of Y ◮ Learned distribution of X under s-CAR assumption ◮ s-CAR assumption
◮ Observed empirical distribution of Y ◮ Learned distribution of X under w-CAR assumption ◮ w-CAR assumption Tutorial UAI 2010 22 / 54
◮ The marginal for Y is P ◮ The joint is s-CAR
Tutorial UAI 2010 23 / 54
Tutorial UAI 2010 24 / 54
Tutorial UAI 2010 25 / 54
◮ Set of support analysis ◮ Likelihood based tests
◮ Compare to canonical models Tutorial UAI 2010 26 / 54
Tutorial UAI 2010 27 / 54
◮ P(X) has a given set of support. For simplicity: all W Tutorial UAI 2010 28 / 54
◮ Linear and affine relationships between rows of CARacterizing matrix ◮ ’Graphical’ criteria on evidence hypergraph. ◮ In particular: nested edges ⇒ not CAR Tutorial UAI 2010 29 / 54
Tutorial UAI 2010 30 / 54
Tutorial UAI 2010 31 / 54
◮ Which types of data-coarsening processes or protocols generate CAR data? ◮ Is there a general process model that can explain all and only CAR data? Tutorial UAI 2010 32 / 54
Tutorial UAI 2010 33 / 54
◮ Partition W of W ◮ P(Y = U | X) =
Tutorial UAI 2010 34 / 54
Tutorial UAI 2010 35 / 54
Tutorial UAI 2010 36 / 54
◮ Partitions W1, . . . , Wk of W ◮ Probabilities λ1, . . . , λk ◮ P(Y = U | X) = P
i:X∈U∈Wi λi
Tutorial UAI 2010 37 / 54
◮ Random choice of one of k available sensors or tests ◮ Report “coarse measurement” from chosen sensor/test
Tutorial UAI 2010 38 / 54
◮ Randomized Monotone Coarsening [3] ◮ CARgen [4] ◮ Both equivalent to CCAR
Tutorial UAI 2010 39 / 54
Tutorial UAI 2010 40 / 54
◮ Generates s-CAR data ◮ Can generate non-MGD ◮ Can not generate all s-CAR data ◮ Relies on uniform sampling over W Tutorial UAI 2010 41 / 54
◮ CARgen* [4] ◮ Propose & Test [6]
◮ Data generated by P & T process where
U:x∈U
◮ Data w-CAR Tutorial UAI 2010 42 / 54
◮ A similar parameter condition is included in CARgen* model.
◮ A CAR procedure is robust ◮ A CAR procedure is CCAR (i.e., MGD) Tutorial UAI 2010 43 / 54
◮ uniform sampling over a ◮ complex combinatorial space (multicovers) Tutorial UAI 2010 44 / 54
Tutorial UAI 2010 45 / 54
λ
i
λ
c∈C(U) CE(Pc, Pθ)
◮ C(U): space of fractional completions of data U ◮ Pc: empirical distribution defined by c ∈ C(U) Tutorial UAI 2010 46 / 54
Tutorial UAI 2010 47 / 54
Tutorial UAI 2010 48 / 54
Tutorial UAI 2010 49 / 54
Tutorial UAI 2010 50 / 54
−0.06 −0.04 −0.02 0.00 5 10 15 20 25 30 35
−0.06 −0.04 −0.02 0.00 5 10 15 20 25 30 35
Tutorial UAI 2010 51 / 54
Data P(Y) learn P(X) P(X),P(Y) CAR-compatible? Parametric model for P(X) CAR assumption
◮ Computation of likelihood ratios ◮ Analysis of distribution of test statistic Tutorial UAI 2010 52 / 54
◮ Learning from incomplete data with EM ◮ Belief updating by conditioning
◮ Support analysis ◮ Canonical models ◮ most (all?) natural CAR models are CCAR, i.e. MGD
◮ Maximize profile likelihood under NO assumptions using AI&M ◮ Can be the basis for quantitative statistical CAR tests. Tutorial UAI 2010 53 / 54