integrating inconsistent data in a probabilistic model
play

Integrating inconsistent data in a probabilistic model Ji r - PowerPoint PPT Presentation

Integrating inconsistent data in a probabilistic model Ji r Vomlel This presentation is available at http://www.utia.cas.cz/vomlel/ Knowledge integration Discrete random variables X i indexed by natural numbers from V = { 1, . . .


  1. Integrating inconsistent data in a probabilistic model Jiˇ r´ ı Vomlel This presentation is available at http://www.utia.cas.cz/vomlel/

  2. Knowledge integration • Discrete random variables X i indexed by natural numbers from V = { 1, . . . , n } ⊂ N . • Low-dimensional probability distributions P j , j = 1, . . . , k defined on variables { X ℓ } ℓ ∈ E j , E j ⊆ V . • Knowledge integration is the process of building a joint probability distribution Q ( X 1 , . . . , X n ) from a set of low-dimensional probability distributions P = { P 1 , . . . , P k } .

  3. 1 2 − α α 1 2 − α α 1 2 − α ? ? X 3 ? ? α X 2 X 1 ? ? α 1 2 − α ? ? 1 2 − α α 1 2 − α α

  4. 1 2 − α α 1 2 − α α 1 ≤ α 2 − α ? X 3 ≤ α ? α X 2 X 1 ? ? α 1 2 − α ? ? 1 2 − α α 1 2 − α ≤ + α α 1 ≤ α 1 2 − α α 6

  5. Consistent case 6 4 20 20 4 α = 4 6 20 20 20 3 3 6 20 20 20 X 3 1 3 4 20 20 20 X 2 X 1 3 1 4 20 20 20 3 3 6 20 20 20 4 6 20 20 6 4 20 20

  6. P = { P 1 , . . . , P k } • input set • set of all distributions having P j as its marginal S j = { Q : Q E j = P j } • set of all distributions having { P 1 , . . . , P k } as its marginals S = ∩ k j = 1 S j • I -projection of Q 0 to S π ( Q 0 , S ) = arg min Q ∈S I ( Q � Q 0 ) • Kullback-Leibler divergence = ∑ P ( X = x ) log P ( X = x ) I ( P � Q ) Q ( X = x ) x

  7. Iterative Proportional Fitting Procedure (IPFP) Deming & Stephan, 1940 S 2 S 1 Q ( 1 ) π ( Q ( 1 ) , S 2 ) Q ( 2 ) Q ( 4 ) π ( Q ( 0 ) , S 1 ) π ( Q ( 2 ) , S 3 ) Q ( 3 ) Q ( 0 ) S 3 Q P j π ( Q , S j ) = Q E j

  8. IPFP on the consistent input 0.2 0.18 0.16 0.14 Probability value 0.12 0.1 0.08 0.06 0.04 0.02 IPFP with ordering P1,P2,P3 IPFP with ordering P1,P3,P2 0 0 3 6 9 12 15 Iteration number

  9. Inconsistent case 4 1 10 10 1 α = 1 4 10 10 10 4 ? ? 10 X 3 1 ? ? 10 X 2 X 1 1 ? ? 10 4 ? ? 10 1 4 10 10 4 1 10 10

  10. IPFP on the inconsistent input 0.35 0.3 0.25 Probability value 0.2 0.15 0.1 0.05 IPFP with ordering P1,P2,P3 IPFP with ordering P1,P3,P2 0 0 3 6 9 12 15 18 21 Iteration number

  11. √ − 3 α 2 + 2 α , β = 0.5 ( 1 − α − r ) , and γ = 0.5 ( − α + r ) . Let r = The limit cycle for the ordering P 1 , P 2 , P 3 x 000 001 010 011 100 101 110 111 lim n → ∞ Q 3 n + 1 ( x ) 0 0 α γ β β γ α lim n → ∞ Q 3 n + 2 ( x ) 0 0 γ β α α β γ lim n → ∞ Q 3 n + 3 ( x ) 0 0 β α γ γ α β 1 1 1 1 1 1 arithm. average 0 0 6 6 6 6 6 6 The limit cycle for the ordering P 1 , P 3 , P 2 x 000 001 010 011 100 101 110 111 lim n → ∞ Q 3 n + 1 ( x ) 0 0 α β γ γ β α lim n → ∞ Q 3 n + 2 ( x ) 0 0 γ α β β α γ lim n → ∞ Q 3 n + 3 ( x ) 0 0 β γ α α γ β 1 1 1 1 1 1 arithm. average 0 0 6 6 6 6 6 6 For α = 0.1 we get β . = 0.244 and γ . = 0.156 .

  12. Inconsistent input set P = { P 1 , . . . , P k } It means that S = ∩ k j = 1 S j = ∅ . Q ( X 1 , . . . , X n ) is required to: • minimize a distance aggregate with respect to P : ∑ w j · d ( P j � Q E j ) P j ∈P • factorize with respect to E = { E 1 , . . . , E k } : there exist potentials ψ E i : X E i �→ R , i = 1, 2, . . . , k such that for all x ∈ X ∏ ψ E i ( x E i ) . Q ( x ) = E i ∈E

  13. Distance • measured by the Kullback-Leibler divergence P j ( x E j ) log P j ( x E j ) ( P j � Q E j ) = ∑ d ( P j � Q E j ) = Q E j ( x E j ) x Ej • measured by the total variance | P j − Q E j | = ∑ d ( P j � Q E j ) | P j ( x E j ) − Q E j ( x E j ) | = x Ej

  14. IPFP properties in the inconsistent case • “converges” to a limit cycle • distributions in the limit cycle are different for different orderings of the input set • in the example, the average of the distributions in the limit cycle does not depend on the ordering – but generally, it is not true • in the example, the distributions in the limit cycles minimized the aggregate of the total variance – but generally it is not known • there are also other distributions that minimize the aggregate of the total variance that are not computed with IPFP • generally, the distributions in the limit cycles do not minimize the aggregate of the Kullback-Leibler divergence • distributions computed within a finite number of iterations factorize with respect to E = { E 1 , . . . , E k }

  15. GEMA S 2 S ′ 2 S ′ 1 S 1 π ( Q ( 1 ) , S ′ 2 ) Q ( 1 ) Q ( 2 ) π ( Q ( 0 ) , S 2 ) Q ( 0 ) π ( Q ( 0 ) , S 1 ) π ( Q ( 2 ) , S ′ 3 ) Q ( 0 ) Q ( 3 ) S ′ 3 π ( Q ( 0 ) , S 3 ) π ( Q ( 0 ) , S ′ S 3 1 ) Q ( 3 ) Q ( 0 ) Q ( 0 )

  16. GEMA on the consistent input 0.2 0.18 0.16 0.14 Probability value 0.12 0.1 0.08 0.06 0.04 0.02 GEMA 0 30 60 90 120 150 180 210 Iteration number

  17. GEMA on the inconsistent input set 0.2 0.18 0.16 0.14 Probability value 0.12 0.1 0.08 0.06 0.04 0.02 GEMA 0 30 60 90 120 150 180 210 Iteration number

  18. GEMA properties • converges also in the inconsistent case • the limit distribution satisfies the necessary condition for the local minima of the aggregate of the Kullback-Leibler divergence • the distributions computed within a finite number of iterations factorize with respect to E = { E 1 , . . . , E k }

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend