fast quantification of uncertainty and robustness with
play

Fast Quantification of Uncertainty and Robustness with Variational - PowerPoint PPT Presentation

Fast Quantification of Uncertainty and Robustness with Variational Bayes Tamara Broderick ITT Career Development Assistant Professor, MIT With: Ryan Giordano, Rachael Meager, Jonathan H. Huggins, Michael I. Jordan Bayesian inference


  1. Variational Bayes • VB approximation • Approximation for q ∗ ( θ ) p ( θ | x ) posterior • Minimize Kullback-Liebler (KL) divergence: p ( θ | x ) q ( θ ) KL ( q k p ( ·| x )) • VB practical success • point estimates and prediction • fast 3

  2. Variational Bayes • VB approximation • Approximation for q ∗ ( θ ) p ( θ | x ) posterior • Minimize Kullback-Liebler (KL) divergence: p ( θ | x ) KL ( q k p ( ·| x )) q ∗ ( θ ) • VB practical success • point estimates and prediction • fast 3

  3. Variational Bayes • VB approximation • Approximation for q ∗ ( θ ) p ( θ | x ) posterior • Minimize Kullback-Leibler (KL) divergence: p ( θ | x ) KL ( q k p ( ·| x )) q ∗ ( θ ) • VB practical success • point estimates and prediction • fast 3

  4. Variational Bayes • VB approximation • Approximation for q ∗ ( θ ) p ( θ | x ) posterior • Minimize Kullback-Leibler (KL) divergence: p ( θ | x ) KL ( q k p ( ·| x )) q ∗ ( θ ) • VB practical success • point estimates and prediction • fast 3

  5. Variational Bayes • VB approximation • Approximation for q ∗ ( θ ) p ( θ | x ) posterior • Minimize Kullback-Leibler (KL) divergence: p ( θ | x ) KL ( q k p ( ·| x )) q ∗ ( θ ) • VB practical success • point estimates and prediction • fast 3

  6. Variational Bayes • VB approximation • Approximation for q ∗ ( θ ) p ( θ | x ) posterior • Minimize Kullback-Leibler (KL) divergence: p ( θ | x ) KL ( q k p ( ·| x )) q ∗ ( θ ) • VB practical success • point estimates and prediction • fast 3

  7. Variational Bayes • VB approximation • Approximation for q ∗ ( θ ) p ( θ | x ) posterior • Minimize Kullback-Leibler (KL) divergence: p ( θ | x ) KL ( q k p ( ·| x )) q ∗ ( θ ) • VB practical success • point estimates and prediction • fast 3 [Broderick, Boyd, Wibisono, Wilson, Jordan 2013]

  8. Variational Bayes • VB approximation • Approximation for q ∗ ( θ ) p ( θ | x ) posterior • Minimize Kullback-Leibler (KL) divergence: p ( θ | x ) KL ( q k p ( ·| x )) q ∗ ( θ ) • VB practical success • point estimates and prediction • fast, streaming, distributed 3 [Broderick, Boyd, Wibisono, Wilson, Jordan 2013]

  9. What about uncertainty? • Variational Bayes q ( θ ) log q ( θ ) Z KL ( q || p ( ·| x )) = p ( θ | x ) d θ ! θ • Mean-field variational Bayes (MFVB) θ 2 J Y q ( θ ) = q ( θ j ) ! j =1 • Underestimates variance (sometimes severely) θ 1 • No covariance estimates 4 [Bishop 2006]

  10. What about uncertainty? • Variational Bayes ! • Mean-field variational Bayes (MFVB) J Y q ( θ ) = q ( θ j ) ! j =1 • Underestimates variance (sometimes severely) • No covariance estimates 4

  11. What about uncertainty? • Variational Bayes ! • Mean-field variational Bayes (MFVB) θ 2 p ( θ | x ) J Y q ( θ ) = q ( θ j ) ! j =1 • Underestimates variance (sometimes severely) θ 1 • No covariance estimates 4 [Bishop 2006]

  12. What about uncertainty? • Variational Bayes q ( θ ) log q ( θ ) Z KL ( q || p ( ·| x )) = p ( θ | x ) d θ ! θ • Mean-field variational Bayes (MFVB) θ 2 p ( θ | x ) J Y q ( θ ) = q ( θ j ) ! j =1 • Underestimates variance (sometimes severely) θ 1 • No covariance estimates 4 [Bishop 2006]

  13. What about uncertainty? • Variational Bayes q ( θ ) log q ( θ ) Z KL ( q || p ( ·| x )) = p ( θ | x ) d θ ! θ • Mean-field variational Bayes (MFVB) θ 2 p ( θ | x ) J Y q ( θ ) = q ( θ j ) ! j =1 • Underestimates variance (sometimes severely) θ 1 • No covariance estimates 4 [Bishop 2006]

  14. What about uncertainty? • Variational Bayes q ( θ ) log q ( θ ) Z KL ( q || p ( ·| x )) = p ( θ | x ) d θ ! θ • Mean-field variational Bayes (MFVB) θ 2 p ( θ | x ) J Y q ( θ ) = q ( θ j ) ! q ∗ ( θ ) j =1 • Underestimates variance (sometimes severely) θ 1 • No covariance estimates 4 [Bishop 2006]

  15. What about uncertainty? • Variational Bayes q ( θ ) log q ( θ ) Z KL ( q || p ( ·| x )) = p ( θ | x ) d θ ! θ • Mean-field variational Bayes (MFVB) θ 2 p ( θ | x ) J Y q ( θ ) = q ( θ j ) ! q ∗ ( θ ) j =1 • Underestimates variance (sometimes severely) θ 1 • No covariance estimates 4 [Bishop 2006]

  16. What about uncertainty? • Variational Bayes q ( θ ) log q ( θ ) Z KL ( q || p ( ·| x )) = p ( θ | x ) d θ ! θ • Mean-field variational Bayes (MFVB) θ 2 p ( θ | x ) J Y q ( θ ) = q ( θ j ) ! q ∗ ( θ ) j =1 • Underestimates variance (sometimes severely) θ 1 • No covariance estimates 4 [Bishop 2006]

  17. What about uncertainty? • Variational Bayes q ( θ ) log q ( θ ) Z KL ( q || p ( ·| x )) = p ( θ | x ) d θ ! θ • Mean-field variational Bayes (MFVB) θ 2 p ( θ | x ) J Y q ( θ ) = q ( θ j ) ! q ∗ ( θ ) j =1 • Underestimates variance (sometimes severely) θ 1 • No covariance estimates 4 [MacKay 2003; Bishop 2006; Wang, Titterington 2004; Turner, Sahani 2011]

  18. What about uncertainty? • Variational Bayes q ( θ ) log q ( θ ) Z KL ( q || p ( ·| x )) = p ( θ | x ) d θ ! θ • Mean-field variational Bayes (MFVB) θ 2 p ( θ | x ) J Y q ( θ ) = q ( θ j ) ! q ∗ ( θ ) j =1 • Underestimates variance (sometimes severely) θ 1 • No covariance estimates [MacKay 2003; Bishop 2006; Wang, Titterington 2004; Turner, Sahani 2011] 4 [Fosdick 2013; Dunson 2014; Bardenet, Doucet, Holmes 2015]

  19. Linear response • Cumulant-generating function � mean = d C ( t ) := log E e t T θ � dtC ( t ) � ! � t =0 • Exact posterior covariance vs MFVB covariance d 2 d 2 � � � � Σ := dt T dtC p ( ·| x ) ( t ) V := dt T dtC q ∗ ( t ) ! � � � � t =0 t =0 • “Linear response” p ( θ | x ) ! log p t ( θ ) := log p ( θ | x ) + t T θ − C ( t ) , MFVB q ∗ t q ∗ ( θ ) • The LRVB approximation � � d d =: ˆ � � dt T E p t θ t θ Σ = dt T E q ∗ Σ ≈ � � � � t =0 t =0 5 [see also Opper, Winther 2003] [Bishop 2006]

  20. Linear response • Cumulant-generating function � mean = d C ( t ) := log E e t T θ � dtC ( t ) � ! � t =0 • True posterior covariance vs MFVB covariance d 2 d 2 � � � � Σ := dt T dtC p ( ·| x ) ( t ) V := dt T dtC q ∗ ( t ) ! � � � � t =0 t =0 • “Linear response” p ( θ | x ) ! log p t ( θ ) := log p ( θ | x ) + t T θ − C ( t ) , MFVB q ∗ t q ∗ ( θ ) • The LRVB approximation � � d d =: ˆ � � dt T E p t θ t θ Σ = dt T E q ∗ Σ ≈ � � � � t =0 t =0 5 [see also Opper, Winther 2003] [Bishop 2006]

  21. Linear response • Cumulant-generating function � mean = d C ( t ) := log E e t T θ � dtC ( t ) � ! � t =0 • True posterior covariance vs MFVB covariance d 2 d 2 � � � � Σ := dt T dtC p ( ·| x ) ( t ) V := dt T dtC q ∗ ( t ) ! � � � � t =0 t =0 • “Linear response” p ( θ | x ) ! log p t ( θ ) := log p ( θ | x ) + t T θ − C ( t ) , MFVB q ∗ t q ∗ ( θ ) • The LRVB approximation � � d d =: ˆ � � dt T E p t θ t θ Σ = dt T E q ∗ Σ ≈ � � � � t =0 t =0 5 [see also Opper, Winther 2003] [Bishop 2006]

  22. Linear response • Cumulant-generating function � mean = d C ( t ) := log E e t T θ � dtC ( t ) � ! � t =0 • True posterior covariance vs MFVB covariance d 2 d 2 � � � � Σ := dt T dtC p ( ·| x ) ( t ) V := dt T dtC q ∗ ( t ) ! � � � � t =0 t =0 • “Linear response” p ( θ | x ) ! log p t ( θ ) := log p ( θ | x ) + t T θ − C ( t ) , MFVB q ∗ t q ∗ ( θ ) • The LRVB approximation � � d d =: ˆ � � dt T E p t θ t θ Σ = dt T E q ∗ Σ ≈ � � � � t =0 t =0 5 [see also Opper, Winther 2003] [Bishop 2006]

  23. Linear response • Cumulant-generating function � mean = d C ( t ) := log E e t T θ � dtC ( t ) � ! � t =0 • Exact posterior covariance vs MFVB covariance d 2 d 2 � � � � Σ := dt T dtC p ( ·| x ) ( t ) V := dt T dtC q ∗ ( t ) ! � � � � t =0 t =0 • “Linear response” p ( θ | x ) p ( θ | x ) ! log p t ( θ ) := log p ( θ | x ) + t T θ − C ( t ) , MFVB q ∗ t q ∗ ( θ ) • The LRVB approximation � � d d =: ˆ � � dt T E p t θ t θ Σ = dt T E q ∗ Σ ≈ � � [Bishop 2006] � � t =0 t =0 5 [see also Opper, Winther 2003] [Bishop 2006]

  24. Linear response • Cumulant-generating function � mean = d C ( t ) := log E e t T θ � dtC ( t ) � ! � t =0 • Exact posterior covariance vs MFVB covariance d 2 d 2 � � � � Σ := dt T dtC p ( ·| x ) ( t ) V := dt T dtC q ∗ ( t ) ! � � � � t =0 t =0 • “Linear response” p ( θ | x ) ! log p t ( θ ) := log p ( θ | x ) + t T θ − C ( t ) , MFVB q ∗ t • The LRVB approximation � � d d =: ˆ � � dt T E p t θ t θ Σ = dt T E q ∗ Σ ≈ � � [Bishop 2006] � � t =0 t =0 5 [see also Opper, Winther 2003]

  25. Linear response • Cumulant-generating function � mean = d C ( t ) := log E e t T θ � dtC ( t ) � ! � t =0 • Exact posterior covariance vs MFVB covariance d 2 d 2 � � � � Σ := dt T dtC p ( ·| x ) ( t ) V := dt T dtC q ∗ ( t ) ! � � � � t =0 t =0 • “Linear response” p ( θ | x ) p ( θ | x ) ! log p t ( θ ) := log p ( θ | x ) + t T θ − C ( t ) , MFVB q ∗ t q ∗ ( θ ) q ∗ ( θ ) • The LRVB approximation � � d d =: ˆ � � dt T E p t θ t θ Σ = dt T E q ∗ Σ ≈ � � [Bishop 2006] � � t =0 t =0 5 [see also Opper, Winther 2003] [Bishop 2006]

  26. Linear response • Cumulant-generating function � mean = d C ( t ) := log E e t T θ � dtC ( t ) � ! � t =0 • Exact posterior covariance vs MFVB covariance d 2 d 2 � � � � Σ := dt T dtC p ( ·| x ) ( t ) V := dt T dtC q ∗ ( t ) ! � � � � t =0 t =0 • “Linear response” p ( θ | x ) p ( θ | x ) ! log p t ( θ ) := log p ( θ | x ) + t T θ − C ( t ) , MFVB q ∗ t q ∗ ( θ ) q ∗ ( θ ) • The LRVB approximation � � d d =: ˆ � � dt T E p t θ t θ Σ = dt T E q ∗ Σ ≈ � � [Bishop 2006] � � t =0 t =0 5 [see also Opper, Winther 2003] [Bishop 2006]

  27. Linear response • Cumulant-generating function � mean = d C ( t ) := log E e t T θ � dtC ( t ) � ! � t =0 • Exact posterior covariance vs MFVB covariance d 2 d 2 � � � � Σ := dt T dtC p ( ·| x ) ( t ) V := dt T dtC q ∗ ( t ) ! � � � � t =0 t =0 • “Linear response” p ( θ | x ) ! log p t ( θ ) := log p ( θ | x ) + t T θ − C ( t ) , MFVB q ∗ t q ∗ ( θ ) • The LRVB approximation � � d d =: ˆ � � dt T E p t θ t θ Σ = dt T E q ∗ Σ ≈ � � [Bishop 2006] � � t =0 t =0 5 [see also Opper, Winther 2003] [Bishop 2006] [Bishop 2006]

  28. Linear response • Cumulant-generating function � mean = d C ( t ) := log E e t T θ � dtC ( t ) � ! � t =0 • Exact posterior covariance vs MFVB covariance d 2 d 2 � � � � Σ := dt T dtC p ( ·| x ) ( t ) V := dt T dtC q ∗ ( t ) ! � � � � t =0 t =0 • “Linear response” p ( θ | x ) ! log p t ( θ ) := log p ( θ | x ) + t T θ − C ( t ) , MFVB q ∗ t • The LRVB approximation � � d d =: ˆ � � dt T E p t θ t θ Σ = dt T E q ∗ Σ ≈ � � [Bishop 2006] � � t =0 t =0 5 [see also Opper, Winther 2003] [Bishop 2006] [Bishop 2006]

  29. Linear response • Cumulant-generating function � mean = d C ( t ) := log E e t T θ � dtC ( t ) � ! � t =0 • Exact posterior covariance vs MFVB covariance d 2 d 2 � � � � Σ := dt T dtC p ( ·| x ) ( t ) V := dt T dtC q ∗ ( t ) ! � � � � t =0 t =0 • “Linear response” p ( θ | x ) ! log p t ( θ ) := log p ( θ | x ) + t T θ − C ( t ) , MFVB q ∗ t • The LRVB approximation � � d d =: ˆ � � dt T E p t θ t θ Σ = dt T E q ∗ Σ ≈ � � [Bishop 2006] � � t =0 t =0 5 [see also Opper, Winther 2003] [Bishop 2006] [Bishop 2006]

  30. Linear response • Cumulant-generating function � mean = d C ( t ) := log E e t T θ � dtC ( t ) � ! � t =0 • Exact posterior covariance vs MFVB covariance d 2 d 2 � � � � Σ := dt T dtC p ( ·| x ) ( t ) V := dt T dtC q ∗ ( t ) ! � � � � t =0 t =0 • “Linear response” p ( θ | x ) ! log p t ( θ ) := log p ( θ | x ) + t T θ − C ( t ) , MFVB q ∗ t • The LRVB approximation � � d d =: ˆ � � dt T E p t θ t θ Σ = dt T E q ∗ Σ ≈ � � [Bishop 2006] � � t =0 t =0 5 [see also Opper, Winther 2003] [Bishop 2006] [Bishop 2006]

  31. Linear response • Cumulant-generating function � mean = d C ( t ) := log E e t T θ � dtC ( t ) � ! � t =0 • Exact posterior covariance vs MFVB covariance d 2 d 2 � � � � Σ := dt T dtC p ( ·| x ) ( t ) V := dt T dtC q ∗ ( t ) ! � � � � t =0 t =0 • “Linear response” p ( θ | x ) ! log p t ( θ ) := log p ( θ | x ) + t T θ − C ( t ) , MFVB q ∗ t • The LRVB approximation � � d d =: ˆ � � dt T E p t θ t θ Σ = dt T E q ∗ Σ ≈ � � [Bishop 2006] � � t =0 t =0 5 [see also Opper, Winther 2003] [Bishop 2006] [Bishop 2006]

  32. Linear response • Cumulant-generating function � mean = d C ( t ) := log E e t T θ � dtC ( t ) � ! � t =0 • Exact posterior covariance vs MFVB covariance d 2 d 2 � � � � Σ := dt T dtC p ( ·| x ) ( t ) V := dt T dtC q ∗ ( t ) ! � � � � t =0 t =0 • “Linear response” p ( θ | x ) ! log p t ( θ ) := log p ( θ | x ) + t T θ − C ( t ) • The LRVB approximation � � d d =: ˆ � � dt T E p t θ t θ Σ = dt T E q ∗ Σ ≈ � � [Bishop 2006] � � t =0 t =0 5 [see also Opper, Winther 2003] [Bishop 2006] [Bishop 2006]

  33. Linear response • Cumulant-generating function � mean = d C ( t ) := log E e t T θ � dtC ( t ) � ! � t =0 • Exact posterior covariance vs MFVB covariance d 2 d 2 � � � � Σ := dt T dtC p ( ·| x ) ( t ) V := dt T dtC q ∗ ( t ) ! � � � � t =0 t =0 • “Linear response” p ( θ | x ) ! log p t ( θ ) := log p ( θ | x ) + t T θ − C ( t ) • The LRVB approximation � � d d =: ˆ � � dt T E p t θ t θ Σ = dt T E q ∗ Σ ≈ � � [Bishop 2006] � � t =0 t =0 5 [see also Opper, Winther 2003] [Bishop 2006] [Bishop 2006]

  34. Linear response • Cumulant-generating function � mean = d C ( t ) := log E e t T θ � dtC ( t ) � ! � t =0 • Exact posterior covariance vs MFVB covariance d 2 d 2 � � � � Σ := dt T dtC p ( ·| x ) ( t ) V := dt T dtC q ∗ ( t ) ! � � � � t =0 t =0 • “Linear response” p ( θ | x ) p ( θ | x ) ! log p t ( θ ) := log p ( θ | x ) + t T θ − C ( t ) , MFVB q ∗ t q ∗ ( θ ) q ∗ ( θ ) • The LRVB approximation � � d d =: ˆ � � dt T E p t θ t θ Σ = dt T E q ∗ Σ ≈ � � [Bishop 2006] � � t =0 t =0 5 [see also Opper, Winther 2003]

  35. Linear response • Cumulant-generating function � mean = d C ( t ) := log E e t T θ � dtC ( t ) � ! � t =0 • Exact posterior covariance vs MFVB covariance d 2 d 2 � � � � Σ := dt T dtC p ( ·| x ) ( t ) V := dt T dtC q ∗ ( t ) ! � � � � t =0 t =0 • “Linear response” p ( θ | x ) p ( θ | x ) ! log p t ( θ ) := log p ( θ | x ) + t T θ − C ( t ) , MFVB q ∗ t q ∗ ( θ ) q ∗ ( θ ) • The LRVB approximation � � d d =: ˆ � � dt T E p t θ t θ Σ = dt T E q ∗ Σ ≈ � � [Bishop 2006] � � t =0 t =0 5 [see also Opper, Winther 2003]

  36. Linear response • Cumulant-generating function � mean = d C ( t ) := log E e t T θ � dtC ( t ) � ! � t =0 • Exact posterior covariance vs MFVB covariance d 2 d 2 � � � � Σ := dt T dtC p ( ·| x ) ( t ) V := dt T dtC q ∗ ( t ) ! � � � � t =0 t =0 • “Linear response” p ( θ | x ) p ( θ | x ) ! log p t ( θ ) := log p ( θ | x ) + t T θ − C ( t ) , MFVB q ∗ t q ∗ ( θ ) q ∗ ( θ ) • The LRVB approximation  d �� d � Σ = dtC p ( ·| x ) ( t ) � [Bishop 2006] dt T � t =0 5 [see also Opper, Winther 2003]

  37. Linear response • Cumulant-generating function � mean = d C ( t ) := log E e t T θ � dtC ( t ) � ! � t =0 • Exact posterior covariance vs MFVB covariance d 2 d 2 � � � � Σ := dt T dtC p ( ·| x ) ( t ) V := dt T dtC q ∗ ( t ) ! � � � � t =0 t =0 • “Linear response” p ( θ | x ) p ( θ | x ) ! log p t ( θ ) := log p ( θ | x ) + t T θ − C ( t ) , MFVB q ∗ t q ∗ ( θ ) q ∗ ( θ ) • The LRVB approximation � � d d =: ˆ � � dt T E p t θ t θ Σ = dt T E q ∗ Σ ≈ � � [Bishop 2006] � � t =0 t =0 5 [see also Opper, Winther 2003]

  38. Linear response • Cumulant-generating function � mean = d C ( t ) := log E e t T θ � dtC ( t ) � ! � t =0 • Exact posterior covariance vs MFVB covariance d 2 d 2 � � � � Σ := dt T dtC p ( ·| x ) ( t ) V := dt T dtC q ∗ ( t ) ! � � � � t =0 t =0 • “Linear response” p ( θ | x ) p ( θ | x ) ! log p t ( θ ) := log p ( θ | x ) + t T θ − C ( t ) , MFVB q ∗ t q ∗ ( θ ) q ∗ ( θ ) • The LRVB approximation � � d d =: ˆ � � dt T E p t θ t θ Σ = dt T E q ∗ Σ ≈ � � [Bishop 2006] � � t =0 t =0 5 [see also Opper, Winther 2003]

  39. Linear response • Cumulant-generating function � mean = d C ( t ) := log E e t T θ � dtC ( t ) � ! � t =0 • Exact posterior covariance vs MFVB covariance d 2 d 2 � � � � Σ := dt T dtC p ( ·| x ) ( t ) V := dt T dtC q ∗ ( t ) ! � � � � t =0 t =0 • “Linear response” p ( θ | x ) p ( θ | x ) ! log p t ( θ ) := log p ( θ | x ) + t T θ − C ( t ) , MFVB q ∗ t q ∗ ( θ ) q ∗ ( θ ) • The LRVB approximation � � d d =: ˆ � � dt T E p t θ t θ Σ = dt T E q ∗ Σ ≈ � � [Bishop 2006] � � t =0 t =0 5 [see also Opper, Winther 2003]

  40. Linear response • Cumulant-generating function � mean = d C ( t ) := log E e t T θ � dtC ( t ) � ! � t =0 • Exact posterior covariance vs MFVB covariance d 2 d 2 � � � � Σ := dt T dtC p ( ·| x ) ( t ) V := dt T dtC q ∗ ( t ) ! � � � � t =0 t =0 • “Linear response” p ( θ | x ) p ( θ | x ) ! log p t ( θ ) := log p ( θ | x ) + t T θ − C ( t ) , MFVB q ∗ t q ∗ ( θ ) q ∗ ( θ ) • The LRVB approximation � � d d =: ˆ � � dt T E p t θ t θ Σ = dt T E q ∗ Σ ≈ � � [Bishop 2006] � � t =0 t =0 5 [see also Opper, Winther 2003]

  41. Linear response • Cumulant-generating function � mean = d C ( t ) := log E e t T θ � dtC ( t ) � ! � t =0 • Exact posterior covariance vs MFVB covariance d 2 d 2 � � � � Σ := dt T dtC p ( ·| x ) ( t ) V := dt T dtC q ∗ ( t ) ! � � � � t =0 t =0 • “Linear response” p ( θ | x ) ! log p t ( θ ) := log p ( θ | x ) + t T θ − C ( t ) , MFVB q ∗ t q ∗ ( θ ) • The LRVB approximation � � d d =: ˆ � � dt T E p t θ t θ Σ = dt T E q ∗ Σ ≈ � � [Bishop 2006] � � t =0 t =0 5 [see also Opper, Winther 2003]

  42. LRVB estimator � d • LRVB covariance estimate ˆ � t θ Σ := dt T E q ∗ � � t =0 • Suppose exponential family with mean parametrization q t m t 6

  43. LRVB estimator � d • LRVB covariance estimate ˆ � t θ Σ := dt T E q ∗ � � t =0 • Suppose exponential family with mean parametrization q t m t 6

  44. LRVB estimator � d • LRVB covariance estimate ˆ � t θ Σ := dt T E q ∗ � � t =0 • Suppose exponential family with mean parametrization q t m t = ( I − V H ) − 1 V 6

  45. LRVB estimator � d • LRVB covariance estimate ˆ � t θ Σ := dt T E q ∗ � � t =0 • Suppose exponential family with mean parametrization q t m t ✓ ∂ 2 KL ◆ − 1 � = ( I − V H ) − 1 V ˆ � Σ = � ∂ m ∂ m T � m = m ∗ 6

  46. LRVB estimator � d • LRVB covariance estimate ˆ � t θ Σ := dt T E q ∗ � � t =0 • Suppose exponential family with mean parametrization q t m t ✓ ∂ 2 KL ◆ − 1 � = ( I − V H ) − 1 V ˆ � Σ = � ∂ m ∂ m T � m = m ∗ 6

  47. LRVB estimator � d • LRVB covariance estimate ˆ � t θ Σ := dt T E q ∗ � � t =0 • Suppose exponential family with mean parametrization q t m t ✓ ∂ 2 KL ◆ − 1 � = ( I − V H ) − 1 V ˆ � Σ = � ∂ m ∂ m T � m = m ∗ 6

  48. LRVB estimator � d • LRVB covariance estimate ˆ � t θ Σ := dt T E q ∗ � � t =0 • Suppose exponential family with mean parametrization q t m t ✓ ∂ 2 KL ◆ − 1 � = ( I − V H ) − 1 V ˆ � Σ = � ∂ m ∂ m T � m = m ∗ • Symmetric and positive definite at local min of KL • The LRVB assumption: 6

  49. LRVB estimator � d • LRVB covariance estimate ˆ � t θ Σ := dt T E q ∗ � � t =0 • Suppose exponential family with mean parametrization q t m t ✓ ∂ 2 KL ◆ − 1 � = ( I − V H ) − 1 V ˆ � Σ = � ∂ m ∂ m T � m = m ∗ • Symmetric and positive definite at local min of KL • The LRVB assumption: E p t θ ≈ E q ∗ t θ 6

  50. LRVB estimator � d • LRVB covariance estimate ˆ � t θ Σ := dt T E q ∗ � � t =0 • Suppose exponential family with mean parametrization q t m t ✓ ∂ 2 KL ◆ − 1 � = ( I − V H ) − 1 V ˆ � Σ = � ∂ m ∂ m T � m = m ∗ • Symmetric and positive definite at local min of KL p ( θ | x ) • The LRVB assumption: E p t θ ≈ E q ∗ t θ q ∗ ( θ ) [Bishop 2006] 6

  51. LRVB estimator � d • LRVB covariance estimate ˆ � t θ Σ := dt T E q ∗ � � t =0 • Suppose exponential family with mean parametrization q t m t ✓ ∂ 2 KL ◆ − 1 � = ( I − V H ) − 1 V ˆ � Σ = � ∂ m ∂ m T � m = m ∗ • Symmetric and positive definite at local min of KL p ( θ | x ) • The LRVB assumption: E p t θ ≈ E q ∗ t θ q ∗ ( θ ) • LRVB estimate is exact when MFVB gives exact mean (e.g. multivariate normal) [Bishop 2006] 6

  52. Microcredit Experiment • Simplified from Meager (2015) • K microcredit trials (Mexico, Mongolia, Bosnia, India, Morocco, Philippines, Ethiopia) • N k businesses in k th site (~900 to ~17K) • Profit of n th business at k th site: ! ! • Priors and hyperpriors: 7

  53. Microcredit Experiment • Simplified from Meager (2016) • K microcredit trials (Mexico, Mongolia, Bosnia, India, Morocco, Philippines, Ethiopia) • N k businesses in k th site (~900 to ~17K) • Profit of n th business at k th site: ! ! • Priors and hyperpriors: 7

  54. Microcredit Experiment • Simplified from Meager (2016) • K microcredit trials (Mexico, Mongolia, Bosnia, India, Morocco, Philippines, Ethiopia) • N k businesses in k th site (~900 to ~17K) • Profit of n th business at k th site: ! ! • Priors and hyperpriors: 7

  55. Microcredit Experiment • Simplified from Meager (2016) • K microcredit trials (Mexico, Mongolia, Bosnia, India, Morocco, Philippines, Ethiopia) • N k businesses in k th site (~900 to ~17K) • Profit of n th business at k th site: ! ! • Priors and hyperpriors: 7

  56. Microcredit Experiment • Simplified from Meager (2016) • K microcredit trials (Mexico, Mongolia, Bosnia, India, Morocco, Philippines, Ethiopia) • N k businesses in k th site (~900 to ~17K) • Profit of n th business at k th site: ! ! • Priors and hyperpriors: 7

  57. Microcredit Experiment • Simplified from Meager (2016) • K microcredit trials (Mexico, Mongolia, Bosnia, India, Morocco, Philippines, Ethiopia) • N k businesses in k th site (~900 to ~17K) • Profit of n th business at k th site: profit ! indep N ( µ k + T kn τ k , σ 2 k ) y kn ∼ ! • Priors and hyperpriors: 7

  58. Microcredit Experiment • Simplified from Meager (2016) • K microcredit trials (Mexico, Mongolia, Bosnia, India, Morocco, Philippines, Ethiopia) • N k businesses in k th site (~900 to ~17K) • Profit of n th business at k th site: profit ! indep N ( µ k + T kn τ k , σ 2 k ) y kn ∼ ! • Priors and hyperpriors: 7

  59. Microcredit Experiment • Simplified from Meager (2016) • K microcredit trials (Mexico, Mongolia, Bosnia, India, Morocco, Philippines, Ethiopia) • N k businesses in k th site (~900 to ~17K) • Profit of n th business at k th site: profit ! indep N ( µ k + T kn τ k , σ 2 k ) y kn ∼ ! • Priors and hyperpriors: 7

  60. Microcredit Experiment • Simplified from Meager (2016) • K microcredit trials (Mexico, Mongolia, Bosnia, India, Morocco, Philippines, Ethiopia) • N k businesses in k th site (~900 to ~17K) • Profit of n th business at k th site: profit ! indep N ( µ k + T kn τ k , σ 2 k ) y kn ∼ ! • Priors and hyperpriors: 7

  61. Microcredit Experiment • Simplified from Meager (2016) • K microcredit trials (Mexico, Mongolia, Bosnia, India, Morocco, Philippines, Ethiopia) • N k businesses in k th site (~900 to ~17K) 1 if microcredit • Profit of n th business at k th site: profit ! indep N ( µ k + T kn τ k , σ 2 k ) y kn ∼ ! • Priors and hyperpriors: 7

  62. Microcredit Experiment • Simplified from Meager (2016) • K microcredit trials (Mexico, Mongolia, Bosnia, India, Morocco, Philippines, Ethiopia) • N k businesses in k th site (~900 to ~17K) 1 if microcredit • Profit of n th business at k th site: profit ! indep N ( µ k + T kn τ k , σ 2 k ) y kn ∼ ! • Priors and hyperpriors: 7

  63. Microcredit Experiment • Simplified from Meager (2016) • K microcredit trials (Mexico, Mongolia, Bosnia, India, Morocco, Philippines, Ethiopia) • N k businesses in k th site (~900 to ~17K) 1 if microcredit • Profit of n th business at k th site: profit ! indep N ( µ k + T kn τ k , σ 2 k ) y kn ∼ ! • Priors and hyperpriors: 7

  64. Microcredit Experiment • Simplified from Meager (2016) • K microcredit trials (Mexico, Mongolia, Bosnia, India, Morocco, Philippines, Ethiopia) • N k businesses in k th site (~900 to ~17K) 1 if microcredit • Profit of n th business at k th site: profit ! indep N ( µ k + T kn τ k , σ 2 k ) y kn ∼ ! • Priors and hyperpriors: 7

  65. Microcredit Experiment • Simplified from Meager (2016) • K microcredit trials (Mexico, Mongolia, Bosnia, India, Morocco, Philippines, Ethiopia) • N k businesses in k th site (~900 to ~17K) 1 if microcredit • Profit of n th business at k th site: profit ! indep N ( µ k + T kn τ k , σ 2 k ) y kn ∼ ! • Priors and hyperpriors: 7

  66. Microcredit Experiment • Simplified from Meager (2016) • K microcredit trials (Mexico, Mongolia, Bosnia, India, Morocco, Philippines, Ethiopia) • N k businesses in k th site (~900 to ~17K) 1 if microcredit • Profit of n th business at k th site: profit ! indep N ( µ k + T kn τ k , σ 2 k ) y kn ∼ ! • Priors and hyperpriors: ✓ ◆ ✓✓ ◆ ◆ µ k µ iid ∼ N , C τ k τ 7

  67. Microcredit Experiment • Simplified from Meager (2016) • K microcredit trials (Mexico, Mongolia, Bosnia, India, Morocco, Philippines, Ethiopia) • N k businesses in k th site (~900 to ~17K) 1 if microcredit • Profit of n th business at k th site: profit ! indep N ( µ k + T kn τ k , σ 2 k ) y kn ∼ ! • Priors and hyperpriors: ✓ ◆ ✓✓ ◆ ◆ µ k µ iid ∼ N , C τ k τ iid σ − 2 ∼ Γ ( a, b ) k 7

  68. Microcredit Experiment • Simplified from Meager (2016) • K microcredit trials (Mexico, Mongolia, Bosnia, India, Morocco, Philippines, Ethiopia) • N k businesses in k th site (~900 to ~17K) 1 if microcredit • Profit of n th business at k th site: profit ! indep N ( µ k + T kn τ k , σ 2 k ) y kn ∼ ! • Priors and hyperpriors: ✓ ◆ ✓✓ ◆ ◆ ✓ ◆ ✓✓ ◆ ◆ µ k µ iid µ iid µ 0 , Λ − 1 ∼ N ∼ N , C τ k τ τ τ 0 iid σ − 2 ∼ Γ ( a, b ) C ∼ Sep&LKJ( η , c, d ) k 7

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend