final exam review
play

FINAL EXAM REVIEW Will cover: All content from the course (Units - PowerPoint PPT Presentation

FINAL EXAM REVIEW Will cover: All content from the course (Units 1-5) Most points concentrated on Units 3-5 (mixture models, HMMs, MCMC) Logistics Take-home exam, maximum 2 hour time limit Exam release late afternoon Fri 5/1


  1. FINAL EXAM REVIEW Will cover: • All content from the course (Units 1-5) • Most points concentrated on Units 3-5 (mixture models, HMMs, MCMC) Logistics • Take-home exam, maximum 2 hour time limit • Exam release late afternoon Fri 5/1 • Exam due NOON (11:59am ET) on Fri 5/8 • Can use: Any notes, any textbook, any Python code (run locally) • Cannot use: The internet to search for answers, other people • We will provide most needed formulas or give textbook reference

  2. Takeaway Messages 1) When uncertain about a variable, don’t condition on it, integrate it away! 2) Model performance is only as good as your fitting algorithm, initialization, and hyperparameter selection. 3) MCMC is a powerful way to estimate posterior distributions (and resulting expectations) even when the model is not analytically tractable

  3. <latexit sha1_base64="x2/tMCazDkzDw1uMp72i63SWnY=">ACDnicbVDLTsJAFJ3iC/GFunQzkZAgMaRFE92YEN24xEQeCdRmOp3ChOm0mZkKBPkCN/6KGxca49a1O/GAbpQ8CQ3OTn3tx7jxsxKpVpfhupeWV1bX0emZjc2t7J7u7V5dhLDCp4ZCFoukiSRjlpKaoYqQZCYICl5G27ua+I17IiQN+a0aRsQOUIdTn2KktORk81EBDu6K8AE2j+AFbFOunD6MCgOneAz7M9nrO9mcWTKngIvESkgOJKg62a+2F+I4IFxhqRsWak7BESimJGxpl2LEmEcA91SEtTjgIi7dH0nTHMa8WDfih0cQWn6u+JEQqkHAau7gyQ6sp5byL+57Vi5Z/bI8qjWBGOZ4v8mEVwk20KOCYMWGmiAsqL4V4i4SCudYEaHYM2/vEjq5ZJ1UirfnOYql0kcaXADkEBWOAMVMA1qIawOARPINX8GY8GS/Gu/Exa0Zycw+APj8we+c5jE</latexit> <latexit sha1_base64="lKLrfhc+1Z5XvW67gMKyBwIphE0=">AB+nicbVBNT8JAEJ36ifhV9OhlIzFBD6RFEz0SvXjERD4SqGS7LBhu212tyIp/BQvHjTGq7/Em/GBXpQ8CWTvLw3k5l5fsSZ0o7zba2srq1vbGa2sts7u3v7du6gpsJYElolIQ9lw8eKciZoVTPNaSOSFAc+p3V/cDP1649UKhaKez2KqBfgnmBdRrA2UtvORYWnhzM0Rq0+1slwgk7bdt4pOjOgZeKmJA8pKm37q9UJSRxQoQnHSjVdJ9JegqVmhNJthUrGmEywD3aNFTgCovmZ0+QSdG6aBuKE0JjWbq74kEB0qNAt90Blj31aI3Ff/zmrHuXnkJE1GsqSDzRd2YIx2iaQ6owyQlmo8MwUQycysifSwx0SatrAnBXx5mdRKRfe8WLq7yJev0zgycATHUAXLqEMt1CBKhAYwjO8wps1tl6sd+tj3rpipTOH8AfW5w+0LpL+</latexit> Takeaway 1! When uncertain about a parameter, better to INTEGRATE AWAY than CONDITION ON p ( x ∗ | ˆ w ) OK: Using a point estimate BETTER: Integrate away ”w” via the sum rule Z p ( x ∗ | X ) = p ( x ∗ , w | X ) dw w

  4. Takeaway 2 • Initialization, remember CP3 (GMMs) • as well as CP5 (coming!) • Algorithm, remember the difference between LBFGS and EM in CP3 Difference between purple and blue is 0.01 on log scale * Hyperparameter: Remember the poor When normalized over 400 pixels (20x20) per image performance in CP2 Means purple model says average validation set image is exp(0.01 * 400) = 54.5 times more likely than the blue model

  5. <latexit sha1_base64="yD5xTIqcA75dojf35Uv58djnkRk=">ACl3icbVFda9swFJW9r9b7aNr1ZezlsrCRlBLsrtAxKCvrGH1s6dIG4sTIstyKyh+VrpcEzX9pP2Zv+zeTkzxsyS4Ijs65hyPdG5dSaPT93474OGjx082Nr2nz56/2Gpt71zpolKM91khCzWIqeZS5LyPAiUflIrTLJb8Or47bfTr71xpUeTfcFbyUZvcpEKRtFSUetn2YHpeA9+wKAL74hFDlGEyg702hvHyYLPpl4YeitqFadC1c60kVZSaozWUNoa6yOjoB5fQpO0sI01dPchvL+vaGJvRjedSNmd4tKEyKdohEjq2oRaZHVjbCxNTNRq+z1/XrAOgiVok2WdR61fYVKwKuM5Mkm1HgZ+iSNDFQome2FlealTaY3fGhTjOuR2Y+1xreWiaBtFD25Ahz9m+HoZnWsy2nRnFW72qNeT/tGF6YeREXlZIc/ZIitJGABzZIgEYozlDMLKFPCvhXYLbVzRbtKzw4hWP3yOrg6AXvewcXh+2Tz8txbJDX5A3pkIAckRNyRs5JnzBn1/nonDpf3FfuJ/ere7ZodZ2l5yX5p9yLP1A7wmo=</latexit> Takeaway 3 • Can use MCMC to do posterior predictive Z p ( x ∗ | X ) = p ( x ∗ , w | X ) dw w Z = p ( x ∗ | w ) p ( w | X ) dw w S = 1 w s iid X p ( x ∗ | w s ) , ∼ p ( w s | X ) S s =1

  6. You are capable of so many things now! Given a proposed probabilistic model, you can do: ML estimation of parameters Heldout likelihood computation MAP estimation of parameters Hyperparameter selection via CV EM to estimate parameters Hyperparameter selection via evidence MCMC estimation of posterior

  7. Optimization Skills Unit 1 • Finding extrema by zeros of first derivative • Handling Constraints via Lagrange multipliers Probabilistic Analysis Skills Data analysis • Discrete and continuous r.v. • Beta-Bernoulli for binary data • Sum rule and product rule • ML estimation of ”proba. heads" • Bayes rule (derived from above) • MAP estimation of “proba. heads" • Expectations • Estimating the posterior • Independence • Predicting new data • Dirichlet-Categorical for discrete data Distributions • Bernoulli distribution • ML estimation of unigram probas • MAP estimation of unigram probas • Beta distribution • Estimating the posterior • Gamma function • Predicting new data • Dirichlet distribution

  8. Example Unit 1 Question a) True or False: Bayes Rule can be proved using the Sum Rule and Product Rules a) You’re modeling the wins/losses of your favorite sports team with a Beta-Bernoulli model. a) You assume each game’s binary outcome (win=1/loss=0) is iid. b) You observe in preseason play: 5 wins and 3 losses c) Suggest a prior to use for the win probability d) Identify 2 or more assumptions about this model that may not be valid in the real world (with concrete reasons)

  9. Example Unit 1 Answer

  10. Unit 2 Optimization Skills Probabilistic Analysis Skills • Convexity and second derivatives • Joints, conditionals, marginals • Finding extrema by zeros of first derivative • Covariance matrices (pos. definite, symmetric) • First and second order gradient descent • Gaussian conjugacy rules Linear Algebra Skills Data analysis • Determinants • Positive definite • Gaussian-Gaussian for regression • Invertibility • ML estimation of weights • MAP estimation of weights Distributions • Univariate Gaussian distribution • Estimating the posterior over weights • Multivariate Gaussian distribution • Predicting new data

  11. <latexit sha1_base64="6vUJKUEKlj3EmXfpJftwyboOxw=">ACO3icbVBNSxBFOzRaHT8Ws0xl0cWRUWGSPoRdgYhFyEVxd2BmHnt5et7Hng+436jLO/Lin/DmJZcIiHX3NOzu4hfBQ1FVT36vQpTKTQ6zoM1Nv5hYvLj1LQ9Mzs3v1BZXDrRSaYb7JEJqoVUs2liHkTBUreShWnUSj5aXjxvfRPL7nSIomPsZ9yP6LnsegKRtFIQeXI61HMsQjWYWUXrs7yg2+NAq6DdvzbA+F7PAn14so9sIw3y+CHMHTIoJ0FeGmjG9Aa62ANiD4QaXq1JwB4C1xR6RKRmgElXuvk7As4jEySbVu06Kfk4VCiZ5YXuZ5ilF/Sctw2NacS1nw9uL2DZKB3oJsq8GgPp/IaR1PwpNslxfv/ZK8T2vnWF3x89FnGbIYzb8qJtJwATKIqEjFGco+4ZQpoTZFViPKsrQ1G2bEtzXJ78lJ5s192t83CrWt8b1TFPpMvZJW4ZJvUyQ/SIE3CyC35SX6TR+vO+mX9sf4Oo2PWaOYTeQHr382I6qL</latexit> <latexit sha1_base64="a43bVnPThLqB7flrdowacMns+mI=">ACG3icbVBNSwMxEM36WetX1aOXYBGqSNmtgl4EURFPUsGq0NYlm6ZtaJdklm1rP0fXvwrXjwo4knw4L8x2/bg14OBx3szMwLIsENuO6nMzI6Nj4xmZnKTs/Mzs3nFhbPTRhryio0FKG+DIhgitWAQ6CXUaERkIdhF0DlL/4pw0N1Bt2I1SVpKd7klICV/FwpKoCv8B2+9dUa3sU1YLeQnIRalg+PegV8g9dTawPXDG9JclVaw34u7xbdPvBf4g1JHg1R9nPvtUZIY8kUEGMqXpuBPWEaOBUsF62FhsWEdohLVa1VBHJTD3p/9bDq1Zp4GaobSnAfX7REKkMV0Z2E5JoG1+e6n4n1eNoblT7iKYmCKDhY1Y4EhxGlQuME1oyC6lhCqub0V0zbRhIKNM2tD8H6/Jecl4reZrF0upXf2x/GkUHLaAUVkIe20R46RmVUQRTdo0f0jF6cB+fJeXeBq0jznBmCf2A8/EFs5Sesg=</latexit> Example Unit 2 Question You are doing regression with the following model • Normal prior on the weights p ( t n | x n ) = NormPDF( w ∗ x n , σ 2 ) • Normal likelihood: a. Consider the following two estimators for t_*. What’s the difference? t ∗ = w MAP x ∗ ˆ ˜ t ∗ = E t ∼ p ( t | x ∗ ,X ) [ t ] b. Suggest at least 2 ways to pick a value for the hyperparameter \sigma

  12. Example Unit 2 Answer

  13. Unit 3: K-Means and Mixture Models Distributions Optimization Skills • Mixtures of Gaussians (GMMs) • K-means objective and algorithm • Coordinate ascent / descent algorithms • Mixtures in general • Can use any likelihood (not just Gauss) • Optimization objectives with hidden vars • Complete likelihood: p(x, z | \theta) Numerical Methods • Incomplete likelihood: p( x | \theta) logsumexp • Expectations of complete likelihood • How to derive it Data analysis • Why it is important • K-means or GMM for a dataset • Expectation-Maximization algorithm • How to pick K hyperparameter • Lower bound objective • Why multiple inits matter • What E-step does • What M-step does

  14. Ex Exampl ple Uni nit 3 Que uestion Consider two possible models for clustering 1-dim. data • K-Means • Gaussian mixtures Name ways that the GMM is more flexible as a model: • How is the GMM’s treatment of assignments more flexible? • How is the GMM’s parameterization of a “cluster” more flexible? Under what limit does the GMM likelihood reduce to the K-means objective?

  15. Ex Exampl ple Uni nit 3 An Answer

  16. Unit 4: Markov models and HMMs Probabilistic Analysis Skills Algorithm Skills • Markov conditional independence • Forward algorithm • Backward algorithm • Stationary distributions • Viterbi algorithm • Deriving independence properties (all examples of dynamic programming) • Like HW4 problem 1 Linear Algebra Skills Optimization Skills • Eigenvectors/values for stationary • EM for HMMs distributions • E-step • M-step Distributions • Discrete Markov models

  17. Example Unit 4 Question • Describe how the Viterbi algorithm is an instance of dynamic programming Identify all the key parts: • What is the fundamental problem being solved? • How is the final solution built from solutions to smaller problems? • How to describe all the solutions as a big “table” that should be filled in? • What is the “base case” update (the simplest subproblem)? • What is the recursive update?

  18. Example Unit 4 Answer

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend