Feb 6: Parametric distributions What is a notebook anyway? The - - PowerPoint PPT Presentation

feb 6 parametric distributions what is a notebook anyway
SMART_READER_LITE
LIVE PREVIEW

Feb 6: Parametric distributions What is a notebook anyway? The - - PowerPoint PPT Presentation

Feb 6: Parametric distributions What is a notebook anyway? The kernel stores the environment , representing variable names and their values. The notebook server listens for web requests The browser only stores the and passes them to the


slide-1
SLIDE 1

Feb 6: Parametric distributions

slide-2
SLIDE 2

What is a notebook anyway?

https://jupyter.readthedocs.io/en/latest/architecture/how_jupyter_ipython_work.html

The kernel stores the environment, representing variable names and their values. The notebook server listens for web requests and passes them to the kernel to execute Python code in an environment. The browser only stores the code input and the output of each cell. The notebook .ipynb file stores the same information to disk.

slide-3
SLIDE 3

Assumptions make math easier

One of the most common statistical assumptions about events is that they are 1. Independent (one event tells you nothing about the next) 2. Identically distributed (probability is the same for each event) If events are independent, we can simply multiply their probabilities. If they are iid, we can just find the probability of one event and raise it to an exponent.

slide-4
SLIDE 4

Test yourself

Which of these event sequences is iid, and which are not? Why?

  • Five players each take one penalty kick
  • Number of french fries in 10 servings from the same restaurant
  • A bus either does or does not pick up passengers at each stop
slide-5
SLIDE 5

My answers

Which of these event sequences is iid, and which are not? Why?

  • Five players each take one penalty kick

○ Independent, but not identically distributed

  • Number of french fries in 10 servings from the same restaurant

○ Restaurants try to maintain consistency, probably iid

  • A bus either does or does not pick up passengers at each stop

○ Not identical (some stops are popular) and also not independent: if it's raining, all stops will have higher probability, so a person at one stop may imply the presence of people at others

All of these are interpretations. You can probably argue convincingly that I'm wrong about any of these. What matters is whether the simplifying assumption is valid enough to make the simpler calculations accurate enough.

slide-6
SLIDE 6

We can define a distribution over N events with N-1 numbers

Event 1 2 3 4 Probability 0.3 0.1 0.4 ???

slide-7
SLIDE 7

We can define a distribution over N events with N-1 numbers

Event 1 2 3 4 Probability 0.3 0.1 0.4 ???

This number has to be 0.2, since the probabilities have to add up to exactly 1.0.

slide-8
SLIDE 8

Parametric models use functions to assign probability to many events with few parameters

The binomial distribution allows you to assign probability to N events using only two numbers. The catch: parametric distributions can only represent certain shapes of

  • distributions. They are efficient in parameters, but inflexible. This is ofuen a good

property: see Occam's razor.

slide-9
SLIDE 9

Parametric distributions are all-purpose tools

Set of possible events Probability function Data generating process "Origin story" Parameters that determine behavior

slide-10
SLIDE 10

Bernoulli distribution

Origin story: Sample a value 1/0 with some probability p. What is the event space? What are the parameters?

slide-11
SLIDE 11

Binomial distribution

Origin story: Sample a sequence of n values 1/0 with some probability p, then add. What is the event space? How do we calculate the probability that the sum is 5? What are the parameters?

Source: Wikipedia