SLIDE 25 MAT 141 ‐ Chapter 7 25
Cancer Test
Let be the event that you have a positive test and let be the event you have cancer. Then the question, “What is the probability that you actually have cancer (give you had a positive test)?” becomes | |′
What does this mean?
- So, our chance of cancer, given the test was positive, is about
7.8%.
- Interesting — a positive mammogram only means you have a
7.8% chance of cancer, rather than 80% (the supposed accuracy of the test). It might seem strange at first but it makes sense: the test gives a false positive 10% of the time, so there will be a ton of false positives in any given population. There will be so many false positives, in fact, that most of the positive test results will be wrong.
- If you take 100 people, only 1 person (1%) will have cancer.
Another 10 (~9.6%) will not have cancer but will get a false positive result. Getting a positive result means you only have a roughly 1/11 chance of being the person who really has cancer (7.8% to be exact).
Other Uses
One clever application of Bayes’ Theorem is in spam filtering. We have
Event A: The message is spam. Test X: The message contains certain words (X)
- Plugged into a more readable formula:
|
- Bayesian filtering allows us to predict the chance a message is really spam given
the “test results” (the presence of certain words). Clearly, words like “viagra” have a higher chance of appearing in spam messages than in normal ones.
- Spam filtering based on a blacklist is flawed — it’s too restrictive and false
positives are too great. But Bayesian filtering gives us a middle ground — we use
- probabilities. As we analyze the words in a message, we can compute the chance
it is spam (rather than making a yes/no decision). If a message has a 99.9% chance of being spam, it probably is. As the filter gets trained with more and more messages, it updates the probabilities that certain words lead to spam
- messages. Advanced Bayesian filters can examine multiple words in a row, as
another data point.
Bayes' Theorem
- We have been looking at two sets and their compliments (e.g.
Cancer/No Cancer and Positive/Not positive). However, we can analyze more complicated situations with the general form of Baye’s Theorem.
For example, we could ask the probability of a certain Age, given a person is in a car accident, event . Here “Age” can be broken down into many categories such as
- : 16 20
- : 21 30
- : 31 65
- : 66 99