--- title: Bayes’ Theorem date: "2010-10-11T14:53:01Z" categories: - data wp_id: 2546 description: I explore how to use Bayes' Theorem to update a probability distribution iteratively. Starting with a flat prior, I show how successive coin toss results refine the Beta distribution to better estimate unknown likelihoods. keywords: [bayes theorem, beta distribution, probability distribution, posterior update, prior distribution, inferential statistics] --- I’ve tried understanding [Bayes’ Theorem](http://en.wikipedia.org/wiki/Bayes'_theorem) several times. I’ve always managed to get confused. Specifically, I’ve always wondered why it’s better than simply using the average estimate from the past. So here’s a little attempt to jog my memory the next time I forget. Q: A coin shows 5 heads when tossed 10 times. What’s the probability of a heads?\ A: It’s not 0.5. That’s the most likely estimate. The probability distribution is actually: [![dbeta(x,5,5)](/blog/assets/bayesian1.webp)](/blog/assets/bayesian1.webp) That’s because you don’t really know the probability with which the coin will throw a heads. It could be any number p. So lets say we have a probability distribution for it, f(p). Initially, you don’t know what this probability distribution is. So assume they’re all the same – a flat function: f(p) = 1[![dbeta(x,1,1)](/blog/assets/bayesian2.webp)](/blog/assets/bayesian2.webp) Now, given this, let’s say a heads falls on the next toss. What’s the revised probability distribution? It’s: f(p) ← f(p) \* probability(heads | x) / probability(heads) = 1 \* (x^1 \* (1-x)^0) / 1 = x [![dbeta(x,2,1)](/blog/assets/bayesian3.webp)](/blog/assets/bayesian3.webp) Let’s say the next is again a heads. Now it’s f(p) ← f(p) \* probability(heads | x) / probability(heads) = x \* (x^1 \* (1-x)^0) / 1 = x^2 [![dbeta(x,3,1)](/blog/assets/bayesian4.webp)](/blog/assets/bayesian4.webp) Now if it’s a tails, it becomes: f(p) ← f(p) \* prob(tails | x) / prob(tails) = x^2 \* (x^0 \* (1-x)^1) / 1 = x^2 \* (1-x) [![dbeta(x,3,2)](/blog/assets/bayesian5.webp)](/blog/assets/bayesian5.webp) … and so on. (This happens to be a called a Beta distribution.) Now, instead of this being the probability of heads, it could be the probability of a person having blood pressure, or a document being spam. As you get more data, the probability distribution of the probability keeps getting revised. --- ## Comments - **Ram** _18 Jan 2012 1:59 pm_: Hi Anand, f(p) ← f(p) \* probability(heads | x) / probability(heads) = 1 \* (x^1 \* (1-x)^0) / 1 = x What I understand from the above notation is f(p) stands for probability distribution function (which is used recursively), x stands for the probability of getting a head (which is unknown) and 1-x stands for the complement of x. Can you please explain why the distribution function is defined as you have done? i.e. , f(p) ← f(p) \* probability(heads | x) / probability(heads)