×
Rock Chalk Talk: Basketball
Anything pertaining to basketball: college, pro, HS, recruiting, TV coverage
Anything pertaining to basketball: college, pro, HS, recruiting, TV coverage
Probability Explained
- CorpusJayhawk
- Topic Author
- Offline
- Platinum Member
Less
More
- Posts: 1849
- Thank you received: 3650
2 years 9 months ago #28316
by CorpusJayhawk
Don't worry about the mules, just load the wagon!!
Did you ever wonder how the pundits come up with probabilities for games? Well I will provide you with a general explanation for you. The first thing that needs to be determined to calculate a probability is the projected scoring margin. How that is done will have to be a post for another time. But for this we will assume we have a projected scoring margin. Once you have a scoring margin you simply calculate the probability based on a function or curve equation. This curve equation is the magic behind the probability calculation. Here is an example of one such curve.
Now let me explain where this curve comes from. This particular curve comes from assessing every game from 1950 to present (through yesterday). We need two things from these analyses. We need to know the projected scoring margin and the the actual scoring margin. Once you have calculate the projected scoring margin and you have the actual scoring margin you can calculate the difference between the two. This is the "margin Delta or the difference in what you predicted the scoring margin to be and what it actually was. I have done this for every game from 1950 forward, over 260,000 games.
Then you can calculate two critical numbers that go into building this curve. These numbers are the mean and the standard deviation. Without getting into gory detail, the mean is just the numerical average of all 260,000+ of these Margin Delta's. The standard deviation is metric that measures the consistency of these margin delta's. A team that varies in their performance will have a low standard deviation whereas as a team that is very inconsistent will have a large standard deviation. Say a team always plays within 5 points or so of the projection. That team would be very consistent and have a low standard deviation.
So now we have these two numbers, mean and standard deviation, we can calculate the points and draw the graph above. The graph above represents the probability of a team winning based on a given scoring margin. So if you look at this graph you will see that if the projected scoring margin is 5 points, the probability of winning will be about 0.7 or 70%. Likewise, if the projected scoring margin is 12.5 points, the probability of winning would be around 90%. I hope that is pretty obvious. It is just reading the point off of the graph.
So you could use this graph for every game if you like. If you have a projected scoring margin, you could simply find the corresponding point on the graph. I actually use this in my tables and it is labeled "Standard Probability". I don't use it for anything other than providing it for perusal.
Now here is where there is a twist. Every data set you choose will have a unique standard deviation and mean. Take for instance you take the 2022 game. You can draw the same type of curve using the standard deviation and mean for the 2022 data. What I do is I calculate this standard deviation and mean for every team every season. In fact, my "Consistency Factor" is directly correlated to this standard deviation since it is a measure of consistency.
The graph below is the same as the one above but I added the 2022 graphs for Auburn and Oregon. I chose Auburn and Oregon because Auburn is 5th most consistent team (lowest standard deviation) and Oregon is the 2nd least consistent team (most inconsistent) in the country. Here is that graph.
Okay, think about this. If a team is very inconsistent that means on any given night you are less sure what you are going to get. They may play 20 points better or 20 points worse. So what a high standard deviation does is flattens the probability graph. Note the green curve, the Oregon curve. Oregon is very inconsistent and thus their probability curve is flatter meaning the probability of victory will be less for a given projected scoring margin. Likewise the probability of loss will be less for a given projected negative scoring margin.
The second important factor is the mean. Note that on the composite graph the mean is zero which means the graph will pass through 0 probability at a zero scoring margin. But not that Oregon has a negative mean of 1.2. That means that the entire curve is shifted to upward or to the left.
The other graph is for Auburn. Auburn is very consistent. So if they have a projected scoring margin they will be more likely win then someone with a higher standard deviation. Thus it steepens the curve. Also, Auburn has a positive mean so the curve is shifted down or to the right.
So lets look at a few points. Let's say the projected scoring margin is 4 points. All three curves would indicate a probability of winning of about 67% or so. But let's say the projected margin is only 1 point. Then we get a wildly diverse probability. Auburn would have only a 44% probability of winning whereas Oregon would have a 57% probability of winning and the standard would be 54% probability.
I know this is complicated but I thought if nothing else having the graph would give a little more insight. I hope this has helped a little to understand how probability is calculated.
Now let me explain where this curve comes from. This particular curve comes from assessing every game from 1950 to present (through yesterday). We need two things from these analyses. We need to know the projected scoring margin and the the actual scoring margin. Once you have calculate the projected scoring margin and you have the actual scoring margin you can calculate the difference between the two. This is the "margin Delta or the difference in what you predicted the scoring margin to be and what it actually was. I have done this for every game from 1950 forward, over 260,000 games.
Then you can calculate two critical numbers that go into building this curve. These numbers are the mean and the standard deviation. Without getting into gory detail, the mean is just the numerical average of all 260,000+ of these Margin Delta's. The standard deviation is metric that measures the consistency of these margin delta's. A team that varies in their performance will have a low standard deviation whereas as a team that is very inconsistent will have a large standard deviation. Say a team always plays within 5 points or so of the projection. That team would be very consistent and have a low standard deviation.
So now we have these two numbers, mean and standard deviation, we can calculate the points and draw the graph above. The graph above represents the probability of a team winning based on a given scoring margin. So if you look at this graph you will see that if the projected scoring margin is 5 points, the probability of winning will be about 0.7 or 70%. Likewise, if the projected scoring margin is 12.5 points, the probability of winning would be around 90%. I hope that is pretty obvious. It is just reading the point off of the graph.
So you could use this graph for every game if you like. If you have a projected scoring margin, you could simply find the corresponding point on the graph. I actually use this in my tables and it is labeled "Standard Probability". I don't use it for anything other than providing it for perusal.
Now here is where there is a twist. Every data set you choose will have a unique standard deviation and mean. Take for instance you take the 2022 game. You can draw the same type of curve using the standard deviation and mean for the 2022 data. What I do is I calculate this standard deviation and mean for every team every season. In fact, my "Consistency Factor" is directly correlated to this standard deviation since it is a measure of consistency.
The graph below is the same as the one above but I added the 2022 graphs for Auburn and Oregon. I chose Auburn and Oregon because Auburn is 5th most consistent team (lowest standard deviation) and Oregon is the 2nd least consistent team (most inconsistent) in the country. Here is that graph.
Okay, think about this. If a team is very inconsistent that means on any given night you are less sure what you are going to get. They may play 20 points better or 20 points worse. So what a high standard deviation does is flattens the probability graph. Note the green curve, the Oregon curve. Oregon is very inconsistent and thus their probability curve is flatter meaning the probability of victory will be less for a given projected scoring margin. Likewise the probability of loss will be less for a given projected negative scoring margin.
The second important factor is the mean. Note that on the composite graph the mean is zero which means the graph will pass through 0 probability at a zero scoring margin. But not that Oregon has a negative mean of 1.2. That means that the entire curve is shifted to upward or to the left.
The other graph is for Auburn. Auburn is very consistent. So if they have a projected scoring margin they will be more likely win then someone with a higher standard deviation. Thus it steepens the curve. Also, Auburn has a positive mean so the curve is shifted down or to the right.
So lets look at a few points. Let's say the projected scoring margin is 4 points. All three curves would indicate a probability of winning of about 67% or so. But let's say the projected margin is only 1 point. Then we get a wildly diverse probability. Auburn would have only a 44% probability of winning whereas Oregon would have a 57% probability of winning and the standard would be 54% probability.
I know this is complicated but I thought if nothing else having the graph would give a little more insight. I hope this has helped a little to understand how probability is calculated.
Don't worry about the mules, just load the wagon!!
The following user(s) said Thank You: hairyhawk, Bayhawk, Socalhawk
Please Log in or Create an account to join the conversation.
Share this page:
- asteroid
- Offline
- Platinum Member
Less
More
- Posts: 601
- Thank you received: 3145
2 years 9 months ago #28318
by asteroid
Have you noticed any trend in the standard deviation with time? Years ago, when I first looked at that statistic, the national average was running right around 11 points, but that was a snapshot in time. Your graph has it at 10 points, based on 70 years of games. I note that the median for the Big 12 this season is right around 10.1 points.
Please Log in or Create an account to join the conversation.
- CorpusJayhawk
- Topic Author
- Offline
- Platinum Member
Less
More
- Posts: 1849
- Thank you received: 3650
2 years 9 months ago #28319
by CorpusJayhawk
Don't worry about the mules, just load the wagon!!
Don't worry about the mules, just load the wagon!!
The following user(s) said Thank You: Socalhawk
Please Log in or Create an account to join the conversation.
- hairyhawk
- Offline
- Platinum Member
Less
More
- Posts: 1202
- Thank you received: 692
2 years 9 months ago #28320
by hairyhawk
So as I expected I do not really understand. In your scenarios If Auburn has a mean score of 1.82 and Oregon has a mean score of -1.22 I think that means that on average Auburn is playing 1.82 points better than predicted and Oregon is playing 1.22 points worse than expected. If that is correct then why with a predicted outcome of say 1 point would that give Oregon a higher predicted winning % than if Auburn had a predicted outcome of 1 point. I would have thought that since Auburn is averaging playing better than predicted they would have a higher chance of winning. Is this because the idea is that eventually everyone should play to the predicted value that there is a higher chance that Auburn would play below the predicted value to bring the mean for Auburn closer to 0.
Please Log in or Create an account to join the conversation.
- CorpusJayhawk
- Topic Author
- Offline
- Platinum Member
Less
More
- Posts: 1849
- Thank you received: 3650
2 years 9 months ago - 2 years 9 months ago #28321
by CorpusJayhawk
Don't worry about the mules, just load the wagon!!
Hairy, I think you understand it very well. The problem is in my typing. I typed in the wrong numbers. Now you know why I try to do everything programmatically. My typing is no bueno. Here is the graph with the signs reversed as they should have been. Great catch BTW. You can see for a projected 5 point scoring margin, Auburn would have a probability of winning of almost 90% which is crazy. Oregon would have a probability of winning of only 62% or so with the same projected 5 point margin.
Don't worry about the mules, just load the wagon!!
Last Edit: 2 years 9 months ago by CorpusJayhawk.
The following user(s) said Thank You: hairyhawk
Please Log in or Create an account to join the conversation.