Sample Size and Statistical Significance

Written by

Published on June 2, 2009

I'm not going to mention a certain Twins catcher by name, but I do want to make a distinction between the two concepts above because it seems in our discussion of said catcher, they have been confused.

When you're looking at a set of data - a player's at-bats, a pitcher's innings, whatever - and you see something in it that appears to be an outlier, you're faced with the question of whether there's a cause to that anomaly or whether it's just random fluctuation. In other words, is Justin Verlander the same guy he was last year, but for one month, he faced a combination of free swingers, favorable shadows over the batter's box, hitters who faced Tim Wakefield the day before, or is there a more central cause - that Verlander's stuff is harder, better located and/or nastier?

And in considering that question, the sample size of the data is important because over a short period of time, random fluctuation is more likely to occur. Put differently, you might flip an evenly weighted coin 10 times and get 8 heads. You will never in a trillion years flip it 10,000 times and get 8,000 heads. Okay, most of us grasp that.

The question is - at what point do we conclude that the coin is |STAR|not|STAR| evenly weighted? Well, it's not simply a matter of how many times you flip the coin, i.e., sample size. It's |STAR|also|STAR| a matter of the |STAR|magnitude of the deviation|STAR|. If you get 16 heads and four tails, I wouldn't be so sure the coin is unevenly weighted. But if you get 20 heads and no tails, it's almost certainly so. (Less than 1 in a million odds of that). In both cases, the sample size is quite small, but in the latter, it's more than sufficient. When the coin lands at a an 80-percent heads clip, you need a larger sample to determine the coin is rigged because the magnitude of the deviation from the baseline (50/50) is less. And if the coin lands heads at a 55-percent clip, you need an even larger number of flips to determine whether it's rigged.

So understand that the sample size is only one of two factors in determining the significance of the outlier. The other is the magnitude.

That's why when you see Verlander strike out 60 batters in 44 IP or Joe Mauer - f|STAR||STAR||STAR| it, I'll mention him, hit 11 home runs, you cannot simply say, "it's only one month, I'm not a believer" without also considering the magnitude of the deviation.

Is 60K in 44 IP a big enough magnitude to make one month significant. Is 11 home runs? In my opinion, yes. But whatever your opinion, you must address both factors if you're going to get a good gauge of whether it's dumb luck or a new baseline.