Wednesday, July 04, 2007

Projected Standings

In the early 80's Bill James in one of his Baseball Abstracts unveiled what is called the Baseball's Pythagorean Winning Percentage. Baseball's Pythagorean Winning Percentage is a model which attempts to estimate a teams winning percentage by using runs scored and runs allowed as the inputs.

The formula is simple yet accurate:

Pythagorean Winning Percentage = Runs Scored^2 / (Runs Scored^2+Runs Allowed^2)

Using retrosheet data I have confirmed what many people speculated previously, that Pythagorean Winning Percentage is a better estimator of a teams future winning percentage than actual winning percentage. To do this, I looked at all major league teams from 2000-2006:
1. Calculated WP% and Pythagorean Winning % as of July 1st of every year, as a proxy for first half of the year.
2. From the above data I was able to derived the post July 1st WP% and Pythagorean Winning %.
3. I then calculated the correlation coefficients comparing 1st half winning % with second half winning percentage.
4. I then calculated the correlation coefficients comparing 1st Pythagorean Winning % with second half winning percentage.

The data can be found here.

The correlation between first half winning percentage and second half winning percentage was 0.47. The correlation between first half Pythagorean Winning % and second half winning percentage was 0.93.

With that out of the way, you can predict final standings for the season: