Description
The book Moneyball triggered a revolution in the analysis of performance statistics in professional sports, by showing that data analytics could be used to increase team winning percentage. This course shows how to program data using Python to test the claims that lie behind the Moneyball story, and to examine the evolution of Moneyball statistics since the book was published. The learner is led through the process of calculating baseball performance statistics from publicly available datasets. The course progresses from the analysis of on base percentage and slugging percentage to more advanced measures derived using the run expectancy matrix, such as wins above replacement (WAR). By the end of this course the learner will be able to use these statistics to conduct their own team and player analyses.
What you will learn
Week 1
In this module we introduce the Moneyball story and explore the method used to test that story. We begin the process of replicating the moneyball test by establishing the relationship between team winning and and two performance statistics – on base percentage (OBP) and slugging percentage (SLG).
Week 2
In this module we estimate the relationship between MLB player salaries and their performance statistics, OBP (on base percentage) and SLG (slugging). The results appear to confirm the Moneyball story – OBP was undervalued relative to SLG prior to the publication of Moneyball, while after publication the relative significance is reversed.
Week 3
This module updates the analysis of Hakes & Sauer and estimates the rewards to OBP and SLG over the period 1994 -2015. In addition it shows how rewards can be related to individual components of SLG: walks, singles, doubles, triples, and home runs.
Week 4
This module introduces the concept of run expectancy, shows how to derive the run expectancy matrix and the calculation of run values based on an MLB dataset of all events in the 2018 season. Run values are calculated by event type (walks, singles, doubles, etc.) and by player.