Foundations of Sports Analytics: Data, Representation, and Models in Sports

Description

This course provides an introduction to using Python to analyze team performance in sports. Learners will discover a variety of techniques that can be used to represent sports data and how to extract narratives based on these analytical techniques. The main focus of the introduction will be on the use of regression analysis to analyze team and player performance data, using examples drawn from the National Football League (NFL), the National Basketball Association (NBA), the National Hockey League (NHL), the English Premier LEague (EPL, soccer) and the Indian Premier League (IPL, cricket).

This course does not simply explain methods and techniques, it enables the learner to apply them to sports datasets of interest so that they can generate their own results, rather than relying on the data processing performed by others. As a consequence the learning will be empowered to explore their own ideas about sports team performance, test them out using the data, and so become a producer of sports analytics rather than a consumer.
While the course materials have been developed using Python, code has also been produced to derive all of the results in R, for those who prefer that environment.

What you will learn

Introduction to Sports Performance and Data

This week introduces a simple example of sports analytics in practice – the calculation of the Pythagorean expectation to model winning in team sports. This can also be used for the purposes of prediction. Examples are developed for five different sports leagues, Major League Baseball (MLB), the National Basketball Association (NBA), the National Hockey League (NHL), the English Premier League (EPL-soccer) and the Indian Premier League (IPL-cricket).

Introduction to Data Sources

This week will use NBA data to introduce basic and important Python codes to conduct data cleaning and data preparation. This week also discusses summary and descriptive analyses with statistics and graphs to understand the distribution of data, the characteristics and pattern of variables as well as the relationship between two variables. At the end of this week, we will introduce correlation coefficients to summarize the linear relationship between two variables.

Introduction to Sports Data and Plots in Python

This module introduces some ways of representing data using examples from MLB, the NBA and Indian Premier League. MLB data is used to analyze the spatial distribution of different hits. NBA data is used to generate heatmaps to illustrate the different ways in which players contribute. IPL data is used to show how team performances can be compared graphically.

Introduction to Sports Data and Regression Using Python

This week introduces the fundamentals of regression analysis. We will discuss how to perform regression analysis using Python and how to interpret regression output. We will use NHL data to estimate multiple regression models to identify the team level performance factors that affect the team’s winning percentage. We will also use cricket data from the Indian Premier League to run regression analyses to examine whether player performance impacts player salary.

What’s included