Description
This course gives you a comprehensive introduction to both the theory and practice of machine learning. You will learn to use Python along with industry-standard libraries and tools, including Pandas, Scikit-learn, and Tensorflow, to ingest, explore, and prepare data for modeling and then train and evaluate models using a wide variety of techniques. Those techniques include linear regression with ordinary least squares, logistic regression, support vector machines, decision trees and ensembles, clustering, principal component analysis, hidden Markov models, and deep learning.
A key feature of this course is that you not only learn how to apply these techniques, you also learn the conceptual basis underlying them so that you understand how they work, why you are doing what you are doing, and what your results mean. The course also features real-world datasets, drawn primarily from the realm of public policy. It is based on an introductory machine learning course offered to graduate students at the University of Chicago and will serve as a strong foundation for deeper and more specialized study.
What you will learn
Machine Learning and the Machine Learning Pipeline
In this module you will be introduced to the machine-learning pipeline and learn about the initial work on your data that you need to do prior to modeling. You will learn about how to ingest data using Pandas, a standard Python library for data exploration and preparation. Next, we turn to the first approach to modeling that we explore in this class, linear regression with ordinary least squares.
Least Squares and Maximum Likelihood Estimation
In this module, you continue the work that we began in the last with linear regressions. You will learn more about how to evaluate such models and how to select the important features and exclude the ones that are not statistically significant. You will also learn about maximum likelihood estimation, a probabilistic approach to estimating your models.
Basis Functions and Regularization
This module introduces you to basis functions and polynomial expansions in particular, which will allow you to use the same linear regression techniques that we have been studying so far to model non-linear relationships. Then, we learn about the bias-variance tradeoff, a key relationship in machine learning. Methods like polynomial expansion may help you train models that capture the relationship in your training data quite well, but those same models may perform badly on new data. You learn about different regularization methods that can help balance this tradeoff and create models that avoid overfitting.
Model Selection and Logistic Regression
In this module, you first learn more about evaluating and tuning your models. We look at cross validation techniques that will help you get more accurate measurements of your model’s performance, and then you see how to use them along with pipelines and GridSearch to tune your models. Finally, we look a the theory and practice of our first technique for classification, logistic regression.



