Supervised Machine Learning: Classification

Description

This course introduces you to one of the main types of modeling families of supervised Machine Learning: Classification. You will learn how to train predictive models to classify categorical outcomes and how to use error metrics to compare across different models. The hands-on section of this course focuses on using best practices for classification, including train and test splits, and handling data sets with unbalanced classes.

By the end of this course you should be able to:
-Differentiate uses and applications of classification and classification ensembles
-Describe and use logistic regression models
-Describe and use decision tree and tree-ensemble models
-Describe and use other ensemble methods for classification
-Use a variety of error metrics to compare and select the classification model that best suits your data
-Use oversampling and undersampling as techniques to handle unbalanced classes in a data set
 
Who should take this course?
This course targets aspiring data scientists interested in acquiring hands-on experience with Supervised Machine Learning Classification techniques in a business setting.
 
What skills should you have?
To make the most out of this course, you should have familiarity with programming on a Python development environment, as well as fundamental understanding of Data Cleaning, Exploratory Data Analysis, Calculus, Linear Algebra, Probability, and Statistics.

What you will learn

Logistic Regression

Logistic regression is one of the most studied and widely used classification algorithms, probably due to its popularity in regulated industries and financial settings. Although more modern classifiers might likely output models with higher accuracy, logistic regressions are great baseline models due to their high interpretability and parametric nature. This module will walk you through extending a linear regression example into a logistic regression, as well as the most common error metrics that you might want to use to compare several classifiers and select that best suits your business problem.

K Nearest Neighbors

K Nearest Neighbors is a popular classification method because they are easy computation and easy to interpret. This module walks you through the theory behind k nearest neighbors as well as a demo for you to practice building k nearest neighbors models with sklearn.

Support Vector Machines

This module will walk you through the main idea of how support vector machines construct hyperplanes to map your data into regions that concentrate a majority of data points of a certain class. Although support vector machines are widely used for regression, outlier detection, and classification, this module will focus on the latter.

Decision Trees

Decision tree methods are a common baseline model for classification tasks due to their visual appeal and high interpretability. This module walks you through the theory behind decision trees and a few hands-on examples of building decision tree models for classification. You will realize the main pros and cons of these techniques. This background will be useful when you are presented with decision tree ensembles in the next module.

What’s included