Description
This course introduces you to one of the main types of Machine Learning: Unsupervised Learning. You will learn how to find insights from data sets that do not have a target or labeled variable. You will learn several clustering and dimension reduction algorithms for unsupervised learning as well as how to select the algorithm that best suits your data. The hands-on section of this course focuses on using best practices for unsupervised learning.
By the end of this course you should be able to:
Explain the kinds of problems suitable for Unsupervised Learning approaches
Explain the curse of dimensionality, and how it makes clustering difficult with many features
Describe and use common clustering and dimensionality-reduction algorithms
Try clustering points where appropriate, compare the performance of per-cluster models
Understand metrics relevant for characterizing clusters
Who should take this course?
This course targets aspiring data scientists interested in acquiring hands-on experience with Unsupervised Machine Learning techniques in a business setting.
What skills should you have?
To make the most out of this course, you should have familiarity with programming on a Python development environment, as well as fundamental understanding of Data Cleaning, Exploratory Data Analysis, Calculus, Linear Algebra, Probability, and Statistics.
What you will learn
Introduction to Unsupervised Learning and K Means
This module introduces Unsupervised Learning and its applications. One of the most common uses of Unsupervised Learning is clustering observations using k-means. In this module, you become familiar with the theory behind this algorithm, and put it in practice in a demonstration.
Distance Metrics & Computational Hurdles
Selecting a Clustering Algorithm
In this module, you become familiar with some of the computational hurdles around clustering algorithms, and how different clustering implementations try to overcome them. After a brief recapitulation of common clustering algorithms, you will learn how to compare them and select the clustering technique that best suits your data.
Dimensionality Reduction
This module introduces dimensionality reduction and Principal Component Analysis, which are powerful techniques for big data, imaging, and pre-processing data.