Description
Marketing data is often so big that humans cannot read or analyze a representative sample of it to understand what insights might lie within. In this course, learners use unsupervised deep learning to train algorithms to extract topics and insights from text data. Learners walk through a conceptual overview of unsupervised machine learning and dive into real-world datasets through instructor-led tutorials in Python. The course concludes with a major project.
This course uses Jupyter Notebooks and the coding environment Google Colab, a browser-based Jupyter notebook environment. Files are stored in Google Drive.
This course can be taken for academic credit as part of CU Boulder’s Master of Science in Data Science (MS-DS) degree offered on the Coursera platform. The MS-DS is an interdisciplinary degree that brings together faculty from CU Boulder’s departments of Applied Mathematics, Computer Science, Information Science, and others. With performance-based admissions and no application process, the MS-DS is ideal for individuals with a broad range of undergraduate education and/or professional experience in computer science, information science, mathematics, and statistics. Learn more about the MS-DS program at https://www.coursera.org/degrees/master-of-science-data-science-boulder.
What you will learn
What is topic modeling?
In this module, we will cover the fundamental concepts of topic modeling, also known as unsupervised machine learning on unstructured text documents. We will contrast unsupervised methods to supervised ones and survey common applications of topic modeling.
The Assumptions of a Topic Model, Bag of Words, and Natural Language Processing
In this module, we will go under the hood inside a topic modeling approach and understand what assumptions drive topic model fit. We will also uncover how bag-of-words approaches to topic modeling work, and the natural language processing required to produce meaningful topic modeling features.
Prepping Amazon Review Data
In this module, we will cover how to parse through JSON-like data and segment it to create a corpus that is ready for the topic modeling process. We will cover how the data for your project is structured and its taxonomy.
Pre-Processing Text and Training a Topic Model
In this module, we will take Amazon review data and load it into a corpus to preprocess it. We will cover how to build topic models from the data and also save those topic models.