Capstone: Retrieving, Processing, and Visualizing Data with Python

Description

In the capstone, students will build a series of applications to retrieve, process and visualize data using Python. The projects will involve all the elements of the specialization. In the first part of the capstone, students will do some visualizations to become familiar with the technologies in use and then will pursue their own project to visualize some other data that they have or can find. Chapters 15 and 16 from the book “Python for Everybody” will serve as the backbone for the capstone. This course covers Python 3.

What you will learn

Welcome to the Capstone

Congratulations to everyone for making it this far. Before you begin, please view the Introduction video and read the Capstone Overview. The Course Resources section contains additional course-wide material that you may want to refer to in future weeks.

Building a Search Engine

This week we will download and run a simple version of the Google PageRank Algorithm and practice spidering some content. The assignment is peer-graded, and the first of three optional Honors assignments in the course. This a continuation of the material covered in Course 4 of the specialization, and is based on Chapter 16 of the textbook.

Exploring Data Sources (Project)

The optional Capstone project is your opportunity to select, process, and visualize the data of your choice, and receive feedback from your peers. The project is not graded, and can be as simple or complex as you like. This week’s assignment is to identify a data source and make a short discussion forum post describing the data source and outlining some possible analysis that could be done with it. You will not be required to use the data source presented here for your actual analysis.

Spidering and Modeling Email Data

In our second optional Honors assignment, we will retrieve and process email data from the Sakai open source project. Video lectures will walk you through the process of retrieving, cleaning up, and modeling the data.

What’s included