Description
Large-scale biology projects such as the sequencing of the human genome and gene expression surveys using RNA-seq, microarrays and other technologies have created a wealth of data for biologists. However, the challenge facing scientists is analyzing and even accessing these data to extract useful information pertaining to the system being studied. This course focuses on employing existing bioinformatic resources – mainly web-based programs and databases – to access the wealth of data to answer questions relevant to the average biologist, and is highly hands-on.
Topics covered include multiple sequence alignments, phylogenetics, gene expression data analysis, and protein interaction networks, in two separate parts.
The first part, Bioinformatic Methods I, dealt with databases, Blast, multiple sequence alignments, phylogenetics, selection analysis and metagenomics.
This, the second part, Bioinformatic Methods II, will cover motif searching, protein-protein interactions, structural bioinformatics, gene expression data analysis, and cis-element predictions.
This pair of courses is useful to any student considering graduate school in the biological sciences, as well as students considering molecular medicine.
These courses are based on one taught at the University of Toronto to upper-level undergraduates who have some understanding of basic molecular biology. If you’re not familiar with this, something like https://learn.saylor.org/course/bio101 might be helpful. No programming is required for this course although some command line work (though within a web browser) occurs in the 5th module.
Bioinformatic Methods II is regularly updated, and was last updated for January 2023.
What you will learn
Protein Motifs
In this module we’ll be exploring conserved regions within protein families. Such regions can help us understand the biology of a sequence, in that they are likely important for biological function, and also be used to help ascribe function to sequences where we can’t identify any homologs in the databases. There are various ways of describing the conserved regions from simple regular expressions to profiles to profile hidden Markov models (HMMs).
Protein-Protein Interactions
In this module we’ll be exploring protein-protein interactions (PPIs). Protein-protein interactions are important as proteins don’t act in isolation, and often an examination of the interaction partners (determined in an unbiased, perhaps high throughput way) of a given protein can tell us a lot about its biology. We’ll talk about some different methods used to determine PPIs and go over their strengths and weaknesses. In the lab we’ll use 3 different tools and two different databases to examine interaction partners of BRCA2, a protein that we examined in last module’s lab. Finally, we’ll touch on a “foundational” concept, Gene Ontology (GO) term enrichment analysis, to help us understand in an overview way the proteins interacting with our example.
Protein Structure
The determination of a protein’s tertiary structure in three dimensions can tell us a lot about the biology of that protein. In this module’s mini-lecture, we’ll talk about some different methods used to determine a protein’s tertiary structure and cover the main database for protein structure data, the PDB. In the lab we’ll explore the PDB and an online tool for searching for structural (as opposed to sequence) similarity, VAST. We’ll then use a nice piece of stand-alone software, PyMOL, to explore several protein structures in more detail.
Review: Protein Motifs, Protein-Protein Interactions, and Protein Structure