Plant Bioinformatics Capstone

Description

The past 15 years have been exciting ones in plant biology. Hundreds of plant genomes have been sequenced, RNA-seq has enabled transcriptome-wide expression profiling, and a proliferation of “-seq”-based methods has permitted protein-protein and protein-DNA interactions to be determined cheaply and in a high-throughput manner. These data sets in turn allow us to generate hypotheses at the click of a mouse or tap of a finger.

In Plant Bioinformatics on Coursera.org, we covered 33 plant-specific online tools from genome browsers to transcriptomic data mining to promoter/network analyses and others, and in this Plant Bioinformatics Capstone we’ll use these tools to hypothesize a biological role for a gene of unknown function, summarized in a written lab report.
This course is part of a Plant Bioinformatics Specialization on Coursera, which introduces core bioinformatic competencies and resources, such as NCBI’s Genbank, Blast, multiple sequence alignments, phylogenetics in Bioinformatic Methods I, followed by protein-protein interactions, structural bioinformatics and RNA-seq analysis in Bioinformatic Methods II, in addition to the plant-specific concepts and tools introduced in Plant Bioinformatics and the Plant Bioinformatics Capstone.
This course/capstone was developed with funding from the University of Toronto’s Faculty of Arts and Science Open Course Initiative Fund (OCIF) and was implemented by Eddi Esteban, Will Heikoop and Nicholas Provart. Asher Pasha programmed a gene ID randomizer.

What you will learn

Exploring your gene of interest with online databases

In the Week 1 module, we are going to use an example gene of (mostly) unknown function from Arabidopsis, At3g20300, and see what online databases can tell us about that gene. Part A uses tools that we have explored in Plant Bioinformatics to gather information about the gene/gene product, such as its size, what its homologs are, phylogenetic relationship to other sequences, domain information, and subcellular localization. Part B explores gene expression databases to see where that gene is expressed. Often where and when a gene is expressed can give us clues as to its function.

Identifying genes related to your gene of interest

Often the function of genes that are coexpressed with a gene of unknown function can give us hints about the function of that gene. Researchers are now often using coexpression analyses as “primary screens” to identify “new” genes in biological pathways (a few examples are described in Usadel et al., 2009). Another interesting facet is whether the promoters of these sets of coexpressed genes contain any common cis-regulatory motifs. In Part A, we’ll explore the genes that are coexpressed with At3g20300, and in Part B, we’ll look for common regulatory motifs.

Analysis of the function of your gene of interest and its network of genes

Gene Ontology enrichment analysis for a set of coexpressed gene is often useful for figuring out what that group of genes is doing. By doing such analyses with a set of coexpressed genes can we infer a role for our gene of unknown function? We’ll explore this aspect in Part A, along with investigating potential pathways the gene list is involved in. In Part B, we’ll use other network tools to investigate additional linkages to other genes, above and beyond those suggested by coexpression. It is sometimes useful to investigate these too! Again, we’ll be using At3g20300 as our example.

Lab report draft

Now we will take the above analyses and synthesize the information from them into a draft lab report/essay describing the putative function of our gene of interest with unknown function. We’ll draw on the literature to describe what is known about related genes, and propose some experiments to test our hypotheses about our gene’s potential function.

What’s included