Carnegie Mellon University
January 31, 2017

Shedding Light on Genomic Dark Matter

McManus Lab

Modern technology has made it very easy to sequence genomes. Interpreting genome sequences, however, remains a daunting challenge. While it is simple to find protein coding genes, the vast majority of genomic regions do not encode proteins. These non-coding regions are considered the “dark matter” of the genome. Much of this dark matter is thought to be involved in regulating expression of protein coding genes. New NIH R01 funding will support molecular and computational studies in the McManus lab to reveal the secrets of genomic dark matter.

The specific studies that garnered grant support originated through a collaboration of the McManus lab with Rumi Naik, an alum of the Lane Fellow program in CMU's Computational Biology Department. The collaboration focused on dark matter elements called upstream open reading frames (uORFs). Located in mRNA leaders, uORFs are short open reading frames that can regulate the translation of the main protein coding region carried on the mRNA. Because uORF expression obeys different rules from the familiar ones for protein-coding genes, it is challenging to find uORFs in genome sequences.

McManus LabNaik, McManus, and Ph.D. student Pieter Spealman developed a new machine-learning algorithm to identify uORFs; the use of a machine-learning approach circumvented the need to work out the detailed mechanisms that dictate uORF activity prior to their identification. Spealman has used the algorithm to predict, for the first time, thousands of statistically significant uORFs in three yeast species. Spealman and MSCB student Myles Mao will now use a novel high-throughput reporter assay developed in the lab to determine the functions of newly identified uORFs.

The combination of molecular and computational biology involved in the project is a great example of the interdisciplinary work for which CMU is known.

“The collaborative environment at CMU has really made this work possible”, Dr. McManus said. “We have access to excellent facilities through the Molecular Biosensor and Imaging Center that allow us to test thousands of uORFs simultaneously, as well as fantastic colleagues who are experts in developing predictive models.”

Ultimately, McManus will use the data to build computational models that predict uORF functions in all species. For example, more than two thirds of mammalian genes have uORFs; McManus lab models will define new underpinnings of gene regulation in other species, including humans.