Carnegie Mellon University
December 13, 2018

Autism Risk-Factors Identified in "Dark Matter" of Human Genome

By CMU's Department of Statistics and Data Science

Abby Simmons

Over the past decade, scientists have identified dozens of genes associated with autism by studying so-called "de novo" mutations — newly arising changes to the genome found in children but not their parents. To date, most de novo mutations linked to autism have been found in protein-coding genes. It has proven far more difficult for scientists to identify autism-associated mutations in noncoding regions of the genome.

"Protein-coding genes clearly play an important role in human disorders like autism, yet their expression is regulated by the 'noncoding' genome, which covers the remaining 98.5 percent of the genome and remains somewhat mysterious," said Carnegie Mellon's Kathryn Roeder, corresponding author and UPMC Professor of Statistics and Life Sciences in the Statistics and Data Science and Computational Biology departments. "Because the genome comprises 3 billion nucleotides, identifying which portions of the noncoding genome, when mutated, enhance the risk of autism is as challenging as looking for a needle in a haystack."

Using a novel bioinformatics framework, the researchers were able to compress the search from billions of nucleotides to tens of thousands of functional categories that potentially contribute to autism. Working with these categories, they used machine learning tools to build statistical models to predict autism risk from a subset of the families in the study. They then applied this model to an independent set of families and successfully predicted patterns of risk in the noncoding genome. Read more