NSF Awards $9 Million to Researchers Using Language Technology Tools To Better Understand Structure, Function of Proteins in Human Cells
Carnegie Mellon News Online Edition

Carnegie Mellon News Home

Carnegie Mellon News Services Home Page


NSF Awards $9 Million to Researchers Using Language Technology Tools to Better Understand Structure, Function of Proteins in Human Cells

The National Science Foundation has made a $9 million, five-year grant to a collaboration of researchers from Carnegie Mellon, the University of Pittsburgh, the Massachusetts Institute of Technology, Boston University and the National Canadian Research Council to advance a new field called Computational Biolinguistics.

callout Computational Biolinguistics, which combines the use of computational tools, including statistical language modeling, machine learning methods and high-level language processing, will allow scientists to better understand how proteins work inside cells.

As in languages, where there are sequences of letters that fall into patterns that make them understandable, there are sequences of amino acids in proteins that can be read to understand their structure, dynamics and function. Sequences of amino acids and their constituents can be thought of as syllables or words that have particular properties.

A deeper understanding of the relationship between protein structure, dynamics and function can help to extract information hidden in the gene sequences of genomes, which may, in turn, help develop drugs to fight disease. Today, there is great societal demand to understand and treat degenerative diseases, many of which are based on defective triggers for protein shape and interactions.

The project's principal investigators are Raj Reddy, Carnegie Mellon's Herbert A. Simon University Professor of Computer Science and Robotics, and Judith Klein-Seetharaman, assistant professor of pharmacology at the University of Pittsburgh Medical School, who also holds an appointment at Carnegie Mellon's Language Technologies Institute (LTI).

Judith Klien-Seethraman and Raj Reddy "The Human Genome Project and related genome sequencing efforts have provided a wealth of data, which has stirred great hopes for increasing our understanding and treating of disease or for mimicking nature's inventions in nanomachine design," said Klein-Seetharaman. "But the precise relationship between a primary sequence and the structure, dynamics and function of the encoded proteins is one of the most fundamental unanswered questions in biology.

"The Computational Biolinguistics Project promises to provide novel views and approaches to solving these challenges that would not be obvious without thinking in terms of the analogy between language and biology."

The team will use computer tools and methods developed for working statistically with human language to better understand the function of proteins in human cells and those of other organisms.

Carnegie Mellon will be the central site for the Computational Biolinguistics Project. Its scientists will supply all of the necessary computational and language modeling technologies. Other partners will provide the bulk of biological and proteomic research and the laboratories where experimental work will take place.

There is also an industrial component to the project. Mathworks, Inc., of Natick, Mass., will work with Carnegie Mellon scientists to enhance its MatLab mathematical software to better support computational biolinguistics research. Medstory, Inc., Burlingame, Calif., which deals with drug innovation informatics, will focus on the clinical and drug development relevance of computational discoveries made under this program.

callout Reddy and Klein-Seetharaman, together with LTI Director Jaime Carbonell, the Allen Newell Professor of Computer Science, and LTI associate professors Ronald Rosenfeld and Yiming Yang, have been doing preliminary work in computational biolinguistics for nearly two years.

The Computational Biolinguistics grant is one of more than 300 announced by the National Science Foundation as part of its Information Technology Research (ITR) program. This year, NSF awarded a total of $144 million in new grants under the program.

Related Links:

National Science Foundation

Language Technologies Institute

Biological Language Modeling Project

Anne Watzman
(09/26/02)


Carnegie Mellon Home