NSF Awards $9 Million to Support Understanding of Proteins in Cells
Carnegie Mellon News Online Edition
In This Issue

Barkin, Ditmore, Pethia, Heinz Morale Committee Earn Andys

NSF Awards $9 Million to Support Understanding of Proteins in Cells

New Biomedical Engineering Department Established in CIT

Center Gets $35.5M for Cybersecurity Work

Elizabeth Jones Earns $1 Million HHMI Chair

Researchers Featured in New Book by William Shatner and Chip Walter

Carnegie Mellon Moves to 21st in U.S. News & World Report Survey; Retains #7 Ranking in Business and Engineering

MBA Program Ranked #3 in the World by Wall Street Journal

MBA Student Studies Management of Pentagon Renovation Project

University Libraries Acquires One-Millionth Book

Code of Workplace Conduct Adopted for Trademark Licensees

University Center Entranceway Transformed Into a Recycling Education Center

Annual United Way Campaign is Under Way

News Briefs
Remembering 9-11

Former Department Head Returns

Watson Festival Has Many Faces

Star TV Producer Gary Smith Earns Alumni Achievement Award

MCS Opens New Labs in Doherty

Carnegie Mellon Cited for Being a Wind Energy Leader

This Issue's Front Page
Carnegie Mellon News Home
Carnegie Mellon News Services Home Page

Raj Reddy and Judith Klein-Seetharaman
NSF Awards $9 Million to Support Understanding of Proteins in Cells
The National Science Foundation has made a $9 million, five-year grant to a collaboration of researchers from Carnegie Mellon, the University of Pittsburgh, the Massachusetts Institute of Technology, Boston University and the National Canadian Research Council to advance a new field called Computational Biolinguistics.

Computational Biolinguistics, which combines the use of computational tools, including statistical language modeling, machine learning methods and high-level language processing, will allow scientists to better understand how proteins work inside cells.

As in languages, where there are sequences of letters that fall into patterns that make them understandable, there are sequences of amino acids in proteins that can be read to understand their structure, dynamics and function. Sequences of amino acids and their constituents can be thought of as syllables or words that have particular properties.

A deeper understanding of the relationship between protein structure, dynamics and function can help to extract information hidden in the gene sequences of genomes, which may, in turn, help develop drugs to fight disease. Today, there is great societal demand to understand and treat degenerative diseases, many of which are based on defective triggers for protein shape and interactions. The project's principal investigators are Raj Reddy, Carnegie Mellon's Herbert A. Simon University Professor of Computer Science and Robotics, and Judith Klein-Seetharaman, assistant professor of pharmacology at the University of Pittsburgh Medical School, who also holds an appointment at Carnegie Mellon's Language Technologies Institute (LTI).

"The Human Genome Project and related genome sequencing efforts have provided a wealth of data, which has stirred great hopes for increasing our understanding and treating of disease or for mimicking nature's inventions in nanomachine design," said Klein-Seetharaman. "But the precise relationship between a primary sequence and the structure, dynamics and function of the encoded proteins is one of the most fundamental unanswered questions in biology.

"The Computational Biolinguistics Project promises to provide novel views and approaches to solving these challenges that would not be obvious without thinking in terms of the analogy between language and biology."

Carnegie Mellon will be the central site for the computational biolinguistics project. Its scientists will supply all of the necessary computational and language modeling technologies. Other partners will provide the bulk of biological and proteomic research and the laboratories where experimental work will take place.

There is also an industrial component to the project. Mathworks, Inc., of Natick, Mass., will work with Carnegie Mellon scientists to enhance its MatLab mathematical software to better support computational biolinguistics research. Medstory, Inc., Burlingame, Calif., which deals with drug innovation informatics, will focus on the clinical and drug development relevance of computational discoveries made under this program.

The Computational Biolinguistics grant is one of more than 300 announced by the National Science Foundation as part of its Information Technology Research (ITR) program. This year, NSF awarded a total of $144 million in new grants under the program.

NSF Aids Million Book Project
The National Science Foundation's Information Technology Research Program has also awarded a $3 million, three-year grant to the Million Books Project (MBP) to support digitization of core academic materials, technical reports, government documents and cultural treasures.

The project involves partners at Carnegie Mellon, Carnegie Library of Pittsburgh, Indiana University, National Agriculture Library, OCLC, Penn State University, Stanford University, University of California-Berkeley, University of Washington, and 17 institutions in China and India. Principal investigators are Raj Reddy and university librarian Gloriana St. Clair.

The MBP will create a large testbed of academic resources of all types, in many languages, and make these materials available free for all to read on the Internet. The project is expected to be completed by 2007.

For more on the MBP, see www.library.cmu.edu/Libraries/MBP_FAQ.html

Anne Watzman

This Issue's Headlines || Carnegie Mellon News Home || Carnegie Mellon Home