2000 Merck Participants-Department of Biological Sciences - Carnegie Mellon University

2000 Merck-supported Participants

Jennifer LinJennifer Lin, Carnegie Mellon University
(Advisor: Dr. Robert F. Murphy)

Alternative Methods of Typical Image Selection and Image Database Development
As massive gene sequencing projects such as the Human Genome Project come to completion the next challenge will be to characterize individual genes and their protein products and functions. Given the large number of proteins to analyze, the information gathered must be quantitative so that the process can be automated and systematic. One particular method of studying protein function is fluorescence microscopy. However, different fluorescence microscopes store images in different formats. Most formats often include little information about image acquisition and sample preparation techniques making it very difficult to retrieve images. The tools to aid in interpretation of collections of images are also very limited. A possible solution is the creation of an image database to systematically store, sort and retrieve images for further analysis. Such a database would not only contain the fluorescence images themselves but relevant contextual information as well. It would be necessary for data input to be convenient and automatic enough to allow for large scale data entry. Query methods must be flexible to allow for maximum effectiveness in analysis and data retrieval for which text based query and Query by Image Content (QBIC) would be ideal. The incorporation of our established typicality ranking method (TypIC) into the query process would also offer unique advantages over a typical database, which would be limited to recalling images blindly. The inclusion of representative image selection would allow this database to summarize the general characteristics of a large set of images with just a few. Typicality methods may also help with data mining efforts by choosing representative images from a large collection of poorly understood data.

Anushka NethisingheAnushka Nethisinghe, Carnegie Mellon University
(Advisor: Dr. Raul Valdes-Perez)

Analysis of Gene Expression Data
There are many data mining techniques today which are used for extracting information from gene expression data. This research focuses on utilizing our own clustering algorithm and a revolutionary niche-finding algorithm to analyze gene expression data. The data was gathered from the Munich Information Center for Protein Sequences database for Saccharomyces cerevisiae and the gene expression values were obtained from the results of experiments conducted on yeast genes at MIT. Running the data through our clustering program groups the genes by their attributes and outputs a hierrarchical tree that enables the user to simply visualize the genes and the groups that they belong to. The niche-finding method takes the same data and generates uniqueness statements about any gene. With the results of these programs, undiscovered properties of genes either individually or as a group may lead to further research possibilities.