2004 Merck Participants-Department of Biological Sciences - Carnegie Mellon University

2004 Merck-supported Participants

Leonard Apeltsin, Carnegie Mellon UniversityLeonard Apeltsin, Carnegie Mellon University
(Mentor: Dr. Russell Schwartz)

Testing the Ordinary Differential Equation Representation of Biochemical Reactions Through Lattice Monte Carlo Simulation of Linear Polymer Assembly
Traditional models of chemical reaction kinetics have focused on abstracting chemical systems in terms of ordinary differential equations. While such mathematical models have been effective at modeling most chemical reactions, there is some question as to their validity when it comes to modeling certain biochemical reactions inside the cell. Traditional models do not take into account the spatial interactions that occur among the many proteins present inside the cell at high concentrations. Hypothesizing that at high concentrations the values of the rate constants used in mathematical models will no longer remain consistent, I created a new lattice Monte Carlo based model of a simple biochemical system. This model took into account the spatial interactions between reacting particles and was also able to simulate the presence of inert particles inside the cell. The biochemical system I chose to focus on was linear polymer assembly, which is and idealized representation of actin assembly. The behavior of this system is easily captured by traditional mathematical models. This allowed me to compare the behavior of my model with that of the traditional kinetic model in order to determine how well the traditional model fares under a variety of conditions. I ran my model while varying both the starting concentration of monomers and the concentration of inert particles present in the system. I also varied the mobility of certain particles in my model. Using nonlinear least-squares regression, I was able to fit data generated from each chemical system I created to its corresponding mathematical representation in order to obtain the rate of binding between the particles present in the system. As predicted, the binding rates did not remain constant with the increase of particle concentrations. I also discovered that at high enough concentrations, the mathematical models began to break down altogether and were unable to effectively represent the dynamics of the system.

Gowtham MahalingamGowtham Mahalingam, Carnegie Mellon University
(Mentor: Dr. Gordon Rule)

Assignment of Amino Acids to intra-residue NMR spectra such as TOCSY, in order to optimize the functionality of the protein assignment process in the MONTE software package
Currently the assignment of nuclear magnetic resonances is a critical and widely used method in the determination of the structure of proteins. Because the determination of protein structure is such an important step in the study of proteins, numerous methods have developed, to utilize different sets of experimental data to obtain the sequence and structure of a protein. However these methods are time consuming, difficult to perform, or provide erroneous or ambiguous data, or provide data for the protein under a very limited environment. NMR provides many features over other traditional determination methods as it can often be used where other methods have failed. NMR has the ability to provide high resolutions, and can be used to determine kinetic parameters of proteins, and ligand binding properties. As a result NMR assignment has become a useful method for obtaining the collective set of spin states associated with a protein. MONTE utilizes NMR data about the set of spin states and (semi) randomly associates them with protein residues to yield candidate assignments. Each candidate assignment is scored and the best fitting alignment is theorized to be the sequence of residues making up a protein. However there exist other extremely rich sets of NMR data, such as TOCSY, which can aid in the process of intra-residue assignments of spin systems. Allowing MONTE to utilize TOCSY experimental data will greatly increase the ability to unambiguously determine residue assignments, and will increase the accuracy and reliability of resultant assignments. Furthermore it is not difficult to obtain the assignments of a particular protein given a rich set of experimental data, however when data is limited or incomplete, it becomes necessary to look to other places for additional or corroborative data. The usage of TOCSY data will increase the efficiency and ability of MONTE to assign proteins.

The proposed addition of TOCSY functionality to MONTE was achieved by building a classification schema to match a spin system for a particular residue to candidate amino acid. A candidate assignments fitness of match was determined as a measure of the total deviation from the expected values for a given amino acids spin system. The assignment functionality has only been tested on artificial data. Real data is often not perfect, and further work in testing and tuning the matching process will be required to increase the assignment capability on degenerate sets of data.

Andreas PfenningAndreas Pfenning, Carnegie Mellon University
(Mentors: Drs. Russell Schwartz and Alison Barth)

Computational Inference of Inducible Genes Related to Neural Plasticity
The electrical and pharmacological properties of neurons change in response to enhanced neural activity. Receptor and channel proteins involved in this change are up-regulated by transcription factors that bind to particular motifs in the promoter. The purpose of this project is to study activity dependent changes in receptor and channel proteins by analyzing transcription factor binding motifs in their promoters. We hypothesized that certain transcription factor binding motifs are statistically linked to plasticity-related genes. The first part of the project was to compile a list of brain-expressed genes to study as well as a control set. In the next step we analyzed promoters of the data sets with respect to the presence of specific transcription factor binding motifs linked to neural plasticity. The AP-1, CRE and Zif268 binding sites are regulated by transcription factors expressed during neurophysiologic changes. Using the binding site information, a list of genes likely to be involved in neural plasticity is compiled. This list of candidate genes were further narrowed down to include only receptor and channel proteins, which are both critical in the brain development and capable of being tested for a role in neural plasticity. Small differences in the presence of these transcription factor binding sites were found between the brain-expressed and the complete data set. More significant is the difference in association between the binding sites. CREB and AP-1 as well as CREB and Zif268 were found more often on the same promoter in the brain-expressed data set. This verifies observations that the AP-1 and CRE transcription factors are often expressed in the same cells. It also suggests they are involved in some of the same neural mechanisms. Further examination will hopefully lead to be a better understanding how activity and learning alter the dynamic membrane properties of neurons.

Marc SchaubMarc Schaub, Carnegie Mellon University
(Mentor: Dr. Jeff Schneider)

Computational Analysis of Alternate Splicing in Different Types of Cancer
Alternate splicing is a major contributor to the complexity of the mammalian genome. Recent analyses have shown that 40-60% of human genes are alternatively spliced. There is also extensive evidence indicating that alternative splicing plays a key role in several human diseases including cancer. In the current work, we extended a previous study (Xu and Lee, 2003) reporting the existence of cancer-specific splicing forms. In that study, he analysis was performed on combined data from all tumor types available. We looked at variations in splicing amongst different types of tumors using a   large scale computational approach. We took publicly available alternative splicing data based on aligned expression sequence tags (ESTs) and grouped these by tissue type and histology. The significance of each alternative splicing event was then evaluated in order to identify splice forms that were more frequent in cancer in a specific tissue. We then extended the available database by grouping independent splice events in the same EST and analyzed the frequency of these in cancer and normal cells. We determined a statistically significant set of 34 splice forms that are specific to cancer in a given tissue (p-value < 0.001). However, our analysis also showed that the publicly available data are not sufficient to identify splice forms that are cancer specific amongst different types of tissue. This is because severe data distribution biases amongst tissue types and in sample sizes tend to generate an excess of false positives. Our results suggest that cancer DOES involve modifications in splicing. However, the extent to which these are the consequence of cellular changes induced by the tumor remains open. Furthermore, a causal relationship between changes in splicing and mechanisms involved in tumorigenesis has yet to be established.

Ajay SurieAjay Surie, Carnegie Mellon University
(Mentor: Dr. Robert Murphy)

Building Generative Models for Images of Subcellular Localization Patterns
An important evolving problem in proteomics is the accurate description of the location of proteins expressed in a given cell. Location patterns provide important information relating to the function of proteins, and have other applications such as distinguishing between healthy and diseased cells. Images of subcellular location patterns are currently being described by various feature sets, and a new approach includes decomposing patterns into certain basic object types that they are comprised of. Since the number of possible proteins that can exist in a cell can be arbitrarily high, there needs to be an efficient way of modeling the distribution of different location patterns in a cell. Such models will also provide useful ways of representing subcellular location data. The goal of this project is to create generative models that can produce images that represent the distribution of various protein location patterns in 2D Hela cells. Initially the models were developed by learning appropriate parameters from existing data from different image classes. These parameters and the overall model were then refined based on feedback from the model's initial output to improve the distribution of the position of objects in generated images. To verify results, a classifier trained on real images was used to classify the real and the generated images and the accuracy was compared. The results showed that the generated images were fairly close to representing actual learned patterns from existing data. Future models will facilitate the more unambiguous description of subcellular location patterns, as well as the ability to represent unknown patterns. Additionally, the models will be able to simulate changes in location patterns based on changes in internal / external environments, such as the movement of proteins or the presence of drugs.

Ruben ValasRuben Valas, Carnegie Mellon University
(Mentor: Dr. Christopher Langmead)

Data-Driven Prediction of Macromolecular Motions
Recent advances in Nuclear Magnetic Resonance (NMR) spectroscopy present new opportunities for investigating the conformational dynamics of proteins in solution. In particular, order parameters for motions relevant to biological function can be obtained via experimental measurement of Residual Dipolar Couplings (RDCs). These order parameters have been used by others to identify mobile regions within proteins. We extend these results and introduce the first technique for characterizing the nature of the motion (hinge, shear, etc.). Motion tensors are extracted from the RDCs and used to train a classifier for macromolecular motions. Using a set of 2,400 dynamic protein models spanning seven different classes of motion, our classifier achieves an accuracy of 91% using 10-fold cross-validation.