Previous Work: Finding Conserved Protein Modules
A long-term goal of computational molecular biology is to extract, from large data sets, information about how proteins work together to carry out life processes at a cellular level. We are investigating protein-protein interaction (PPI) networks, in which the vertices are the proteins within a species and the edges indicate direct interactions between proteins. Our goal is to discover conserved protein modules: richly interacting sets of proteins whose patterns of interaction are conserved across two or more species. Such modules correspond to subgraphs of a PPI network that have many internal edges, relatively few external edges, and a structure that persists from one species to another. There are several existing software packages for finding conserved protein modules, and the algorithms developed in this project will be compared systematically with these packages. This research leads naturally to fundamental abstract problems in computational graph theory, such as the problem of partitioning a graph into vertex-disjoint subgraphs with a high density of edges, or the colorful subgraph problem, in which each vertex of a graph is assigned a color, and the goal is to find a small connected subgraph containing at least one vertex of each color. We will systematically derive and evaluate heuristic algorithms for these NP-hard problems.
Funding provided by NSF.