Book

This is a page about the book published by Atlantis Press (Springer), which is based on my PhD thesis.

You can buy it here!

9789462390966

Preface

The development of algorithms for enumerating all possible solutions of a specific combinatorial problem has a long history, which dates back to, at least, the sixties, when the problem of enumerating some specific graph-theoretic structures (such shortest paths and cycles) has been attacked. As already observed by David Eppstein in 1997, these enumeration problems have several applications, such as (1) looking for structures which satisfy some additional constraints which are hard to optimize, (2) evaluating the quality of a model for a specific problem, in terms of the number of incorrect structures, (3) computing how sensitive the structures are to variation of some problem’s parameters, and (4) examining not just the optimal structures, but a larger class of structures, to gain a better understanding of the problem. As a matter of fact, in the last fifty years a large variety of enumeration problems have been considered in the literature, ranging from geometry problems to graph and hypergraph problems, from order and permutation problems to logic problems, and from set problems to string problems. A very recent compendium has been compiled by Kunihiro Wasa, which includes 350 combinatorial problems and more than 230 references. Nevertheless, the research area of enumeration algorithms is still very active and still includes many interesting open problems. This is where this book comes into play, by first presenting an overview of the main computational issues related to the design and the analysis of enumeration algorithms, and by then contributing to this research area with several significant results, both theoretical and experimental.

Even though the emphasis of the book is on enumeration problems, it is worth noting that the original main application area of the thesis of Andrea Marino has been computational biology. Indeed, in the last years, biologists have accumulated a huge amount of information, at different levels of observation, from the molecular level to the population one. This information usually describes interactions or relationships among entities of biological nature, and they are often represented by means of networks (or, equivalently, graphs). Graphs allow researchers to abstract from the specific individual information: the complexity of a biological entity is enclosed into a vertex of the network and the complex interaction mechanisms between two entities are simply described by means of an arc. Clearly, the biological application determines the meaning of the nodes and of the arcs and influences the network topology: typical networks at the molecular level are gene regulation networks, protein interaction networks, and metabolic networks, while typical networks at the macroscopic level are, instead, phylogenetic networks and ecological networks. Reducing problems arising in biology to the analysis of networks allows us to take advantage of the many results and algorithmic techniques that have been developed in graph theory and, more recently, in the analysis of complex networks. In other words, the observation of biological phenomena is turned into the observation of the network, of its structure, and of its properties. The network becomes a tool to investigate the macromolecular interactions at the level of genes, metabolites and proteins to extract the cellular phenotypes, or the conglomerate of several cellular processes resulting from the expression of the genes and of the proteins.

The main goal of the present book is the application of algorithm design and complexity analysis techniques to the analysis of biological (and, more in general, of complex) networks, by focusing mainly on topological property computation and subnetwork extraction tasks. Several quantifiable tools of network theory offer unforeseen possibilities to understand biological network organization and evolution. Some well-known examples of these tools are measures like the degree distribution, the diameter (that is, the longest shortest path), and the clustering coefficient. These topological properties of biological networks can be seen as the result of a network evolution process: hence, one can formulate evolving network models for biological networks which produce networks consistent with the above topological properties. This implies that efficient algorithms have to be designed in order to compute these properties in a very little amount of time and (maybe more importantly) of space (note that, sometimes, even polynomial-time/space algorithms might turn out to be too expensive if a massive experimentation has to be done and/or if the size of the network is quite large). For what concern the second task, that is, subnetwork extraction, observe that, in general terms, this task consists in extracting a subgraph that best explains the relationships between a given set of nodes of interest in a graph. A typical example in communication networks of such a problem is the Steiner tree problem which consists in finding the lightest tree connecting a specific subset of vertices of the network. Subnetwork extraction is a common tool while studying biological networks: for example, in 2010, Faust et al. investigate six different approaches, all based on subnetwork extraction, to extract relevant pathways from metabolic networks. One of the main issues with the subgraph extraction approach is to determine the kind of subgraph to be extracted, which clearly has to be meaningful from a biological point of view. After that, even in this case it turns out that most of the times the extraction of desired subgraphs is a computationally difficult problem. Finally, as it is common in the bioinformatics research area, finding one subgraph is not usually enough: no clear optimization criterium is usually known, so that the problem becomes even more difficult since it requires to enumerate all possible subgraphs.

All the enumeration problems attacked in this book arise from real-world applications, either in the specific field of computational biology or in the more general field of complex networks. Some of these problems (such as the enumeration of cycles and the enumeration of diametral vertices) were already well-known and widely studied. Others (such as the enumeration of bubbles and the enumeration of stories) are closely related to previously known problems (such as the enumeration of cycles and the enumeration of feedback vertex sets). For all these problems, efficient algorithms and/or heuristics are proposed in order to deal with them: all these algorithms have been implemented and experimented, thus validating their usefulness in solving the original application problem. Indeed, one of the main beautiful characteristics of this book is the fact that it combines deep theoretical results and practical implementation and experimentation: thus significantly contributing both to the solution of (biological) very interesting practical questions and to the field of theoretical computer science. In particular, I would like to emphasize one of the most impressive theoretical results contained in this book, that is, the first optimal algorithm for enumerating cycles in an undirected graph. This result significantly improves the solution of a 40 year old problem! And I would also like to emphasize one of the most impressive experimental results contained in this book: that is, the design, analysis, and implementation of a new very efficient heuristic for enumerating diametral vertices in a graph. By making use of these new heuristic, for example, the diameter of a snapshot of a subgraph of the Facebook network, that contained approximately 150 millions of vertices and almost 16 billions of edges, has been computed in just twenty minutes (for the sake of curiosity, the diameter value is 41)!

In summary, I think that this book is a very cute example of how theory and practice should proceed together, by exploiting the “virtuous circle” in which practical problems (in this case, mostly biological ones) motivate significant and deep contributions to theoretical computer science, which in turn allow efficient, useful and practical solutions of the original problems.

Florence, January 2015

Pierluigi Crescenzi