I am very grateful to the Department of Computer Science in Pisa, which hosted me for the last four years and gave me the opportunity to grow in such a nice environment.

]]>It has been a very nice event, full of interesting short presentations of past and current PhD students. As Luca and Alessio (which are respectively current and past PhD students) were not in Italy, I presented a short summary of our works about community detection.

]]>Here you can find the and the details of the competition.

]]>The first paper is a joint work with Alessio Conte, Tiziano De Matteis, Daniele De Sensi, Roberto Grossi, and Luca Versari, with title “D2K: Scalable Community Detection in Massive Networks via Small-Diameter k-Plexes”.

This paper studies k-plexes, a well known pseudo-clique model for network communities. In a k-plex, each node can miss at most k-1 links. Our goal is to detect large communities in today’s real-world graphs which can have hundreds of millions of edges. While many have tried, this task has been elusive so far due to its computationally challenging nature: k-plexes and other pseudo-cliques are harder to find and more numerous than cliques, a well known hard problem. We present D2K, which is the first algorithm able to find large k-plexes of very large graphs in just a few minutes. The good performance of our algorithm follows from a combination of graph-theoretical concepts, careful algorithm engineering and a high-performance implementation. In particular, we exploit the low degeneracy of real-world graphs, and the fact that large enough k-plexes have diameter 2. We validate a sequential and a parallel/distributed implementation of D2K on real graphs with up to half a billion edges.

The second paper is a joint work with Alessio Conte, Gaspare Ferraro, Roberto Grossi, Kunihiko Sadakane, and Takeaki Uno with title “Node Similarity with q -Grams for Real-World Labeled Networks”

]]>We study node similarity in labeled networks, using the label sequences found in paths of bounded length q leading to the nodes. (This recalls the q-grams employed in document resemblance, based on the Jaccard distance.) When applied to networks, the challenge is two-fold: the number of q-grams generated from labeled paths grows exponentially with q, and their frequency should be taken into account: this leads to a variation of the Jaccard index known as Bray-Curtis index for multisets. We describe nSimGram, a suite of fast algorithms for node similarity with q-grams, based on a novel blend of color coding, probabilistic counting, sketches, and string algorithms, where the universe of elements to sample is exponential. We provide experimental evidence that our measure is effective and our running times scale to deal with large real-world networks.

It has been a great conference. Below the abstract of our paper.

]]>We investigate a decomposition technique for listing problems in graphs and set systems. It is based on the Cartesian product of some iterators, which list the solutions of simpler problems. Our ideas applies to several problems, and we illustrate one of them in depth, namely, listing all minimum spanning trees of a weighted graph G. Here iterators over the spanning trees for unweighted graphs can be obtained by a suitable modification of the listing algorithm by [Shioura et al., SICOMP 1997], and the decomposition of G is obtained by suitably partitioning its edges according to their weights. By combining these iterators in a Cartesian product scheme that employs Gray coding, we give the first algorithm which lists all minimum spanning trees of G in constant delay, where the delay is the time elapsed between any two consecutive outputs. Our solution requires polynomial preprocessing time and uses polynomial space.

]]>Given a connected graph G of m edges and n vertices, we consider the basic problem of listing all the choices of k vertex-disjoint st-paths, for any two input vertices s,t of G and a positive integer k. Our algorithms take O(m) time per solution, using O(m) space and requiring O(F_k(G)) setup time, where F_k(G) = O(m min{k, n^{2/3} log n, sqrt{m} log n} ) is the cost of running a max-flow algorithm on~G to compute a flow of size k. The proposed techniques are simple and apply to other related listing problems as discussed in the paper.

]]>An orientation of an undirected graph is obtained by assigning a direction to each of its edges. It is called cyclic when a directed cycle appears, and acyclic otherwise. We study efficient algorithms for enumerating the orientations of an undirected graph. To get the full picture, we consider both the cases of acyclic and cyclic orientations, under some rules specifying which nodes are the sources (i.e. their incident edges are all directed outwards). Our enumeration algorithms use linear space and provide new bounds for the delay, which is the maximum elapsed time between the output of any two consecutively listed solutions. We obtain a delay of O(m) for acyclic orientations and Õ(m) for cyclic ones. When just a single source is specified, these delays become O(m⋅n) and O(m⋅h+h3), respectively, where h is the girth of the graph without the given source. When multiple sources are specified, the delays are the same as in the single source case.