In traditional community detection approaches, a community is modelled as a (quasi-) clique, that is, each node in the community is assumed to be equally likely to be connected to every other node. An adjacency matrix of such graph can be seen in Figure 1.

Adjacency matrix of a community where the 1s (edges) are approximately equally distributed.
Figure 1. Adjacency matrix of an example community from the Youtube data.

Many communities, however, have more structure in them. In particular, if we order the rows and columns of the adjacency matrix in the degree order, we can see that the matrix has a clear structure: most edges (orange dots) apper under the green line in Figure 2.

Adjacency matrix of a community ordered in the degree order, with a green line separating the dense area from the sparse one
Figure 2. Adjacency matrix of the community from Figure 1 after re-ordering the rows and columns.

Figure 2 also shows a green line which is our hyperbolic community model for this community. In our model, only the edges under the green line are considered equally (and highly) likely, while the edges above it are unlikely, and not part of the community.

In [1], we developed three models for non-clique communities. The first is based on the hyperbolic equation: Let \(i,j\in\{0,1,\ldots,N-1\}\) be two nodes in \(N\)-node community. Then there is an edge (with high probability) if \((i+p)(j+p)\leq \theta\) for some \(p\) and \(\theta\), that are the community's parameters. In the second model, we define the curve by fixing two points on it, namely, were it crosses the point \(i=j\) and where it crosses the point \(i=N-1\). Our third model is a mixture model; a slightly simplified version of its inequality is \((1-x)(i\cdot j) + x(i + j) \leq \Sigma\), where \(x\) and \(\Sigma\) are the parameters.

Importantly, we show in [1] that the these three models are equivalent: given one, we can always find parameters for the other two that define exactly the same community. We also show how these different models allow for different interpretations of the community. And naturally, we also develop algorithms that are capable of finding good parameters for all of the communities in a graph, even when the communities overlap.

Source code

The algorithms are implemented with Matlab and C, and have been tested with Matlab 2015b on Linux. See the README file in the package for more information.

The software is given free of charge for academic use. If you use either package on your own work, you must cite [1].

References

  1. Saskia Metzler Stephan Günnemann Pauli Miettinen Hyperbolae Are No Hyperbole: Modelling Communities That Are Not Cliques. Proc. 2016 IEEE International Conference on Data Mining (ICDM '16), 2016, 330339.
    10.1109/ICDM.2016.0044
    [tech. rep. | manuscript | pdf (IEEE) | source code]
  2. Saskia Metzler Stephan Günnemann Pauli Miettinen Hyperbolae Are No Hyperbole: Modelling Communities That Are Not Cliques. arXiv:1602:04650 [cs.SI] .
    [pdf (arXiv)]