In order to test community detection algorithms, and graph mining algorithms in general, it is important that we can generate realistic random graphs. For community detection in particular, the generative process should be such that we know what the generated communities are; the so-called ground truth.

Some well-known graph generators with known community structure are the stochastic block model (SBM), its degree-corrected version (DC-SBM), the R-MAT model, and the Lancichinetti–Fortunato–Radicchi (LFR) model. None of these models and generators can generate graphs with communities that have hyperbolic structure, though.

In [1], we presented a random graph generator that is capable of generating hyperbolic communities. The communities are non-overlapping, similarly to SBM. Their sizes and shapes (namely, the size of the core \(\gamma\) and the thickness of the tail \(H\)) are drawn from distributions that are either learned from a given (real-world) graph, or defined by the user. In addition, some noise is applied to remove edges within communities and to add edges between them (again, the amount of noise is either learned or given by the user).

Our model [1] preserves many properties of the original graph. For example, the degree distribution and local and global clustering coefficients are preserved up to the effects of the noise. The resulting graphs also naturally preserve the shapes of the communities, while still generating highly random graphs.

Source code

The generator HyGen is implemented with Matlab, and has been tested with Matlab 2016b on Linux. See the README file in the package for more information.

The software is given free of charge for academic use. If you use either package on your own work, you must cite [1].

References

  1. Saskia Metzler Pauli Miettinen Random Graph Generators for Hyperbolic Community Structures. Proc. 7th International Conference on Complex Networks and Their Applications, , 680693.
    10.1007/978-3-030-05411-3_54
    [pdf (Springer) | manuscript | source code]