This is the source code for Saskia Metzler, Stephan Guennemann, and Pauli Miettinen: Hyperbolae Are No Hyperbole: Modelling Communities That Are Not Cliques. In Proceedings of the 16th IEEE International Conference on Data Mining (ICDM), 2016. The code is provided as-is and is free to use for scientific purposes. If you do use the code, please cite the above paper. Before you can start using this matlab code, there are a few things to do and to consider. Any provided commands here are just examples. The actual paths/versions/file names on your machine might differ. --------Environment setup---------- Compile the *.c files with your mex, remember to include openMP as the code is partially parallelised: mex CFLAGS="\$CFLAGS -std=c99 -fopenmp" LDFLAGS="\$LDFLAGS -lgomp" ... getElementsBelowHGammaParallel.c In case matlab complains that it cannot link any more libraries at runtime when you want to use the compiled functions, you need to start it with openMP preloaded: LD_PRELOAD=/usr/lib/gcc/x86_64-linux-gnu/4.9.2/libgomp.so matlab --------Data sets----------- The datasets we used are from the Stanford Large Network Dataset collection and the University of Florida Sparse Matrix Collection. You need to download your own version of them. For the first, you'll need the graph and the community information as *.txt files. For the latter, the data in matrix market format is most convenient. The code is written for datasets discussed in the experiments that should be in a folder called data on the same level as the folder containing the code, so using other data will require edits in the respective positions. We provide a function called "readCommunities" to create *.mat files from the Stanford data. The script "readDataSets" illustrates how to use this function. --------Running experiments----------- A good starting point to understand how the experiments are done is the script "testImproveLLForMultipleCommunities". In the part for the non-synthetic data, it iterates over the different datasets and essentially calls the function "improveModel" with some initial naive model for each community. "improveModel" is the central part of the algorithm. Given a graph and communities (note that they are given as a struct array), it computes a good model and outputs the respective parameters for every community as well as the order in which the communities need to be considered. To understand how well a model returned by "improveModel" is, the function "computeLLOfGraph" computes the respective global log-likelihood (LL). To compare to HyCom models and block models, the script "testHyComModelsForRealWorldData" uses the "computeLLOfGraph" function and then does the LL ratio test to compare the LLs. Of course, the HyCom models have to be created prior to this computation. The script "testImproveLLForMultipleCommunitiesUnderHyComModel" shows how this is done. Existing community detection algorithms can be used in conjunction with our modelling approach. The scripts "tryBMF", "tryHyCom", and "trySpectCLust" show how this can be done. Note that only the spectral clustering code is part of this code package (function "spectclust"). Implementations for the other algorithms are provided by the respective authors we cite. -------Displaying communities-------- We provide a function called "plotCommunity" that takes a struct of a community and makes the kind of plots you see in the paper. The distributions of the parameters H, gamma, and x for a model of a graph can be visualised using the scripts "plotHGammaDist" and "plotXSigmaDist".