This is the source code for
Saskia Metzler, Stephan Guennemann, and Pauli Miettinen: Hyperbolae Are No
Hyperbole: Modelling Communities That Are Not Cliques. In Proceedings of the
16th IEEE International Conference on Data Mining (ICDM), 2016.

The code is provided as-is and is free to use for scientific purposes. If you
do use the code, please cite the above paper. 

Before you can start using this matlab code, there are a few things to do and
to consider. Any provided commands here are just examples. The actual
paths/versions/file names on your machine might differ.


--------Environment setup----------

Compile the *.c files with your mex, remember to include openMP as the code is
partially parallelised:

    mex CFLAGS="\$CFLAGS -std=c99 -fopenmp" LDFLAGS="\$LDFLAGS -lgomp" ...
    getElementsBelowHGammaParallel.c

In case matlab complains that it cannot link any more libraries at runtime when
you want to use the compiled functions, you need to start it with openMP
preloaded:
	
	LD_PRELOAD=/usr/lib/gcc/x86_64-linux-gnu/4.9.2/libgomp.so matlab


--------Data sets-----------

The datasets we used are from the Stanford Large Network Dataset collection and
the University of Florida Sparse Matrix Collection. You need to download your
own version of them. For the first, you'll need the graph and the community
information as *.txt files. For the latter, the data in matrix market format is
most convenient. The code is written for datasets discussed in the experiments
that should be in a folder called data on the same level as the folder
containing the code, so using other data will require edits in the respective
positions. We provide a function called "readCommunities" to create *.mat files
from the Stanford data. The script "readDataSets" illustrates how to use this
function.


--------Running experiments-----------

A good starting point to understand how the experiments are done is the script
"testImproveLLForMultipleCommunities". In the part for the non-synthetic data,
it iterates over the different datasets and essentially calls the function
"improveModel" with some initial naive model for each community. "improveModel"
is the central part of the algorithm. Given a graph and communities (note that
they are given as a struct array), it computes a good model and outputs the
respective parameters for every community as well as the order in which the
communities need to be considered.

To understand how well a model returned by "improveModel" is, the function
"computeLLOfGraph" computes the respective global log-likelihood (LL).

To compare to HyCom models and block models, the script
"testHyComModelsForRealWorldData" uses the "computeLLOfGraph" function and then
does the LL ratio test to compare the LLs. Of course, the HyCom models have to
be created prior to this computation. The script
"testImproveLLForMultipleCommunitiesUnderHyComModel" shows how this is done.

Existing community detection algorithms can be used in conjunction with our
modelling approach. The scripts "tryBMF", "tryHyCom", and "trySpectCLust" show
how this can be done. Note that only the spectral clustering code is part of
this code package (function "spectclust"). Implementations for the other
algorithms are provided by the respective authors we cite.


-------Displaying communities--------

We provide a function called "plotCommunity" that takes a struct of a community
and makes the kind of plots you see in the paper.

The distributions of the parameters H, gamma, and x for a model of a graph can
be visualised using the scripts "plotHGammaDist" and "plotXSigmaDist".