Boolean Tensor Factorization

Walk'n'Merge: A Scalable Algorithm for Boolean Tensor Factorization

Finding the best Boolean tensor decomposition of a given rank is NP-hard optimization problem, and hence heuristic algorithms are often employed. Yet, developing efficient heuristic algorithms for Boolean tensor decompositions is hard. To address this problem, we developed an algorithm called Walk'n'Merge [1, 2] for scalable Boolean tensor decompositions.

The algorithm is based on the view of the Boolean CP decomposition as a (Boolean) sum of rank-1 (binary) tensors. Rank-1 binary tensors, on the other hand, correspond to monochromatic (all-1s) sub-tensors of the input tensor. As we do not aim at finding exact decompositions, the rank-1 binary tensors are also only approximate. Hence, as the first step, the Walk'n'Merge algorithm tries to find dense sub-tensors. This is done via re-started random walks in specific type of a graph called fibre graph. In particular, every non-zero element of the tensor is one node in the fibre graph, and two nodes are connected if (in case of 3-way tensors) they are on the same slice of the tensor (i.e. they share two of the three indices). In a fibre graph, every rank-1 (i.e. monochromatic) sub-tensor of the input corresponds to a subgraph with radius 3 and approximate rank-1 sub-tensors are subgraphs with somewhat higher radius. The re-started random walks can be used to find the small-radius subgraphs by studying the frequency with which the walks visit each node.

The random walks can return almost the same sub-tensor multiple times; on the other hand, random walks can easily miss the very smallest subgraphs. To counter these problems, Walk'n'Merge first mines all small (say, 2-by-2-by-2) monochromatic sub-tensors using an exhaustive search, and then tries to merge any pair of sub-tensors obtained in either of these two phases.

After the merging phase is over, the algorithm has generated a Boolean CP decomposition of the input tensor: the decomposition is the Boolean sum of the rank-1 sub-tensors found. If the desired final result is also Boolean CP decomposition, the algorithm can now order the sub-tensors based on how much they reduce the residual error, and return the top-\(k\) sub-tensors for rank-\(k\) decomposition. Alternatively, we propose to use the MDL principle to select the rank automatically.

The Walk'n'Merge algorithm can be also used to find Boolean Tucker3 decomposition. For that, we use MDL to merge different modes of the sub-tensors. The resulting decompositions can be used, e.g. to find facts from surface data.

Later updates [3] have made the Walk'n'Merge algorithm even faster by utilizing the sparsity of the fibre graph.

Source code

The source code for the improved Walk'n'Merge algorithm called Split'n'Expand is available below. Please cite [1] and [3] if you use the code.

References

  1. Dóra Erdős Pauli Miettinen Walk'n'Merge: A Scalable Algorithm for Boolean Tensor Factorization. Proc. 13th IEEE International Conference on Data Mining (ICDM'13), , 10371042.
    10.1109/ICDM.2013.141
    [pdf (IEEE) | manuscript | tech. rep. | source code]
  2. Dóra Erdős Pauli Miettinen Scalable Boolean Tensor Factorizations using Random Walks. arXiv:1310.4843 [cs.DS] .
    [pdf (arXiv)]
  3. Helge Dombrowski Boolean tensor decomposition based on the Walk'n'Merge algorithm. Universität des Saarlandes (M.Sc. thesis).
    [pdf]