Boolean Tensor Factorization

Introduction

A tensor is a multi-dimensional (or multi-way) array. For example, a vector is a 1-way tensor and a matrix is a 2-way tensor. A Boolean tensor is a tensor that assumes binary values (0 and 1), and that is endowed with the Boolean algebra, that is, the semi-ring over \(\{0,1\}\) with the multiplication defined as usual and the addition defined as \(1+1=1\).

Boolean tensor factorization [1] factorizes a Boolean tensor into factor matrices and potential core tensor. It is the obvious generalization of Boolean matrix factorization to tensors. The two most prominent types of tensor factorization are the Tucker decomposition and the CP decomposition, also abbreviated as CANDECOMP or PARAFAC or CPD. Tucker3 decomposition represents a 3-way tensor \(\tens{T}\) using a core tensor \(\tens{G}\) and factor matrices \(A\), \(B\), and \(C\). CP decomposition represents the same tensor using just the factor matrices \(A\), \(B\), and \(C\). Figure 1 gives a schematic view on how these factorizations look like.

Schematic view of the Tucker3 decomposition Schematic view of the CP decomposition
Figure 1. Illustration of Tucker3 (above) and CP (below) tensor factorizations.

The element-wise definition of the Boolean CP decomposition of three-way tensor \(\tens{T}\) into factor matrices \(A\), \(B\), and \(C\) is \begin{equation} \label{eq:bcp} \tens{T}_{ijk} = \bigvee_{r=1}^R A_{ir}B_{jr}C_{kr}\; . \end{equation} That is, in the CP decomposition (Boolean or not), the number of the columns in the factor matrices must agree. This limitation does not hold true for the Boolean Tucker3, where the element-wise equation is \begin{equation} \label{eq:btucker} \tens{T}_{ijk} = \bigvee_{p=1}^P\bigvee_{r=1}^R\bigvee_{q=1}^Q \tens{G}_{prq}A_{ip}B_{jr}C_{kq}\; . \end{equation}

The Boolean tensor rank is defined as the minimum \(R\) for witch the equality \eqref{eq:bcp} holds. Like normal tensor rank, also the Boolean tensor rank is NP-hard to compute, and as a consequence, the minimum-error fixed-rank Boolean CP decomposition is impossible to approximate to within any polynomially computable factor [1].

Despite the complexity, heuristic algorithms for Boolean tensor decomposition exist. If the factorization is restricted, for instance, so that one of the factor matrices must have at most one non-zero in each row, the resulting Boolean tensor clustering problem can be approximated well. On the other hand, the Walk'n'Merge algorithm is an example of a heuristic algorithm for general Boolean CP and Tucker3 decompositions, tailored for sparse tensors [2, 4, 7]. It has been applied to open information extraction, especially to the task of finding facts from a collection of surface subject–predicate–object triples.

Boolean tensors are a natural way to express subject–predicate–object triples, and indeed, one can express the SPARQL queries as Boolean matrix and tensor operations.

Further reading

More information on the Walk'n'Merge algorithm, its applications to information extraction, on tensor clustering, and on defining SPARQL using Boolean tensors can be found on their respective pages. For general information on (non-Boolean) tensor factorizations, the survey of Kolda & Bader [2009] is highly recommendable.

References

  1. Pauli Miettinen Boolean Tensor Factorizations. Proc. 11th IEEE International Conference on Data Mining (ICDM2011), 447456.
    10.1109/ICDM.2011.28
    [pdf (IEEE) | manuscript | slides]
  2. Dóra Erdős Pauli Miettinen Walk'n'Merge: A Scalable Algorithm for Boolean Tensor Factorization. Proc. 13th IEEE International Conference on Data Mining (ICDM'13), , 10371042.
    10.1109/ICDM.2013.141
    [pdf (IEEE) | manuscript | tech. rep. | source code]
  3. Dóra Erdős Pauli Miettinen Discovering Facts with Boolean Tensor Tucker Decomposition. Proc. 2013 ACM International Conference on Infortmation and Knowledge Management (CIKM '13), , 15961572.
    10.1145/2505515.2507846
    [pdf (ACM) | manuscript | data]
  4. Dóra Erdős Pauli Miettinen Scalable Boolean Tensor Factorizations using Random Walks. arXiv:1310.4843 [cs.DS] .
    [pdf (arXiv)]
  5. Saskia Metzler Pauli Miettinen Clustering Boolean Tensors. Data Mining and Knowledge Discovery 29(5), , 13431373.
    10.1007/s10618-015-0420-3
    [tech. rep. | manuscript | pdf (Springer) | source code]
  6. Saskia Metzler Pauli Miettinen Clustering Boolean Tensors. arXiv:1501.00696 [cs.NA] .
    [pdf (arXiv)]
  7. Helge Dombrowski Boolean tensor decomposition based on the Walk'n'Merge algorithm. Universität des Saarlandes (M.Sc. thesis).
    [pdf]