# Boolean Tensor Factorization

## Defining SPARQL with Boolean Tensor Operations

The Resource Description Framework (RDF) represents information in the form of subject–predicate–object triples. We can represent such RDF data using 3-way binary tensors: An element $$(i,j,k)$$ of $$\tens{T}$$ is $$1$$ if and only if the respective subject-predicate-object triple $$(s_i,p_j,o_k)$$ is present in the RDF graph $$G$$.

An example for a simple SPARQL query that has a single triple pattern as basic graph pattern is SELECT * WHERE {?a $$G$$:p$$_j$$ $$G$$:o$$_k$$}. The keyword SELECT acts as a projection operator. It identifies the variables to appear in the query result. It matches all RDF triples of $$G$$ that have predicate p$$_j$$ and object o$$_k$$. If the RDF data is represented as a binary 3-way tensor $$\tens{T}$$, the triple pattern selects the fiber $$t_{:jk}$$. That is a vector of all subjects with predicate $$j$$ and object $$k$$. This vector has a $$1$$ at positions $$i$$ that correspond to an RDF triple $$(s_i,p_j,o_k)$$ present in the RDF graph $$G$$. A slice $$T_{:j:}$$ of $$\tens{T}$$ would be selected if only one mode was fixed by the query as in SELECT * WHERE {?a $$G$$:p$$_j$$ ?b}.

It turns out that most SPARQL operations, especially the joins, can be expressed using the Khatri–Rao product . Thus, insights on the computation of the Khatri–Rao can lead to direct real-world benefits. This re-interpretation also allows us to use techniques more familiar in relational data bases natively with RDF data.

## References

1. Join Size Estimation on Boolean Tensors of RDF Data. WWW 2015 Companion Volume, , 7778.
10.1145/2740908.2742738
[manuscript | tech. rep. | pdf (WWW) | pdf (ACM)]
2. On Defining SPARQL with Boolean Tensor Algebra. arXiv:1503.00301 [cs.DB]
[pdf (arXiv)]