Reductions are a standard way to study the computational complexity of problems. They allow us to impose a partial order on the problems based on the hardness of solving them. Reductions are not commonly used for many classical frequency-based pattern mining problems, though. Often, the argument is that “the output is potentially exponential” rendering standard polynomial reductions useless. But other models of complexity, such as counting complexity or enumeration complexity, can still be applied.

In this paper [1] we present two novel reductions, maximality-preserving reduction and maximality-preserving reductions for feasible frequent pattern problems. These allow us to study the complexity of maximal pattern mining problems in different domains, including labelled graphs, DAGs, trees, subsequences with no repeated symbols, and itemsets, to name a few. Extending the work of Kimelfeld and Kolaitis, we build a number of novel reductions between these problems (see Figure 1). Our results show that the complexity of these maximal pattern mining problems essentially collapses.

Reductions between various maximal pattern mining problems.
Figure 1. Reductions between various maximal pattern mining problems. MaxFS stands for maximal frequent subgraphs on labelled graphs, MaxSQS for maximal frequent subsequences with no repetitions, and MaxFIS for maximal frequent itemsets.

Furthermore, in [2] we show that these reductions can be used in practice to efficiently mine frequent subgraphs from labelled graphs using slightly modified Apriori algorithm.

Source code and sample data

The algorithms are implemented with Python. Use fsg2ffis.py --help for more information.

Sample data is provided together with the source code to try out the program. This is not the full data used in [2], but a small sample thereof.

The software is given free of charge for academic use. If you use the software on your own work, you must cite [1]. For the data, you must cite Stack Exchange as described here and [1] for data pre-processing.

References

  1. Stefan Neumann Pauli Miettinen Reductions for Frequency-Based Data Mining Problems. Proc. 2017 IEEE International Conference on Data Mining (ICDM '17), , 9971002.
    10.1109/ICDM.2017.128
    [manuscript | tech. rep. | source code | slides]
  2. Stefan Neumann Pauli Miettinen Reductions for Frequency-Based Data Mining Problems. arXiv:1709.00900 [cs.CC] .
    [pdf (arXiv) | source code]