max planck institut informatik
mpii logo Minerva of the Max Planck Society

Automated Nonlinear Regression Modeling for HCI, Proc. CHI 2014

Automated Nonlinear Regression Modeling for HCI

Antti Oulasvirta    
Max Planck Institute for Informatics     Saarland University    


Predictive models in HCI, such as models of user performance, are often expressed as multivariate nonlinear regressions. This approach has been preferred, because it is compact and allows scrutiny. However, existing modeling tools in HCI, along with the common statistical packages, are limited to predefined nonlinear models or support linear models only. To assist researchers in the task of identifying novel nonlinear models, we propose a stochastic local search method that constructs equations iteratively. Instead of predefining a model equation, the researcher defines constraints that guide the search process. Comparison of outputs to published baselines in HCI shows improvements in model fit in seven out of 11 cases. We present a few ways in which the method can help HCI researchers explore modeling problems. We conclude that the approach is particularly suitable for complex datasets that have many predictor variables.


PDF copy of the paper:

Automated Nonlinear Regression Modeling for HCI
Oulasvirta, A.
Proceedings of the 2014 Annual Conference on Human factors in Computing Systems (CHI'14), ACM Press (2014), to appear
Presentation slides in Slideshare


The method searches for nonlinear models for a given dataset. The implementation builds on ideas in the area of symbolic programming [6]. It starts from a linear model with all predictor variables. A greedy local search and random local search method are available. The method searches for as long as the researcher allows, printing improved models to the command prompt as the search progresses. The default fitness function is R-squared, but this can be changed in code. Several other options are offered in code.

The models produced by the method were compared to 11 models reported in HCI literature. The task was to find a nonlinear regression model for the same input data as reported in a paper while allowing the same maximum number of free parameters.

In addition, the method was tried out in three other tasks: 1) a complex dataset with more predictors and larger input file, 2) finding a single model for multiple datasets, and 3) finding a model using a limited set of "theoretically motivated" transformations. Please see the paper for the results.


The original sources are described in the paper. See Table 1. Values are comma-separated. The first column is the dependent variable (often movement time MT) and the rest are predictors. Variable names for the predictors are given in Table 1 and described in more length in the original papers.

1dataset1.csv Stylus tapping
2dataset2.csv Stylus tapping with W_e
3dataset3.csv Mouse pointing
4dataset4.csv Mouse pointing with W_e
5dataset5.csv Trackball dragging
6dataset6.csv Trackball dragging with W_e
7dataset7.csv Magic Lens pointing
8dataset8.csv Tactile guidance
9dataset9.csv Pointing, angular Exp. 2
10dataset10.csv Two thumb tapping
11dataset11.csv Menu selection


This is a proof-of-concept study. It shows that known techniques in symbolic programming can be successfully applied to modeling tasks in HCI. Performance can be vastly improved by considering more advanced techniques in the literature. Therefore, the code is (very) experimental. The version shared here is able to replicate the results reported in Table 1 in the paper. I recommend using the random search mode.

Installation:My present system is Mac OS X 10.7.5 with Python 2.7.6. To run the optimizer, you need the following modules: ols, array, csv, string, random, numpy, sys, re, math, scipy, utilities. The modules and are in the zip file. They must be placed in the directory as the main file. You need to have a folder named "logs" for outputs.

Usage: From command line python FILENAME number-of-free-params number-of-iterations. Note that because the intercept is considered as a parameter, the actual number of free parameters is one less. The input file must be in the same directory. The outputs are stored to logs-folder.

Bibtex Citation

  title={Automated Nonlinear Regression Modeling for HCI},
  author={Oulasvirta, Antti},
        booktitle={Proceedings of the 2014 Annual Conference on Human factors in Computing Systems 
        organization={ACM Press}