Free-Viewpoint Video of Human Actors

Human Motion Capture

Body Model


Figure 1: Body model with underlying skeleton structure (l). Energy function for one camera views (r).

The body model used throughout the system is a generic model consisting of a hierarchic arrangement of 16 body segments (head, upper arm, torso etc.). The model's kinematics are defined via an underlying skeleton consisting of 17 joints connecting bone segments. To accommodate for variations in the body shape local scaling and segment deformation parameters (Bézier parameters) are also available.

Initialization


Figure 2: Schematic overview of the initialization method.

In the beginning of each multi-view video sequence, the subject stands in an initialization pose. Using the silhouette information of this pose in each camera view, an initial set of pose parameters as well as a set of scaling and deformation parameters are found. These scaling parameters deform the a priori body model to optimally conform to the recorded person. The initialization method is a multi-step optimization procedure whose components are depicted in Fig. 2. In each step, a non-linear optimization over a subset of the model parameters (deformation, scaling or pose parameters) is performed using the silhouette-overlap as an error metric.

Motion Parameter Estimation

The criterion that guides our motion capture and initialization procedures is the overlap between the projected body model and the input silhouettes in each camera view. A quantitative measure for this overlap is the the pixel-wise XOR between the projected model silhouette and the input image silhouette in each camera view (Fig. 1). The error metric used during optimization is the sum of the XOR values from each camera view. We exploit consumer-level graphics hardware to efficiently compute this error metric.
The motion paramaters of the body model are found by performing a non-linear optimization in the pose parameter space. For optimization we use a direction set method with a slightly modified line search step. To make the problem tractable, the search is performed hierarchically. In order to deal with fast body motion, a pre-selection step (grid-search) on the lower-dimensional parameter spaces of the limbs is performed.
The energy function evaluation can be improved in two ways. First, in order to reduce the amount of data transfered between GPU and CPU, the energy function evaluation is only done on sub-windows of the image plane. Second, the rendering overhead during the XOR computation can be further reduced by only rendering the body parts under consideration. The whole problem lends itself to a parallel implementation using 5 CPUs and GPUs which significantly improves the performance of the human motion capture algorithm (see references).
In Fig. 1 four input camera frames from a multi-view video sequence (small images) and the corresponding poses of the body model are shown that were computed with our method (images in the center).


Figure 3: Input images and corresponding poses of the body model as they are found by the motion capture method.

Home / Research Units / AG4: Home Page / Research Areas

Copyright © 1998-2002 by Max-Planck-Institut für Informatik. All rights reserved. Impressum and legal notices.
Page maintained by Christian Theobalt <theobalt@mpi-sb.mpg.de>
www site design and concept by Uwe Brahm
Document last changed on Tuesday, April 22 2003 - 15:00