Date: 06/September/2014 - morning
It is now possible to capture high-quality 3D scenes using cameras and depth sensors. Apart from substantial hardware innovations
that pushed these technologies to the consumer market, significant algorithmic advances have also been reported in computer vision. But
the captured 3D scenes are often disconnected point-clouds with no temporal coherence. Estimating 3D motion accurately at fine-scale
detail remains a very challenging problem for the general non-rigid case. Further, the captured 3D scenes cannot be edited and
manipulated by users in a semantically meaningful way.
Due to these challenges, computer vision technologies are yet to make a great impact in performance-critical areas such as graphics production for the entertainment industry and bio-mechanical modelling for medicine and sports. For these purposes, it is necessary to build accurate and editable 3D deformation models. Computer graphics has traditionally approached this requirement from the other side, using detailed hand-crafted models that are suited to the specific purpose. But data-driven deformation models, inspired partly from advances in computer vision, are getting increasingly popular due to their greater realism. With relatively cheap consumer-grade capture technologies, robust 3D deformation models can be built from the "big-data" of 3D deformations. It is an exciting opportunity for computer vision researchers to contribute to several new real-world applications.
Robust deformable models are also a powerful tool for solving challenging computer vision problems, as they provide more accurate priors than can be obtained from the images themselves. However, the knowledge of 3D surface deformation methods and 3D geometry processing is not as wide-spread in the computer vision community as it is in computer graphics. In this tutorial, we aim to bridge this gap.
In the course, we will first introduce 3D rigging methods that are used in computer graphics and review the state-of-the-art in 3D mesh deformation editing. We do not pre-suppose any prior knowledge from the attendees, apart from basic linear algebra. Specifically, we will cover skeleton rigs for pose-editing of articulated meshes and blendshape models for facial deformation editing. We will briefly review 3D surface deformation using 3D Laplacian differential coordinates, such that local shape properties are preserved over deformation.
Then, we will describe the 3D performance capture pipeline i.e, how to build a sequence of coherent 3D meshes from temporally disconnect sensor input, such that the mesh topology and vertex connectivity are preserved over time. We describe systems based on multi-view capture in an indoor studio, as well as simpler systems based on sparser set-ups and depth cameras. This produces a mesh sequence for visualization and accurate 3D non-rigid motion detail, but still unsuitable for direct manipulation and editing.
Then, we will describe a set of methods for converting raw mesh sequences into rigged models that can be manipulated and pose-edited. We describe methods for embedding a skeleton rig into an input mesh, as well as converting a given mesh sequence into a rigged skeleton animation.
Finally, we describe how to move beyond pre-defined motion rigs and build data-driven 3D deformation models from the bottom-up. We describe a method for building a statistical muscle deformation model from captured videos, which describes variation due to human body shape, body pose as well as external forces acting on the body. We describe an automatic method for decomposing a given mesh animation into sparse, localized deformation components which can be intuitively manipulated and edited. We conclude by listing a variety of applications for statistical deformation models in computer vision.
This is an introductory tutorial, since we do not pre-suppose any prior knowledge from the attendees, apart from basic linear algebra. Researchers working in computer vision applications for computer graphics will be most interested. But we hope to attract people also working in fundamental problems such as 3D object tracking, shape-from-X and statistical priors for low-level vision.
Kiran Varanasi is a researcher at Technicolor R&I in the domain of "acquisition and modelling". His research interests are
in 3D computer vision and computer graphics, especially in the spatio-temporal modeling of dynamic 3D scenes. He obtained his
PhD from INRIA in Grenoble, France in 2010. He later worked as a post-doctoral researcher at Max Planck Institute for computer
science in Saarbruecken, Germany (http://www.mpi-inf.mpg.de/~varanasi).
Edilson Aguiar is a professor of computer science at CEUNES at UFES in Brazil. His main research interests are in 3D performance capture, motion capture and animation. He obtained his PhD from Max Planck Institute for computer science in Saarbrucken, Germany in 2008 (http://www.mpi-inf.mpg.de/~edeaguia). He spent part of his PhD at Stanford University. He later worked at the Disney Research lab in Pittsburgh, USA as a post-doctoral researcher before taking up faculty position in Brazil.
Both the tutorial organizers have jointly authored a chapter "Rigging Captured Meshes" in the upcoming book "Real world visual computing" that will be published by CRC press in 2014. The book chapter has benefited from the editorial oversight of the book editors: Oliver Grau, Markus Magnor, Olga Sorkine-Hornung and Christian Theobalt. Prof. Christian Theobalt has been particularly instrumental in the discovery of the key ideas presented in the course, through his research supervision of both the presenters during their work at Max Planck Institute (MPI).