MPI-INF Logo
Homepage

Contact

Firstname Lastname

Marc Habermann

Max-Planck-Institut für Informatik
Department 4: Computer Graphics
Graphics, Vision and Video
 office: Campus E1 4, Room 224
Saarland Informatics Campus
66123 Saarbrücken
Germany
 email: mhaberma@mpi-inf.mpg.de
 phone: +49 681 9325-4024
 fax: +49 681 9325-4099

Research Interests

  • Computer Vision and Computer Graphics

  • Human Performance Capture

  • Reconstruction of Non-Rigid Deformations from RGB Video

  • Texture-Based Descriptors

Publications

DeepCap: Monocular Human Performance Capture Using Weak Supervision

Marc Habermann   Weipeng Xu   Michael Zollhoefer   Gerard Pons-Moll   Christian Theobalt

CVPR 2020 (Oral) CVPR 2020 Best Student Paper Honorable Mention

Human performance capture is a highly important computer vision problem with many applications in movie production and virtual/augmented reality. Many previous performance capture approaches either required expensive multi-view setups or did not recover dense space-time coherent geometry with frame-to-frame correspondences. We propose a novel deep learning approach for monocular dense human performance capture. Our method is trained in a weakly supervised manner based on multi-view supervision completely removing the need for training data with 3D ground truth annotations. The network architecture is based on two separate networks that disentangle the task into a pose estimation and a non-rigid surface deformation step. Extensive qualitative and quantitative evaluations show that our approach outperforms the state of the art in terms of quality and robustness.

[pdf], [video], [project page], [arxiv]



Neural Human Video Rendering by Learning Dynamic Textures and Rendering-to-Video Translation

Lingjie Liu   Weipeng Xu   Marc Habermann   Michael Zollhoefer   Florian Bernard   Hyeongwoo Kim   Wenping Wang   Christian Theobalt

TVCG 2020

Synthesizing realistic videos of humans using neural networks has been a popular alternative to the conventional graphics-based rendering pipeline due to its high efficiency. Existing works typically formulate this as an image-to-image translation problem in 2D screen space, which leads to artifacts such as over-smoothing, missing body parts, and temporal instability of fine-scale detail, such as pose-dependent wrinkles in the clothing. In this paper, we propose a novel human video synthesis method that approaches these limiting factors by explicitly disentangling the learning of time-coherent fine-scale details from the embedding of the human in 2D screen space. More specifically, our method relies on the combination of two convolutional neural networks (CNNs). Given the pose information, the first CNN predicts a dynamic texture map that contains time-coherent high-frequency details, and the second CNN conditions the generation of the final video on the temporally coherent output of the first CNN. We demonstrate several applications of our approach, such as human reenactment and novel view synthesis from monocular video, where we show significant improvement over the state of the art both qualitatively and quantitatively

[pdf], [arxiv]



EventCap: Monocular 3D Capture of High-Speed Human Motions using an Event Camera

Lan Xu   Weipeng Xu   Vladislav Golyanik   Marc Habermann   Lu Fang   Christian Theobalt

CVPR 2020 (Oral)

The high frame rate is a critical requirement for capturing fast human motions. In this setting, existing markerless image-based methods are constrained by the lighting requirement, the high data bandwidth and the consequent high computation overhead. In this paper, we propose EventCap — the first approach for 3D capturing of high-speed human motions using a single event camera. Our method combines model-based optimization and CNN-based human pose detection to capture high-frequency motion details and to reduce the drifting in the tracking. As a result, we can capture fast motions at millisecond resolution with significantly higher data efficiency than using highframe rate videos. Experiments on our new event-based fast human motion dataset demonstrate the effectiveness and accuracy of our method, as well as its robustness to challenging lighting conditions.

[pdf], [video], [project page], [arxiv]



Monocular Real-time Hand Shape and Motion Capture using Multi-modal Data

Yuxiao Zhou   Marc Habermann   Weipeng Xu   Ikhsanul Habibie   Christian Theobalt   Feng Xu

CVPR 2020

We present a novel method for monocular hand shape and pose estimation at unprecedented runtime performance of 100fps and at state-of-the-art accuracy. This is enabled by a new learning based architecture designed such that it can make use of all the sources of available hand training data: image data with either 2D or 3D annotations, as well as stand-alone 3D animations without corresponding image data. It features a 3D hand joint detection module and an inverse kinematics module which regresses not only 3D joint positions but also maps them to joint rotations in a single feed-forward pass. This output makes the method more directly usable for applications in computer vision and graphics compared to only regressing 3D joint positions. We demonstrate that our architectural design leads to a significant quantitative and qualitative improvement over the state of the art on several challenging benchmarks. We will make our code publicly available for future research.

[pdf], [video], [project page], [arxiv]



LiveCap: Real-time Human Performance Capture from Monocular Video

Marc Habermann   Weipeng Xu   Michael Zollhoefer   Gerard Pons-Moll   Christian Theobalt

ACM ToG 2019 @ SIGGRAPH 2019

We present the first real-time human performance capture approach that reconstructs dense, space-time coherent deforming geometry of entire humans in general everyday clothing from just a single RGB video.We propose a novel two-stage analysis-by-synthesis optimization whose formulation and implementation are designed for high performance. In the first stage, a skinned template model is jointly fitted to background subtracted input video, 2D and 3D skeleton joint positions found using a deep neural network, and a set of sparse facial landmark detections. In the second stage, dense non-rigid 3D deformations of skin and even loose apparel are captured based on a novel real-time capable algorithm for non-rigid tracking using dense photometric and silhouette constraints. Our novel energy formulation leverages automatically identified material regions on the template to model the differing non-rigid deformation behavior of skin and apparel. The two resulting nonlinear optimization problems per-frame are solved with specially-tailored data-parallel Gauss-Newton solvers. In order to achieve real-time performance of over 25Hz, we design a pipelined parallel architecture using the CPU and two commodity GPUs. Our method is the first real-time monocular approach for full-body performance capture. Our method yields comparable accuracy with off-line performance capture techniques, while being orders of magnitude faster.

[pdf], [video], [project page], [arxiv]



Neural Animation and Reenactment of Human Actor Videos

Lingjie Liu   Weipeng Xu   Michael Zollhoefer   Hyeongwoo Kim   Florian Bernard   Marc Habermann   Wenping Wang   Christian Theobalt

ACM ToG 2019 @ SIGGRAPH 2019

We propose a method for generating (near) video-realistic animations of real humans under user control. In contrast to conventional human character rendering, we do not require the availability of a production-quality photo-realistic 3D model of the human, but instead rely on a video sequence in conjunction with a (medium-quality) controllable 3D template model of the person. With that, our approach significantly reduces production cost compared to conventional rendering approaches based on production-quality 3D models, and can also be used to realistically edit existing videos. Technically, this is achieved by training a neural network that translates simple synthetic images of a human character into realistic imagery. For training our networks, we first track the 3D motion of the person in the video using the template model, and subsequently generate a synthetically rendered version of the video. These images are then used to train a conditional generative adversarial network that translates synthetic images of the 3D model into realistic imagery of the human. We evaluate our method for the reenactment of another person that is tracked in order to obtain the motion data, and show video results generated from artist-designed skeleton motion. Our results outperform the state-of-the-art in learning-based human image synthesis.

[pdf], [video], [project page], [arxiv]



NRST: Non-rigid Surface Tracking from Monocular Video

Marc Habermann   Weipeng Xu   Helge Rhodin   Michael Zollhoefer   Gerard Pons-Moll   Christian Theobalt

Oral @ German Conference on Pattern Recognition (GCPR) 2018

We propose an efficient method for non-rigid surface tracking from monocular RGB videos. Given a video and a template mesh, our algorithm sequentially registers the template non-rigidly to each frame.We formulate the per-frame registration as an optimization problem that includes a novel texture term specifically tailored towards tracking objects with uniform texture but fine-scale structure, such as the regular micro-structural patterns of fabric. Our texture term exploits the orientation information in the micro-structures of the objects, e.g., the yarn patterns of fabrics. This enables us to accurately track uniformly colored materials that have these high frequency micro-structures, for which traditional photometric terms are usually less effective. The results demonstrate the effectiveness of our method on both general textured non-rigid objects and monochromatic fabrics.

[pdf], [video], [project page]



Awards & Honors

Teaching

  • April 2020 - August 2020
    Supervisor for Computer Vision and Machine Learning for Computer Graphics, Lecturer: Prof. Dr. Christian Theobalt, Dr. Mohamed Elgharib, Dr. Vladislav Golyanik at the Saarland University, Saarbrücken, Germany

  • April 2019 - August 2019
    Supervisor for Computer Vision and Machine Learning for Computer Graphics, Lecturer: Prof. Dr. Christian Theobalt, Dr. Mohamed Elgharib, Dr. Vladislav Golyanik at the Saarland University, Saarbrücken, Germany

  • April 2018 - August 2018
    Supervisor for 3D Shape Analysis, Lecturer: Dr. Florian Bernard and Prof. Dr. Christian Theobalt at the Saarland University, Saarbrücken, Germany

  • September 2016 - June 2018
    Tutor for Seminarfach 3D Modellierung at the Leibniz Gymnasium/Albertus Magnus Gymnasium, Sankt Ingbert, Germany

  • July 2013 - September 2016:
    Tutor for 3D Modellierung Alte Schmelz, Sankt Ingbert, Germany

Education

  • September 2017 - present
    PhD student at the Max Planck Institute for Informatics in the GVV Group, Saarbrücken, Germany

  • April 2016 - November 2017
    Master Studies in Computer Science at Saarland University, Saarbrücken, Germany
    Title of Master's Thesis (Diplomarbeit): RONDA - Reconstruction of Non-rigid Surfaces from High Resolution Video (supervisor: Prof. Dr. Christian Theobalt) (PDF)

  • October 2012 - April 2016:
    Bachelor Studies in Computer Science at Saarland University, Saarbrücken, Germany
    Title of Bachelor's Thesis: Drone Path Planning (supervisor: Dr.-Ing. Tobias Ritschel) (PDF)

  • July 2012:
    Abitur at the Albertus Magnus Gymnasium, Sankt Ingbert, Germany

Private

Hobbies

  • Photography

  • Bouldering

  • Reading Books