HiFECap: Monocular High-Fidelity and Expressive Capture of Human Performances

(BMVC 2022)

Download Video: HD (MP4, 462.8 MB)


Monocular 3D human performance capture is indispensable for many applications in computer graphics and vision for enabling immersive experiences. However, detailed capture of humans requires tracking of multiple aspects, including the skeletal pose, the dynamic surface, which includes clothing, hand gestures as well as facial expressions. No existing monocular method allows joint tracking of all these components. To this end, we propose HiFECap, a new neural human performance capture approach, which simultaneously captures human pose, clothing, facial expression, and hands just from a single RGB video. We demonstrate that our proposed network architecture, the carefully designed training strategy, and the tight integration of parametric face and hand models to a template mesh enable the capture of all these individual aspects. Importantly, our method also captures high-frequency details, such as deforming wrinkles on the clothes, better than the previous works. Furthermore, we show that HiFECap outperforms the state-of-the-art human performance capture approaches qualitatively and quantitatively while for the first time capturing all aspects of the human.


  • Paper

  • Supplemental document

  • Main video


title = {HiFECap: Monocular High-Fidelity and Expressive Capture of Human Performances},
author = {Jiang, Yue and Habermann, Marc and Golyanik, Vladislav and Theobalt, Christian},
year = {2022},


For questions, clarifications, please get in touch with:
Yue Jiang
Marc Habermann
Vladislav Golyanik

This page is Zotero translator friendly. Page last updated Imprint. Data Protection.