Download Video: HD (MP4, 107 MB)

Abstract

Human performance capture is a highly important computer vision problem with many applications in movie production and virtual/augmented reality. Many previous performance capture approaches either required expensive multi-view setups or did not recover dense space-time coherent geometry with frame-to-frame correspondences. We propose a novel deep learning approach for monocular dense human performance capture. Our method is trained in a weakly supervised manner based on multi-view supervision completely removing the need for training data with 3D ground truth annotations. The network architecture is based on two separate networks that disentangle the task into a pose estimation and a non-rigid surface deformation step. Extensive qualitative and quantitative evaluations show that our approach outperforms the state of the art in terms of quality and robustness.

Downloads


Dataset


Citation

BibTeX, 1 KB

@inproceedings{deepcap,
    title = {DeepCap: Monocular Human Performance Capture Using Weak Supervision},
    author = {Habermann, Marc and Xu, Weipeng and Zollhoefer, Michael and Pons-Moll, Gerard and Theobalt, Christian},
    booktitle = {{IEEE} Conference on Computer Vision and Pattern Recognition (CVPR)},
    month = {jun},
    organization = {{IEEE}},
    year = {2020},
}
				

Acknowledgments

This work was funded by the ERC Consolidator Grant 4DRepLy (770784).

Contact

For questions, clarifications, please get in touch with:
Marc Habermann
mhaberma@mpi-inf.mpg.de

This page is Zotero translator friendly. Page last updated Imprint. Data Protection.