In the Wild Human Pose Estimation Using Explicit 2D Features and Intermediate 3D Representations

Download Video: HD (MP4, 53 MB)


Convolutional Neural Network based approaches for monocular 3D human pose estimation usually require a large amount of training images with 3D pose annotations. While it is feasible to provide 2D joint annotations for large corpora of in-the-wild images with humans, providing accurate 3D annotations to such in-the-wild corpora is hardly feasible in practice. Most existing 3D labelled data sets are either synthetically created or feature in-studio images. 3D pose estimation algorithms trained on such data often have limited ability to generalize to real world scene diversity. We therefore propose a new deep learning based method for monocular 3D human pose estimation that shows high accuracy and generalizes better to in-the-wild scenes. It has a network architecture that comprises a new disentangled hidden space encoding of explicit 2D and 3D features, and uses supervision by a new learned projection model from predicted 3D pose. Our algorithm can be jointly trained on image data with 3D labels and image data with only 2D labels. It achieves state-of-the-art accuracy on challenging in the-wild data.



BibTeX, 1 KB

Author = {Habibie, Ikhsanul and Xu, Weipeng and Mehta, Dushyant and Pons-Moll, Gerard and Theobalt, Christian},
Title = {In the Wild Human Pose Estimation Using Explicit 2D Features and Intermediate 3D Representations},
Booktitle = {IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
Year = {2019},
Eprint = {Todo},


This work was supported by the ERC Consolidator Grant 4DRepLy (770784).
Gerard Pons-Moll is funded by the Deutsche Forschungsgemeinschaft (DFG. German Research Foundation) - 409792180.


For questions, clarifications, please get in touch with:
Ikhsanul Habibie

This page is Zotero translator friendly. Page last updated Imprint. Data Protection.