Demo of VNect
Real-time 3D Human Pose Estimation
with a Single RGB Camera

Now leaner, faster and more accurate!!


We present the first real-time method to capture the full global 3D skeletal pose of a human in a stable, temporally consistent manner using a single RGB camera. Our method combines a new convolutional neural network (CNN) based pose regressor with kinematic skeleton fitting. Our novel fully-convolutional pose formulation regresses 2D and 3D joint positions jointly in real time and does not require tightly cropped input frames. A real-time kinematic skeleton fitting method uses the CNN output to yield temporally stable 3D global pose reconstructions on the basis of a coherent kinematic skeleton.

The demo is based on an improved version of the system described in the SIGGRAPH'17 paper of the same name. The input resolution and the skeletal fitting have been optimized to make the system leaner and faster, enabling it to run on laptop GPUs. The system now runs at approx. 40fps which allows smoother and more accurate pose tracking. The system additionally enjoys higher accuracy on account of being trained with more training data and an additional loss term to prevent limb confusion.

Real-time Demo Examples


BibTeX, 1 KB

  author = {Mehta, Dushyant and Sridhar, Srinath and Sotnychenko, Oleksandr and Rhodin, Helge and Shafiei, Mohammad and Seidel, Hans-Peter and Xu, Weipeng and Casas, Dan and Theobalt, Christian},
  title = {VNect: Real-time 3D Human Pose Estimation with a Single RGB Camera},
  journal = {ACM Transactions on Graphics},
  url = {},
  numpages = {14},
  month = July,
  year = {2017}


Dushyant Mehta