Real-time 3D Human Translation and Pose Estimation with Six Inertial Sensors


Motion capture is facing some new possibilities brought by the inertial sensing technologies which do not suffer from occlusion or wide-range recordings as vision-based solutions do. However, as the recorded signals are sparse and quite noisy, online performance and global translation estimation turn out to be two key difficulties. In this paper, we present TransPose, a DNN-based approach to perform full motion capture (with both global translations and body poses) from only 6 Inertial Measurement Units (IMUs) at over 90 fps. For body pose estimation, we propose a multi-stage network that estimates leaf-to-full joint positions as intermediate results. This design makes the pose estimation much easier, and thus achieves both better accuracy and lower computation cost. For global translation estimation, we propose a supporting-foot-based method and an RNN-based method to robustly solve for the global translations with a confidence-based fusion technique. Quantitative and qualitative comparisons show that our method outperforms the state-of-the-art learning- and optimization-based methods with a large margin in both accuracy and efficiency. As a purely inertial sensor-based approach, our method is not limited by environmental settings (e.g., fixed cameras), making the capture free from common difficulties such as wide-range motion space and strong occlusion.



    author = {Yi, Xinyu and Zhou, Yuxiao and Xu, Feng},
    title = {TransPose: Real-time 3D Human Translation and Pose Estimation with Six Inertial Sensors},
    journal = {ACM Transactions on Graphics}, 
    year = {2021}, 
    month = {08},
    volume = {40},
    number = {4}, 
    articleno = {86},
    publisher = {ACM}


We would like to thank Yinghao Huang and the other DIP authors for providing the SMPL parameters for TotalCapture dataset and the SIP/SOP results. We would also like to thank Associate Professor Yebin Liu for the support on the IMU sensors. We also appreciate Hao Zhang, Dong Yang, Wenbin Lin, and Rui Qin for the extensive help with the live demos. We thank Chengwei Zheng for the proofreading, and the reviewers for their valuable comments. This work was supported by the National Key R&D Program of China 2018YFA0704000, the NSFC (No.61822111, 61727808) and Beijing Natural Science Foundation (JQ19015). Feng Xu is the corresponding author.

Contact Us

Tsinghua University
Beijing, China
Tsinghua University