Yuqi Wang (Robert)
I am currently a researcher at Bytedance. I received my Ph.D. degree in the NLPR, Institute of Automation, Chinese Academy of Sciences (CASIA), supervised by Prof. Zhaoxiang Zhang. Prior to that, I obtained my Bachelor's degree in Automation (Robotics) from the College of Control Science and Engineering at Zhejiang University (ZJU) in 2020.
Additionally, I interned at Meituan, under the supervision of Fei Xia, and at BAAI, where I was mentored by Dr. Xinlong Wang.
My overarching research passion lies in
embodied understanding and planning in open-world environments,
with a particular focus on exploring world models and their applications in the physical world.
My research interests span
computer vision,
unsupervised learning,
3D perception,
world models, and
video generation,
all aiming toward comprehensive open-world 3D scene perception and understanding.
Email  / 
Google Scholar  / 
Github  / 
Curriculum Vitae
|
|
News
2025-06: Two papers are accepted to ICCV 2025.
2025-01: Two papers are accepted to ICLR 2025.
2024-09: One paper on driving world model is accepted to NeurIPS 2024 Dataset Track.
2024-07: One paper on indoor monocular occupancy is accepted to ECCV 2024.
2024-02: Two papers on driving world model and occupancy prediction is accepted to CVPR 2024.
2023-12: One paper on multi-agent representation learning is accepted to Nation Science Review.
2023-07: One paper on unsupervised instance segmentation is accepted to TPAMI.
2023-02: One paper on 3D object perception is accepted to CVPR 2023.
2022-09: One paper on unsupervised object discovery is accepted to NeurIPS 2022.
|
Research
* indicates equal contribution
|
|
Unified Vision-Language-Action Model
Yuqi Wang, Xinghang Li, Wenxuan Wang, Junbo Zhang, Yingyan Li, Yuntao Chen, Xinlong Wang, Zhaoxiang Zhang
arXiv, 2025
[paper] [Page] [Code]
unified vision-language-action model for embodied intelligence
|
|
DrivingGPT: Unifying Driving World Modeling and Planning with Multi-modal Autoregressive Transformers
Yuntao Chen, Yuqi Wang, Zhaoxiang Zhang
ICCV, 2025
[paper] [Page]
Unifying world model and planning in autonomous driving
|
|
End-to-End Driving with Online Trajectory Evaluation via BEV World Model
Yingyan Li*, Yuqi Wang*, Yang Liu, Jiawei He, Lue Fan, Zhaoxiang Zhang
ICCV, 2025
[paper] [Code]
An end-to-end autonomous driving framework that leverages a BEV-based world model to predict future agent states, enabling online trajectory evaluation and selection.
|
|
FreeVS: Generative View Synthesis on Free Driving Trajectory
Qitai Wang, Lue Fan, Yuqi Wang, Yuntao Chen, Zhaoxiang Zhang
ICLR, 2025
[paper] [Page] [Code]
Generative view synthesis on free driving trajectory
|
|
Enhancing End-to-End Autonomous Driving with Latent World Model
Yingyan Li, Lue Fan,Jiawei He, Yuqi Wang, Yuntao Chen, Zhaoxiang Zhang
ICLR, 2025
[paper] [Code]
Latent world model as a self-supervised learning proxy for end-to-end autonomous driving
|
|
DrivingDojo Dataset: Advancing Interactive and Knowledge-Enriched Driving World Model
Yuqi Wang*, Ke Cheng*, Jiawei He*, Qitai Wang*, Hengchen Dai, Yuntao Chen, Fei Xia, Zhaoxiang Zhang
NeurIPS, 2024, D&B Track
[paper] [Page] [code]
DrivingDojo dataset features video clips with a complete set of driving maneuvers, diverse multi-agent interplay, and rich open-world driving knowledge, laying a stepping stone for future world model development.
|
|
Is Sora a World Simulator? A Comprehensive Survey on General World Models and Beyond
Zheng Zhu*,
Xiaofeng Wang*,
Wangbo Zhao*,
Chen Min*,
Nianchen Deng*,
Min Dou*,
Yuqi Wang*,
Botian Shi,
Kai Wang,
Chi Zhang,
Yang You,
Zhaoxiang Zhang,
Dawei Zhao,
Liang Xiao,
Jian Zhao,
Jiwen Lu,
Guan Huang
arXiv, 2024
[paper] [code]
A comprehensive survey on general world models, including world models for video generation, autonomous driving and autonomous agents.
|
|
Monocular Occupancy Prediction for Scalable Indoor Scenes
Hongxiao Yu, Yuqi Wang, Yuntao Chen, Zhaoxiang Zhang
ECCV, 2024
[paper] [page] [code]
ISO, a method for monocular occupancy prediction in indoor scenes.
|
|
Driving into the Future: Multiview Visual Forecasting and Planning with World Model for Autonomous Driving
Yuqi Wang*, Jiawei He*, Lue Fan*, Hongxin Li*, Yuntao Chen, Zhaoxiang Zhang
CVPR, 2024
[paper] [Page] [code]
Drive-WM, a pioneering multi-view world model for end-to-end autonomous driving.
|
|
PanoOcc: Unified Occupancy Representation for Camera-based 3D Panoptic Segmentation
Yuqi Wang, Yuntao Chen, Xingyu Liao, Lue Fan, Zhaoxiang Zhang
CVPR, 2024
[paper] [code]
PanoOcc, a method for camera-based 3D panoptic scene understanding.
|
|
Emergence of Machine Language: Towards Symbolic Intelligence with Neural Networks
Yuqi Wang, Xu-Yao Zhang, Cheng-Lin Liu, Tieniu Tan, Zhaoxiang Zhang
National Science Review (NSR), 2024
[paper]
Emergence of machine language.
|
|
Object Affinity Learning: Towards Annotation-Free Instance Segmentation
Yuqi Wang, Yuntao Chen, Zhaoxiang Zhang
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2024
[paper] [Code]
2D object discovery through depth and flow cues.
|
|
FrustumFormer: Adaptive Instance-aware Resampling for Multi-view 3D Detection
Yuqi Wang, Yuntao Chen, Zhaoxiang Zhang
CVPR, 2023
[paper] [Code] [Bilibili]
FrustumFormer, enhancing vision-based 3D object detection through 2D prior.
|
|
4D Unsupervised Object Discovery
Yuqi Wang, Yuntao Chen, Zhaoxiang Zhang
NeurIPS, 2022, (Spotlight)
[paper] [Code]
4D unsupervised object discovery using camera and LiDAR raw information.
|
Honors and Awards
2025 中科院院长奖
2025 北京市优秀毕业生
2024 National Scholarship / 国家奖学金
2023 朱李月华奖学金
2020 浙江省优秀毕业生
2018 National Scholarship / 国家奖学金
2018 中控奖学金
|
© Yuqi Wang | Last updated: July 12, 2025
|