Planning under uncertainty for real-world robotics tasks, such as autonomous driving, requires reasoning in enormous high-dimensional belief spaces, rendering the problem computationally intensive. While parallelization offers scalability, existing hybrid CPU-GPU solvers face critical bottlenecks due to host-device synchronization latency and branch divergence on SIMT architectures, limiting their utility for real-time planning and hindering real-robot deployment. We present Vec-QMDP, a CPU-native parallel planner that aligns POMDP search with modern CPUs' SIMD architecture, achieving 227× - 1073× speedup over state-of-the-art serial planners. Vec-QMDP adopts a Data-Oriented Design (DOD), refactoring scattered, pointer-based data structures into contiguous, cache-efficient memory layouts. We further introduce a hierarchical parallelism scheme: distributing sub-trees across independent CPU cores and SIMD lanes, enabling fully vectorized tree expansion and collision checking. Efficiency is maximized with the help of UCB load balancing across trees and a vectorized STR-tree for coarse-level collision checking. Evauated on large-scale autonomous driving benchmarks, Vec-QMDP achieves state-of-the-art planning performance with millisecond-level latency, establishing CPUs as a high-performance computing platform for large-scale planning under uncertainty.
Vec-QMDP scales up a state-of-the-art POMDP planner Hi-Drive for autonomous driving by leveraging SIMD parallelism, demonstrating how belief tree search and belief-space trajectory optimization can be extensively vectorized for robotics tasks in complex dynamic environments. (a) Sample the belief into M × N scenarios in an Structure of arrays (SoA) layout. (b) Vectorized QMDP search: after the first action, scenario trees run in parallel on M CPU threads; within each thread, SIMD global vectorization batches transition dynamics across scenarios and SIMD local vectorization accelerates within-node collision checks. (c) Vectorized trajectory optimization: generate candidates and use block-diagonal cross-scenario evaluation within minibatches to select optimal trajectory.
| Type | Planner | Val14 | Test14-random | Test14-hard | Inference / Planning Time (ms) ↓ | |||
|---|---|---|---|---|---|---|---|---|
| R | NR | R | NR | R | NR | |||
| Expert | Log-replay | 80.32 | 93.53 | 75.86 | 94.03 | 68.80 | 85.96 | - |
| Learning- based |
PLUTO | 78.11 | 88.89 | 78.62 | 89.90 | 59.74 | 70.03 | - |
| Diffusion Planner | 82.80 | 89.87 | 82.93 | 89.19 | 69.22 | 75.99 | 80 | |
| Hybrid | PDM-Hybrid | 92.11 | 92.77 | 91.28 | 90.10 | 76.07 | 65.99 | 171 |
| PLUTO w/ refine. | 76.88 | 92.88 | 90.29 | 92.23 | 76.88 | 80.08 | - | |
| Diff. Planner w/ refine. | 92.90 | 94.26 | 91.75 | 94.80 | 82.00 | 78.87 | >80 | |
| Model- based |
HiDrive | 93.15 | 93.62 | 92.31 | 93.71 | 83.18 | 81.41 | 92 |
| VecQMDP (match, Ours) | 93.15±0.11 | 94.16±0.03 | 92.51±0.00 | 95.21±0.00 | 84.23±0.35 | 82.30±0.48 | 9 | |
| VecQMDP (best, Ours) | 93.22±0.06 | 94.36±0.02 | 93.04±0.05 | 95.21±0.00 | 84.23±0.35 | 82.84±0.11 | 14 | |
Comparison results of VecQMDP and state-of-the-art methods on
nuPlan dataset.
Bold indicates best; underscored indicates second-best.
Values show mean ± standard error. NR: non-reactive mode. R: reactive mode.
Tree construction throughput. (Left) Edges/ms vs. traffic density. (Right) Speedup over serial HiDrive (227×-1073×), increasing with density.
@article{
jin2026vec,
title={Vec-QMDP: Vectorized POMDP Planning on CPUs for Real-Time Autonomous Driving},
author={Jin, Xuanjin and Dong, Yanxin and Sun, Bin and Xu, Huan and Hao, Zhihui and Lang, XianPeng and Cai, Panpan},
journal={arXiv preprint arXiv:2602.08334},
year={2026},
eprint={2602.08334},
archivePrefix={arXiv},
primaryClass={cs.RO}
url={https://arxiv.org/abs/2602.08334}
}