Vec-QMDP: Vectorized POMDP Planning on CPUs for Real-Time Autonomous Driving

A teaser video introducing Vec-QMDP

Abstract

Planning under uncertainty for real-world robotics tasks, such as autonomous driving, requires reasoning in enormous high-dimensional belief spaces, rendering the problem computationally intensive. While parallelization offers scalability, existing hybrid CPU-GPU solvers face critical bottlenecks due to host-device synchronization latency and branch divergence on SIMT architectures, limiting their utility for real-time planning and hindering real-robot deployment. We present Vec-QMDP, a CPU-native parallel planner that aligns POMDP search with modern CPUs' SIMD architecture, achieving 227× - 1073× speedup over state-of-the-art serial planners. Vec-QMDP adopts a Data-Oriented Design (DOD), refactoring scattered, pointer-based data structures into contiguous, cache-efficient memory layouts. We further introduce a hierarchical parallelism scheme: distributing sub-trees across independent CPU cores and SIMD lanes, enabling fully vectorized tree expansion and collision checking. Efficiency is maximized with the help of UCB load balancing across trees and a vectorized STR-tree for coarse-level collision checking. Evauated on large-scale autonomous driving benchmarks, Vec-QMDP achieves state-of-the-art planning performance with millisecond-level latency, establishing CPUs as a high-performance computing platform for large-scale planning under uncertainty.

Contributions

  • 🏆 First CPU-Native Vectorized POMDP Planner
  • 🛠️ DOD-Driven "Global + Local" Acceleration
  • 🚀 1000× Speedup Unlocks 14ms SOTA Driving

Overview

Overview of Vec-QMDP architecture showing belief tree search and trajectory optimization

Vec-QMDP scales up a state-of-the-art POMDP planner Hi-Drive for autonomous driving by leveraging SIMD parallelism, demonstrating how belief tree search and belief-space trajectory optimization can be extensively vectorized for robotics tasks in complex dynamic environments. (a) Sample the belief into M × N scenarios in an Structure of arrays (SoA) layout. (b) Vectorized QMDP search: after the first action, scenario trees run in parallel on M CPU threads; within each thread, SIMD global vectorization batches transition dynamics across scenarios and SIMD local vectorization accelerates within-node collision checks. (c) Vectorized trajectory optimization: generate candidates and use block-diagonal cross-scenario evaluation within minibatches to select optimal trajectory.

Comparison Results

Driving Performance Comparison on nuPlan

Type Planner Val14 Test14-random Test14-hard Inference / Planning Time (ms) ↓
R NR R NR R NR
Expert Log-replay 80.32 93.53 75.86 94.03 68.80 85.96 -
Learning-
based
PLUTO 78.11 88.89 78.62 89.90 59.74 70.03 -
Diffusion Planner 82.80 89.87 82.93 89.19 69.22 75.99 80
Hybrid PDM-Hybrid 92.11 92.77 91.28 90.10 76.07 65.99 171
PLUTO w/ refine. 76.88 92.88 90.29 92.23 76.88 80.08 -
Diff. Planner w/ refine. 92.90 94.26 91.75 94.80 82.00 78.87 >80
Model-
based
HiDrive 93.15 93.62 92.31 93.71 83.18 81.41 92
VecQMDP (match, Ours) 93.15±0.11 94.16±0.03 92.51±0.00 95.21±0.00 84.23±0.35 82.30±0.48 9
VecQMDP (best, Ours) 93.22±0.06 94.36±0.02 93.04±0.05 95.21±0.00 84.23±0.35 82.84±0.11 14

Comparison results of VecQMDP and state-of-the-art methods on nuPlan dataset.
Bold indicates best; underscored indicates second-best. Values show mean ± standard error. NR: non-reactive mode. R: reactive mode.

Computational Throughput Comparison

Throughput comparison: edges per millisecond vs traffic density
(a) Throughput (edges/ms)
Speedup comparison over serial HiDrive
(b) Speedup over HiDrive

Tree construction throughput. (Left) Edges/ms vs. traffic density. (Right) Speedup over serial HiDrive (227×-1073×), increasing with density.

Qualitative Results

BibTeX


            @article{
            jin2026vec,
            title={Vec-QMDP: Vectorized POMDP Planning on CPUs for Real-Time Autonomous Driving},
            author={Jin, Xuanjin and Dong, Yanxin and Sun, Bin and Xu, Huan and Hao, Zhihui and Lang, XianPeng and Cai, Panpan},
            journal={arXiv preprint arXiv:2602.08334},
            year={2026},
            eprint={2602.08334},
            archivePrefix={arXiv},
            primaryClass={cs.RO}
            url={https://arxiv.org/abs/2602.08334}
            }