Table 2. Offline evaluation across ablations and baselines. Best, 2nd.
Full System (Full Data) Collision ↑ Smoothness ↑ Best of 1 ↓ Best of 15 ↓
GT 97.6 0.0 0.0
Ours 91.4 4.82 0.76 0.39
w/o DINOv3 features 90.6 4.76 0.81 0.41
Ours - Pilot 89.2 2.04 0.87 0.47
Component Ablations (Pilot Data)
w/o attention 86.6 2.78 1.00 0.49
w/o semantic 84.1 4.17 0.91 0.53
w/o VM or DINO (Traj only) 82.5 2.04 1.19 0.48
Baselines (Pilot Data)
Ours - Pilot 89.2 2.04 0.87 0.47
VAE-LSTM 84.5 9.09 1.01 N/A
CXA Transformer 80.3 3.70 1.25 N/A