QuadVerse: An Integrated Framework Aligning Visual-Physical Reality for Quadruped Simulation

TL;DR

Compared to traditional simulators (e.g. Isaac Gym), QuadVerse enables high-fidelity batched ego-view rendering built on geometry-constrained 3DGS, collision-ready semantic mesh extraction with prior-posterior contact calibration, and residual actuator compensation via trajectory replay for quadruped locomotion and navigation with robust sim-to-real transfer.

Abstract

Simulation is central to robot learning, yet the sim-to-real gap remains a major bottleneck. Existing approaches often tackle visual or dynamic gaps separately, overlooking how these individual mismatches accumulate and propagate throughout the robot's state evolution.

In this paper, we introduce QuadVerse, an integrated framework that uses reconstructed scenes as a calibration substrate for aligning visual perception, physical interaction, and actuator dynamics. From captured RGB videos, we reconstruct geometry-constrained 3D Gaussian Splatting (3DGS) scenes that support batched photorealistic ego-view rendering and collision-ready semantic mesh extraction. The meshes further enable contact calibration by initializing spatially varying friction priors and refining them through trajectory-based posterior search. To address remaining actuator discrepancies, QuadVerse trains a residual dynamics compensator by replaying real-world trajectories on the contact-calibrated terrain, reducing the entanglement between terrain-induced contact errors and actuator non-idealities.

Experiments show that QuadVerse improves reconstruction quality and locomotion tracking over relevant baselines. Leveraging this foundation, we demonstrate robust zero-shot visual-navigation policy deployment without task-specific real-world rollouts.

Framework

Overview of QuadVerse: reconstruction and calibration, dynamics compensation, sim-to-real deployment

Overview of QuadVerse. (1) Reconstruction and Calibration: QuadVerse reconstructs 3DGS scenes for batched ego-view rendering and extracts collision-ready semantic meshes for contact calibration. (2) Dynamics Compensation: A dynamics compensator is trained using RL by replaying real-world trajectories on the contact-calibrated terrain; the locomotion policy is then fine-tuned under the corrected dynamics. (3) Sim-to-Real Deployment: Policies trained in QuadVerse are deployed zero-shot to outdoor visual navigation tasks without task-specific real-world rollouts.

Experiments

Scene Reconstruction

We evaluate QuadVerse on diverse outdoor scenes, including ramps, grass, staircases, and mixed pavement. For each scene, we show (a) rendered RGB image, (b) rendered normal map, (c) extracted mesh, and (d) simulated robot-view image at quadruped eye level. These results illustrate high-fidelity visual rendering, coherent geometry reconstruction, and faithful simulation of the robot's egocentric perception.

(a) Rendered Image (b) Rendered Normal (c) Extracted Mesh (d) Rendered Robot View

Contact Calibration

We evaluate contact fidelity on mixed-friction terrain. Uniform-friction simulation fails to capture the real robot's traction loss, while QuadVerse combines semantic friction priors with trajectory-based posterior calibration to better reproduce slippage and maintain closer trajectory alignment.

Dynamics: Residual Compensation & Trajectory Tracking

Dynamics comparison

Open-loop replay of recorded joint-space commands is performed on the contact-calibrated terrain. Without compensation, the nominal simulator suffers from actuator mismatch, leading to accumulated joint-space tracking errors and unstable replay. With QuadVerse residual actuator compensation, the simulated robot maintains a more stable gait and more closely follows the real-world reference.

Trajectory tracking comparison

Real-world right-turn task: the baseline policy (trained in standard Isaac Gym) exhibits significant drift from the desired trajectory due to imprecise actuator dynamics modeling. The policy fine-tuned in QuadVerse under compensated dynamics accurately tracks the circular path, demonstrating that the learned dynamics compensation transfers to the physical system.

Zero-Shot Sim-to-Real Navigation

Vision-based goal-seeking in unstructured grassland: the robot must locate a colored cone from RGB observations within 25 seconds. Policy trained in QuadVerse achieves 84% success rate on the real robot, closely approaching 92% in simulation. Left: training and evaluation in QuadVerse simulation (ego view from GS renderer). Right: zero-shot deployment on the physical Unitree Go2 (external view + robot head camera inset).

Simulation (QuadVerse)

Real deployment

BibTeX

@misc{chen2026quadverse,
  title         = {QuadVerse: An Integrated Framework Aligning Visual-Physical Reality for Quadruped Simulation},
  author        = {Chen, Yuxiang and Wang, Yuanhao and Zhang, Ziheng and Zhang, Meng and Liu, Yu and Jia, Yufei and Wang, Tiancai and Zhou, Erjin and Xie, Jin},
  year          = {2026},
  eprint        = {2606.07118},
  archivePrefix = {arXiv},
  primaryClass  = {cs.RO},
  url           = {https://arxiv.org/abs/2606.07118},
}