WorldSample: Closed-loop Real-robot RL with World Modelling

A physically grounded world-model augmentation framework that turns scarce physical rollouts into useful synthetic experience—without replacing real robot interaction.

Yuquan Xue1, Le Xu2, Zeyi Liu3,1, Zhenyu Wu4, Zhengyi Gu1, Xinyang Song1, Bofang Jia1, and Ziwei Wang1†

1PINELab, Nanyang Technological University   2Tsinghua University   3Central South University   4Beijing University of Posts and Telecommunications

Paper RL Framework · Coming Soon World Model · Coming Soon Homepage
Real-robot tasks from the WorldSample evaluation suite.
82%average final success rate
vs. 56% with HiL-SERL
59%fewer real training steps
23K vs. 56K on average
29.89dual-view model PSNR
0.925 SSIM after adaptation

Motivation

Physical rollout is expensive.

Real-world reinforcement learning can improve beyond the coverage of demonstrations, but robot interaction is slow, costly, and only reveals one realized action-outcome path at a time.

WorldSample keeps physical rollouts as the anchor of learning. A task-adapted action-conditioned world model expands each rollout into locally counterfactual, reward-labelled trajectories, while Policy-Paced Learning admits synthetic data only when it is useful and safe for the learner.

WorldSample expands a real robot rollout into multiple synthetic trajectories and improves training performance.

Method

Real-synthetic Closed-Loop with Policy-Paced Learning.

WorldSample combines task-grounded generation with explicit controls over which synthetic experience is used and when it enters RL training.

WorldSample pipeline showing the real-synthetic data loop and Policy-Paced Learning.
Physical rollout data continually grounds world-model adaptation and policy improvement.

01 · REAL-SYNTHETIC DATA LOOP

Generate around what the robot actually experienced.

Real rollout segments seed locally perturbed, counterfactual action sequences. The post-trained world model predicts their futures and a reward model labels the resulting synthetic trajectories.

  • Task-adapted video world model
  • Counterfactual trajectory generation
  • Asynchronous generation and fine-tuning

02 · POLICY-PACED LEARNING

Trust synthetic data in proportion to policy readiness.

PPL balances generated successes and failures to stabilize critic values, then schedules the synthetic ratio from policy uncertainty on real robot states.

  • Q-aware sample selection
  • Uncertainty-guided data scheduling
  • Stabilized policy training with high sample efficiency

Real-robot results

Higher success with less physical interaction.

Across five manipulation tasks, WorldSample improves on the human-in-the-loop baseline while converging with substantially fewer real training steps.

Success and intervention curves comparing WorldSample and HiL-SERL across five manipulation tasks.
Success and intervention rate during online training. Blue: WorldSample; orange: HiL-SERL.
MethodPushingInsertionSortingPick & PlaceAssemblyAverage
VLAW86%47%78%76%32%64%
WMPO90%82%72%78%23%69%
HiL-SERL84%63%66%55%10%56%
WorldSample95%95%95%84%42%82%

Success rate from Table 1 of the arXiv paper. WorldSample uses 8K / 10K / 20K / 36K / 40K real training steps across the five tasks.

rollout data

Compare reality with the generated future.

Each pair is conditioned on the same task setting. Select a task to inspect the physical rollout beside its world-model-generated counterpart.

PushingSide-camera comparisonPress bread into a toaster by executing contact-rich object displacement.
PHYSICALREAL ROLLOUT
SYNTHETICWORLD-MODEL ROLLOUT

Experimental setting

Five task categories. One real-robot platform.

We evaluate on a Galaxea A1X robot arm with a binary gripper, using side-view and wrist-mounted Intel RealSense D435i cameras. The tasks cover contact-rich interaction, precision alignment, visual discrimination, grasping, and long-horizon assembly.

Resources

Read WorldSample and follow the code release.

PDFarXiv PaperarXiv:2607.02431v1 · July 2026
RLRL FrameworkComing Soon
WMWorld ModelComing Soon

Citation

BibTeX

@misc{xue2026worldsample,
  title   = {WorldSample: Closed-loop Real-robot RL with World Modelling},
  author  = {Xue, Yuquan and Xu, Le and Liu, Zeyi and Wu, Zhenyu and
             Gu, Zhengyi and Song, Xinyang and Jia, Bofang and Wang, Ziwei},
  year    = {2026},
  eprint  = {2607.02431},
  archivePrefix = {arXiv},
  primaryClass  = {cs.RO},
  url     = {https://arxiv.org/abs/2607.02431v1}
}