JoyAI-Sim

JoyAI-Sim Framework

JoyAI-Sim builds a simulation-driven data interconversion toolchain centered on JoySim, establishing bidirectional links among robot data, simulation data, and human data: Robot ⇌ Simulation ⇌ Human. The Robot → Simulation → Human path starts from real robot tasks and constructs calibrated digital twins for human-aligned model evaluation and trajectory naturalness checking; the Human → Simulation → Robot path starts from first-person human demonstrations and, after physical feasibility checks in simulation, converts them into robot-oriented trajectories, annotations, and visual observations.

Together, the two paths connect scarce robot data that is close to deployment, scalable and diagnosable simulation data, and abundant human data that cannot be directly executed by robots. In implementation, JoyAI-Sim uses the JD Cloud JoyBuilder platform to provide simulation execution, rendering, photorealism enhancement, and data management services for scalable data generation and model evaluation.

Efficient model evaluation based on simulation tool chain

Through the Robot → Simulation → Human simulation toolchain, model evaluation can be conducted more efficiently than on physical robots. Real robot tasks are used to define deployment-oriented goals, while digital twins enable scalable simulation-based evaluation and trajectory synthesis. Human embodied feedback is further introduced to assess the naturalness of simulated actions, thereby forming a closed-loop evaluation pipeline that connects physical robot evaluation, simulation evaluation, and human perception.

Robot → Simulation

Using real robot tasks as anchors, task semantics, object assets, scene layouts, robot embodiments, camera configurations, control interfaces, and success criteria are mapped into digital twins to construct simulation evaluation environments that are reproducible, parallelizable, and diagnosable. JoyAI-Sim builds a Sim-Ready asset library for household scenarios by integrating 3D reconstruction techniques. The library covers 300+ fine-grained categories and 53,661 asset instances, and supports adjustments across multiple simulators along dimensions such as robot states, object layouts, instance variations, backgrounds, language instructions, and lighting.

Hierarchical distribution of sim-ready assets by scene-level category and fine-grained object class

The simulation environments are further grounded through scene-specific asset construction and alignment. JoyAI-Sim reconstructs study-room and living-room household settings as digital twins, so that real-robot evaluation targets, object arrangements, and deployment constraints can be faithfully replayed in simulation.

Study-room asset construction for robot-to-simulation scene alignment

Living-room asset construction for robot-to-simulation scene alignment

Robot-to-Sim Alignment Example

This paired rollout uses a real robot task as the anchor and shows the corresponding digital twin alignment. The side-by-side view makes the evaluation setting reproducible and easier to diagnose across object layout, camera viewpoint, and robot embodiment.

Simulation → Human

In simulation, robot trajectories are generated or augmented using FSM+IK, IKFlow, and reinforcement learning, with human embodied feedback further introduced to assess trajectory naturalness. Human operators simulate the execution of these trajectories to identify unreasonable patterns in approach strategies, phase transitions, and motion smoothness. This makes it possible to filter out trajectories that are physically feasible but inconsistent with human motion intuition, thereby improving the quality of both simulation-based evaluation and synthetic data.

Simulation to human flowchart for trajectory naturalness refinement

This flow illustrates how simulated robot trajectories are projected into human-hand space for first-person replay and inspection. Human embodied feedback then becomes a practical filter for rejecting awkward yet executable trajectories before they are reused for evaluation or training.

Human Embodied Feedback

Synchronized VR recordings expose the human-in-the-loop stage directly: operators inspect approach strategies, contact timing, and motion smoothness, then filter behaviors that are physically feasible but awkward under embodied human judgment.

Approach Check

Phase Transition

Contact Plausibility

Motion Smoothness

Data Enrichment Service Based on the Simulation Toolchain

Through the Human → Simulation → Robot simulation toolchain, JoyAI-Sim provides a data enrichment service. Using first-person human demonstration videos as the data source, the toolchain leverages hand motion recovery, scene reconstruction, and digital twin construction to transform human behaviors that are originally independent of any robot embodiment into tasks that can be executed and verified in simulation. It then generates trajectories, states, and action data for robot learning through robot trajectory retargeting, physical feasibility filtering, simulation randomization, and robot-view rendering. In this way, it builds a value-added pipeline that connects human demonstration data, simulation data, and robot data.

In the Human → Simulation stage, hand trajectories, grasp/release events, object interaction relationships, and scene geometry are extracted from human egocentric videos. A simulation-executable digital twin environment is then constructed, transforming human videos from purely visual records into simulation instances with spatial structure, interaction relationships, and task semantics.

In the Simulation → Robot stage, robot embodiment adaptation and physical feasibility validation are performed in the simulation environment. Actions involving joint-limit violations, collisions, unreachable poses, or unreasonable contacts are filtered out. Meanwhile, by varying factors such as object positions, container layouts, object combinations, colors and materials, lighting, and backgrounds, diverse yet physically feasible robot trajectories and observation videos can be derived from the same human demonstration.

Ultimately, large-scale, low-cost human behavior data is transformed into high-value training resources that are verifiable, extensible, and usable for robot learning.

Simulation Generalization Examples

Before robot-view observations are exported, the simulator can vary visual conditions while preserving the task structure. Lighting and material randomization provide controlled diversity for downstream robot learning without changing the underlying trajectory.

Lighting Generalization

Texture Generalization

Simulation-to-Robot Output

The resulting output is not just a visualization artifact: it is a robot-centered, multi-view observation stream prepared for downstream robot learning. This sample shows the training-ready visual output after simulation retargeting and robot-view rendering.

Trial and Access

You are welcome to experience the JoyAI-Sim embodied simulation service through the JD Cloud JoyBuilder platform:

Create a simulation service Manage a simulation service View a simulation service Connect to a simulation service Mount cloud-based simulation assets Embodied simulation data augmentation practice case Data trading platform

Citation

Please cite the arXiv technical report using the following BibTeX entry.

@misc{liu2026dataladdersimulationenabledinterconversiontoolchain,
  title={DataLadder: A Simulation-Enabled Interconversion Toolchain for the Embodied Data Pyramid},
  author={Peidong Liu and Yongce Liu and Songyan Guo and Fuyuan Ma and Zhihao Yuan and Ao Li and Zengjue Chen and Wenhao Li and Tianle Zhang and Mingyang Li and Jiale Zhang and Junzhe Xiong and Zhiyuan Xiang and Dafeng Chi and Yuzheng Zhuang and Yihang Li and Qingrong He and Jiaming Liang and Chen Cai and Peng Hao and Mingxi Luo and Song Wang and Junwu Xiong and Ruodai Li and Liyi Luo and Wei Tan and Dongjiang Li and Jiawei Li and Hui Shen and Yicheng Gong and Liang Lin},
  year={2026},
  eprint={2606.16776},
  archivePrefix={arXiv},
  primaryClass={cs.RO},
  url={https://arxiv.org/abs/2606.16776}
}