SoMA: A Real-to-Sim Neural Simulator for Robotic Soft-body Manipulation

ICML 2026

1Fudan University, China, 2Shanghai Artificial Intelligence Laboratory, China, 3Shanghai Jiao Tong University, China, 4The Chinese University of Hong Kong, China, 5The University of Hong Kong, China

TL;DR: SoMA is a Gaussian splat neural simulator that models deformable object dynamics from real-world robot manipulation, enabling action-conditioned, stable long-horizon simulation with high-fidelity, multi-view–consistent rendering.

Various manipulation results

Abstract

Teaser

Simulating deformable objects under rich interactions remains a fundamental challenge for real-to-sim robot manipulation, with dynamics jointly driven by environmental effects and robot actions. Existing simulators rely on predefined physics or data-driven dynamics without robot-conditioned control, limiting accuracy, stability, and generalization. This paper presents SoMA, a 3D Gaussian Splat simulator for soft-body manipulation. SoMA couples deformable dynamics, environmental forces, and robot joint actions in a unified latent neural space for end-to-end real-to-sim simulation. Modeling interactions over learned Gaussian splats enables controllable, stable long-horizon manipulation and generalization beyond observed trajectories without predefined physical models. SoMA improves resimulation accuracy and generalization on real-world robot manipulation by 20\%, enabling stable simulation of complex tasks such as long-horizon cloth folding.

Overview of SoMA

SoMA framework

SoMA takes RGB observations and robot joint-space actions collected from real-world manipulation as input (Left). It reconstructs deformable objects as hierarchical Gaussian splats, and propagates them through a neural simulator with supervision from rendering and dynamics (Middle). Object motion is driven by force-based interactions, where environmental and robot-induced forces act on splats to produce deformation (Right). A two-stage multi-resolution training strategy first captures global motion with large temporal gaps and then refines fine-grained dynamics under occlusion and contact using small gaps.

Results

Qualitative resimulation and generalization under robot manipulation

Qualitative resimulation and generalization under robot manipulation. Left: resimulation on training trajectories. Right: generalization to unseen robot actions and contact configurations. Across diverse soft-body objects, including near-linear (rope), near-planar (cloth), and volumetric (doll) objects, SoMA produces stable, long-horizon simulations that closely match observed dynamics. PhysTwin shows deviations under complex or unseen interactions due to real-to-sim mismatch, while GausSim often remains static or unstable in challenging scenarios.

Multi-view results

Multi-view results Our method maintains consistent 3D geometric structure and visually accurate, physically plausible dynamics across both the main and side views, demonstrating strong viewpoint-consistent simulation.

Quantitative evaluation on resimulation and generalization under robot manipulation

Quantitative evaluation on resimulation and generalization under robot manipulation. We report performance comparisons with PhysTwin and GausSim across image-based and depth-based metrics. Our method achieves the best results across all metrics, demonstrating robust and accurate real-to-sim simulation.