SoMA: A Real-to-Sim Neural Simulator for Robotic Soft-body Manipulation


1Fudan University, China, 2Shanghai Artificial Intelligence Laboratory, China, 3Shanghai Jiao Tong University, China, 4The Chinese University of Hong Kong, China, 5The University of Hong Kong, China

TL;DR: SoMA is a Gaussian splat neural simulator that models deformable object dynamics from real-world robot manipulation, enabling action-conditioned, stable long-horizon simulation with high-fidelity, multi-view–consistent rendering.

Various manipulation results

Abstract

Teaser

Simulating deformable objects under rich interactions remains a fundamental challenge for real-to-sim robot manipulation, with dynamics jointly driven by environmental effects and robot actions. Existing simulators rely on predefined physics or data-driven dynamics without robot-conditioned control, limiting accuracy, stability, and generalization. This paper presents SoMA, a 3D Gaussian Splat simulator for soft-body manipulation. SoMA couples deformable dynamics, environmental forces, and robot joint actions in a unified latent neural space for end-to-end real-to-sim simulation. Modeling interactions over learned Gaussian splats enables controllable, stable long-horizon manipulation and generalization beyond observed trajectories without predefined physical models. SoMA improves resimulation accuracy and generalization on real-world robot manipulation by 20\%, enabling stable simulation of complex tasks such as long-horizon cloth folding.

Overview of SoMA

SoMA framework

SoMA takes RGB observations and robot joint-space actions collected from real-world manipulation as input (Left). It reconstructs deformable objects as hierarchical Gaussian splats, and propagates them through a neural simulator with supervision from rendering and dynamics (Middle). Object motion is driven by force-based interactions, where environmental and robot-induced forces act on splats to produce deformation (Right). A two-stage multi-resolution training strategy first captures global motion with large temporal gaps and then refines fine-grained dynamics under occlusion and contact using small gaps.

Results

Qualitative resimulation and generalization under robot manipulation

Qualitative resimulation and generalization under robot manipulation. Left: resimulation on training trajectories. Right: generalization to unseen robot actions and contact configurations. Across diverse soft-body objects, including near-linear (rope), near-planar (cloth), and volumetric (doll) objects, SoMA produces stable, long-horizon simulations that closely match observed dynamics. PhysTwin shows deviations under complex or unseen interactions due to real-to-sim mismatch, while GausSim often remains static or unstable in challenging scenarios.

Multi-view results

Multi-view results Our method maintains consistent 3D geometric structure and visually accurate, physically plausible dynamics across both the main and side views, demonstrating strong viewpoint-consistent simulation.

Quantitative evaluation on resimulation and generalization under robot manipulation

Quantitative evaluation on resimulation and generalization under robot manipulation. We report performance comparisons with PhysTwin and GausSim across image-based and depth-based metrics. Our method achieves the best results across all metrics, demonstrating robust and accurate real-to-sim simulation.