OXE-Aug is a high-quality open-source dataset that augments OXE with 9× more robot embodiments across 16 datasets, covering 60% of the widely used Octo pretraining mixture. In total, OXE-Aug provides over 4.4 million trajectories, more than triple the size of the original OXE.

Moreover, we conduct systematic sim and real experiments and show that cross-embodiment learning scales positively with robot augmentation. Our result suggests that increasing the number and diversity of augmented embodiments yields greater robustness and stronger policy generalization to unseen robots.


Technical Summary Video

The OXE-Aug Dataset

OXE-Aug is designed to scale the benefits of robot augmentation by applying cross-painting to a broad range of tasks, scenes, and robot embodiments. The original OXE dataset is highly biased, the top 4 robots account for over 85% of the trajectories. We select 16 datasets from OXE that are commonly used in training robot foundation models. The original demonstrations in these datasets were collected using Franka, UR5, xArm, WidowX, Google Robot, and Jaco platforms. We augment each dataset with up to 9 different robots: the 6 aforementioned robots, as well as Sawyer, Kinova Gen3, and KUKA.

The full dataset in LeRobot format (~1TB) can be downloaded from Huggingface: https://huggingface.co/oxe-aug. We also open-source our code for generating the dataset: https://github.com/GuanhuaJi/oxe-aug which also provides an option for generating the data in RLDS format.


Below are visualizations of some example episodes from OXE-Aug:

Austin Buds

Austin Sailor

Berkeley Autolab UR5

Bridge

UCSD Kitchen

Kaist Nonprehensile

Taco Play

Toto

Jaco Play

UT Austin Mutex

UTokyo XArm Pick and Place

Viola

IAMLab CMU Pickup Insert

Language Table

NYU Franka Play

Fractal

Scaling Robot Augmentation — A Systematic Study

We conduct a systematic simulation study to examine how robot augmentation scales in terms of transfer, generalization, and robustness. While prior work has shown that augmenting demonstrations from a source robot to a known target enables zero-shot transfer, our goal is to investigate whether robot augmentation provides broader benefits when scaled across multiple target robots


Generalization to Unseen Robots

Standard Combined

We evaluate performance on robots that were not used for augmentation. Specifically, we compare three settings: training only on the source robot (“No Augmentation”), augmenting demonstrations to a single additional robot that is not the target robot (“1 Augmented Robot”), and augmenting to all robots besides the held-out robot (“N–1 Augmented Robots”).

All policies are evaluated on an unseen robot that is not part of the augmentation set. We see that the ``Source-Only'' policy performs poorly, as expected. ``1+1'' policies generalize moderately well to some robots, but ``N-1'' achieves substantially higher success across all four target robots. This suggests that robot augmentation promotes embodiment-agnostic visual representations and spatial reasoning, and generalization to unseen embodiments improves as we increase the diversity of robot augmentations.

TODO: Add more videos

Robustness to Visual Perturbations

We also ask the question: "Can robot augmentation improve robustness on the original robot?" We evaluate on the original source robot under visual perturbations (lighting shifts and occlusions). Similar to above, we compare 3 policies: training only on the source robot ("Source-Only"), augmenting to another robot ("Source + Target (1+1)") and augmenting to N robots ("N").

Altered Results

The figure shows performance under test-time perturbations to the source robot's environment: (1) lighting shifts that introduce shadows, and (2) occlusions from randomly placed black rectangles. While "Source-Only" policies degrade significantly, both "1+1" and "N" are more robust, with "N" consistently achieving the highest success. This suggests that robot augmentation enhances robustness even on the original embodiment by encouraging the policy to focus on task-relevant structure rather than incidental features such as arm texture or lighting cues.

Robustness Demonstrations

Real Experiments

In real experiments, we evaluate whether large-scale augmentation can also benefit pre-trained foundation models. We consider two state-of-the-art generalist policies—OpenVLA and $\pi_0$—and fine-tune them using OXE-Aug.

For evaluation, we use tasks from the Bridge dataset, originally collected on a WidowX robot, and test on two embodiments:
  1. A Franka arm with its default parallel-jaw gripper ("Franka"), which corresponds to one of the augmented robots in OXE-Aug, and
  2. A Franka arm with a custom modified Robotiq gripper ("Robotiq++"), which features colored pads to simulate an unseen embodiment.

We evaluate 4 manipulation tasks. For each base model, we compare "Source-Only," "Target-Only," and "N."

Real Results

Each policy is evaluated over 10 trials per task. Both OpenVLA-OFT and $\pi_0$'s performances are relatively low when fine-tuned only on the original Bridge data, especially on Franka, due to the visual domain shift from the black WidowX gripper to the white Franka one. Fine-tuning on OXE-Aug significantly improves cross-embodiment performance: "Target-Only" improves success across all tasks, and "N," which incorporates more diverse robot augmentations, yields the highest success overall. On average, "N" improves performance by 24% for OpenVLA-OFT and 45% for $\pi_0$. On the novel Robotiq++ embodiment, fine-tuned policies reach 75% (OpenVLA-OFT) and 82% ($\pi_0$) average success, demonstrating strong generalization.

Task Videos

Add your robot and your dataset!

TODO: Add text here.

Try out OXE-AUG!

If you have a dataset that you’d like to augment to different embodiments, check out our GitHub repo! We’ve put a lot of effort into making the augmentation process pain-free as possible.

License & Responsible Use

Datasets: CC BY 4.0 (attribution required; state your modifications in derivatives).

Code: Apache-2.0 / MIT (match your repo choice).

Responsible Use: No personal data; research/robotics use; do not deploy in unlawful or harmful contexts.