EgoInteract project figure

EgoInteract: Synthetic Egocentric Videos Generation for Interaction Understanding and Anticipation

arXiv preprint (2026) · Under Review

Abstract

Collecting large-scale egocentric video datasets with dense spatial and temporal annotations is costly, slow, and often constrained by environmental biases, privacy constraints, and limited coverage of interaction patterns. While synthetic data has shown strong potential in several vision domains, its use for egocentric perception remains relatively underexplored, especially for tasks requiring temporally coherent human-object interactions. In this work, we introduce EgoInteract, a controllable simulator for egocentric video generation designed to model fine-grained egocentric interactions and their temporal dynamics.

Building on this framework, we generate a synthetic egocentric video dataset with dense spatial and temporal annotations for temporal action segmentation, next-active object detection, interaction anticipation, and hand-object interaction detection. We evaluate models trained with simulated data on multiple real-world egocentric benchmarks spanning diverse environments, object categories, and interaction patterns. Results show consistent improvements over strong baselines across tasks and datasets, demonstrating the effectiveness and transferability of our simulation-based approach.

Citation

Leonardi, R., Ragusa, F., Materia, D., Passanisi, A., Fort, J., Engel, J., and Farinella, G. M. (2026). EgoInteract: Synthetic Egocentric Videos Generation for Interaction Understanding and Anticipation. arXiv preprint arXiv:2605.18214.

@article{leonardi2026egointeract,
  title={EgoInteract: Synthetic Egocentric Videos Generation for Interaction Understanding and Anticipation},
  author={Leonardi, Rosario and Ragusa, Francesco and Materia, Daniele and Passanisi, Alessandro and Fort, James and Engel, Jakob and Farinella, Giovanni Maria},
  journal={arXiv preprint arXiv:2605.18214},
  year={2026}
}