Geometry-Aware Policy Imitation (GPI)

GPI treats demonstrations as geometric curves that induces distance and flow fields for simple, efficient, flexible and interpretable imitation learning.

Yiming Li^1,2 Nael Darwiche² Amirreza Razmjoo^1,2 Sichao Liu²

Yilun Du³ Auke Ijspeert² Sylvain Calinon^1,2

¹ Idiap Research Institute | ² EPFL | ³ Harvard University

📄Paper (PDF) 💻Code 📚BibTeX

Performance At A Glance

From GPI we obtain multimodal behaviors, better performance, and inference times that are 20–100× faster than diffusion-based policies, while cutting memory usage by orders of magnitude.

Figure summarizing the GPI pipeline and Push-T results, highlighting efficiency and performance gains.

Real-robot deployments

Two representative robot experiments highlight GPI coping with unknown complex dynamics and adaptive interaction.

Box flip with complex unknown dynamics on ALOHA robot.

Fruit delivery with reactive, anticipatory interaction on Franka robot.

Motivation: Why GPI?

Imitation ultimately means following the expert’s behavior while staying as close as possible. GPI makes this explicit by decoupling imitation into metric learning and policy synthesis.

Overview figure illustrating Geometry-Aware Policy Imitation. — GPI treats demonstrations as geometric curves that induce distance fields across the full state space.

Projecting this distance field onto the actuated subspace yields two complementary flows: an *attraction flow* from the negative gradient (red arrow) and a *progression flow* from trajectory tangents (yellow arrow).

These flows form a dynamical system that drives the policy to stay close to demonstrations while advancing along them; executing \(\mathbf{u}\) through the system dynamics produces state evolution \(\int f(x,u)\,dt\).

Multiple demonstrations compose naturally via Boolean operations on distance fields, letting the policy align with the most relevant example even under unknown dynamics.

Flexible visual representations

GPI only needs a meaningful distance—Euclidean, geodesic, cosine, or latent—so we can swap in task-specific encoders, self-supervised autoencoders, or pretrained vision backbones without ever touching the controller.

Figure 2 illustrating three ways to obtain latent embeddings for Geometry-Aware Policy Imitation.

Non-parametric policy synthesis

Progression and attraction flows superposes a stable dynamical system in the actuated space, yielding efficient, interpretable policies.

Vector-field perspective of GPI combining progression and attraction flows.

An 2D example

Figure 3 showing demonstrations, distance landscape, and flow fields. — **From demonstrations to policy flows.** (a) Demonstrations define a Y-shaped geometric manifold. (b) Composed distance fields produce an energy landscape encoding proximity to each branch. (c) Pure progression along tangents can drift away from demonstrations. (d) Adding attraction — the negative distance gradient — pulls trajectories back, yielding a stable multimodal vector field.

Simulation experiments

Simulation results show how GPI generalises across PushT, Robomimic, and Adroit Hand benchmarks.

PushT

Robomimic Lift

Robomimic Can

Robomimic Square

Adroit Hand Door.

Adroit Hand Hammer

Adroit Hand Pen

Adroit Hand Relocate

References

Diffusion Policy: Visuomotor Policy Learning via Action Diffusion

Chi et al., 2023 · International Journal of Robotics Research (IJRR)
Project page
From Movement Primitives to Distance Fields to Dynamical Systems

Li & Calinon, 2025 · IEEE Robotics and Automation Letters (RA-L)
Project page

Citation

@misc{GPI,
Author = {Yiming Li and Nael Darwiche and Amirreza Razmjoo and Sichao Liu and Yilun Du and Auke Ijspeert and Sylvain Calinon},
Title = {Geometry-aware Policy Imitation},
Year = {2025},
Eprint = {arXiv:2510.08787},
}

↑ Back to top