Geometry-Aware Policy Imitation (GPI)
Simple, Fast and Flexible
Evaluations on PushT Benchmark
Watch the GPI overview and a real-robot deployment from the paper.
Diffusion/flow-matching policies excel but are heavy and opaque. GPI reinterprets demonstrations Γ as geometric curves, builds distance fields d(x | Γ), and induces two primitives: (i) a progression flow along the demo tangent and (ii) an attraction flow given by the negative distance gradient. Their superposition yields a controllable, non-parametric vector field that is fast, interpretable, and robust.
The progression component advances the system along the expert trajectory using its local tangent
u_demo(x)
. It ensures forward motion and task completion without backtracking.
π_prog(x) = λ₁(x) · u_demo(x)
The attraction component corrects deviations by following the negative gradient of the distance field in the actuated subspace. This stabilizes the dynamics, pulling the state back toward the demonstration manifold.
π_attr(x) = − λ₂(x) · ∇x′ d(x | Γ)
GPI preserves distinct demonstrations as separate models and composes the K nearest via soft weights
w_i(x) ∝ exp(−β · d(x | Γ(i)))
, enabling natural multimodal behavior without mode collapse.
π(x) = Σ_i w_i(x) · [ λ₁ u_demo^{(i)}(x) − λ₂ ∇x′ d(x | Γ(i)) ]
We explicitly separate the learned distance metric (from vision/state encoders) from policy synthesis. Encoders can be swapped or fine-tuned independently; the reactive controller remains a simple first-order system.
GPI relies on meaningful distance fields; poor representation learning can degrade attraction. Extremely long-horizon tasks may require waypointing or light receding-horizon planning. Safety constraints are not enforced by default.
Push-T snapshot (state vs. vision; per-step latency and memory).
Method | State: Avg/Max | Train / Infer | Memory | Vision: Avg/Max | Train / Infer | Memory |
---|---|---|---|---|---|---|
DDPM | 82.3 / 86.3 | 1.0 h / 641 ms | 252 MB | 80.9 / 85.5 | 2.5 h / 647 ms | 353 MB |
DDIM | 81.5 / 85.1 | 1.0 h / 65 ms | 252 MB | 79.1 / 83.1 | 2.5 h / 67 ms | 353 MB |
GPI (Ours) | 85.8 / 89.0 | 0 h / 0.6 ms | 0.7 MB | 83.3 / 86.9 | 0.3 h / 3.3 ms | 44 MB |
Evaluation follows standard Push-T protocols; latencies are per-step.
Figure 4 (below) studies receding-horizon rollouts. Even when we plan over 64-step horizons, both state and vision settings retain strong performance, showing the geometric controller can operate reactively or in a longer-horizon mode without degradation.
Figure 5 (next) highlights how performance scales with demonstration coverage. Whether actions are expressed in a relative or absolute frame, increasing the subset size consistently boosts average reward and remains stable across different neighbor counts K.
Figure 7 explores the interplay between the progression and attraction primitives. A broad range of weights (λ₁, λ₂) produces high reward, underscoring that the two fields combine smoothly without delicate tuning.
Figures 8 and 9 showcase the real-robot evaluations described in Section 3.2 of the paper. The ALOHA clip (top) captures multiple successful box-flip trajectories, while the Franka arm experiment (bottom) demonstrates human–robot interaction where the robot reacts to a user presenting fruit.
@inproceedings{anonymous2026gpi, title = {Geometry-Aware Policy Imitation}, author = {Anonymous}, booktitle = {International Conference on Learning Representations (ICLR)}, year = {2026}, note = {Under review} }