Affordance-based Robot Manipulation
with Flow Matching

We present a framework for assistive robot manipulation that addresses two fundamental challenges: efficient adaptation of large-scale models for scene affordance understanding and effective learning of robot actions by grounding the visual affordance. To tackle the first challenge, we adopt a parameter-efficient prompt tuning method, prepending learnable text prompts to a frozen vision model to predict affordances, while considering spatial and semantic relationships in multi-task scenarios. For the second challenge, we propose a flow matching method, representing a robot visuomotor policy as a conditional process of flowing random waypoints to desired robot actions. We introduce a real-world dataset with 10 tasks to evaluate our approach. Experiments show our prompt tuning method achieves competitive or superior performance to other finetuning protocols across data scales, while satisfying parameter efficiency. Flow matching yields more stable training and faster inference, while maintaining comparable generalization performance to diffusion policy. Our framework seamlessly unifies parameter-efficient affordance learning and robot action generation with flow matching.


Highlights

Flow matching represents a robot visuomotor policy as a conditional process of flowing random waypoints to desired robot actions.
Prompt tuning for vision-language-model to predict manipulation affordances in multi-task scenarios.
Flow matching exhibits more stable training and evaluation, and noticeably faster inference, while maintaining comparable generalization performance to diffusion policy.
An example of robot feeding the human.

Real-world Experiments (affordance-based VLA with flow Matching)

Affordance-based VLA with flow Matching has been tested on tasks across Activities of Daily Living, and leads to consistently better performance than alternative behavior cloning methods. (Videos are 4x speed)

Comb the hair
Sweep the trash
Hang the towel
Pass the water

Closed-loop long horizon manipulation with flow matching

Box Opening
Franka Kitchen
Robomimic
PushT

Paper

2409.01083 [cs.RO].
Affordance-based Robot Manipulation with Flow Matching
Fan Zhang, Michael Gienger

Code is here https://github.com/HRI-EU/flow_matching


Team


This webpage template was recycled from here.