Affordance-based Robot Manipulation
with Flow Matching

We present a framework for assistive robot manipulation, which focuses on two fundamental challenges: first, efficiently adapting large-scale models to downstream scene affordance understanding tasks, especially in daily living scenarios where gathering multi-task data involving humans requires strenuous effort; second, effectively learning robot trajectories by grounding the visual affordance model. We tackle the first challenge by employing a parameter-efficient prompt tuning method that prepends learnable text prompts to the frozen vision model to predict manipulation affordances in multi-task scenarios. Then we propose to learn robot actiontrajectories guided by affordances in a supervised Flow Matching method. Flow matching represents a robot visuomotor policy as a conditional process of flowing random waypoints to desired robot trajectories. Our extensive evaluation highlights that the proposed prompt tuning method for learning manipulation affordance with language prompter achieves competitive performance and even outperforms other finetuning protocols across data scales, while satisfying parameter efficiency. Learning multi-task robot action trajectories with flow matching also leads to consistently better generalization performance and faster inference in various manipulation tasks than alternative behavior cloning methods. Our framework seamlessly unifies affordance model learning and trajectory generation with flow matching for robot manipulation.


Highlights

Flow matching represents a robot visuomotor policy as a conditional process of flowing random waypoints to desired robot actions.
Prompt tuning for vision-language-model to predict manipulation affordances in multi-task scenarios.
Flow matching exhibits marginally better generalization performance, prominently faster inference and greater stability on training and evaluation than diffusion model with DDPM.
An example of robot feeding the human with flow matching.

Real-world Experiments

Flow Matching has been tested on 10 tasks across Activities of Daily Living, and leads to consistently better performance than alternative behavior cloning methods. (Videos are 4x speed)

Comb the hair
Sweep the trash
Hang the towel
Pass the water
Put on the hat
Wipe the forearm
Brush the teeth
😠ðŸ˜ē😊ðŸĪ–🧑‍ðŸĶē📊🔎🏍ïļðŸĶū

Franka Kitchen Simulation Experiments


Paper

2409.01083 [cs.RO].
Affordance-based Robot Manipulation with Flow Matching
Fan Zhang, Michael Gienger

Code is here https://github.com/HRI-EU/flow_matching.
We are in process of integrating flow matching into the Hugging Face ðŸĪ— LeRobot PushT task.


Team


This webpage template was recycled from here.