To Help or Not to Help: LLM-based Attentive Support for Human-Robot Group Interactions

Daniel Tanneberg, Felix Ocker, Stephan Hasler, Joerg Deigmoeller, Anna Belardinelli, Chao Wang, Heiko Wersing, Bernhard Sendhoff, Michael Gienger

Overview of the framework enabling attentive supportive behavior in the robot.

Abstract

How can a robot provide unobtrusive physical support within a group of humans? We present Attentive Support, a novel interaction concept for robots to support a group of humans. It combines scene perception, dialogue acquisition, situation understanding, and behavior generation with the common-sense reasoning capabilities of Large Language Models (LLMs). In addition to following user instructions, \textit{Attentive Support} is capable of deciding when and how to support the humans, and when to remain silent to not disturb the group. With a diverse set of scenarios, we show and evaluate the robot's attentive behavior, which supports and helps the humans when required, while not disturbing if no help is needed.


Use Cases of Large Language Models driven Multi-Modal Human-Robot Interaction

System

The system's architecture includes three key modules: "Scene Narrator", "Planner", and "Expresser". The Scene Simulator mirrors the states of objects and humans as detected by sensors. The Planner module processes multi-modal inputs as event messages, encompassing the positions of individuals within the scene. Inter-module communication is facilitated using ROS.

The system structure

Interaction Flow

The interaction typically begins with a person's speech. For instance, "Scene Narrator" detects "Felix speaks to Daniel: 'Please hand me the red glass'." This event is then translated into natural language and relayed to the "Planner" module, initiating a GPT query. Simultaneously, the "Planner" informs the "Expresser" for an immediate rule-based response, leading the robot to look at Felix while its ears and lid roll back, simulating a listening gesture. Approximately 2 seconds later, GPT responds by invoking theget_persons() and get_objects() functions to identify people and objects present. The resulting data, including "Felix," "Daniel," and object details, are sent back to GPT for further analysis. During the wait for GPT's next response, the robot exhibits a 'thinking' gesture, looking from side to side with blinking lid movements. Shortly after, the LLM calls check_hindering_reasons() to assess if Daniel can see and reach the red glass and whether he is busy. Concurrently, facial_expression() is activated for the robot to look towards Daniel. The outcome indicates Daniel can hand over the glass, and the robot, following pre-defined guidance, opts not to intervene, silently displaying the reasoning on the GUI. Subsequently, Felix asks Daniel to pour cola into the glass. The robot, attentive to their conversation, deduces through check_hindering_reasons that Daniel is occupied with a phone call and learns from is_person_busy_or_idle that Felix is holding the cup. The robot then opts to pour cola from the bottle into Felix's glass. Should Felix not be holding the glass, or if it's beyond the robot's reach, the robot will instead place the bottle near Felix. Directed by LLM, the robot's head tracks the bottle during pickup and shifts to the glass while pouring. Upon completion, the robot nods towards Felix and announces, "I've poured Coca-Cola into your glass as Daniel is currently busy.".

The interaction flow. The blue square are the action generated by the LLM; the grey ones are rule-based function.

Prompts