BT-Prompt: Hierarchical Behavior Tree Generation for High-level Task Planning using Large Language Models

Ming-Fong Hsieh1, Wei-Li Liu1, Sun-Fu Chou1, Chang-Jin Wu1, Yu-Ting Ko1, Kuan-Ting Yu2, Hsueh-Cheng Wang*,1


1National Yang Ming Chiao Tung University (NYCU), Hsinchu, Taiwan.

2XYZ Robotics


Recent advancement of Large Language Models (LLMs) have made it possible for LLMs to be incorporated in the development of high-level task planners alongside human user. Prior works of LLMs emphaises on the generation of sequential plans, thereby closing the doors to execution of more complex applications. This research delves into how prompting engineering affect the generation of Behavior Trees (BTs) using LLMs. In order to enhance the understanding of LLMs for few-shot example tasks and improve the success rate of generated tasks. First, we introduced BT-Prompt in Experiment 1, a specially designed example task format which allows LLMs to maximize their few-shot learning performance. The results obtained in Experiment 1 illustrate the effectiveness of LLM-generated BTs compared to other formats in few-shot scenarios. Second, we aim to demonstrate our method's ability to design a hierarchy structure in Experiment 2, in additionally this would highlights the pivotal role of mid-level subtrees in successfully executed complex tasks across various LLM backbones and temperature settings. As intended by the research setup, for sub-optimal BTs, LLMs were used in attempts to autonomously generate mid-level subtrees without human guidance. The result of these autonomously generated mid-level subtrees then become a central element in Experiment 3, where we applied our methods to an series of challenging task defined by a robotics competition. In this context, it became evident that the pre-existing hand-coded mid-level subtrees were insufficient, where as integrating the LLM-generated mid-level subtrees into existing functions resulting in enhanced task execution for these intricate challenges. Finally, this research collected illustrative examples of positive and negative prompting strategies of LLM-generated BTs, with demonstrations of successful executions in virtual and real environments.



Appendix A : Two main failure types

Type 1 - The inclusion of unnecessary subtrees or a lack of necessary subtrees can result in task failure.
Adding unnecessary subtrees or lacking essential subtrees results in task failure. When LLM generates the ”inspection” task, it incorporates the subtree related to the red-boxed area into the task plan. However, as this subtree is designed for ”tracking object,” it requires a target object to fulfill the task’s completion requirement. Since the ”inspection” task does not involve detecting objects, the behavior tree may halt at this stage due to the absence of a target object, causing it to be unable to progress further.


Type 2 - Misinterpretation of the logic expressed through example tasks can result in task failure.
Misinterpretation of the logic expressed through example tasks. When LLM generates the ”object navigation” task, it misinterprets the logic expression symbols within the red- boxed area (from fallback to sequence). Due to this incorrect interpretation of the logic expression symbols, the behavior tree is unable to transition to the tracking state even when it detects the target object.


Appendix B : New Tasks in Experiment 3 (RobotX Challenge 2022)

Detect and Dock (Dock.):
In this task, there's a floating platform with three bays, each having a different color (red, green, or blue). The AMS detects the color and docks in the bay with the matching color.

Scan the code (Scan.):
In this task, there's a three-sided light tower on a floating platform that displays a sequence of RGB lights. The AMS observes the colors and their order and use this sequence for other tasks in later rounds.

Find and Fling(Shoot.):
In this task, there's a floating platform with three panels, each having a colored square and two holes. The AMS detects a designated color and shoots racquetballs through the panel's holes. Each team gets four racquetballs for this.

Entrance and Exit Gates(Gate.):
In this task, the AMS needs to go through gates marked by colored buoys and underwater beacons. The AMS must detect the underwater beacon signals between these gates and enter through them before moving on to other tasks. The task's complexity increases in each round and includes elements from other tasks. The beacons have different frequencies and activate one at a time during the task.There are three gates:
    Gate 1 has a red and a white buoy.
    Gate 2 has two white buoys.
    Gate 3 has a white and a green buoy.

Wildlife Encounter (Wildlife.):
Task involves three floating platforms that look like Australian marine animals: a platypus, turtle, and crocodile. The AMS detects and reacts to them using Hyperspectral Imaging (HSI) camera . Teams can use a UAV for help. After detection, the AMS circles the platypus clockwise, the turtle counterclockwise, and the crocodile twice in any direction.

UAV Search and Report (Search.):
In this land-based UAV task mimicking a search and rescue operation, the UAV starts from one point, searches a marked field with orange markers, finds two objects there, and lands at another designated point. Teams can use any search pattern within the field boundaries. They report the objects they locate and their exact positions.

Follow the Path (Follow.):
This task requires the Autonomous Maritime System (AMS) to follow a designated path of white buoys, pass through six pairs of red and green buoys, and exit through another set of white buoys. The AMS must steer clear of randomly placed obstacles, symbolized by round black buoys. Teams have the option to employ a UAV for assistance in completing this task.

UAV Replenishment (Deliver.):
This task involves using a UAV. The UAV takes off from a USV, finds a floating helipad, picks up a small colored tin, delivers the tin to a circular target area on another floating helipad, and then returns to the USV.