UACE: Reinforcement Learning for Marine Path Planning

Back

Day: June 17, Tuesday
Location / Time: B. ERATO at 09:50 - 10:10
Last minutes changes: -
Session: 25. Unmanned vehicles for underwater acoustic surveillance and monitoring
Organiser(s): Alain Maguer
Chairperson(s): Alain Maguer
Lecture: Reinforcement Learning for Marine Path Planning
Paper ID: 2294
Author(s): Edward Clark, Alfie Anthony Treloar, Alan Hunter, Marcus Donnelly
Presenter: Edward Clark
Abstract: \nReinforcement learning (RL) has been successfully applied to low-level control tasks in ocean robotics and is increasingly used for conventional path-planning tasks. \nHowever, there is a significant gap in research on using RL for high-level path planning in uncertain marine acoustic environments. This work presents two case studies demonstrating the application of RL to high-level path planning in such scenarios.\n\nThe first case study explores a path-planning task in a simulated ocean environment using MARLIN, an ocean acoustic digital twin, where the goal is to guide a marine vehicle to maximize the detection probability of a stationary source. The second case study involves a synthetic aperture sonar task, where the vehicle must plan a path to maximize the detection probability of an unknown target located on the seabed. Our results demonstrate that RL can effectively plan paths in both environments, showcasing its versatility for complex marine applications.\n\nWe highlight the critical role of environmental assumptions and reward function design in influencing performance. \nThe study defines `good' performance in these tasks and examines how the choice of algorithm and hyperparameters impacts the agent's effectiveness. Furthermore, we address the challenges introduced by the stochastic nature of the environment, which can make training RL agents more difficult.\n\nTo overcome these challenges, we employ curriculum learning, training agents progressively from simpler environments to more complex ones. \nThis approach not only improves the agents' learning efficiency but also allows for detailed interrogation of their decision-making processes at each level of complexity. \nBy understanding how behaviors emerge during curriculum learning, end users can trace the origins of specific decisions and identify the curriculum elements that influenced them. This insight into the emergence of agent behavior is critical for deploying RL in real-world marine applications, ensuring transparency and reliability in operational scenarios.
Download the full paper
This paper is a candidate for the "Prof. Leif Bjørnø Best Student Paper Award (for students under 35)"
Corresponding author: Mr Edward Clark
Affiliation: University Of Bath
Country: United Kingdom

Back