2023_programme: Optimising Sensor Path Planning with Reinforcement Learning and Passive Sonar Modelling



  • Session: 15. Towards Automatic Target Recognition. Detection, Classification and Modelling
    Organiser(s): Johannes Groen, Yan Pailhas, Roy Edgar Hansen, Jessica Topple and Narada Warakagoda
  • Lecture: Optimising Sensor Path Planning with Reinforcement Learning and Passive Sonar Modelling
    Paper ID: 1915
    Author(s): Clark Edward, Hunter Alan, Isupova Olga, Donnelly Marcus
    Presenter: Clark Edward
    Abstract: This paper presents a solution for optimising sensor path planning in marine sensor management using reinforcement learning (RL). RL is a type of machine learning where an intelligent system, known as an agent, learns how to make effective decisions by interacting with its environment. Acoustic propagation modelling is integrated into the RL framework using a standard python RL library gym and using algorithms from the stablebaselines3 library. The observation space encompasses signal-to-noise (SNR) information, platform position, bathymetry, and sound speed data. The action space is discretised into 16 horizontal directions and 3 vertical levels, resulting in a 49-dimensional action space. The reward function combines penalisation for movement and rewards for navigating to high SNR regions. SNR is calculated using PyRAM, a Python implementation of the RAM parabolic equation solution to the Helmholtz Equation. The RL agent uses proximal policy optimization to learn the management policy. The learnt policy is compared against a gradient ascent policy and an ‘oracle’\npolicy which can use perfect knowledge of the source location for direct navigation. The learning process converges to a stable median between 2.5 and 3 million learning steps. The results demonstrate that the learnt policy closely matches the ‘oracle’ policy in both reward distribution and behaviour. It also outperforms the gradient ascent policy in a realistic environment.\n
      Download the full paper
  • Corresponding author: Mr Edward Clark
    Affiliation: University Of Bath
    Country: United Kingdom
    e-mail: