UACE2017 Proceedings: Fine-tuning vs Full Training of Deep Neural Networks for Seafloor Mine Recognition in Sonar Images



  • Session:
    Towards Automatic Target Recognition. Detection, Classification and Modelling
  • Paper:
    Fine-tuning vs Full Training of Deep Neural Networks for Seafloor Mine Recognition in Sonar Images
  • Author(s):
    Narada Warakagoda, Øivind Midtgaard
  • Abstract:
    Deep Convolutional Neural Networks (DCNN) is a promising approach for automatic target recognition (ATR) which effectively combines feature extraction and classification. DCNN can automatically learn the relevant features for the recognition task at hand from the training data, and thus potentially eliminate the tedious and manual feature design process distinctive for traditional classification systems. However, a huge amount of labeled data is necessary to successfully train a DCNN of typical size. Establishing labeled sonar images with mine targets is cost intensive, thus making a sufficiently large database unfeasible.\nFine-tuning is a commonly applied technique when the available amount of training data is insufficient. The basic idea is to feed an already trained DCNN with the available new training data to generate feature vectors, and then train a simpler classifier on these feature vectors. This approach has given good results, even when the associated semantics of the fine-tuning data and original training data are significantly different. However, it is uncertain whether this approach will work well for sonar images because most of the well-known pre-trained DCNNs such as Alexnet, VGG and ResNet are trained on the optical images. The fine-tuning data and original training data will then be different not only in semantics, but also in physics of image generation.\nIn this work, we evaluated fine-tuning of two popular DCNNs, AlexNet and VGG16, on Synthetic Aperture Sonar (SAS) images. The experiments were conducted in the context of a seafloor mine recognition task where four classes were considered; Cylinder, Manta, other mines and non-mine objects. The pre-trained DCNN is fed with the sonar images of the training set and the corresponding feature vectors were extracted at different locations of the fully connected part of the network. Simple three layer MLPs (Multilayer Perceptrons) were trained on these feature vectors. Performance of the resultant nets were evaluated using an independent test set.\nIn another line of experiments, the whole network was retrained using sonar image data with pre-trained parameters as initial values. In the last series of experiments, a small-size DCNN was trained from scratch using the same sonar images of the training set. Unlike Alexnet and VGG16, this network has a far lower number of parameters and hence it is feasible to train it on the available training data. In the full paper we compare the three approaches in detail with respect to the recognition accuracy and training times/resource usage.
  •   Download the full paper

Contact details

  • Contact person:
    Dr Narada Warakagoda
  • e-mail:
  • Affiliation:
    Norwegian Defence Research Establishment
  • Country:
    Norway