Probabilistic intersection-over-union for training and evaluation of oriented object detectors

Federal University of Rio Grande do Sul
Transactions on Image Processing (IEEE TIP) 2024

Comparison with other methods

Image
Gemini
GPT4
MistralOCR
Image
Gemini
GPT4
MistralOCR
Image
Gemini
GPT4
MistralOCR
Image

Smooth L1

Gemini

GWD

GPT4

KLD

MistralOCR

⭐ ProbIoU (Ours)

Abstract

Oriented object detection is a challenging and relatively new problem. Most existing approaches are based on deep learning and explore Oriented Bounding Boxes (OBBs) to represent the objects. They are typically based on adaptations of traditional detectors that work with Horizontal Bounding Boxes (HBBs), which have been exploring IoU-like loss functions to regress the HBBs. However, extending this idea for OBBs is challenging due to complex formulations or requirement for customized backpropagation implementations. Furthermore, using OBBs presents limitations for irregular or roughly circular objects, since the definition of the ideal OBB is an ambiguous and ill-posed problem. In this work, we jointly tackle the problem of training, representing, and evaluating oriented detectors. We explore Gaussian distributions -- called Gaussian Bounding Boxes (GBBs) -- as fuzzy representations for oriented objects and propose using a similarity metric between two GBBs based on the Hellinger distance. We show that this metric leads to a differentiable closed-form expression that can be directly used as a localization loss term to train OBB object detectors. We also show that GBBs present a natural representation as elliptical regions (called EBBs), which inherently mitigate ambiguity representation for circular objects. Finally, we empirically show that the proposed similarity metric computed between two GBBs strongly correlates with the IoU between the corresponding EBBs, motivating the name Probabilistic Intersection-over-Union (ProbIoU). Our experiments show that results using ProbIoU as a regression loss are competitive with state-of-the-art alternatives without requiring additional hyperparameters or customized implementations, and that ProbIoU is a promising alternative to evaluate oriented object detectors.

Description of the image

Approach

  • If we use a fuzzy object representation based on GBBs, we can calculate \( B_D \) (distance measure) and present closed-form expressions in terms of the GBB parameters.
  • Considering that \( p \sim \mathcal{N}(\boldsymbol{\mu}_1, \Sigma_1) \) and \( q \sim \mathcal{N}(\boldsymbol{\mu}_2, \Sigma_2) \) are Gaussian distributions with
$$ \boldsymbol{\mu}_1 = \begin{pmatrix} x_1 \\ y_1 \end{pmatrix}, \quad \Sigma_1 = \begin{bmatrix} a_1 & c_1 \\ c_1 & b_1 \end{bmatrix}, \quad \boldsymbol{\mu}_2 = \begin{pmatrix} x_2 \\ y_2 \end{pmatrix}, \quad \Sigma_2 = \begin{bmatrix} a_2 & c_2 \\ c_2 & b_2 \end{bmatrix} $$

we obtain:

$$ B_D = \frac{1}{8}(\boldsymbol{\mu}_1 - \boldsymbol{\mu}_2)^T \Sigma^{-1} (\boldsymbol{\mu}_1 - \boldsymbol{\mu}_2) + \frac{1}{2} \ln\left( \frac{\det \Sigma} {\sqrt{\det \Sigma_1 \det \Sigma_2}} \right), \quad \Sigma = \frac{1}{2}(\Sigma_1 + \Sigma_2) $$

since:

$$ B_C = e^{-B_D}, \quad H_D(p,q) = \sqrt{1 - B_C(p,q)}, \quad \text{ProbIoU}(p,q) = 1 - H_D(p,q) $$
  • \( \text{ProbIoU}(p,q) \) ranges from 0 to 1, with the following properties:



Loss Implementation Scale Metric Hyper
Invariance Properties parameters
r-IoU Hard --
Smooth ℓ₁ Easy × × --
GWD Easy × × τ, f(·)
KLD Easy × τ, f(·)
ProbIoU Easy --

Description of the image

Results

We use the MAP metric to compare our method with other SOTA methods.

Description of the image

For DOTAv1 we use its evaluation server (only AP50).

Description of the image

BibTeX

@ARTICLE{10382963,
  author={Murrugarra-Llerena, Jeffri and Kirsten, Lucas N. and Zeni, Luis Felipe and Jung, Claudio R.},
  journal={IEEE Transactions on Image Processing}, 
  title={Probabilistic Intersection-Over-Union for Training and Evaluation of Oriented Object Detectors}, 
  year={2024},
  volume={33},
  number={},
  pages={671-681},
  keywords={Detectors;Location awareness;Object detection;Measurement;Training;Gaussian distribution;Annotations;Computer vision;object detection;performance evaluation},
  doi={10.1109/TIP.2023.3348697}}