Object detection

Bounding box

Bounding box detection models predict where objects are within an image. Predictions from a bounding box model have two components: a bounding box and a classification. The bounding box component is an array with a number of boxes by 4, the localization of the box. The classification is most often single class classification with K+1 channels.

Input: [H, W, C]
Bounding Boxes: [B, (x, y, h, w)]
Classification: [B, K + 1]

classes = ["dog", "cat"]
# [B, (x, y, h, w)]
bboxes = np.array([
  [0.,  0., 5., 7.],
  [2., 0.3, 4., 4.],
])
# [B, K + 1]
probs = np.array([
  [0, .4, .6],
  [0, .8, .2]
])

annotations = []
for bbox, prob in zip(bboxes, probs):
  pred = probs.argmax()
  if pred > 0: # ignore if background most probable
    annotations.append(
            ObjectAnnotation(
            value = Rectangle.from_xyhw(*bbox),
            name = classes[pred - 1]
         )
    )

Did this page help you?