Bounding box detection models predict where objects are within an image. Predictions from a bounding box model have two components: a bounding box and a classification. The bounding box component is an array with a number of boxes by 4, the localization of the box. The classification is most often single class classification with K+1 channels.
Input: [H, W, C]
Bounding Boxes: [B, (x, y, h, w)]
Classification: [B, K + 1]
classes = ["dog", "cat"] # [B, (x, y, h, w)] bboxes = np.array([ [0., 0., 5., 7.], [2., 0.3, 4., 4.], ]) # [B, K + 1] probs = np.array([ [0, .4, .6], [0, .8, .2] ]) annotations =  for bbox, prob in zip(bboxes, probs): pred = probs.argmax() if pred > 0: # ignore if background most probable annotations.append( ObjectAnnotation( value = Rectangle.from_xyhw(*bbox), name = classes[pred - 1] ) )
Updated about 2 months ago