Image segmentation

Single-class segmentation

A model makes a prediction for each class (K) at each point (H, W) in the output. Typically an extra background class is added for a final size [K + 1] length dimension.

The final prediction is constructed by taking the argmax over the last dimension to get the most probable class at each point. Single class segmentation models are most often trained using softmax and the sum of all the classes will equal 1.0.

Input: [H, W, C]
Label: [H, W] or [H, W, K + 1]
Output: [H, W, K + 1]
Prediction: [H, W]

Code sample

# Get the most probable class from predictions [H, W, K+1]
arr = np.argmax(prediction, axis=-1)
# Output [H, W]
[[0, 0, 0],
 [1, 1, 1],
 [2, 2, 2]]

# Construct a single MaskData
md = MaskData.from_2D_arr(arr)
# Create an annotations for each non-background class
annotations = [
    ObjectAnnotation(value=Mask(mask=md, color=1), name="dog"),
    ObjectAnnotation(value=Mask(mask=md, color=2), name="cat"),
]

Multi-class segmentation

A model makes a prediction for each class (K) at each point (H, W) in the output. The final prediction is constructed by thresholding the predictions at each point. If the probability is over the threshold the class is detected at the point. Multi-class segmentation models are most often trained with sigmoid cross-entropy. The value at each point will be between 0 and 1.

Input: [H, W, C]
Label: [H, W, K]
Output: [H, W, K]
Prediction: [H, W, K]

Code sample

# Get the most probable class from predictions [H, W, K]
threshold = 0.5
arr = (prediction > threshold).astype(np.uint8) # threshold and convert to uint8
# arr: [H, W, K]
arr[:, :, 0]
# One channel
[[0, 0, 0],
 [1, 1, 1],
 [0, 0, 0]]

annotations = []
for idx, name in enumerate(["dog", "cat"]):
    md = MaskData.from_2D_arr(arr[idx]) # new mask data for each class
    annotations.append(
        ObjectAnnotation(value=Mask(mask=md, color=idx), name=name)
    )

Did this page help you?