Remove Small Components from a Mask#
Segmentation outputs often have noise: small disconnected regions that aren’t part of the main object. This guide shows how to filter them.
By minimum size#
Remove all connected components smaller than a threshold:
cleaned = mask.remove_small_components(min_size=100)
This keeps only components with at least 100 pixels.
Keep only the largest#
If you know there’s exactly one object, keep only the largest component:
largest = mask.largest_connected_component()
Connectivity#
Both methods accept a connectivity parameter:
connectivity=4: only horizontal/vertical neighbors count (default)connectivity=8: diagonal neighbors also count
# Stricter connectivity (no diagonals)
cleaned = mask.remove_small_components(min_size=100, connectivity=4)
Get all components separately#
To inspect or filter components with custom logic:
components = mask.connected_components()
# Filter by your own criteria
kept = [c for c in components if c.area() > 50 and c.bbox()[2] > 10]
# Recombine
result = RLEMask.union(kept) if kept else RLEMask.zeros(mask.shape)
Get components with stats#
For efficient filtering based on component properties, get stats in a single pass:
components, stats = mask.connected_components_with_stats()
areas, bboxes, centroids = stats
# areas: shape (n,) - pixel count per component
# bboxes: shape (n, 4) - [x, y, width, height] per component
# centroids: shape (n, 2) - [x, y] center of mass per component
Filter with a custom function#
Pass a filter function that receives stats arrays and returns a boolean mask. Only matching components are extracted (more efficient than extracting all then filtering):
# Keep components that are large enough and roughly square
def my_filter(areas, bboxes, centroids):
widths, heights = bboxes[:, 2], bboxes[:, 3]
aspect_ratios = widths / np.maximum(heights, 1)
return (areas > 100) & (aspect_ratios > 0.5) & (aspect_ratios < 2.0)
components, stats = mask.connected_components_with_stats(filter_fn=my_filter)
Filter by location#
Keep only components whose centroid is in a region of interest:
def in_roi(areas, bboxes, centroids):
x, y = centroids[:, 0], centroids[:, 1]
return (x > 100) & (x < 500) & (y > 50) & (y < 400)
components, _ = mask.connected_components_with_stats(filter_fn=in_roi)