Repost | Overview of Panoptic Segmentation — towards Real-World Computer Vision (Part.1)
Note: This article has been authorized for reposting by the original author: Overview of Panoptic Segmentation — towards Real-World Computer Vision (Part.1)
Abstract
Panoptic Segmentation unifies typically dense prediction tasks of semantic segmentation and instance segmentation into a holistic task, providing broader application and an essential step toward real-world vision systems. This paper presents progress on this task, including its application, taxonomy of state-of-the-art methods, and related datasets. The segmentation effects are contrasted while the traits and restrictions of each technique are examined.
Introduction
What is Panoptic Segmentation
Computer Vision is an essential task as a sub-area of artificial intelligence (AI), which require the machine to perceive environments and understand visual content as humans. Dense prediction tasks of computer vision, including image segmentation, instance segmentation, and panoptic segmentation. These dense prediction tasks, such as image classification, are more challenging than others because the machine needs to make pixel-level predictions. Furthermore, these tasks transform an image into segments humans can easily understand or process. Panoptic Segmentation is a most valuable task, which requires both segmenting each instance and all stuff categories in a unified framework. We will detail describe it in the following sections.
Panoptic Segmentation is first introduced by [1], which is an essential step toward real-world vision systems. We can roughly classify all semantic categories into objects(e.g., people, cars, bicycles) and stuff(e.g., road, bench, sky). To be specific, objects are countable concepts, meaning they correspond to not only semantic categories but also different instances. For example, as shown in Fig.1different people are labeled as separate objects (various colors). However, stuff means uncountable concepts, which means only one Segmentation for each category.
Panoptic Segmentation means segmenting all objects and stuff simultaneously in a unified framework. The task definition is simple: Each pixel of an image must be assigned a semantic label and an instance id. Pixels with the same label and id belong to the same object; for stuff labels, the instance id is ignored.
Comparison with other typically Segmentation tasks
Semantic Segmentation is one of the gold-standard dense prediction tasks in computer vision, intending to assign a semantic label to each pixel of an image. As shown in Fig2, semantic segmentation is essentially the process of identifying an image class and separating it from the other image classes by overlaying a segmentation mask on top of it. Instead, the main difference between panoptic and semantic segmentation is that the countable objects are labeled as separate instances in panoptic segmentation. As in Fig2(b), all cars are colored blue, and all people are labeled as red, but the different vehicles and people have different colors in Fig2(d).
Instance segmentation is focused on the instance of objects, creating separate segmentation masks for all things and classifies pixels based on individual instances rather than classes. The main difference between panoptic and instance segmentation is that panoptic segmentation requires simultaneously segmenting all instances and stuff categories. So, for example, Fig(c) and Fig(d), it labels objects such as people or cars as different instances but fails to capture the stuff categories such as road and sky.
Here is a concise overview of the primary distinctions between Semantic and Instance segmentation.
Semantic segmentation focuses on investigating amorphous objects, whereas instance segmentation is used to segment things that are well-defined, making classification and detection simpler.
Semantic segmentation generates a unified segmentation mask for all instances of the same class, whereas instance segmentation generates unique segmentation masks for every object in the image.
Above all, panoptic segmentation combines semantic segmentation and instance segmentation. Therefore, it needs to predict all stuff and object semantic categories and separate different objects into different instances.
So, panoptic segmentation is a more challenging task, while it can achieve holistic perception ability to boost machines obtain adequate information in complex environments.
Applications of Panoptic Segmentation
Panoptic segmentation has a powerful perception ability. It can be helpful in many real-world applications. Consequently, numerous case scenarios exist in which panoptic segmentation plays a crucial role. Fig. 5 provides a summary of the most prominent applications involving panoptic segmentation.
Panoptic Segmentation for Medical Image Analysis
Using panoptic segmentation for medical images is a crucial application. Medical images often have complex environments requiring the model to recognize interesting objects and backgrounds. Since the progress of panoptic segmentation, great interest has been paid to using this task in medical images. They employ panoptic segmentation to combine background and foreground knowledge to improve cell segmentation, cancer cell detection, lesion detection, etc.
For example, [3] proposes to use panoptic segmentation to overcome the high variability of object appearances, numerous overlapping objects, and ambiguous object boundaries in medical images. They introduce a Panoptic Feature Fusion Net (PFFNet) to perform broad applicability on various biomedical and biological datasets, including histopathology images, fluorescence microscopy images, and plant phenotype images.[4] propose a panoptic segmentation model which incorporates an auxiliary semantic segmentation branch with the instance branch to integrate global and local features. The model can incorporate complementary information at both global and local levels. As shown in Fig 4, they apply panoptic segmentation in cancer diagnosis and prognosis, which boots performance by a large margin. In [137], the issue of segmenting overlapping nuclei was addressed. A bending loss regularised network for nuclei segmentation was developed. Large curvatures were penalized with high penalties, whereas smaller curvatures were penalized with low penalties and employed as bending loss. This has helped reduce the bending loss and prevent the formation of several nuclei-surrounded outlines.
Panoptic Segmentation for Autonomous self-driving
Important applications of panoptic segmentation include autonomous self-driving autos. To construct an effective autonomous driving system, granular scene comprehension and enhanced scene perception are necessary. [5] propose to use panoptic segmentation to enhance a sensor fusion-based environment perception which could benefit from the rich information provided by panoptic segmentation. Furthermore, panoptic segmentation can assist in accurately parsing images for both semantic (where pixels indicate automobiles, pedestrians, and drivable space, respectively) and instance information (where pixels represent the same car vs. other car objects). Thus the machine control system can use panoptic segmentation to capture better holistic information to make more precise decisions.
For example, in [6] NVIDIA offers an efficient method for pixel-level semantic and instance segmentation of camera pictures using a single DNN capable of performing many tasks. This strategy allows the training of a DNN based on panoptic segmentation, which seeks to comprehend the scene holistically instead of piecemeal. Therefore, a single end-to-end DNN was employed to extract all relevant data while achieving per-frame inference times of around 5ms on an integrated NVIDIA DRIVE AGX platform within a vehicle.
On the other hand, hardware sensors collect LiDAR data for self-driving cars, which boosts research of panoptic segmentation on LiDAR data. Such as in [7]Stefano Gasperini et al. preset a Panoster, a panoptic segmentation method for LiDAR point clouds. As in Fig 5, Their method directly delivers instance IDs with a learning-based clustering solution, embedded in the model and optimized for the pure and non-fragmented cluster. [8] Using panoptic segmentation to understand the semantic class of each point in a LiDAR sweep is important, as well as knowing which instance of that class it belongs to.
Panoptic Segmentation for Remote Sensing
Panoptic segmentation can be used to boost the performance of road condition monitoring or urban planning. Panoptic segmentation can provide extra background and foreground information for precise remote sensing detection. Some methods utilize this task to provide more comprehensive information.
For instance, [9] proposes a new framework to use panoptic segmentation for the scale problem in UAV images., i.e., The vast target scene and small UAV target led to a lack of foreground targets in the segmentation findings and low segmentation mask quality. Typically, a deformable convolution is added to the feature extraction network to improve its ability to extract features. Moreover, the MaskIoU module is developed and incorporated into the instance segmentation branch to improve the overall quality of the foreground target mask. In addition, a collection of UAV-collected data is arranged into the UAV-OUC panoptic segmentation dataset for testing and validating panoptic segmentation models in aerial imagery [144].
Panoptic Segmentation for Other Applications
Panoptic segmentation is also used in the agriculture area. Such as, in [10], they use panoptic segmentation to analyze the behavior of pigs. They follow the relatively new definition of panoptic segmentation and aim at the pixel-accurate segmentation of the individual pigs instead of bounding boxes or key points. Panoptic segmentation provides more information that can be extracted from the segmentation of weight or size of animals. The method is tested on a specially created data set with 1000 hand-labeled images and achieves detection rates of around 95% (F1 Score) despite disturbances such as occlusions and dirty lenses.
[11] propose a new panoptic segmentation system to provide a holistic understanding of the surroundings for the visually impaired. Such as in Fig 8, they use panoptic segmentation to assist the navigation of visually impaired people by efficiently offering both things and stuff awareness in the proximity of the visually impaired.
Reference List:
[1] Panoptic Segmentation; Alexander Kirillov1,2 Kaiming He1 Ross Girshick1 Carsten Rother2 Piotr Dolla ́r1
[2]Panoptic Segmentation: A Review
[3] Panoptic Feature Fusion Net: A Novel Instance Segmentation Paradigm for Biomedical and Biological Images
[4]Nuclei Segmentation via a Deep Panoptic Model with Semantic Feature Fusion
[5]Multi-task Network for Panoptic Segmentation in Automated Driving
[6]Pixel-perfect perception: How ai helps autonomous vehicles see outside the box,
[7] Panoster: End-to-end Panoptic Segmentation of LiDAR Point Clouds
[8] LiDAR Panoptic Segmentation for Autonomous Driving
[9]Panoptic segmentation of UAV images with deformable convolution network and mask scoring
[10]Panoptic Instance Segmentation on Pigs
[11] Panoptic Lintention Network: Towards Efficient Navigational Perception for the Visually Impaired