An investigation of regression as an avenue to find precision-runtime trade-off for object segmentation
- The ability to finely segment different instances of various objects in an environment forms a critical tool in the perception tool-box of any autonomous agent. Traditionally instance segmentation is treated as a multi-label pixel-wise classification problem. This formulation has resulted in networks that are capable of producing high-quality instance masks but are extremely slow for real-world usage, especially on platforms with limited computational capabilities. This thesis investigates an alternate regression-based formulation of instance segmentation to achieve a good trade-off between mask precision and run-time. Particularly the instance masks are parameterized and a CNN is trained to regress to these parameters, analogous to bounding box regression performed by an object detection network. In this investigation, the instance segmentation masks in the Cityscape dataset are approximated using irregular octagons and an existing object detector network (i.e., SqueezeDet) is modified to regresses to the parameters of these octagonal approximations. The resulting network is referred to as SqueezeDetOcta. At the image boundaries, object instances are only partially visible. Due to the convolutional nature of most object detection networks, special handling of the boundary adhering object instances is warranted. However, the current object detection techniques seem to be unaffected by this and handle all the object instances alike. To this end, this work proposes selectively learning only partial, untainted parameters of the bounding box approximation of the boundary adhering object instances. Anchor-based object detection networks like SqueezeDet and YOLOv2 have a discrepancy between the ground-truth encoding/decoding scheme and the coordinate space used for clustering, to generate the prior anchor shapes. To resolve this disagreement, this work proposes clustering in a space defined by two coordinate axes representing the natural log transformations of the width and height of the ground-truth bounding boxes. When both SqueezeDet and SqueezeDetOcta were trained from scratch, SqueezeDetOcta lagged behind the SqueezeDet network by a massive ≈ 6.19 mAP. Further analysis revealed that the sparsity of the annotated data was the reason for this lackluster performance of the SqueezeDetOcta network. To mitigate this issue transfer-learning was used to fine-tune the SqueezeDetOcta network starting from the trained weights of the SqueezeDet network. When all the layers of the SqueezeDetOcta were fine-tuned, it outperformed the SqueezeDet network paired with logarithmically extracted anchors by ≈ 0.77 mAP. In addition to this, the forward pass latencies of both SqueezeDet and SqueezeDetOcta are close to ≈ 19ms. Boundary adhesion considerations, during training, resulted in an improvement of ≈ 2.62 mAP of the baseline SqueezeDet network. A SqueezeDet network paired with logarithmically extracted anchors improved the performance of the baseline SqueezeDet network by ≈ 1.85 mAP. In summary, this work demonstrates that if given sufficient fine instance annotated data, an existing object detection network can be modified to predict much finer approximations (i.e., irregular octagons) of the instance annotations, whilst having the same forward pass latency as that of the bounding box predicting network. The results justify the merits of logarithmically extracted anchors to boost the performance of any anchor-based object detection network. The results also showed that the special handling of image boundary adhering object instances produces more performant object detectors.
Document Type: | Master's Thesis |
---|---|
Language: | English |
Author: | Arun Rajendra Prabhu |
Number of pages: | xvi, 124 |
ISBN: | 978-3-96043-086-5 |
ISSN: | 1869-5272 |
URN: | urn:nbn:de:hbz:1044-opus-51115 |
DOI: | https://doi.org/10.18418/978-3-96043-086-5 |
Advisor: | Paul G. Plöger, André Hinkenjann, Stefan Eickeler |
Publishing Institution: | Hochschule Bonn-Rhein-Sieg |
Granting Institution: | Hochschule Bonn-Rhein-Sieg, Fachbereich Informatik |
Contributing Corporation: | Bonn-Aachen International Center for Information Technology (b-it); Fraunhofer-Institut für Intelligente Analyse- und Informationssysteme IAIS |
Date of first publication: | 2020/11/10 |
Series (Volume): | Technical Report / Hochschule Bonn-Rhein-Sieg University of Applied Sciences. Department of Computer Science (05-2020) |
Keyword: | autonomous driving; computer vision; convolutional neural networks; deep learning; instance segmentation; object detection |
Departments, institutes and facilities: | Fachbereich Informatik |
Dewey Decimal Classification (DDC): | 0 Informatik, Informationswissenschaft, allgemeine Werke / 00 Informatik, Wissen, Systeme / 004 Datenverarbeitung; Informatik |
Series: | Technical Report / University of Applied Sciences Bonn-Rhein-Sieg. Department of Computer Science |
Entry in this database: | 2020/11/10 |
Licence (Multiple languages): | In Copyright - Educational Use Permitted (Urheberrechtsschutz - Nutzung zu Bildungszwecken erlaubt) |