Efficient Object Detection with Super High-Resolution Image

Object detection in Ultra High-Resolution (UHR) images has long been a challenging problem in computer vision due to the varying scales of the targeted objects. When it comes to barcode detection, resizing UHR input images to smaller sizes often leads to the loss of pertinent information, while processing them directly is highly in-efficient and computationally expensive. In this paper, we propose using semantic segmentation to achieve a fast and accurate detection of barcodes of various scales in UHR images. Our pipeline involves a modified Region Proposal Network (RPN) on images of size greater than 10k×10k and a newly proposed Y-Net segmentation network, followed by a post-processing workflow for fitting a bounding box around each segmented barcode mask. The end-to-end system has a latency of 16 milliseconds, which is 2.5× faster than YOLOv4 and 5.9× faster than Mask R-CNN. In terms of accuracy, our method outperforms YOLOv4 and Mask R-CNN by a mAP of 5.5% and 47.1% respectively, on a synthetic dataset. We have made available the generated synthetic barcode dataset and its code at http://www.github.com/viplabB/SBD/.


  • Jerome Quenum, University of California - Berkeley, link

  • Avideh Zakhor, University of California - Berkeley, link

  • Kehan Wang, University of California - Berkeley, link


Barcodes are digital signs made of adjacent and alternating black and white smaller rectangles. Despite the great progress made in deep Learning, detecting them in high-resolution images has proven to be a difficult task. Over the years, barcodes have increasingly become part of human interaction in many fields. In administration, they are used to encode, save, and retrieve various users’ information; in businesses such as grocery stores, they are used to track sales and inventories and in hospitals, they are used to track and retrieve patients' data. More interestingly, in warehouses, their detection will facilitate the automation process involved in manipulating different packages. In this project, we aim to achieve a fast and accurate detection of such barcodes in high-resolution images (HR) in order to facilitate the automation pipeline. In doing so, we used 2 main pools of datasets. Specifically, we generated about 100k high-resolution synthetic barcode images and utilized a database of 3.8 million high-resolution real barcode images with their corresponding ground truths. Our proposed approach involves first using a Region Proposal  Network (RPN) on a low-resolution version of the input image that will be remapped to the original using an appropriate perspective transformation. We then input the proposed regions into Y-Net, a segmentation network that we developed in the process. Following this stage, bounding-boxes coordinates were extracted from the resulting segmentation masks. In conclusion, our pipeline has several advantages: i. The alternated RPN being used allows to significantly reduce the inference time for the proposed regions. ii. Y-Net with its few parameters allows via its segmentation pipeline for fast and accurate detection/localization of barcodes regardless of how small and how large they may be in the proposed regions.


Closing Report - September 2021