Spatial and Semantic Information Enhancement for Indoor 3D Object Detection
Object detection technology is one of the key technologies for indoor service robots. However, due to the various types of objects in the indoor environment, the mutual occlusion between the objects is serious, which increases the difficulty of object detection. In view of the difficult challenges of object detection in the indoor environment, we propose an indoor three- dimensional object detection based on deep learning. Most existing 3D object detection techniques based on deep learning lack sufficient spatial and semantic information. To address this issue, the article presents an indoor 3D object detection method with enhanced spatial semantic information. This article proposes a new (Edge Convolution+) EdgeConv+, and based on it, a Shallow Spatial Information Enhancement module (SSIE) is added to Votenet. At the same time, a new attention mechanism, Convolutional Gated Non-Local+ (CGNL+), is designed to add Deep Semantic Information Enhancement module (DSIE) to Votenet. Experiments show that on the ScanNet dataset, the proposed method is 2.4% and 2.1% higher than Votenet at mAP@0.25 and mAP@0.5, respectively. Furthermore, it has strong robustness to deal with sparse point clouds.
[1] Abeywickrama T., Cheema M., and Taniar D., “k-Nearest Neighbors on Road Networks: A Journey in Experimentation and In-Memory Implementation,” Proceedings of the VLDB Endowment, vol. 9, no. 6, pp. 1-12 , 2016. https://doi.org/10.14778/2904121.2904124
[2] Balcazar J., Dai Y., and Watanabe O., “Provably Fast Training Algorithms for Support Vector Machines,” in Proceedings of the IEEE International Conference on Data Mining, San Jose, pp. 43-50, 2001. doi: 10.1109/ICDM.2001.989499.
[3] Bruna J., Zaremba W., Szlam A., and LeCun Y., “Spectral Networks and Locally Connected Networks on Graphs,” arXiv Preprint arXiv 1312.6203, pp. 1-14, 2014. https://doi.org/10.48550/arXiv.1312.6203
[4] Dalal N. and Triggs B., “Histograms of Oriented Gradients for Human Detection,” in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, San Diego, pp. 886-893, 2005. DOI: 10.1109/CVPR.2005.177
[5] Dai A., Chang A., Savva M., Halber M., Funkhouser T., and Nießner M., “Scannet: Richly-Annotated 3D Reconstructions,” IEEE Conference on Computer Vision and Pattern Recognition, Hawaii, pp. 5828-5839, 2017. https://doi.org/10.48550/arXiv.1702.04405
[6] Engelcke M., Rao D., Wang D., Tong C., and Posner I., “Vote3deep: Fast Object Detection in 3D Point Clouds Using Efficient Convolutional Neural Networks,” in Proceedings of the IEEE International Conference on Robotics and Automation, Singapore, pp. 1355-1361, 2017. doi: 10.1109/ICRA.2017.7989161.
[7] Felzenszwalb P., Girshick R., McAllester D., and Ramanan D., “Object Detection with Discriminatively Trained Part-Based Models,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 32, no. 9, pp. 1627- 1645, 2010. DOI: 10.1109/TPAMI.2009.167
[8] Girshick R., Donahue J., Darrell T., and Malik J., “Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, pp. 580-587, 2014. doi: 10.1109/CVPR.2014.81.
[9] Griffin B., “Mobile Robot Manipulation Using Pure Object Detection,” in Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, Waikoloa, pp. 561-571, 2023. DOI: 10.1109/WACV56688.2023.00063
[10] He K., Gkioxari G., Dollár P., and Girshick R., “Mask R-CNN,” in Proceedings of the IEEE International Conference on Computer Vision, Venice, pp. 2961-2969, 2017. doi: 10.1109/ICCV.2017.322.
[11] Hou J., Dai A., and Nießner M., “3D-SIS: 3D Semantic Instance Segmentation of RGB-D Scans,” IEEE/CVF Conference on Computer Vision and Pattern Recognition, California, pp. 4421-4430, 2019. DOI Bookmark: 10.1109/CVPR.2019.00455
[12] Li B., Zhang T., and Xia T. “Vehicle Detection from 3D Lidar Using Fully Convolutional 838 The International Arab Journal of Information Technology, Vol. 20, No. 5, September 2023 Network,” arXiv Preprint, arXiv 1608.07916, pp.
[13] Li C., Li L., Jiang H., and Weng K., “YOLOv6: A Single-Stage Object Detection Framework for Industrial Applications,” arXiv Preprint, arXiv: 2209.02976, pp. 1-17, 2022. https://arxiv.org/pdf/2209.02976.pdf
[14] Liu W., Anguelov D., Erhan D., Szegedy C., Reed S., Fu C., and Berg A., “SSD: Single Shot MultiBox Detector,” in Proceedings of the 14th European Conference on Computer Vision, Amsterdam, pp. 21-37, 2016. https://doi.org/10.1007/978-3-319-46448-0_2
[15] Lowe D., “Distinctive Image Features from Scale-Invariant Keypoints,” International Journal of Computer Vision, vol. 60, no. 2, pp. 91-110, 2004. DOI:10.1023/B:VISI.0000029664.99615.94
[16] Ojala T., Pietikainen M., and Maenpaa T., “Multiresolution Gray-Scale and Rotation Invariant Texture Classification with Local Binary Patterns,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 24, no. 7, pp. 971-987, 2002. doi: 10.1109/TPAMI.2002.1017623.
[17] Qi C., Litany O., He K., and Guibas L., “Deep Hough Voting for 3D Object Detection in Point Clouds,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, pp. 9277-9286, 2019. doi: 10.1109/ICCV.2019.00937.
[18] Qi C., Liu W., Wu C., Su H., and Guibas L., “Frustum PointNets for 3D Object Detection from RGB-D Data,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Utah, pp. 918-927, 2018. doi: 10.1109/CVPR.2018.00102.
[19] Qi C., Su H., Mo K., and Guibas L., “Pointnet: Deep Learning on Point Sets for 3D Classification and Segmentation,” IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, pp. 652-660, 2017. DOI Bookmark: 10.1109/CVPR.2017.16
[20] Qi C., Yi L., Su H., and Guibas L., “Pointnet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space,” in Proceedings of the 31st Conference on Neural Information Processing Systems, California, pp. 1-10, 2017. https://proceedings.neurips.cc/paper_files/paper/2 017/file/d8bf84be3800d12f74d8b05e9b89836f- Paper.pdf
[21] Rafique A., Jalal A., and Kim K., “Statistical Multi-Objects Segmentation for Indoor/Outdoor Scene Detection and Classification via Depth Images,” in Proceedings of the 17th International Bhurban Conference on Applied Sciences and Technology, Islamabad, pp. 271-276, 2020. doi: 10.1109/IBCAST47879.2020.9044576.
[22] Redmon J., Divvala S., Girshick R., and Farhadi A., “You Only Look Once: Unified, Real-Time Object Detection,” in Proceedigs of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, pp. 779-788, 2016. DOI Bookmark: 10.1109/CVPR.2016.91
[23] Ren S., He K., Girshick R., and Sun J., “Faster R- CNN: Towards Real-Time Object Detection with Region Proposal Networks,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 39, no. 6, pp. 1137-1149, 2017. doi: 10.1109/TPAMI.2016.2577031.
[24] Simon M., Amende K., Kraus A., and Honer J., “Complexer-YOLO: Real-Time 3D Object Detection and Tracking on Semantic Point Clouds,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, California, pp. 1190- 1199, 2019. DOI:10.1109/CVPRW.2019.00158
[25] Song S. and Xiao J., “Deep Sliding Shapes for Amodal 3D Object Detection in RGB-D Images,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, pp. 808-816, 2016. doi: 10.1109/CVPR.2016.94.
[26] Wang C., Bochkovskiy A., and Liao H., “YOLOv7: Trainable Bag-Of-Freebies Sets New State-Of-The-Art for Real-Time Object Detectors,” arXiv Preprint, arXiv 2207.02696, pp. 1-15, 2022. https://arxiv.org/abs/2207.02696
[27] Wang Y., Sun Y., Liu Z., Sarma S., Bronstein M., and Solomon J., “Dynamic Graph CNN for Learning on Point Clouds,” ACM Transactions on Graphics, vol. 1, no. 1, pp. 1-13, 2019. https://arxiv.org/pdf/1801.07829.pdf
[28] Wang Y. and Solomon J., “Object DGCNN: 3D Object Detection Using Dynamic Graphs,” in Proceedings of the 35th Conference on Neural Information Processing Systems, Sydney, pp. 1- 16, 2021. https://arxiv.org/pdf/2110.06923.pdf
[29] Xie Q., Lai Y., Wu J., Wang Z., Zhang Y., Xu K., and Wang J., “Mlcvnet: Multi-Level Context Votenet for 3D Object Detection,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, pp. 10447-10456, 2020. doi: 10.1109/CVPR42600.2020.01046
[30] Yi L., Zhao W., Wang H., Sung M., and Guibas L., “GSPN: Generative Shape Proposal Network for 3D Instance Segmentation in Point Cloud,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, California, pp. 3947-3956, 2019. doi: 10.1109/CVPR.2019.00407.
[31] Yue K., Sun M., Yuan Y., Zhou F., Ding E., and Xu F., “Compact Generalized Non-Local Network,” in Proceedings of the 32nd Conference on Neural Information Processing Systems, Spatial and Semantic Information Enhancement for Indoor... 839 Montréal, pp. 1-10, 2018. https://dl.acm.org/doi/pdf/10.5555/3327757.3327 758
[32] Zhou Y. and Tuzel O., “Voxelnet: End-To-End Learning For Point Cloud Based 3D Object Detection,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Utah, pp. 4490-4499, 2018. DOI Bookmark: 10.1109/CVPR.2018.00472