تخمین زاویة سر در شناسایی چهرۀ انسان با استفاده از روش یادگیری خودنظارتی

نوع مقاله : مقاله پژوهشی

نویسندگان

دانشکده مهندسی صنایع و سیستم‌ها، دانشگاه تربیت مدرّس، تهران، ایران

چکیده

یکی از عناصر مهم در تحلیل ژست افراد، تخمین زاویۀ سر است؛ لیکن یکی از موانع اصلی برای این تخمین، هزینة برچسب‌گذاری تصاویر است. برچسب‌گذاری زاویة سر افراد در تصاویر مختلف فرایندی هزینه‌بر، زمان‌گیر و نیازمند دانش تاحدی تخصصی است. از همین رو تصاویر برچسب‌دار برای مسئلة تخمین زاویة سر در مقایسه با بقیة مسائل بینایی رایانه محدود است. یکی از راه‌حل‌های جبران کمبود برچسب‌ها، استفاده از روش‌های خودنظارتی است. روش‌های خودنظارتی می‌توانند از داده‌های بدون برچسب (تصاویر چهرة افراد)، به طریق پیش آموزش دادن شبکه‌های عصبی ژرف،  ویژگی‌های مناسبی برای تخمین زاویة سر استخراج کنند. در کنار پیش آموزش دادن شبکه‌های ژرف به روش یادگیری خودنظارتی، می‌توان از وظایف خودنظارتی به عنوان تابع هزینة کمکی در کنار وظیفة اصلی تخمین زاویة سر استفاده کرد. این مقاله سعی دارد که تمایز استفاده از روش‌های یادگیری خودنظارتی برای تخمین زاویة سر را نشان دهد. همچنین نشان داده می‌شود که با طراحی معماری یادگیری چند وظیفه‌ای از ترکیب توابع هزینة بانظارت و خود نظارتی، میانگین خطای تخمین زاویة سر تا 29 درصد نسبت به روش پایه بانظارت و معماری HopeNet کاهش می‌یابد. 

کلیدواژه‌ها

  • [1] D. Geronimo, A. M. Lopez, A. D. Sappa, and T. Graf, “Survey of pedestrian detection for advanced driver assistance systems,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 32, no. 7, pp. 1239–1258, 2009.
  • [2] Z. Chen et al., “A realistic face-to-face conversation system based on deep neural networks,” 2019, doi: 10.1109/ICCVW.2019.00315.
  • [3] S. S. Mukherjee and N. M. Robertson, “Deep Head Pose: Gaze-Direction Estimation in Multimodal Video,” IEEE Trans. Multimed., 2015, doi: 10.1109/TMM.2015.2482819.
  • [4] K. Cao, Y. Rong, C. Li, X. Tang, and C. C. Loy, “Pose-Robust Face Recognition via Deep Residual Equivariant Mapping,” 2018, doi: 10.1109/CVPR.2018.00544.
  • [5] Z. Liu, Z. Chen, J. Bai, S. Li, and S. Lian, “Facial pose estimation by deep learning from label distributions,” 2019, doi: 10.1109/ICCVW.2019.00156.
  • [6] N. Ruiz, E. Chong, and J. M. Rehg, “Fine-grained head pose estimation without keypoints,” in Proceedings of the IEEE conference on computer vision and pattern recognition workshops, 2018, pp. 2074–2083.
  • [7] R. Valle, J. M. Buenaposada, and L. Baumela, “Multi-Task Head Pose Estimation in-the-Wild,” IEEE Trans. Pattern Anal. Mach. Intell., 2021, doi: 10.1109/TPAMI.2020.3046323.
  • [8] A. Sheka and V. Samun, “Knowledge Distillation from Ensemble of Offsets for Head Pose Estimation,” arXiv Prepr. arXiv2108.09183, Aug. 2021.
  • [9] J. B. Grill et al., “Bootstrap your own latent a new approach to self-supervised learning,” Adv. Neural Inf. Process. Syst., vol. 33, pp. 21271–21284, 2020.
  • [10] M. Caron et al., “Emerging properties in self-supervised vision transformers,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 9650–9660.
  • [11] X. Chen and K. He, “Exploring simple siamese representation learning,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 15750–15758.
  • [12] E. Cole, X. Yang, K. Wilber, O. Mac Aodha, and S. Belongie, “When does contrastive visual representation learning work?,” arXiv Prepr. arXiv2105.05837, 2021.
  • [13] Y. M. Asano, C. Rupprecht, and A. Vedaldi, “A critical analysis of self-supervision, or what we can learn from a single image,” arXiv Prepr. arXiv1904.13132, 2019.
  • [14] M. Pourmirzaei, G. A. Montazer, and F. Esmaili, “Using Self-Supervised Auxiliary Tasks to Improve Fine-Grained Facial Representation,” arXiv Prepr. arXiv2105.06421., May 2021.
  • [15] X. Yin, X. Yu, K. Sohn, X. Liu, and M. Chandraker, “Towards Large-Pose Face Frontalization in the Wild,” 2017, doi: 10.1109/ICCV.2017.430.
  • [16] G. Fanelli, M. Dantone, J. Gall, A. Fossati, and L. Van Gool, “Random Forests for Real Time 3D Face Analysis,” Int. J. Comput. vision, 101(3), pp.437-458., 2013, doi: 10.1007/s11263-012-0549-0.
  • [17] X. Zhu, Z. Lei, X. Liu, H. Shi, and S. Z. Li, “Face alignment across large poses: A 3d solution,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 146–155.
  • [18] X. Zhang, S. Park, T. Beeler, D. Bradley, S. Tang, and O. Hilliges, “ETH-XGaze: A Large Scale Dataset for Gaze Estimation Under Extreme Head Pose and Gaze Variation,” 2020, doi: 10.1007/978-3-030-58558-7_22.
  • [19] K. Khan, R. U. Khan, R. Leonardi, P. Migliorati, and S. Benini, “Head pose estimation: A survey of the last ten years,” Signal Process. Image Commun., vol. 99, p. 116479, 2021.
  • [20] B. Huang, R. Chen, W. Xu, and Q. Zhou, “Improving head pose estimation using two-stage ensembles with top-k regression,” Image Vis. Comput., 2020, doi: 10.1016/j.imavis.2019.11.005.
  • [21] Y. Zhou and J. Gregson, “WHENet: Real-time Fine-Grained Estimation for Wide Range Head Pose,” arXiv Prepr. arXiv2005.10353., May 2020.
  • [22] M. Xin, S. Mo, and Y. Lin, “Eva-gcn: Head pose estimation based on graph convolutional networks,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 1462–1471.
  • [23] V. Albiero, X. Chen, X. Yin, G. Pang, and T. Hassner, “img2pose: Face alignment and detection via 6dof, face pose estimation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 7617–7627.
  • [24] N. Dhingra, “LwPosr: Lightweight Efficient Fine Grained Head Pose Estimation,” in Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2022, pp. 1495–1505.
  • [25] T. Chen, S. Kornblith, K. Swersky, M. Norouzi, and G. Hinton, “Big Self-Supervised Models are Strong Semi-Supervised Learners,” Advances in neural information processing systems, 33, pp.22243-22255. 2020.
  • [26] J. Zbontar, L. Jing, I. Misra, Y. LeCun, and S. Deny, “Barlow Twins: Self-Supervised Learning via Redundancy Reduction,” arXiv Prepr. arXiv2103.03230, 2021.
  • [27] S. K. Mustikovela et al., “Self-Supervised Viewpoint Learning from Image Collections,” 2020, doi: 10.1109/CVPR42600.2020.00403.
  • [28] X. Zhai, A. Oliver, A. Kolesnikov, and L. Beyer, “S4L: Self-Supervised Semi-Supervised Learning,” Proc. IEEE/CVF Int. Conf. Comput. Vis. (pp. 1476-1485)., May 2019.
  • [29] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770–778.
  • [30] S. Gidaris, P. Singh, and N. Komodakis, “Unsupervised representation learning by predicting image rotations,” arXiv preprint arXiv:1803.07728. 2018.
  • [31] F. Zhuang et al., “A comprehensive survey on transfer learning,” Proc. IEEE, vol. 109, no. 1, pp. 43–76, 2020.
  • [32] J. Zhuang et al., “AdaBelief optimizer: Adapting stepsizes by the belief in observed gradients,” Advances in neural information processing systems, 33, pp.18795-18806. 2020.
  • [33] N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov, “Dropout: A simple way to prevent neural networks from overfitting,” J. Mach. Learn. Res. 15(1), pp.1929-1958., 2014.
  • [34] T. Y. Yang, Y. T. Chen, Y. Y. Lin, and Y. Y. Chuang, “Fsa-net: Learning fine-grained structure aggregation for head pose estimation from a single image,” 2019, doi: 10.1109/CVPR.2019.00118.
  • [35] V. Kazemi and J. Sullivan, “One millisecond face alignment with an ensemble of regression trees,” 2014, doi: 10.1109/CVPR.2014.241.
  • [36] J. Deng, J. Guo, E. Ververas, I. Kotsia, and S. Zafeiriou, “Retinaface: Single-shot multi-level face localisation in the wild,” 2020, doi: 10.1109/CVPR42600.2020.00525.
  • [37] H. Wang, Z. Chen, and Y. Zhou, “Hybrid coarse-fine classification for head pose estimation,” arXiv Prepr. arXiv1901.06778, 2019.
  • [38] Z. Cao, Z. Chu, D. Liu, and Y. Chen, “A vector-based representation to enhance head pose estimation,” 2021, doi: 10.1109/WACV48630.2021.00123.
  • [39] H. W. Hsu, T. Y. Wu, S. Wan, W. H. Wong, and C. Y. Lee, “Quatnet: Quaternion-based head pose estimation with multiregression loss,” IEEE Trans. Multimed., 2019, doi: 10.1109/TMM.2018.2866770.
  • [40] M. Crawshaw, “Multi-task learning with deep neural networks: A survey,” arXiv preprint arXiv:2009.09796. 2020.
  • [41] H. Bao, L. Dong, and F. Wei, “BEiT: BERT Pre-Training of Image Transformers,” arXiv Prepr. arXiv2106.08254, 2021.
دوره 20، شماره 2
پاییز و زمستان 1401
آذر 1401