Improved Semantic Inpainting Architecture Augmented with a Facial Landmark Detector
This paper presents an augmented method for image completion, particularly for images of human faces by leveraging on deep learning based inpainting techniques. Face completion generally tend to be a daunting task because of the relatively low uniformity of a face attributed to structures like eyes, nose, etc. Here, understanding the top level context is paramount for proper semantic completion. The method presented improves upon existing inpainting techniques that reduce context difference by locating the closest encoding of the damaged image in the latent space of a pre-trained deep generator. However, these existing methods fail to consider key facial structures (eyes, nose, jawline, etc.,) and their respective location to each other. This paper mitigates this by introducing a face landmark detector and a corresponding landmark loss. This landmark loss is added to the construction loss between the damaged and generated image and the adversarial loss of the generative model. The model was trained with the celeb A dataset, tools like pyamg, pillow and the OpenCV library was used for image manipulation and facial landmark detection. There are three main weighted parameters that balance the effect of the three loss functions in this paper, namely context loss, landmark loss and prior loss. Experimental results demonstrate that the added landmark loss attributes to better understanding of top-level context and hence the model can generate more visually appealing in painted images than the existing model.The model obtained average Structural Similarity Index (SSIM) and Peak Signal-to-Noise Ratio (PNSR) scores of 0.851 and 33.448 for different orientations of the face and 0.896 and 31.473, respectively, for various types masks.
[1] Afonso M., Bioucas-Dias J., and Figueiredo M., “An Augmented Lagrangian Approach to The Constrained Optimization Formulation of İmaging İnverse Problems,” IEEE Transactions on Image Processing, vol. 20, no. 3, pp. 681- 695, 2011.
[2] Abadi M., Barham P., Chen J., Chen Z., Davis A., Dean J., Devin M., Ghemawat S., Irving G., Isard M., Kudlur M., Levenberg J., Monga R., Moore S., Murray D., Steiner B., Tucker P., Vasudevan V., Warden P., Wicke M., Yu Y., and Zheng X., “Tensorflow: a System for Large- Scale Machine Learning,” in Proceedings of 12th USENIX Symposium on Operating Systems Design and Implementation, Savannah, pp. 265- 283, 2016.
[3] Barnes C., Shechtman E., Finkelstein A., and Goldman D., “Patchmatch: A Randomized Correspondence Algorithm for Structural İmage Editing,” ACM Transactions on Graphics, vol. 28, no. 3, pp. 24, 2009.
[4] Gatys L., Ecker A., and Bethge M., “Texture Synthesis Using Convolutional Neural Networks,” in Proceedings of the 28th International Conference on Neural Information Processing Systems, Montreal Canada, pp. 262- 270, 2015.
[5] Goodfellow I., Pouget-Abadie J., Mirza M., Xu B., Warde-Farley D., Ozair S., Courville A., and Bengio Y., “Generative Adversarial Nets,” in Processing of Advances in Neural İnformation Processing Systems, Montréal CANADA, pp. 1- 9, 2014.
[6] Huang J., Kang S., Ahuja N., and Kopf J., “Image Completion Using Planar Structure Guidance,” ACM Transactions on Graphics, vol. 33, no. 4, pp. 1-10, 2014.
[7] Hays J. and Efros A., “Scene Completion Using Millions of Photographs,” ACM Transactions on Graphics, vol. 26, no. 3, pp. 4, 2007.
[8] Islam, N., Sulaiman N., Al Farid F., Uddin J., Alyami S., Rashid M., Majeed A., and Moni M., “Diagnosis of Hearing Deficiency Using EEG based AEP Signals: CWT and Improved-VGG16 Pipeline,” PeerJ Computer Science, vol. 7, pp. e638, 2021.
[9] Johnson J., Alahi A., and Fei-Fei L., “Perceptual Losses for Real-Time Style Transfer And Super- Resolution,” in Proceedings of European Conference on Computer Vision, Amsterdam, pp. 694-711, 2016.
[10] Jia Y., Shelhamer E., Donahue J., Karayev S., Long J., Girshick R., Guadarrama S., and Darrell T., “Caffe: Convolutional Architecture for Fast Feature Embedding,” in Proceedings of the ACM International Conference on Multimedia, Orlando Florida, pp. 675-678, 2014.
[11] Kazemi V. and Sullivan J., “One Millisecond Face Alignment with an Ensemble of Regression Trees,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, pp. 1867-1874, 2014.
[12] King D., “Dlib-ml: A Machine Learning Toolkit,” Journal of Machine Learning Research, vol. 10, pp. 1755-1758, 2009.
[13] Kingma D. and Ba J., “Adam: A Method for Stochastic Optimization,” in Proceedings of the 3rd International Conference for Learning Representations, San Diego, 2014.
[14] Khondaker A., Khandaker A., and Uddin J., “Computer Vision-based Early Fire Detection Using Enhanced Chromatic Segmentation and Optical Flow Analysis Technique,” The International Arab Journal of Information Technology, vol. 17, no. 6, pp. 947-953, 2020.
[15] Li Y., Liu S., Yang J., and Yang M., “Generative Face Completion,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, vol. 1, no. 2, pp. 3911-3919, 2017.
[16] Liu Z., Luo P., Wang X., and Tang X., “Deep Learning Face Attributes in the Wild,” in Proceedings of the IEEE International Conference on Computer Vision, Santiago, pp. 3730-3738, 2015.
[17] Le V., Brandt J., Lin Z., Bourdev L., and Huang T., “Interactive Facial Feature Localization,” in Proceedings European Conference on Computer Vision, Florence, pp. 679-692, 2012.
[18] LeCun Y., Bottou L., Orr G., and M¨uller K., Efficient backprop,” in Neural Networks: Tricks of the Trade, Springer, 2012.
[19] Le Cun Y., “A Theoretical Framework for Back- Propagation,” in Proceedings of the Connectionist Models Summer School, Pittsburg, pp. 21-28, 1988.
[20] Meir R. and Rätsch G., Advanced Lectures on Machine Learning, Springer, 2003.
[21] Pathak D., Kr¨ahenb¨uhl P., Donahue J., Darrell T., and Efros A., “Context Encoders: Feature Learning by İnpainting,” in Proceedings of IEEE Confrence on CVPR, Las Vegas, pp. 2536-2544, 2016.
[22] P´erez P., Gangnet M., and Blake A., “Poisson İmage Editing,” ACM Transactions on Graphics, vol. 22, no. 3, pp. 313-318, 2003.
[23] Radford A., Metz L., and Chintala S., Unsupervised Representation Learning with Deep Convolutional Generative Adversary Networks,” in Proceeding of 4th International Conference on Learning Representations, San Juan, pp. 1-15, 2016.
[24] Sagonas C., Tzimiropoulos G., Zafeiriou S., and Pantic M., “300 Faces in-The-Wild Challenge: the First Facial Landmark Localization 362 The International Arab Journal of Information Technology, Vol. 19, No. 3, May 2022 Challenge,” in Proceedings of the IEEE International Conference on Computer Vision Workshops, Sydney, pp. 397-403, 2013.
[25] Shoron S., Islam M., Uddin J., Shon D., Im K., Park J., Lim D., Jang B., and Kim J., “A Watermarking Technique for Biomedical Images,” Electronics, vol. 8, no. 9, pp. 975, 2019.
[26] Simonyan K., Vedaldi A., and Zisserman A., “Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps,” arXiv Preprint arXiv: 1312.6034, 2013.
[27] Whyte O., Sivic J., and Zisserman A., “Get out of my Picture! İnternet-Based İnpainting,” in Proceedings of the British Machine Vision Conference, London, pp. 1-11, 2009.
[28] Yeh A., Chen C., Lim T., Schwing A., Hasegawa-Johnson M., and Do M., “Semantic İmage İnpainting with Deep Generative Models,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, pp. 5485-5493, 2017.
[29] Zhu C., Byrd R., Lu P., and Nocedal J., “Algorithm 778: L-BFGS-B: FORTRAN Subroutines for Large-Scale Bound-Constrained Optimization,” ACM Transactions on Mathematical Software, vol. 23, no. 4, pp. 550- 560, 1997.