..............................
..............................
..............................

Improved Semantic Inpainting Architecture
This paper presents an augmented method for image completion, particularly for images of human faces by
leveraging on deep learning based inpainting techniques. Face completion generally tend to be a daunting task because of the
relatively low uniformity of a face attributed to structures like eyes, nose, etc. Here, understanding the top level context is
paramount for proper semantic completion. The method presented improves upon existing inpainting techniques that reduce
context difference by locating the closest encoding of the damaged image in the latent space of a pre-trained deep generator.
However, these existing methods fail to consider key facial structures (eyes, nose, jawline, etc.,) and their respective location
to each other. This paper mitigates this by introducing a face landmark detector and a corresponding landmark loss. This
landmark loss is added to the construction loss between the damaged and generated image and the adversarial loss of the
generative model. The model was trained with the celeb A dataset, tools like pyamg, pillow and the OpenCV library was used
for image manipulation and facial landmark detection. There are three main weighted parameters that balance the effect of the
three loss functions in this paper, namely context loss, landmark loss and prior loss. Experimental results demonstrate that the
added landmark loss attributes to better understanding of top-level context and hence the model can generate more visually
appealing in painted images than the existing model.The model obtained average Structural Similarity Index (SSIM) and Peak
Signal-to-Noise Ratio (PNSR) scores of 0.851 and 33.448 for different orientations of the face and 0.896 and 31.473,
respectively, for various types masks.
[1] Afonso M., Bioucas-Dias J., and Figueiredo M., “An Augmented Lagrangian Approach to The Constrained Optimization Formulation of İmaging İnverse Problems,” IEEE Transactions on Image Processing, vol. 20, no. 3, pp. 681- 695, 2011.
[2] Abadi M., Barham P., Chen J., Chen Z., Davis A., Dean J., Devin M., Ghemawat S., Irving G., Isard M., Kudlur M., Levenberg J., Monga R., Moore S., Murray D., Steiner B., Tucker P., Vasudevan V., Warden P., Wicke M., Yu Y., and Zheng X., “Tensorflow: a System for Large- Scale Machine Learning,” in Proceedings of 12th USENIX Symposium on Operating Systems Design and Implementation, Savannah, pp. 265- 283, 2016.
[3] Barnes C., Shechtman E., Finkelstein A., and Goldman D., “Patchmatch: A Randomized Correspondence Algorithm for Structural İmage Editing,” ACM Transactions on Graphics, vol. 28, no. 3, pp. 24, 2009.
[4] Gatys L., Ecker A., and Bethge M., “Texture Synthesis Using Convolutional Neural Networks,” in Proceedings of the 28th International Conference on Neural Information Processing Systems, Montreal Canada, pp. 262- 270, 2015.
[5] Goodfellow I., Pouget-Abadie J., Mirza M., Xu B., Warde-Farley D., Ozair S., Courville A., and Bengio Y., “Generative Adversarial Nets,” in Processing of Advances in Neural İnformation Processing Systems, Montréal CANADA, pp. 1- 9, 2014.
[6] Huang J., Kang S., Ahuja N., and Kopf J., “Image Completion Using Planar Structure Guidance,” ACM Transactions on Graphics, vol. 33, no. 4, pp. 1-10, 2014.
[7] Hays J. and Efros A., “Scene Completion Using Millions of Photographs,” ACM Transactions on Graphics, vol. 26, no. 3, pp. 4, 2007.
[8] Islam, N., Sulaiman N., Al Farid F., Uddin J., Alyami S., Rashid M., Majeed A., and Moni M., “Diagnosis of Hearing Deficiency Using EEG based AEP Signals: CWT and Improved-VGG16 Pipeline,” PeerJ Computer Science, vol. 7, pp. e638, 2021.
[9] Johnson J., Alahi A., and Fei-Fei L., “Perceptual Losses for Real-Time Style Transfer And Super- Resolution,” in Proceedings of European Conference on Computer Vision, Amsterdam, pp. 694-711, 2016.
[10] Jia Y., Shelhamer E., Donahue J., Karayev S., Long J., Girshick R., Guadarrama S., and Darrell T., “Caffe: Convolutional Architecture for Fast Feature Embedding,” in Proceedings of the ACM International Conference on Multimedia, Orlando Florida, pp. 675-678, 2014.
[11] Kazemi V. and Sullivan J., “One Millisecond Face Alignment with an Ensemble of Regression Trees,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, pp. 1867-1874, 2014.
[12] King D., “Dlib-ml: A Machine Learning Toolkit,” Journal of Machine Learning Research, vol. 10, pp. 1755-1758, 2009.
[13] Kingma D. and Ba J., “Adam: A Method for Stochastic Optimization,” in Proceedings of the 3rd International Conference for Learning Representations, San Diego, 2014.
[14] Khondaker A., Khandaker A., and Uddin J., “Computer Vision-based Early Fire Detection Using Enhanced Chromatic Segmentation and Optical Flow Analysis Technique,” The International Arab Journal of Information Technology, vol. 17, no. 6, pp. 947-953, 2020.
[15] Li Y., Liu S., Yang J., and Yang M., “Generative Face Completion,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, vol. 1, no. 2, pp. 3911-3919, 2017.
[16] Liu Z., Luo P., Wang X., and Tang X., “Deep Learning Face Attributes in the Wild,” in Proceedings of the IEEE International Conference on Computer Vision, Santiago, pp. 3730-3738, 2015.
[17] Le V., Brandt J., Lin Z., Bourdev L., and Huang T., “Interactive Facial Feature Localization,” in Proceedings European Conference on Computer Vision, Florence, pp. 679-692, 2012.
[18] LeCun Y., Bottou L., Orr G., and M¨uller K., Efficient backprop,” in Neural Networks: Tricks of the Trade, Springer, 2012.
[19] Le Cun Y., “A Theoretical Framework for Back- Propagation,” in Proceedings of the Connectionist Models Summer School, Pittsburg, pp. 21-28, 1988.
[20] Meir R. and Rätsch G., Advanced Lectures on Machine Learning, Springer, 2003.
[21] Pathak D., Kr¨ahenb¨uhl P., Donahue J., Darrell T., and Efros A., “Context Encoders: Feature Learning by İnpainting,” in Proceedings of IEEE Confrence on CVPR, Las Vegas, pp. 2536-2544, 2016.
[22] P´erez P., Gangnet M., and Blake A., “Poisson İmage Editing,” ACM Transactions on Graphics, vol. 22, no. 3, pp. 313-318, 2003.
[23] Radford A., Metz L., and Chintala S., Unsupervised Representation Learning with Deep Convolutional Generative Adversary Networks,” in Proceeding of 4th International Conference on Learning Representations, San Juan, pp. 1-15, 2016.
[24] Sagonas C., Tzimiropoulos G., Zafeiriou S., and Pantic M., “300 Faces in-The-Wild Challenge: the First Facial Landmark Localization 362 The International Arab Journal of Information Technology, Vol. 19, No. 3, May 2022 Challenge,” in Proceedings of the IEEE International Conference on Computer Vision Workshops, Sydney, pp. 397-403, 2013.
[25] Shoron S., Islam M., Uddin J., Shon D., Im K., Park J., Lim D., Jang B., and Kim J., “A Watermarking Technique for Biomedical Images,” Electronics, vol. 8, no. 9, pp. 975, 2019.
[26] Simonyan K., Vedaldi A., and Zisserman A., “Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps,” arXiv Preprint arXiv: 1312.6034, 2013.
[27] Whyte O., Sivic J., and Zisserman A., “Get out of my Picture! İnternet-Based İnpainting,” in Proceedings of the British Machine Vision Conference, London, pp. 1-11, 2009.
[28] Yeh A., Chen C., Lim T., Schwing A., Hasegawa-Johnson M., and Do M., “Semantic İmage İnpainting with Deep Generative Models,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, pp. 5485-5493, 2017.
[29] Zhu C., Byrd R., Lu P., and Nocedal J., “Algorithm 778: L-BFGS-B: FORTRAN Subroutines for Large-Scale Bound-Constrained Optimization,” ACM Transactions on Mathematical Software, vol. 23, no. 4, pp. 550- 560, 1997. Mirza Sami completed B.Sc. Degree in Computer Science and Engineering (CSE) from Brac University (BracU), Bangladesh in 2018. Currently, he is a PhD student at University of Alabama at Birmingham, USA. His research interests are computer vision and artificial intelligence. Israt Naiyer completed B.Sc. Degree in Computer Science and Engineering (CSE) from Brac University (BracU), Bangladesh in 2018. Currently, she is a Software Engineer, SQA at Therap Service LLC. Her research interests are Artificial Intelligence, Computer Vision, Program Ananlysis. Ehsanul Khan completed his B.Sc. Degree in CSE from BracU, Bangladesh in 2019. His research interests include fire detection and computer vision. Jia Uddin received Ph.D. in Computer Engineering from the University of Ulsan, Korea, in January 2015. He is an Assistant Professor in AI and Big Data Department, Endicott College, Woosong University, South Korea and an Associate Professor (On Leave), Computer Science and Engineering Department at BracU, Bangladesh. His research interests include fault diagnosis, computer vision, and multimedia signal processing. He is the corresponding author of this paper.