The International Arab Journal of Information Technology (IAJIT)


3D VAE Video Prediction Model with Kullback Leibler Loss Enhancement

The Video Prediction (VP) models adopted many techniques to build suitable structures to extract the spatiotemporal features and predict the future frame. The VP techniques extracted the spatial and temporal features in separated models and then fused both features to generate the future frame. However, these architectures suffered from the design complexity and time for prediction required. So, many efforts introduced VP based on decreasing design complexity and producing good results. This study produces the VP model based on a Three-Dimensional Variational Auto Encoder (3D VAE). The proposed model builds all layers depending on 3D convolutional layers. This leads to better extraction of spatiotemporal information and decreases the design complexity. Second, the Kullback Leibler Loss (KL Loss) is enhanced by a 3D sampling stage which allows to calculation of the 3D latent loss. This helps to extract the better and proper spatiotemporal latent variable from the 3D Encoder. The 3D sampling represents a good regularizer in the model. The proposed model outperforms in terms of SNR=34.8673, SSIM= 0.9616 which applied to Karlsruhe Institute of Technology and Toyota Technological Institute (KITTI) and Caltech pedestrian datasets and records 5.2 M parameters.

