
Exploring the Intersection of Information Theory and Machine Learning
This study addresses the need for a unified framework demonstrating Information Theory’s (IT) pervasive impact across diverse Machine Learning (ML) tasks. We investigate how IT principles-including entropy, Mutual Information (MI), cross-entropy, KL-divergence, and Information Gain (IG)-rigorously guide ML model design, optimization, and interpretability. Our approach combines theoretical elucidation with empirical validation on standard benchmarks. IT enhances feature selection; for instance, MI-ranked features in the breast cancer dataset improved classifier accuracy to 95.1% (top 20) and 93% (top 5), outperforming F-score selection. It also improves model training; cross-entropy loss in Neural Networks (NNs) for Iris classification led to faster convergence and high accuracy (0.98 training, 0.95 validation), surpassing MSE loss. For generative models, KL-divergence effectively structures Variational Auto-Encoder (VAE) latent spaces from Modified National Institute of Standards and Technology (MNIST) data, promoting compact, continuous representations ideal for generation. Finally, the Information Bottleneck (IB) principle, applied to Canadian Institute For Advanced Research (CIFAR-100), yielded competitive test accuracy (51% vs. 50% for baseline Convolutional Neural Network. (CNN)) and reduced training time (925.02s vs. 1015.75s), highlighting its efficacy in learning compressed, predictive representations. These findings collectively underscore its continued crucial role as a unifying paradigm for addressing fundamental challenges in the evolving ML ecosystem, providing solutions for feature selection, model robustness, and generalization.
[1] Abril-Pla O., Andreani V., Carroll C., Dong L., and et al., “PyMC: A Modern and Comprehensive Probabilistic Programming Framework in Python,” Peer Journal Computer Science, vol. 9, Exploring the Intersection of Information Theory and Machine Learning 857 pp. 1-35, 2023. DOI:10.7717/peerj-cs.1516
[2] Algarni M. and Ben Ismael M., “A Dynamic Deep-Learning Approach for Predicting Information Diffusion,” International Journal of Advances in Soft Computing and its Applications, vol. 15, no. 3, pp. 132-149, 2023. DOI:10.15849/IJASCA.231130.09
[3] Ali A., Naeem S., Anam S., and Ahmed M., “Shannon Entropy in Artificial Intelligence and its Applications Based on Information Theory,” Journal of Applied and Emerging Sciences, vol. 13, no. 1, pp. 9-17, 2023. file:///C:/Users/acit2k/Downloads/549-1496-2- PB.pdf
[4] Alia M., Hnaif A., Alrawashdeh A., Jaradat Y., Masoud M., Manasrah A., and AlShanty A., “Robust Image Watermarking Using DWT, DCT, and PSO with CNN-Based Attack Evaluation,” The International Arab Journal of Information Technology, vol. 21, no. 6, pp. 967-977, 2024. DOI:10.34028/iajit/21/6/1
[5] Alqudah A., Jaradat Y., Alobaydi B., Alqudah D., Alobaydi E., and Jarah B., “Artificial Intelligence in Design and Impact on Electronic Marketing in Companies,” Journal of Ecohumanism, vol. 3, no. 4, pp. 170-179, 2024. DOI:10.62754/joe.v3i4.3480
[6] Alrumaidhi M., Farag M., and Rakha H., “Comparative Analysis of Parametric and Non- Parametric Data-Driven Models to Predict Road Crash Severity Among Elderly Drivers Using Synthetic Resampling Techniques,” Sustainability, vol. 15, no. 13, pp. 1-30, 2023. https://doi.org/10.3390/su15139878
[7] Asperti A. and Trentin M., “Balancing Reconstruction Error and Kullback-Leibler Divergence in Variational Autoencoders,” IEEE Access, vol. 8, pp. 199440-199448, 2020. DOI:10.1109/ACCESS.2020.3034828
[8] Avery J., Information Theory and Evolution, World Scientific, 2012. https://doi.org/10.1142/12668
[9] Bikku T., “Multi-Layered Deep Learning Perceptron Approach for Health Risk Prediction,” Journal of Big Data, vol. 7, no. 1, pp. 1-14, 2020. https://doi.org/10.1186/s40537-020-00316-7
[10] Dargan S., Kumar M., Ayyagari M., and Kumar G., “A Survey of Deep Learning and its Applications: A New Paradigm to Machine Learning,” Archives of Computational Methods in Engineering, vol. 27, no. 4, pp. 1071-1092, 2020. https://doi.org/10.1007/s11831-019-09344-w
[11] Fan J., Li R., Zhang C., and Zou H., Statistical Foundations of Data Science, Chapman and Hall/CRC, 2020. https://doi.org/10.1201/9780429096280
[12] Jaradat Y., Masoud M., Jannoud I., Manasrah A., and Alia M., “A Tutorial on Singular Value Decomposition with Applications on Image Compression and Dimensionality Reduction,” in Proceedings of the International Conference on Information Technology, Amman, pp. 769-772, 2021. DOI:10.1109/ICIT52682.2021.9491732
[13] Jeon H. and Roy B., “Information-Theoretic Foundations for Machine Learning,” arXiv Preprint, vol. arXiv:2407.12288v4, pp.1-96, 2024. https://doi.org/10.48550/arXiv.2407.12288
[14] Kanaan T., Kanaan G., Al-shalabi R., and Aldaaja A., “Offensive Language Detection in Social Networks for Arabic Language Using Clustering Techniques,” International Journal of Advances in Soft Computing and its Applications, vol. 13, no. 2, pp. 95-111, 2021. https://www.icsrs.org/Volumes/ijasca/2022.03.13. pdf
[15] Khder M. and Fujo S., “Applying Machine Learning-Supervised Learning Techniques for Tennis Players Dataset Analysis,” International Journal of Advances in Soft Computing and its Applications, vol. 14, no. 3, pp. 190-214, 2022. DOI:10.15849/IJASCA.221128.13
[16] Mahesh B., “Machine Learning Algorithms-A Review,” International Journal of Science and Research, vol. 9, no. 1, pp. 381-386, 2020. DOI:10.21275/ART20203995
[17] Mao A., Mohri M., and Zhong Y., “Cross-Entropy Loss Functions: Theoretical Analysis and Applications,” in Proceedings of the 40th International Conference on Machine Learning, Honolulu, pp. 23803-23828, 2023. https://dl.acm.org/doi/10.5555/3618408.3619400
[18] Naskath J., Sivakamasundari G., and Begum A., “A Study on Different Deep Learning Algorithms Used in Deep Neural Nets: MLP, SOM, and DBN,” International Journal Wireless Personal Communications, vol. 128, pp. 2913-2936, 2023. https://doi.org/10.1007/s11277-022-10079-4
[19] Ribeiro M., Henriques T., Castro L., Souto A., Antunes L., Santos C., and Teixeira A., “The Entropy Universe,” Entropy, vol. 23, no. 2, pp. 1- 35, 2021. https://doi.org/10.3390/e23020222
[20] Ruby U., Theerthagiri P., Jacob J., and Vamsidhar Y., “Binary Cross Entropy with Deep Learning Technique for Image Classification,” International Journal of Advanced Trends in Computer Science and Engineering, vol. 13, no. 4, pp. 1-5, 2020. https://doi.org/10.30534/ijatcse/2020/175942020
[21] Sarker I., “Machine Learning: Algorithms, Real- World Applications and Research Directions,” SN Computer Science, vol. 2, pp. 1-21, 2021. https://doi.org/10.1007/s42979-021-00592-x
[22] Tangirala S., “Evaluating The Impact of GINI Index and Information Gain on Classification Using Decision Tree Classifier Algorithm,” International Journal of Advanced Computer Science and Applications, vol. 11, no. 2, pp. 1-8, 858 The International Arab Journal of Information Technology, Vol. 22, No. 5, September 2025 2020. DOI:10.14569/IJACSA.2020.0110277
[23] Vajda D., Pekar A., and Farkas K., “Towards Machine Learning-Based Anomaly Detection on Time-Series Data,” Infocommunications Journal, vol. 13, no. 1, pp. 35-44, 2021. DOI:10.36244/ICJ.2021.1.5
[24] Zavalsız M., Alhajj S., Sailunaz K., Ozyer T., and Alhajj R., “A Comparative Study of Different Pre- Trained Deeplearning Models and Custom CNN for Pancreatic Tumor Detection,” The International Arab Journal of Information Technology, vol. 20, no. 3, pp. 515-526, 2023. DOI:10.34028/iajit/20/3A/9
[25] Zhou H., Wang X., and Zhu R., “Feature Selection Based on Mutual Information with Correlation Coefficient,” Applied Intelligence, vol. 52, pp. 5457-5474, 2022. https://doi.org/10.1007/s10489- 021-02524-x
[26] Zhou Y., Wang X., Zhang M., Zhu J., Zheng R., and Wu Q., “MPCE: A Maximum Probability Based Cross Entropy Loss Function for Neural Network Classification,” IEEE Access, vol. 7, pp. 146331-146341, 2019. DOI:10.1109/ACCESS.2019.2946264