Horizontal Sequence Pooling Technique in Convolutional Neural Networks to Optimize Feature Extraction for DNA Sequence Classification
The exact positioning of features within the sequence is important in Deoxyribonucleic Acid (DNA) sequence classification, as it encodes the unique genetic information of each organism. In Convolutional Neural Networks (CNNs), pooling techniques are vital for efficient feature extraction. However, traditional pooling techniques demonstrated some limitations in domain-specific pooling for sequence-based data analysis, specifically, lack of positional sensitivity, thereby, encountering information loss. To address these constraints, this study introduces Horizontal Sequence Pooling (HSP), a novel pooling technique that enhances feature extraction by applying positional pooling of sequences across the horizontal axis of the feature maps. The CNN model framework was optimized through data preprocessing and hyper-parameter tuning. The results validate that HSP significantly outperforms traditional pooling techniques across multiple metrics. It achieved a reduction in feature parameters by as high as 96% and validation loss by 19%. Furthermore, HSP attained the highest accuracy of 96%, a Matthews Correlation Coefficient (MCC) of 96%, and an Area-Under-the-Curve Precision and Recall (AUC-PR) score of 99%, indicating its superior ability to balance precision and recall. These results underscore HSP’s efficiency in feature extraction and its capability to handle complex, imbalanced datasets, making it a highly effective method for DNA sequence classification in CNN architectures.
[1] Agarwal K. and Dixit M., “Scrupulous SCGAN Framework for Recognition of Restored Images with Caffe based PCA Filtration,” The International Arab Journal of Information Technology, vol. 21, no. 1, pp. 107-116, 2024. https://doi.org/10.34028/iajit/21/1/10
[2] Bera S. and Shrivastava V., “Effect of Pooling Strategy on Convolutional Neural Network for Classification of Hyperspectral Remote Sensing Images,” IET Image Process, vol. 14, pp. 480-486, 2020. https://doi.org/10.1049/iet-ipr.2019.0561
[3] Blagus R. and Lusa L., “SMOTE for High- Dimensional Class-Imbalanced Data,” BMC Bioinformatics, vol. 14, no. 106, pp. 1-16, 2013. https://doi.org/10.1186/1471-2105-14-106
[4] Boureau Y.-L., Ponce J., and LeCun Y., “A Theoretical Analysis Of Feature Pooling In Visual Recognition,” in Proceedings of the 27th International Conference on International Conference on Machine Learning, Haifa, pp. 111- 118, 2010. https://www.di.ens.fr/willow/pdfs/icml2010b.pdf
[5] Chawla N., Bowyer K., Hall L., and Kegelmeyer P., “SMOTE: Synthetic Minority Over-Sampling Technique,” Journal of Artificial Intelligence Research, vol. 16, no.1, pp. 321-357, 2002.
[6] Dang T. and Vu T., “Sequence-based Protein- Protein Interaction Prediction Using Multi-kernel Deep Convolutional Neural Networks with Protein Language Model,” bioRxiv, pp. 1-15, 2024. https://doi.org/10.1101/2023.10.03.560728
[7] Dong J., Jiang M., Hu L., and He Z., “Hamming Encoder: Mining Discriminative k-mers for Discrete Sequence Classification,” ArXiv, vol. abs/2310.10321, 2023. https://doi.org/10.48550/arXiv.2310.10321
[8] El-Tohamy A., Maghwary H., and Badr N., “A Deep Learning Approach for Viral DNA Sequence Classification Using Genetic Algorithm,” International Journal of Advanced Computer Science and Applications, vol. 13. no. 8, pp. 530- 538, 2022. DOI:10.14569/IJACSA.2022.0130861
[9] Feng S., Zhou H., and Dong H., “Application of Deep Transfer Learning To Predicting Crystal Structures of Inorganic Substances,” Computational Materials Science, vol. 195, pp. 110476, 2021. https://doi.org/10.1016/j.commatsci.2021.110476
[10] Gunasekaran H., Ramalakshmi K., Arokiaraj A., Kanmani S., and Venkatesan C., “Analysis of DNA Sequence Classification Using CNN and Hybrid Models,” Computational and Mathematical Methods in Medicine, vol. 2021, no. 1835056, 2021. https://doi.org/10.1155/2021/1835056
[11] Hinton G., Srivastava N., Krizhevsky A., Sutskever I., and Salakhutdinov R., “Improving Neural Networks By Preventing Co-Adaptation Of Feature Detectors,” ArXiv Preprnt, arXiv:1207.0580, 2012. https://doi.org/10.48550/arXiv.1207.0580
[12] Jena M., Mishra S., and Mishra D., “Empirical Analysis of Activation Functions and Pooling Layers in CNN for Classification of Diabetic Retinopathy,” in Proceedings of the International Conference on Applied Machine Learning, Bhubaneswar, pp. 34-39, 2019. 852 The International Arab Journal of Information Technology, Vol. 21, No. 3, September 2024 DOI:10.1109/ICAML48257.2019.00014
[13] Kingma D. and Ba J., “Adam: A Method for Stochastic Optimization,” arXiv preprint, arXiv:1412.6980, 2014. https://doi.org/10.48550/arXiv.1412.6980
[14] Koo P. and Eddy S., “Representation Learning of Genomic Sequence Motifs With Convolutional Neural Networks,” PLoS computational biology, vol. 15, no. 12, pp. 1-17, 2019. https://doi.org/10.1371/journal.pcbi.1007560
[15] LeCun Y., Bottou L., Bengio Y., and Haffner P., “Gradient-Based Learning Applied To Document Recognition,” IEEE, vol. 86, no. 11, pp. 2278- 2324, 1998. DOI:10.1109/5.726791
[16] LeCun Y., Bengio Y., and Hinton G., “Deep Learning,” Nature, vol. 521, no. 7553, pp. 436- 444, 2015. DOI:10.1038/nature14539
[17] Lin M., Chen Q., and Yan S., “Network in Network,” arXiv, arXiv:1312.4400, 2014. https://doi.org/10.48550/arXiv.1312.4400
[18] Liu X., Wang M., and Li A., “PhosVarDeep: Deep- Learning Based Prediction of Phospho-Variants Using Sequence Information,” PeerJ, vol. 10, no. e12847, pp. 1-18, 2022. DOI:10.7717/peerj.12847
[19] Mohamed E., Gaber T., Karam O., and Rashed E., “A Novel CNN Pooling Layer for Breast Cancer Segmentation and Classification From Thermograms,” PLOS ONE, vol. 17, no. 10, pp. e0276523, 2022. DOI:10.1371/journal.pone.0276523
[20] Mohammed K., Boyapati S., Kandimalla M., Kavati M., and Saleti S., “A Comparative Analysis of the Evolution of DNA Sequencing Techniques along with the Accuracy Prediction of a Sample DNA Sequence Dataset using Machine Learning,” in Proceeding of the 2nd International Conference on Paradigm Shifts in Communications Embedded Systems, Machine Learning and Signal Processing, Nagpurpp, 1-5, 2023. DOI:10.1109/PCEMS58491.2023.10136116
[21] More Y., Dumbre K., and Shiragapur B., “Horizontal Max Pooling a Novel Approach for Noise Reduction in Max Pooling for Better Feature Detect,” in Proceedings of the International Conference on Emerging Smart Computing and Informatics, Pune, pp. 1-5, 2023. DOI:10.1109/ESCI56872.2023.10099648.
[22] Parhami P., Fateh M., Rezvani M., and Rokny H., “A Benchmarking of Deep Neural Network Models for Cancer Subtyping Using Single Point Mutations,” Journal of Ambient Intelligence and Humanized Computing, vol. 14, no. 8, pp. 1-14, 2022. https://doi.org/10.1101/2022.07.24.501264
[23] Passricha V., and Aggarwal R., “A Comparative Analysis Of Pooling Strategies for Convolutional Neural Network Based Hindi ASR,” Journal of Ambient Intelligence and Humanized Computing, vol. 11, pp. 675-691, 2020. https://doi.org/10.1007/s12652-019-01325-y
[24] Soffer S., Ben-Cohen A., Shimon O., Amitai M. M., and Greenspan H., “Convolutional Neural Networks for Radiologic Images: A Radiologist's Guide,” Radiology, vol. 290, no. 3, pp. 590-606, 2019. DOI: 10.1148/radiol.2018180547
[25] Soliman N., Abd-Alhalem S., Ismaiel N., and El- Samie F., “An Improved Convolutional Neural Network Model for DNA Classification,” Computers, Materials and Continua, vol. 70, no. 3, pp. 5907-5927, 2022. DOI:10.32604/cmc.2022.018860
[26] Struhl K. and Segal E., “Determinants of nucleosome positioning,” Nature Structural and Molecular Biology, vol. 20, no. 3, pp. 267-273, 2013. https://doi.org/10.1038/nsmb.2506
[27] Sun S., Hu B., Yu Z., and Song X., “A Stochastic Max Pooling Strategy for Convolutional Neural Network Trained by Noisy Samples,” International Journal of Computers, Communications and Control, vol. 15, no. 1, 2020. DOI:10.15837/ijccc.2020.1.3712
[28] Voytetskiy A., Herbert A., and Poptsova M., “Graph Neural Networks for Z-DNA prediction in Genomes,” in Proceedings of the IEEE International Conference on Bioinformatics and Biomedicine, Las Vegas, pp. 3173-3178, 2024. DOI:10.1109/BIBM55620
[29] Wisesty U., Purwarianti A., Pancoro A., Chattopadhyay A., Phan N., and Chuang E., “Join Classifier of Type and Index Mutation on Lung Cancer DNA Using Sequential Labeling Model,” IEEE Access, vol. 10, pp. 9004-9021, 2022. DOI:10.1109/ACCESS.2022.3142925
[30] Yu C., Hung P., Hong J., and Chiang H., “Efficient Max Pooling Architecture with Zero-Padding for Convolutional Neural Networks,” in Proceedings of the IEEE 12th Global Conference on Consumer Electronics, Nara, pp. 747-748, 2023. DOI:10.1109/GCCE59613
[31] Yu D., Wang H., Chen P., and Wei Z., “Mixed Pooling for Convolutional Neural Networks,” Rough Sets and Knowledge Technology, vol. 8818, pp. 364-375, 2014. https://doi.org/10.1007/978-3- 319-11740-9_34
[32] Zafar A., Aamir M., Mohd Nawi N., Arshad A., and Riaz S., “A Comparison of Pooling Methods for Convolutional Neural Networks,” Applied Sciences, vol. 12, no. 17, pp. 8643, 2022. https://doi.org/10.3390/app12178643
[33] Zhao L. and Zhang Z., “A Improved Pooling Method for Convolutional Neural Networks,” Scientific Reports, vol. 14, no. 1, pp. 1589, 2024. https://doi.org/10.1038/s41598-024-51258-6