
Sensitive Data Detection of Social Network Based on Improved Random Forest Algorithm
The characteristics of social network sensitive data are complex, which leads to the difficulty of detecting social network sensitive data, so to study the sensitive data detection method of social network based on improved Random Forest (RF) algorithm. Simulate login to social network, and capture social network information by means of web crawler and collector. The Topology-Based Hierarchical Trait (TBHT) topology feature logic algorithm optimized by Naive Bayesian (NB) algorithm is used to extract sensitive data features of social networks from social network information. The RF algorithm is improved by adaptive node splitting, and a sensitive data detection model based on the improved RF algorithm is built by combining the characteristics of social network sensitive data. Social network information is input into the model, and relevant detection results are obtained. The experimental results show that the data acquisition mode using web crawler and collector runs stably and has a large amount of data acquisition, and the extracted data features are efficient. The accuracy of the improved RF algorithm in data classification is more than 97.5%. Therefore, this method is a powerful and practical method for detecting sensitive data of social networks.
[1] Akinyelu A., “Advances in Spam Detection for Email Spam, Web Spam, Social Network Spam, and Review Spam: ML-Based and Nature- Inspired-Based Techniques,” Journal of Computer Security, vol. 29, no. 5, pp. 473-529, 2021. https://doi.org/10.3233/JCS-210022
[2] Atroszko P., Abiddine F., Malik S., Mamun M., and et al., “Lack of Measurement Invariance in a Widely Used Facebook Addiction Scale May Thwart Progress in Research on Social-Network- Use Disorder: A Cross-Cultural Study,” Computers in Human Behavior, vol. 128, no. 3, pp. 107132, 2022. https://psycnet.apa.org/doi/10.1016/j.chb.2021.10 7132
[3] Bhari P., “Use of Machine Learning and Detect Fake Profiles in a Social Media Network,” ECS Transactions, vol. 107, no. 1, pp. 11905-11911, 2022. https://doi.org/10.1149/10701.11905ecst
[4] Cho S. and Kim H., “Privacy Preserving Authenticated Key Agreement Based on Bilinear Pairing for uHealthcare,” The International Arab Journal of Information Technology, vol. 18, no. 4, pp. 523-530, 2021. DOI:10.34028/18/4/4
[5] Chuanxing S., Rong Z., and Lixiu S., “Research on Keyword Matching Retrieval of Web Crawler Based on Python Language,” Computer Simulation, vol. 40, no. 3, pp. 504-507, 2023. https://doi.org/10.3969/j.issn.1006- 9348.2023.03.095 Sensitive Data Detection of Social Network Based on Improved Random Forest Algorithm 231
[6] Fleming Z., “Using Virtual Outcrop Models and Google Earth to Teach Structural Geology Concepts,” Journal of Structural Geology, vol. 156, no. 3, pp. 104537, 2022. https://doi.org/10.1016/j.jsg.2022.104537
[7] Ghaleb S., Mohamad M., Fadzli S., and Ghanem W., “E-mail Spam Classification Using Grasshopper Optimization Algorithm and Neural Networks,” Computers, Materials, Continua, vol. 71, no. 3, pp. 4749-4766, 2022. https://doi.org/10.32604/cmc.2022.020472
[8] Hatcher W., Qian C., Gao W., Liang F., and et al., “Towards Efficient and Intelligent Internet of Things Search Engine,” IEEE Access, vol. 9, pp. 15778-15795, 2021. https://doi.org/10.1109/ACCESS.2021.3052759
[9] Idocin J., Betanzos A., Cordon O., Bustince H., and Minarova M., “Community Detection and Social Network Analysis based on the Italian Wars of the 15th Century,” Future Generation Computer Systems, vol. 113, pp. 25-40, 2020. https://doi.org/10.1016/j.future.2020.06.030
[10] Khan N., Ray R., Zhang S., Osabuohien E., and Ihtisham M., “Influence of Mobile Phone and Internet Technology on Income of Rural Farmers: Evidence from Khyber Pakhtunkhwa Province, Pakistan,” Technology in Society, vol. 68, no. 2, pp. 101866, 2022. https://doi.org/10.1016/j.techsoc.2022.101866
[11] Liu D., Dai Q., Tang X., Zhang R., and et al., “An Improved Random Forest-Based Operation Duration Prediction of Long-Distance Tunnel Construction Considering Geological Uncertainty,” Journal of Computing in Civil Engineering, vol. 39, no. 2, pp. 1-15, 2025. https://doi.org/10.1061/JCCEE5.CPENG-6041
[12] Liu J., Li X., Zhang Q., and Zhong G., “A Novel Focused Crawler Combining Web Space Evolution and Domain Ontology,” Knowledge- Based Systems, vol. 243, pp. 108495, 2022. https://doi.org/10.1016/j.knosys.2022.108495
[13] Martindale N., Stewart S., Mcgirl N., Adams M., and et al., “Enabling Computation on Sensitive Data in International Safeguards with Privacy- Preserving Encryption Techniques,” Journal of Nuclear Materials Management, vol. 49, no. 2, pp. 16-25, 2021. https://www.osti.gov/biblio/1827049
[14] Meissa M., Benharzallah S., Kahloul L., and Kazar O., “A Personalized Recommendation for Web API Discovery in Social Web of Things,” The International Arab Journal of Information Technology, vol. 18, no. 3A, pp. 438-445, 2021. DOI:10.34028/iajit/18/3A/7
[15] Norman J., “Duplicate Sensitive Data Aggregation in Heterogeneous WSN,” International Journal of Computational Physical Sciences, vol. 15, no. 2, pp. 131-146, 2024.
[16] Rajaraman P. and Prakash M., “Intelligent Deep Learning Based Bidirectional Long Short Term Memory Model for Automated Reply of E-mail Client Prototype,” Pattern Recognition Letters, vol. 152, no. 12, pp. 340-347, 2021. https://doi.org/10.1016/j.patrec.2021.10.021
[17] Rebhi W., Yahia N., and Saoud N., “Stable Communities Detection Method for Temporal Multiplex Graphs: Heterogeneous Social Network Case Study,” The Computer Journal, vol. 64, no. 3, pp. 418-431, 2020. https://doi.org/10.1093/comjnl/bxaa162
[18] Rodrigues A., Villela M., and Feitosa E., “A Systematic Mapping Study on Social Network Privacy: Threats and Solutions,” ACM Computing Surveys, vol. 56, no. 7, pp. 1-29, 2024. https://doi.org/10.1145/3645086
[19] Sun J. and Gloor P., “E-mail Network Patterns and Body Language Predict Risk-Taking Attitude,” Future Internet, vol. 13, no. 1, pp. 17-29, 2021. https://doi.org/10.3390/fi13010017
[20] Tao L. and Xue X., “An Improved Random Forest Model to Predict Bond Strength of FRP-to- Concrete,” Journal of Civil Engineering and Management, vol. 30, no. 6, pp. 250-535, 2024. https://doi.org/10.3846/jcem.2024.21636
[21] Thriveni M., Rao M., and Giribabu S., “Combining the Behaviour of User and Relationships to Predicting the Links in Social Networks,” AIP Conference Proceedings, vol. 2512, no. 1, pp. 5-16, 2024. https://doi.org/10.1063/5.0140368
[22] Wood J. and Schalkwyk I., “Reproducibility in Transportation Research: Importance, Best Practices, and Dealing with Protected and Sensitive Data,” Journal of Transportation Technologies, vol. 15, no. 1, pp. 179-202, 2025. https://doi.org/10.4236/jtts.2025.151010
[23] Yokotani K. and Takano M., “Predicting Cyber Offenders and Victims and their Offense and Damage Time from Routine Chat Times and Online Social Network Activities,” Computers in Human Behavior, vol. 128, no. 3, pp. 10709, 2022. https://doi.org/10.1016/j.chb.2021.107099