The International Arab Journal of Information Technology (IAJIT)


Correlation Dependencies between Variables in Feature Selection on Boolean Symbolic Objects

Djamal Ziani,
Feature selection is an important process in data analysis and data mining. The increasing size, complexity, and multi-valued nature of data necessitate the use of Symbolic Data Analysis (SDA), which utilizes symbolic objects instead of classical tables, for data analysis. The symbolic objects are created by using abstraction or generalization techniques on individuals. They are a representation of concepts or clusters. To improve the description of these objects, and to eliminate incoherencies and over-generalization, using dependencies between variables is crucial in SDA. This study shows how correlation dependencies between variables can be processed on Boolean Symbolic Objects (BSOs) in feature selection. A new feature selection criterion that considers the dependencies between variables, and a method of dealing with computation complexity is also presented.

[1] Ben Bassat M. and Zaidenberg L., “Contextual Template Matching: A Distance Measure for Patterns with Hierarchically Dependent Features,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. Pami-6, no. 2, pp. 201-211, 1984.

[2] Csernel M. and De Carvalho F., “On Memory Requirement with Normal Symbolic Form,” in Proceedings of Exploratory Data Analysis in Empirical Research. Springer, Berlin, pp. 22-30, 2002.

[3] Dale M., Numerical Syntaxonomy, Springer Netherlands, 1989.

[4] De Carvalho F., “Proximity Coefficients between Boolean Symbolic Objects,” in Proceedings of New Approaches in Classification and Data Analysis, Berlim, 387-394, 1994.

[5] De Carvalho F., “Extension Based Proximity Coefficients Between Constrained Boolean Symbolic Objects,” in Proceedings of the 5th Conference of the International Federation of Classification Societies, Kobe Berlin, pp. 370- 378, 1998.

[6] De Carvalho F., Csernel M., and Lechevallier Y., “Clustering Constrained Symbolic Data,” Pattern Recognition Letters, vol. 30, no. 11, pp. 1037- 1045, 2009.

[7] Diday E., “An Introduction to Symbolic Data Analysis,” in Proceedings of the 4th International Conference of the Federation of Classification Societies, Paris, pp. 53-55,1993.

[8] Gao Y., Koehn P., and Birch A., “Soft dependency Constraints for Reordering in Hierarchical Phrase-Based Translation,” in Proceedings of the Conference on Empirical Methods in Natural Language Processing, Edinburgh, pp. 857-868, 2011.

[9] Gower J., “A General Coefficient of Similarity and Some of its Properties,” Biometrics, vol. 27, no. 4, pp. 857-871, 1971.

[10] Gross S. and Huber C., “Hierarchical Dependency Models for Multivariate Survival Data With Censoring,” Lifetime Data Analysis, vol. 6, no. 4, pp. 299-320, 2000.

[11] He C. and Jeng J., “Feature Selection of Weather Data with Interval Principal Component Analysis,” in Proceedings of international Conference on System Science and Engineering, Puli, pp. 1-4, 2016.

[12] Kiranagi B., Guru D., and Gudivada V., “Unsupervised Feature Selection Scheme for Clustering of Symbolic Data Using The Multivalued Type Similarity Measure,” in Proceedings of the 2nd Workshop on Feature Selection for Data Mining, Bethesda, pp. 67-74, 2006.

[13] Klin B. and Sassone V., “Structural Operational Semantics for Stochastic and Weighted Transition Systems,” Information and Computation, vol. 227, pp. 58-83, 2013.

[14] Kosmelj K., Le-Rademacher J., and Billard L. “Symbolic Covariance Matrix for Interval- Correlation Dependencies between Variables in Feature Selection on Boolean Symbolic Objects 1073 Valued Variables and its Application to Principal Component Analysis: A Case Study,” Metodoloski Zvezki, vol. 11, no. 1, pp. 1-20, 2014.

[15] Michalski R., Knowledge Acquisition Through Conceptual Clustering: A Theoretical Framework and An Algorithm for Partitioning Data into Conjunctive Concepts,” International Journal of Policy Analysis and Information Systems, vol. 4, 219-244, 1980.

[16] Nagoya A., Ono Y., and Ichino M., “Detection of Chain Structures Embedded In Multidimensional Symbolic Data,” Pattern Recognition Letters, vol. 30, no. 11, pp. 951-959, 2009.

[17] Pankhurst R., Practical Taxonomic Computing, Cambridge University Press, 1991.

[18] Sneath P., Numerical Taxonomy, Springer, 2005.

[19] Tlemsani R. and Benyettou A., “On Line Isolated Characters Recognition Using Dynamic Bayesian Networks,” The International Arab Journal of Information Technology, vol. 8, no. 4, pp. 406- 413, 2011.

[20] Vignes R., “Caractérisation Automatique de Groupes Biologiques,” Doctorat Thesis, Paris VI University, 1991.

[21] Ziani D., “Feature Selection on Boolean Symbolic Objects,” International journal of Computer Science and Information Technology, vol. 5, no. 6, pp. 1-20, 2013.

[22] Ziani D., “Feature Selection on Probabilistic Symbolic Objects,” Frontiers of Computer Science, vol. 8, no. 6, pp. 933-947, 2014.

[23] Ziani D., “Sélection De Variables Sur Un Ensemble D’objets Symboliques: Traitement Des Dépendances Entre Variables,” University of Paris Dauphine, Dissertation for the Doctoral Degree (in French), Paris, 1996.

[24] Ziani D., “Variable Hierarchical Dependencies in Feature Selection on Boolean Symbolic Objects,” in Proceedings of 6th International Conference of Soft Computing and Pattern Recognition, Tunisia, pp. 11-16, 2014. Djamal Ziani is an associate professor at Al Yamamah University, in Management Information Systems since 2019. He was Associate Professor at King Saud University in the Computer Sciences and Information Systems College from 2009 to 2018. Dr. Djamal is a researcher in ERP and in the data management group. He received a Master’s degree in Computer Sciences from the University of Valenciennes, France in 1992, and Ph.D. in Computer Science from the University of Paris Dauphine, France in 1996. He has been a consultant and project manager in many companies in Canada, such as SAP, Bombardier Aerospace, and Montreal Stock Exchange, from 1998 to 2009.