The International Arab Journal of Information Technology (IAJIT)

..............................
..............................
..............................


Mapping XML to Inverted Indexed Circular

Extensible Markup Language (XML) has become the de facto standard for data exchange on the World Wide Web and is widely used in many fields, so it is urgent to develop some efficient methods to manage, store, and query XML data. Traditional methods use relational databases to store XML data which take advantage of mature technologies of relational databases. But it needs to map XML schemas to relational schemas, then rewrite XML queries to SQL queries, and finally, transform returned SQL-style results to XML-style results again. One possible solution to this is to store XML data directly and query it directly by XML query languages. In this paper, we research the problem of how to map XML data so that storing and querying it can be efficient. We propose the following framework to gain the goal: Firstly, we map a given XML data tree to a set of inverted indexed circular list, in which the relationships between parent and child nodes (and also ancestor and descendent nodes) are preserved. Then, an XML schema tree is used to guide and improve the efficiency of querying the corresponding XML data tree, which is generated from the given XML data tree. Finally, an efficient algorithm is given to query the XML data tree by using the corresponding set of inverted indexed circular list and its schema. The algorithms analysis and experiments prove the efficiency of our method over naïve method.


[1] Amer-Yahia S., Cho S., Lakshmanan L., and Srivastava D., Tree Pattern Query Minimization, The International Journal on Very Large Data Bases, vol. 11, no. 4, pp. 315- 331, 2002.

[2] Arenas M. and Libkin L., A Normal Form for XML Documents, ACM Transactions on Database Systems, vol. 29, no. 1, pp. 195-232, 2004.

[3] Arion A., Bonifati A., Manolescu I., and Pugliese A., Path Summaries and Path Partitioning in Modern XML Databases, World Wide Web, vol. 11, no. 1, pp. 117-151, 2008.

[4] Beyer K., Cochrane R., Josifovski V., Kleewein J., Lapis G., Lohman G., Lyle R., zcan F., Pirahesh H., Seemann N., Truong T., Linden B., Vickery B., and Zhang C., System RX: One Part Relational, One Part XML, in Proceeding of the ACM SIGMOD International Conference on Management of Data, Maryland, pp. 347-358, 2005.

[5] Chen L., Bernstein P., Carlin P., Filipovic D., Rys M., Shamgunov N., Terwilliger J., Todic M., Tomasevic S., and Tomic D., Mapping XML To A Wide Sparse Table, in Proceeding of IEEE 28th International Conference on Data Engineering, Washington, pp. 630-641, 2012.

[6] Chen Z., Jagadish H., Lakshanan L., and Paparizos S., From Tree Patterns to Generalized Tree Patterns: on Efficient Evaluation of Xquery, in Proceeding of 29th International Conference on Very Large Data Bases, Berlin, pp. 237-248, 2003.

[7] Deutsch A., Fernandez M., and Suciu D., Storing Semistructured Data With STORED, in Proceeding of ACM SIGMOD International Conference on Management of Data, Pennsylvania, pp. 431-442, 1999.

[8] Georgiadis H. and Vassalos V., Xpath on Steroids: Exploiting Relational Engines for Xpath Performance, in Proceeding of the ACM SIGMOD International Conference on Management of Data, Beijing, pp. 317-328, 2007.

[9] Lin R., Chang Y., and Chao K., A Compact and Efficient Labeling Scheme for XML Documents, in Proceeding of 18th International Conference on Database Systems for Advanced Applications, Wuhan, pp. 269-283, 2013.

[10] Lin X., Wang N., Zeng X., and Sun Y., XML Normalization Based on Entity Segments, Information Sciences, vol. 239, pp. 85-95, 2013.

[11] Lv T. and Yan P., A Framework of Summarizing XML Documents with Schemas, The International Arab Journal of Information Technology, vol. 10, no. 1, pp. 18-27, 2013.

[12] Lv T. and Yan P., Mapping Dtds to Relational Schemas with Semantic Constraints, Information and Software Technology, vol. 48, no. 4, pp. 245-252, 2006. Mapping XML to Inverted Indexed Circular Linked Lists 361

[13] Murthy R., Liu Z., Krishnaprasad M., Chandrasekar S., Tran A., Sedlar E., Florescu D., Kotsovolos S., Agarwal N., Arora V., and Krishnamurthy V., Towards an Enterprise XML Architecture, in Proceeding of the ACM SIGMOD International Conference on Management of Data, Baltimore, pp. 953-957, 2005.

[14] Schmidt A., Keysten M., Windhouwer M., and Wass F., Efficient Relational Storage and Retrieval of XML Documents, in Proceeding of the Third International Workshop on the Web and Databases, Dallas, pp. 137-150, 2000.

[15] Software AG., http://www1.softwareag.com/ corporate/products/tamino/default.asp, Last Visited 2014.

[16] Sonic Software Corporation. http:// www.sonicsoftware.com/products/sonic_xml_ser ver/index.ssp, Last Visited 2014.

[17] Tatarinov I., Viglas S., Beyer K., Shanmugasundaram J., Shekita E., and Zhang C., Storing and Querying Ordered XML Using a Relational Database System, in Proceeding of the ACM SIGMOD International Conference On Management of Data, Madison, pp. 204-215, 2002.

[18] Teng L., Ning G., and Ping Y., Normal Forms for XML Documents, Information and Software Technology, vol. 46, no. 12, pp. 839-846, 2004.

[19] Yu C. and Jagadish H., XML Schema Refinement through Redundancy Detection and Normalization, The VLDB Journal, vol. 17, no. 2, pp. 203-223, 2008. Teng Lv received his PhD degree from Fudan University, China. His research interests include database and data management. He is the author or coauthor of more than 70 journal papers or reviewed conference papers. He is the reviewers or PC members of several journals and conferences both at home and abroad. Ping Yan received her PhD degree from Fudan University, China. Her research interests include partial differential equations and their applications in neural network and epidemic diseases, and data management. Weimin He revieved his PhD degree from University of Texas at Arlington, USA. His research interests include XML data management, information retrieval and Peer-to-Peer computing data management.