eXtensible Markup Language (XML) has become one of the de facto standards of data exchange and representation
in many applications. An XML document is usually to o complex and large to understand and use for a hum an being. A
summarized XML document of the original document is useful in such cases. Three standards are given to evaluate the final
summarized XML document: document size, information content, and information importance. A framework of summarizing an
XML document based both on the document itself and the schema is given, which applies schema to summarize XML
documents because there are many important semantic and structural information implied by the schema. In our framework,
redundant data are first removed by abnormal functi onal dependencies and schema structure. Then tags a nd values of the
XML document are summarized based on the document i tself and schema. Our framework is a semi-automatic approach which
can help users to summarize an XML document in the sense that some parameters must be specified by the users. Experiments
show that the framework can make the summarized XML document has a good balance of document size, information content,
and information importance comparing with the origi nal one.
[1] Amini M., Tombros A., Usunier N., and Lalmas M., Learning [2] Buneman P., Davidson S., Fan W., Hara C., and Tan W., Keys for XML, Computer Networks, vol. 39, no. 5, pp. 473<487, 2002.
[3] Dalamagas T., Cheng T., Winkel K., and Sellis T., A Methodology for Clustering XML Documents by Structure, Information Systems, vol. 31, no. 3, pp. 187<228, 2006.
[4] DBLP, available at: http://dblp.uni [5] Dilek B. and Sanjay M., Entropy as a Measure of Quality of XML Schema Document, The International Arab Journal of Information Technology , vol. 8, no. 1, pp. 75<83, 2011.
[6] Fischer G. and Campista I., A Template [7] Freire J., Haritsa J., Ramanath M., and Simon J., StatiX: Making XML Count, in Proceedings of the International Conference on Management of Data , USA, pp. 181<191, 2002.
[8] Hahn U. and Mani I., The Challenges of Automatic Summarization, Journal of Computer , vol. 33, no. 11, pp. 29<36, 2000.
[9] League C. and Eng K., Type [10] Lv T., Gu N., and Yan P., Normal forms for XML Documents, Information and Software Technology , vol. 46, no. 12, pp. 839<846, 2004.
[11] Maneth S., Mihaylov N., and Sakr S., XML Tree Structure Compression, in Proceedings of the 3 rd International Workshop on XML Data Management Tools and Techniques , Italy, pp. 243<247, 2008.
[12] Mayorga V. and Polyzotis N., Sketch [13] Polyzotis N., Garofalakis M., and Ioannidis Y., Approximate XML Query Answers., in Proceedings of SIGMOD International Conference on Management of Data , France, pp. 263<274, 2004.
[14] Ramanath M. and Kumar K., A Rank [15] W3C, Extensible Markup Language, available at: http://www.w3.org/XML/, last visited 2011.
[16] Wang W., Jiang H., Lu H., and Yu J., Bloom Histogram: Path Selectivity Estimation for XML Data with Updates, in Proceedings of the 30th International Conference on Very Large Data Bases VLDB, Canada, pp. 240<251, 2004.
[17] Yu C. and Jagadish H., Schema Summarization, in Proceedings of the 32 nd International Conference on Very Large Data Bases VLDB , Korea, pp. 319<330, 2006.
[18] Zhang N., Ozsu T., Aboulnaga A., and Ilyas I., Xseed: Accurate and Fast Cardinality Estimation for XPath Queries, in Proceedings of the 2 2nd International Conference on ICDE , USA, pp. 61, 2006. A Framework of Summarizing XML Documents with Schemas 27 Teng Lv received his PhD degree from Fudan University, China. His research interests include database and XML data management. He is the author or coauthor of more than 50 journal papers or reviewed conference papers. He is the reviewers or PC members of several journals and conferences both at home and abroad. Ping Yan received her PhD degree from Fudan University, China. Her research interests include partial differential equations and their applications in neural network and epidemic diseases, databases, and XML data management.
Cite this
Teaching and Research Section of Computer, Army Off icer Academy, China 2School of Science, Anhui Agricultural University, China, "A Framework of Summarizing XML Documents with Schemas ", The International Arab Journal of Information Technology (IAJIT) ,Volume 10, Number 01, pp. 80 - 89, January 2013, doi: puter is not good at doing such .
@ARTICLE{3327,
author={Teaching and Research Section of Computer, Army Off icer Academy, China 2School of Science, Anhui Agricultural University, China},
journal={The International Arab Journal of Information Technology (IAJIT)},
title={A Framework of Summarizing XML Documents with Schemas },
volume={10},
number={01},
pages={80 - 89},
doi={puter is not good at doing such },
year={1970}
}
TY - JOUR
TI - A Framework of Summarizing XML Documents with Schemas
T2 -
SP - 80
EP - 89
AU - Teaching and Research Section of Computer
AU - Army Off icer Academy
AU - China 2School of Science
AU - Anhui Agricultural University
AU - China
DO - puter is not good at doing such
JO - The International Arab Journal of Information Technology (IAJIT)
IS - 9
SN - 2413-9351
VO - 10
VL - 10
JA -
Y1 - Jan 1970
ER -
PY - 1970
Teaching and Research Section of Computer, Army Off icer Academy, China 2School of Science, Anhui Agricultural University, China, " A Framework of Summarizing XML Documents with Schemas ", The International Arab Journal of Information Technology (IAJIT) ,Volume 10, Number 01, pp. 80 - 89, January 2013, doi: puter is not good at doing such .
Abstract: eXtensible Markup Language (XML) has become one of the de facto standards of data exchange and representation
in many applications. An XML document is usually to o complex and large to understand and use for a hum an being. A
summarized XML document of the original document is useful in such cases. Three standards are given to evaluate the final
summarized XML document: document size, information content, and information importance. A framework of summarizing an
XML document based both on the document itself and the schema is given, which applies schema to summarize XML
documents because there are many important semantic and structural information implied by the schema. In our framework,
redundant data are first removed by abnormal functi onal dependencies and schema structure. Then tags a nd values of the
XML document are summarized based on the document i tself and schema. Our framework is a semi-automatic approach which
can help users to summarize an XML document in the sense that some parameters must be specified by the users. Experiments
show that the framework can make the summarized XML document has a good balance of document size, information content,
and information importance comparing with the origi nal one. URL: https://iajit.org/paper/3327
@ARTICLE{3327,
author={Teaching and Research Section of Computer, Army Off icer Academy, China 2School of Science, Anhui Agricultural University, China},
journal={The International Arab Journal of Information Technology (IAJIT)},
title={A Framework of Summarizing XML Documents with Schemas },
volume={10},
number={01},
pages={80 - 89},
doi={puter is not good at doing such },
year={1970}
,abstract={eXtensible Markup Language (XML) has become one of the de facto standards of data exchange and representation
in many applications. An XML document is usually to o complex and large to understand and use for a hum an being. A
summarized XML document of the original document is useful in such cases. Three standards are given to evaluate the final
summarized XML document: document size, information content, and information importance. A framework of summarizing an
XML document based both on the document itself and the schema is given, which applies schema to summarize XML
documents because there are many important semantic and structural information implied by the schema. In our framework,
redundant data are first removed by abnormal functi onal dependencies and schema structure. Then tags a nd values of the
XML document are summarized based on the document i tself and schema. Our framework is a semi-automatic approach which
can help users to summarize an XML document in the sense that some parameters must be specified by the users. Experiments
show that the framework can make the summarized XML document has a good balance of document size, information content,
and information importance comparing with the origi nal one.},
keywords={XML, document summarization, schema, key, functiona l dependency},
ISSN={2413-9351},
month={Jan}}
TY - JOUR
TI - A Framework of Summarizing XML Documents with Schemas
T2 -
SP - 80
EP - 89
AU - Teaching and Research Section of Computer
AU - Army Off icer Academy
AU - China 2School of Science
AU - Anhui Agricultural University
AU - China
DO - puter is not good at doing such
JO - The International Arab Journal of Information Technology (IAJIT)
IS - 9
SN - 2413-9351
VO - 10
VL - 10
JA -
Y1 - Jan 1970
ER -
PY - 1970
AB - eXtensible Markup Language (XML) has become one of the de facto standards of data exchange and representation
in many applications. An XML document is usually to o complex and large to understand and use for a hum an being. A
summarized XML document of the original document is useful in such cases. Three standards are given to evaluate the final
summarized XML document: document size, information content, and information importance. A framework of summarizing an
XML document based both on the document itself and the schema is given, which applies schema to summarize XML
documents because there are many important semantic and structural information implied by the schema. In our framework,
redundant data are first removed by abnormal functi onal dependencies and schema structure. Then tags a nd values of the
XML document are summarized based on the document i tself and schema. Our framework is a semi-automatic approach which
can help users to summarize an XML document in the sense that some parameters must be specified by the users. Experiments
show that the framework can make the summarized XML document has a good balance of document size, information content,
and information importance comparing with the origi nal one.