Structural similarity evaluation between XML documents and DTDs

The automatic processing and management of XML-based data are ever more popular research issues due to the increasing abundant use of XML, especially on the Web. Nonetheless, several operations based on the structure of XML data have not yet received strong attention. Among these is the process of m...

Full description

Saved in:
Bibliographic Details
Main Author: Tekli, J. (author)
Other Authors: Chbeir, R. (author), Yetongnon, K. (author)
Format: conferenceObject
Published: 2007
Online Access:http://hdl.handle.net/10725/5859
https://doi.org/10.1007/978-3-540-76993-4_17
http://libraries.lau.edu.lb/research/laur/terms-of-use/articles.php
https://link.springer.com/chapter/10.1007/978-3-540-76993-4_17
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:The automatic processing and management of XML-based data are ever more popular research issues due to the increasing abundant use of XML, especially on the Web. Nonetheless, several operations based on the structure of XML data have not yet received strong attention. Among these is the process of matching XML documents with XML grammars, useful in various applications such as documents classification, retrieval and selective dissemination of information. In this paper, we propose an algorithm for measuring the structural similarity between an XML document and a Document Type Definition (DTD) considered as the simplest way for specifying structural constraints on XML documents. We consider the various DTD operators that designate constraints on the existence, repeatability and alternativeness of XML elements/attributes. Our approach is based on the concept of tree edit distance, as an effective and efficient means for comparing tree structures, XML documents and DTDs being modeled as ordered labeled trees. It is of polynomial complexity, in comparison with existing exponential algorithms. Classification experiments, conducted on large sets of real and synthetic XML documents, underline our approach effectiveness, as well as its applicability to large XML repositories and databases. © Springer-Verlag Berlin Heidelberg 2007.