A novel XML document structure comparison framework based-on sub-tree commonalities and label semantics

XML similarity evaluation has become a central issue in the database and information communities, its applications ranging over document clustering, version control, data integration and ranked retrieval. Various algorithms for comparing hierarchically structured data, XML documents in particular, h...

Full description

Saved in:
Bibliographic Details
Main Author: Tekli, Joe (author)
Other Authors: Chbeir, Richard (author)
Format: article
Published: 2011
Online Access:http://hdl.handle.net/10725/5084
http://dx.doi.org/10.1016/j.websem.2011.10.002
http://libraries.lau.edu.lb/research/laur/terms-of-use/articles.php
http://www.sciencedirect.com/science/article/pii/S1570826811000825
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1864513465017696256
author Tekli, Joe
author2 Chbeir, Richard
author2_role author
author_facet Tekli, Joe
Chbeir, Richard
author_role author
dc.creator.none.fl_str_mv Tekli, Joe
Chbeir, Richard
dc.date.none.fl_str_mv 2011-11-15
2012
2017-01-27T09:24:13Z
2017-01-27T09:24:13Z
dc.identifier.none.fl_str_mv 1570-8268
http://hdl.handle.net/10725/5084
http://dx.doi.org/10.1016/j.websem.2011.10.002
Tekli, J., & Chbeir, R. (2012). A novel XML document structure comparison framework based-on sub-tree commonalities and label semantics. Web Semantics: Science, Services and Agents on the World Wide Web, 11, 14-40.
http://libraries.lau.edu.lb/research/laur/terms-of-use/articles.php
http://www.sciencedirect.com/science/article/pii/S1570826811000825
dc.language.none.fl_str_mv en
dc.relation.none.fl_str_mv Web Semantics: Science, Services and Agents on the World Wide Web
dc.rights.*.fl_str_mv info:eu-repo/semantics/openAccess
dc.title.none.fl_str_mv A novel XML document structure comparison framework based-on sub-tree commonalities and label semantics
dc.type.none.fl_str_mv Article
info:eu-repo/semantics/publishedVersion
info:eu-repo/semantics/article
description XML similarity evaluation has become a central issue in the database and information communities, its applications ranging over document clustering, version control, data integration and ranked retrieval. Various algorithms for comparing hierarchically structured data, XML documents in particular, have been proposed in the literature. Most of them make use of techniques for finding the edit distance between tree structures, XML documents being commonly modeled as Ordered Labeled Trees. Yet, a thorough investigation of current approaches led us to identify several similarity aspects, i.e., sub-tree related structural and semantic similarities, which are not sufficiently addressed while comparing XML documents. In this paper, we provide an integrated and fine-grained comparison framework to deal with both structural and semantic similarities in XML documents (detecting the occurrences and repetitions of structurally and semantically similar sub-trees), and to allow the end-user to adjust the comparison process according to her requirements. Our framework consists of four main modules for (i) discovering the structural commonalities between sub-trees, (ii) identifying sub-tree semantic resemblances, (iii) computing tree-based edit operations costs, and (iv) computing tree edit distance. Experimental results demonstrate higher comparison accuracy with respect to alternative methods, while timing experiments reflect the impact of semantic similarity on overall system performance.
eu_rights_str_mv openAccess
format article
id LAURepo_1e335ce05cfd366a5fa33f120bb9172d
identifier_str_mv 1570-8268
Tekli, J., & Chbeir, R. (2012). A novel XML document structure comparison framework based-on sub-tree commonalities and label semantics. Web Semantics: Science, Services and Agents on the World Wide Web, 11, 14-40.
language_invalid_str_mv en
network_acronym_str LAURepo
network_name_str Lebanese American University repository
oai_identifier_str oai:laur.lau.edu.lb:10725/5084
publishDate 2011
repository.mail.fl_str_mv
repository.name.fl_str_mv
repository_id_str
spelling A novel XML document structure comparison framework based-on sub-tree commonalities and label semanticsTekli, JoeChbeir, RichardXML similarity evaluation has become a central issue in the database and information communities, its applications ranging over document clustering, version control, data integration and ranked retrieval. Various algorithms for comparing hierarchically structured data, XML documents in particular, have been proposed in the literature. Most of them make use of techniques for finding the edit distance between tree structures, XML documents being commonly modeled as Ordered Labeled Trees. Yet, a thorough investigation of current approaches led us to identify several similarity aspects, i.e., sub-tree related structural and semantic similarities, which are not sufficiently addressed while comparing XML documents. In this paper, we provide an integrated and fine-grained comparison framework to deal with both structural and semantic similarities in XML documents (detecting the occurrences and repetitions of structurally and semantically similar sub-trees), and to allow the end-user to adjust the comparison process according to her requirements. Our framework consists of four main modules for (i) discovering the structural commonalities between sub-trees, (ii) identifying sub-tree semantic resemblances, (iii) computing tree-based edit operations costs, and (iv) computing tree edit distance. Experimental results demonstrate higher comparison accuracy with respect to alternative methods, while timing experiments reflect the impact of semantic similarity on overall system performance.PublishedN/A2017-01-27T09:24:13Z2017-01-27T09:24:13Z20122011-11-15Articleinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/article1570-8268http://hdl.handle.net/10725/5084http://dx.doi.org/10.1016/j.websem.2011.10.002Tekli, J., & Chbeir, R. (2012). A novel XML document structure comparison framework based-on sub-tree commonalities and label semantics. Web Semantics: Science, Services and Agents on the World Wide Web, 11, 14-40.http://libraries.lau.edu.lb/research/laur/terms-of-use/articles.phphttp://www.sciencedirect.com/science/article/pii/S1570826811000825enWeb Semantics: Science, Services and Agents on the World Wide Webinfo:eu-repo/semantics/openAccessoai:laur.lau.edu.lb:10725/50842024-08-09T09:08:56Z
spellingShingle A novel XML document structure comparison framework based-on sub-tree commonalities and label semantics
Tekli, Joe
status_str publishedVersion
title A novel XML document structure comparison framework based-on sub-tree commonalities and label semantics
title_full A novel XML document structure comparison framework based-on sub-tree commonalities and label semantics
title_fullStr A novel XML document structure comparison framework based-on sub-tree commonalities and label semantics
title_full_unstemmed A novel XML document structure comparison framework based-on sub-tree commonalities and label semantics
title_short A novel XML document structure comparison framework based-on sub-tree commonalities and label semantics
title_sort A novel XML document structure comparison framework based-on sub-tree commonalities and label semantics
url http://hdl.handle.net/10725/5084
http://dx.doi.org/10.1016/j.websem.2011.10.002
http://libraries.lau.edu.lb/research/laur/terms-of-use/articles.php
http://www.sciencedirect.com/science/article/pii/S1570826811000825