Evaluation/Évaluation in areas (2024-03-15)
Luisa Werner, Pierre Genevès, Nabil Layaïda, Jérôme Euzenat, Damien Graux, Reproduce, replicate, reevaluate: the long but safe way to extend machine learning methods, in: Proc. 38th AAAI Conference on Artificial Intelligence (AAAI), Vancouver (CA), 2024
Reproducibility is a desirable property of scientific research. On the one hand, it increases confidence in results. On the other hand, reproducible results can be extended on a solid basis. In rapidly developing fields such as machine learning, the latter is particularly important to ensure the reliability of research. In this paper, we present a systematic approach to reproducing (using the available implementation), replicating (using an alternative implementation) and reevaluating (using different datasets) state-of-the-art experiments. This approach enables the early detection and correction of deficiencies and thus the development of more robust and transparent machine learning methods. We detail the independent reproduction, replication, and reevaluation of the initially published experiments with a method that we want to extend. For each step, we identify issues and draw lessons learned. We further discuss solutions that have proven effective in overcoming the encountered problems. This work can serve as a guide for further reproducibility studies and generally improve reproducibility in machine learning.
Jérôme David, Jérôme Euzenat, Pierre Genevès, Nabil Layaïda, Evaluation of query transformations without data, in: Proc. WWW workshop on Reasoning on Data (RoD), Lyon (FR), pp1599-1602, 2018
Query transformations are ubiquitous in semantic web query processing. For any situation in which transformations are not proved correct by construction, the quality of these transformations has to be evaluated. Usual evaluation measures are either overly syntactic and not very informative ---the result being: correct or incorrect--- or dependent from the evaluation sources. Moreover, both approaches do not necessarily yield the same result. We suggest that grounding the evaluation on query containment allows for a data-independent evaluation that is more informative than the usual syntactic evaluation. In addition, such evaluation modalities may take into account ontologies, alignments or different query languages as soon as they are relevant to query evaluation.
Manel Achichi, Michelle Cheatham, Zlatan Dragisic, Jérôme Euzenat, Daniel Faria, Alfio Ferrara, Giorgos Flouris, Irini Fundulaki, Ian Harrow, Valentina Ivanova, Ernesto Jiménez-Ruiz, Kristian Kolthoff, Elena Kuss, Patrick Lambrix, Henrik Leopold, Huanyu Li, Christian Meilicke, Majid Mohammadi, Stefano Montanelli, Catia Pesquita, Tzanina Saveta, Pavel Shvaiko, Andrea Splendiani, Heiner Stuckenschmidt, Élodie Thiéblin, Konstantin Todorov, Cássia Trojahn dos Santos, Ondřej Zamazal, Results of the Ontology Alignment Evaluation Initiative 2017, in: Pavel Shvaiko, Jérôme Euzenat, Ernesto Jiménez-Ruiz, Michelle Cheatham, Oktie Hassanzadeh (eds), Proc. 12th ISWC workshop on ontology matching (OM), Wien (AT), pp61-113, 2017
Ontology matching consists of finding correspondences between semantically related entities of different ontologies. The Ontology Alignment Evaluation Initiative (OAEI) aims at comparing ontology matching systems on precisely defined test cases. These test cases can be based on ontologies of different levels of complexity (from simple thesauri to expressive OWL ontologies) and use different evaluation modalities (e.g., blind evaluation, open evaluation, or consensus). The OAEI 2017 campaign offered 9 tracks with 23 test cases, and was attended by 21 participants. This paper is an overall presentation of that campaign.
Manel Achichi, Michelle Cheatham, Zlatan Dragisic, Jérôme Euzenat, Daniel Faria, Alfio Ferrara, Giorgos Flouris, Irini Fundulaki, Ian Harrow, Valentina Ivanova, Ernesto Jiménez-Ruiz, Elena Kuss, Patrick Lambrix, Henrik Leopold, Huanyu Li, Christian Meilicke, Stefano Montanelli, Catia Pesquita, Tzanina Saveta, Pavel Shvaiko, Andrea Splendiani, Heiner Stuckenschmidt, Konstantin Todorov, Cássia Trojahn dos Santos, Ondřej Zamazal, Results of the Ontology Alignment Evaluation Initiative 2016, in: Pavel Shvaiko, Jérôme Euzenat, Ernesto Jiménez-Ruiz, Michelle Cheatham, Oktie Hassanzadeh, Ryutaro Ichise (eds), Proc. 11th ISWC workshop on ontology matching (OM), Kobe (JP), pp73-129, 2016
Ontology matching consists of finding correspondences between semantically related entities of two ontologies. OAEI campaigns aim at comparing ontology matching systems on precisely defined test cases. These test cases can use ontologies of different nature (from simple thesauri to expressive OWL ontologies) and use different modalities, e.g., blind evaluation, open evaluation, or consensus. OAEI 2016 offered 9 tracks with 22 test cases, and was attended by 21 participants. This paper is an overall presentation of the OAEI 2016 campaign.
Michelle Cheatham, Zlatan Dragisic, Jérôme Euzenat, Daniel Faria, Alfio Ferrara, Giorgos Flouris, Irini Fundulaki, Roger Granada, Valentina Ivanova, Ernesto Jiménez-Ruiz, Patrick Lambrix, Stefano Montanelli, Catia Pesquita, Tzanina Saveta, Pavel Shvaiko, Alessandro Solimando, Cássia Trojahn dos Santos, Ondřej Zamazal, Results of the Ontology Alignment Evaluation Initiative 2015, in: Pavel Shvaiko, Jérôme Euzenat, Ernesto Jiménez-Ruiz, Michelle Cheatham, Oktie Hassanzadeh (eds), Proc. 10th ISWC workshop on ontology matching (OM), Bethlehem (PA US), pp60-115, 2016
Ontology matching consists of finding correspondences between semantically related entities of two ontologies. OAEI campaigns aim at comparing ontology matching systems on precisely defined test cases. These test cases can use ontologies of different nature (from simple thesauri to expressive OWL ontologies) and use different modalities, e.g., blind evaluation, open evaluation and consensus. OAEI 2015 offered 8 tracks with 15 test cases followed by 22 participants. Since 2011, the campaign has been using a new evaluation modality which provides more automation to the evaluation. This paper is an overall presentation of the OAEI 2015 campaign.
Zlatan Dragisic, Kai Eckert, Jérôme Euzenat, Daniel Faria, Alfio Ferrara, Roger Granada, Valentina Ivanova, Ernesto Jiménez-Ruiz, Andreas Oskar Kempf, Patrick Lambrix, Stefano Montanelli, Heiko Paulheim, Dominique Ritze, Pavel Shvaiko, Alessandro Solimando, Cássia Trojahn dos Santos, Ondřej Zamazal, Bernardo Cuenca Grau, Results of the Ontology Alignment Evaluation Initiative 2014, in: Pavel Shvaiko, Jérôme Euzenat, Ming Mao, Ernesto Jiménez-Ruiz, Juanzi Li, Axel-Cyrille Ngonga Ngomo (eds), Proc. 9th ISWC workshop on ontology matching (OM), Riva del Garda (IT), pp61-104, 2014
Ontology matching consists of finding correspondences between semantically related entities of two ontologies. OAEI campaigns aim at comparing ontology matching systems on precisely defined test cases. These test cases can use ontologies of different nature (from simple thesauri to expressive OWL ontologies) and use different modalities, e.g., blind evaluation, open evaluation and consensus. OAEI 2014 offered 7 tracks with 9 test cases followed by 14 participants. Since 2010, the campaign has been using a new evaluation modality which provides more automation to the evaluation. This paper is an overall presentation of the OAEI 2014 campaign.
Bernardo Cuenca Grau, Zlatan Dragisic, Kai Eckert, Jérôme Euzenat, Alfio Ferrara, Roger Granada, Valentina Ivanova, Ernesto Jiménez-Ruiz, Andreas Oskar Kempf, Patrick Lambrix, Andriy Nikolov, Heiko Paulheim, Dominique Ritze, François Scharffe, Pavel Shvaiko, Cássia Trojahn dos Santos, Ondřej Zamazal, Results of the Ontology Alignment Evaluation Initiative 2013, in: Pavel Shvaiko, Jérôme Euzenat, Kavitha Srinivas, Ming Mao, Ernesto Jiménez-Ruiz (eds), Proc. 8th ISWC workshop on ontology matching (OM), Sydney (NSW AU), pp61-100, 2013
Ontology matching consists of finding correspondences between semantically related entities of two ontologies. OAEI campaigns aim at comparing ontology matching systems on precisely defined test cases. These test cases can use ontologies of different nature (from simple thesauri to expressive OWL ontologies) and use different modalities, e.g., blind evaluation, open evaluation and consensus. OAEI 2013 offered 6 tracks with 8 test cases followed by 23 participants. Since 2010, the campaign has been using a new evaluation modality which provides more automation to the evaluation. This paper is an overall presentation of the OAEI 2013 campaign.
Jérôme Euzenat, Maria Roşoiu, Cássia Trojahn dos Santos, Ontology matching benchmarks: generation, stability, and discriminability, Journal of web semantics 21:30-48, 2013
The OAEI Benchmark test set has been used for many years as a main reference to evaluate and compare ontology matching systems. However, this test set has barely varied since 2004 and has become a relatively easy task for matchers. In this paper, we present the design of a flexible test generator based on an extensible set of alterators which may be used programmatically for generating different test sets from different seed ontologies and different alteration modalities. It has been used for reproducing Benchmark both with the original seed ontology and with other ontologies. This highlights the remarkable stability of results over different generations and the preservation of difficulty across seed ontologies, as well as a systematic bias towards the initial Benchmark test set and the inability of such tests to identify an overall winning matcher. These were exactly the properties for which Benchmark had been designed. Furthermore, the generator has been used for providing new test sets aiming at increasing the difficulty and discriminability of Benchmark. Although difficulty may be easily increased with the generator, attempts to increase discriminability proved unfruitful. However, efforts towards this goal raise questions about the very nature of discriminability.
Ontology matching, Matching evaluation, Test generation, Semantic web
José Luis Aguirre, Bernardo Cuenca Grau, Kai Eckert, Jérôme Euzenat, Alfio Ferrara, Willem Robert van Hage, Laura Hollink, Ernesto Jiménez-Ruiz, Christian Meilicke, Andriy Nikolov, Dominique Ritze, François Scharffe, Pavel Shvaiko, Ondřej Sváb-Zamazal, Cássia Trojahn dos Santos, Benjamin Zapilko, Results of the Ontology Alignment Evaluation Initiative 2012, in: Pavel Shvaiko, Jérôme Euzenat, Anastasios Kementsietsidis, Ming Mao, Natalya Noy, Heiner Stuckenschmidt (eds), Proc. 7th ISWC workshop on ontology matching (OM), Boston (MA US), pp73-115, 2012
Ontology matching consists of finding correspondences between semantically related entities of two ontologies. OAEI campaigns aim at comparing ontology matching systems on precisely defined test cases. These test cases can use ontologies of different nature (from simple thesauri to expressive OWL ontologies) and use different modalities, e.g., blind evaluation, open evaluation, consensus. OAEI 2012 offered 7 tracks with 9 test cases followed by 21 participants. Since 2010, the campaign has been using a new evaluation modality which provides more automation to the evaluation. This paper is an overall presentation of the OAEI 2012 campaign.
José Luis Aguirre, Christian Meilicke, Jérôme Euzenat, Iterative implementation of services for the automatic evaluation of matching tools (v2), Deliverable 12.5v2, SEALS, 34p., 2012
This deliverable reports on the current status of the service implementation for the automatic evaluation of matching tools, and on the final status of those services. These services have been used in the third SEALS evaluation of matching systems, held in Spring 2012 in coordination with the OAEI 2011.5 campaign. We worked mainly on the tasks of modifying the WP12 BPEL work-flow to introduce new features introduced in the RES 1.2 version; testing the modified work-flows on a local installation and on the SEALS Platform; writing transformations of result data to be compliant with the new SEALS ontologies specifications; and finally, extending the SEALS client for ontology matching evaluation for better supporting the automation of WP12 evaluation campaigns and to advance in the integration with SEALS repositories. We report the results obtained while accomplishing these tasks.
ontology matching, ontology alignment, evaluation, benchmarks, efficiency measure
Jérôme Euzenat, A modest proposal for data interlinking evaluation, in: Pavel Shvaiko, Jérôme Euzenat, Anastasios Kementsietsidis, Ming Mao, Natalya Noy, Heiner Stuckenschmidt (eds), Proc. 7th ISWC workshop on ontology matching (OM), Boston (MA US), pp234-235, 2012
Data interlinking is a very important topic nowadays. It is sufficiently similar to ontology matching that comparable evaluation can be overtaken. However, it has enough differences, so that specific evaluations may be designed. We discuss such variations and design.
Data interlinking, Evaluation, Benchmark, Blocking, Instance matching
Christian Meilicke, José Luis Aguirre, Jérôme Euzenat, Ondřej Sváb-Zamazal, Ernesto Jiménez-Ruiz, Ian Horrocks, Cássia Trojahn dos Santos, Results of the second evaluation of matching tools, Deliverable 12.6, SEALS, 30p., 2012
This deliverable reports on the results of the second SEALS evaluation campaign (for WP12 it is the third evaluation campaign), which has been carried out in coordination with the OAEI 2011.5 campaign. Opposed to OAEI 2010 and 2011 the full set of OAEI tracks has been executed with the help of SEALS technology. 19 systems have participated and five data sets have been used. Two of these data sets are new and have not been used in previous OAEI campaigns. In this deliverable we report on the data sets used in the campaign, the execution of the campaign, and we present and discuss the evaluation results.
ontology matching, ontology alignment, evaluation, benchmarks
Jérôme Euzenat, Christian Meilicke, Pavel Shvaiko, Heiner Stuckenschmidt, Cássia Trojahn dos Santos, Ontology Alignment Evaluation Initiative: six years of experience, Journal on data semantics XV(6720):158-192, 2011
In the area of semantic technologies, benchmarking and systematic evaluation is not yet as established as in other areas of computer science, e.g., information retrieval. In spite of successful attempts, more effort and experience are required in order to achieve such a level of maturity. In this paper, we report results and lessons learned from the Ontology Alignment Evaluation Initiative (OAEI), a benchmarking initiative for ontology matching. The goal of this work is twofold: on the one hand, we document the state of the art in evaluating ontology matching methods and provide potential participants of the initiative with a better understanding of the design and the underlying principles of the OAEI campaigns. On the other hand, we report experiences gained in this particular area of semantic technologies to potential developers of benchmarking for other kinds of systems. For this purpose, we describe the evaluation design used in the OAEI campaigns in terms of datasets, evaluation criteria and workflows, provide a global view on the results of the campaigns carried out from 2005 to 2010 and discuss upcoming trends, both specific to ontology matching and generally relevant for the evaluation of semantic technologies. Finally, we argue that there is a need for a further automation of benchmarking to shorten the feedback cycle for tool developers.
Evaluation, Experimentation, Benchmarking, Ontology matching, Ontology alignment, Schema matching, Semantic technologies
Jérôme Euzenat, Alfio Ferrara, Willem Robert van Hague, Laura Hollink, Christian Meilicke, Andriy Nikolov, François Scharffe, Pavel Shvaiko, Heiner Stuckenschmidt, Ondřej Sváb-Zamazal, Cássia Trojahn dos Santos, Results of the Ontology Alignment Evaluation Initiative 2011, in: Pavel Shvaiko, Isabel Cruz, Jérôme Euzenat, Tom Heath, Ming Mao, Christoph Quix (eds), Proc. 6th ISWC workshop on ontology matching (OM), Bonn (DE), pp85-110, 2011
Ontology matching consists of finding correspondences between entities of two ontologies. OAEI campaigns aim at comparing ontology matching systems on precisely defined test cases. Test cases can use ontologies of different nature (from simple directories to expressive OWL ontologies) and use different modalities, e.g., blind evaluation, open evaluation, consensus. OAEI-2011 builds over previous campaigns by having 4 tracks with 6 test cases followed by 18 participants. Since 2010, the campaign introduces a new evaluation modality in association with the SEALS project. A subset of OAEI test cases is included in this new modality which provides more automation to the evaluation and more direct feedback to the participants. This paper is an overall presentation of the OAEI 2011 campaign.
Maria Roşoiu, Cássia Trojahn dos Santos, Jérôme Euzenat, Ontology matching benchmarks: generation and evaluation, in: Pavel Shvaiko, Isabel Cruz, Jérôme Euzenat, Tom Heath, Ming Mao, Christoph Quix (eds), Proc. 6th ISWC workshop on ontology matching (OM), Bonn (DE), pp73-84, 2011
The OAEI Benchmark data set has been used as a main reference to evaluate and compare matching systems. It requires matching an ontology with systematically modified versions of itself. However, it has two main drawbacks: it has not varied since 2004 and it has become a relatively easy task for matchers. In this paper, we present the design of a modular test generator that overcomes these drawbacks. Using this generator, we have reproduced Benchmark both with the original seed ontology and with other ontologies. Evaluating different matchers on these generated tests, we have observed that (a) the difficulties encountered by a matcher at a test are preserved across the seed ontology, (b) contrary to our expectations, we found no systematic positive bias towards the original data set which has been available for developers to test their systems, and (c) the generated data sets have consistent results across matchers and across seed ontologies. However, the discriminant power of the generated tests is still too low and more tests would be necessary to draw definitive conclusions.
Ontology matching, Matching evaluation, Test generation, Semantic web
Cássia Trojahn dos Santos, Christian Meilicke, Jérôme Euzenat, Iterative implementation of services for the automatic evaluation of matching tools, Deliverable 12.5, SEALS, 21p., 2011
The implementation of the automatic services for evaluating matching tools follows an iterative model. The aim is to provide a way for continuously analysing and improving these services. In this deliverable, we report the first iteration of this process, i.e., current implementation status of the services. In this first iteration, we have extended our previous implementation in order to migrate our own services to the SEALS components, which have been finished since the end of the first evaluation campaign.
ontology matching, ontology alignment, evaluation, benchmarks, efficiency measure
Jérôme Euzenat, Alfio Ferrara, Christian Meilicke, Andriy Nikolov, Juan Pane, François Scharffe, Pavel Shvaiko, Heiner Stuckenschmidt, Ondřej Sváb-Zamazal, Vojtech Svátek, Cássia Trojahn dos Santos, Results of the Ontology Alignment Evaluation Initiative 2010, in: Pavel Shvaiko, Jérôme Euzenat, Fausto Giunchiglia, Heiner Stuckenschmidt, Ming Mao, Isabel Cruz (eds), Proc. 5th ISWC workshop on ontology matching (OM), Shanghai (CN), pp85-117, 2010
Ontology matching consists of finding correspondences between entities of two ontologies. OAEI campaigns aim at comparing ontology matching systems on precisely defined test cases. Test cases can use ontologies of different nature (from simple directories to expressive OWL ontologies) and use different modalities, e.g., blind evaluation, open evaluation, consensus. OAEI-2010 builds over previous campaigns by having 4 tracks with 6 test cases followed by 15 participants. This year, the OAEI campaign introduces a new evaluation modality in association with the SEALS project. A subset of OAEI test cases is included in this new modality which provides more automation to the evaluation and more direct feedback to the participants. This paper is an overall presentation of the OAEI 2010 campaign.
Jérôme Euzenat, Christian Meilicke, Heiner Stuckenschmidt, Cássia Trojahn dos Santos, A web-based evaluation service for ontology matching, in: Proc. 9th demonstration track on international semantic web conference (ISWC), Shanghai (CN), pp93-96, 2010
Evaluation of semantic web technologies at large scale, including ontology matching, is an important topic of semantic web research. This paper presents a web-based evaluation service for automatically executing the evaluation of ontology matching systems. This service is based on the use of a web service interface wrapping the functionality of a matching tool to be evaluated and allows developers to launch evaluations of their tool at any time on their own. Furthermore, the service can be used to visualise and manipulate the evaluation results. The approach allows the execution of the tool on the machine of the tool developer without the need for a runtime environment.
Christian Meilicke, Cássia Trojahn dos Santos, Jérôme Euzenat, Services for the automatic evaluation of matching tools, Deliverable 12.2, SEALS, 35p., 2010
In this deliverable we describe a SEALS evaluation service for ontology matching that is based on the use of a web service interface to be implemented by the tool vendor. Following this approach we can offer an evaluation service before many components of the SEALS platform have been finished. We describe both the system architecture of the evaluation service from a general point of view as well as the specific components and their relation to the modules of the SEALS platform.
ontology matching, ontology alignment, evaluation, benchmarks
Cássia Trojahn dos Santos, Christian Meilicke, Jérôme Euzenat, Heiner Stuckenschmidt, Automating OAEI Campaigns (First Report), in: Asunción Gómez Pérez, Fabio Ciravegna, Frank van Harmelen, Jeff Heflin (eds), Proc. 1st ISWC international workshop on evaluation of semantic technologies (iWEST), Shanghai (CN), 2010
This paper reports the first effort into integrating OAEI and SEALS evaluation campaigns. The SEALS project aims at providing standardized resources (software components, data sets, etc.) for automatically executing evaluations of typical semantic web tools, including ontology matching tools. A first version of the software infrastructure is based on the use of a web service interface wrapping the functionality of a matching tool to be evaluated. In this setting, the evaluation results can visualized and manipulated immediately in a direct feedback cycle. We describe how parts of the OAEI 2010 evaluation campaign have been integrated into this software infrastructure. In particular, we discuss technical and organizational aspects related to the use of the new technology for both participants and organizers of the OAEI.
ontology matching, evaluation workflows, evaluation criteria, automating evaluation
Cássia Trojahn dos Santos, Christian Meilicke, Jérôme Euzenat, Ondřej Sváb-Zamazal, Results of the first evaluation of matching tools, Deliverable 12.3, SEALS, 36p., November 2010
This deliverable reports the results of the first SEALS evaluation campaign, which has been carried out in coordination with the OAEI 2010 campaign. A subset of the OAEI tracks has been included in a new modality, the SEALS modality. From the participant's point of view, the main innovation is the use of a web-based interface for launching evaluations. 13 systems, out of 15 for all tracks, have participated in some of the three SEALS tracks. We report the preliminary results of these systems for each SEALS track and discuss the main lesson learned from to the use of the new technology for both participants and organizers of the OAEI.
ontology matching, ontology alignment, evaluation, benchmarks
Cássia Trojahn dos Santos, Jérôme Euzenat, Christian Meilicke, Heiner Stuckenschmidt, Evaluation design and collection of test data for matching tools, Deliverable 12.1, SEALS, 68p., November 2009
This deliverable presents a systematic procedure for evaluating ontology matching systems and algorithms, in the context of SEALS project. It describes the criteria and metrics on which the evaluations will be carried out and the characteristics of the test data to be used, as well as the evaluation target, which includes the systems generating the alignments for evaluation.
ontology matching, ontology alignment, evaluation, benchmarks, efficiency measure