Evaluation/Évaluation in areas (2024-02-10)
Luisa Werner, Pierre Genevès, Nabil Layaïda, Jérôme Euzenat, Damien Graux, Reproduce, replicate, reevaluate: the long but safe way to extend machine learning methods, in: Proc. 38th AAAI Conference on Artificial Intelligence (AAAI), Vancouver (CA), 2024
Reproducibility is a desirable property of scientific research. On the one hand, it increases confidence in results. On the other hand, reproducible results can be extended on a solid basis. In rapidly developing fields such as machine learning, the latter is particularly important to ensure the reliability of research. In this paper, we present a systematic approach to reproducing (using the available implementation), replicating (using an alternative implementation) and reevaluating (using different datasets) state-of-the-art experiments. This approach enables the early detection and correction of deficiencies and thus the development of more robust and transparent machine learning methods. We detail the independent reproduction, replication, and reevaluation of the initially published experiments with a method that we want to extend. For each step, we identify issues and draw lessons learned. We further discuss solutions that have proven effective in overcoming the encountered problems. This work can serve as a guide for further reproducibility studies and generally improve reproducibility in machine learning.
Jérôme David, Measures for knowledge – with applications to ontology matching and data interlinking, Habilitation à diriger des recherches, Université Grenoble Alpes, Grenoble (FR), May 2023
The Semantic Web is an extension of the web that enables people to express knowledge in a way that machines can reason with it. At the web scale, this knowledge may be described using different ontologies, and alignments have been defined to express these differences. Furthermore, the same individual may be represented by different instances in different datasets. Dealing with knowledge heterogeneity in the Semantic Web requires comparing these knowledge structures. Our objective is to understand heterogeneity and benefit from this understanding, not to reduce diversity. In this context, we have studied and contributed to techniques and measures for comparing knowledge structures on the Semantic Web along three dimensions: ontologies, alignments, and instances. At the ontology level, we propose measures for the ontology space and alignment space. The first family of measures relies solely on the content of ontologies, while the second one takes advantage of alignments between ontologies. At the alignment level, we investigate how to assess the quality of alignments. First, we study how to extend classical controlled evaluation measures by considering the semantics of aligned ontologies while relaxing the all-or-nothing nature of logical entailment. We also propose estimating the quality of alignments when no reference alignment is available. At the instance level, we tackle the challenge of identifying resources from different knowledge graphs that represent the same entity. We follow an approach based on keys and alignments. Specifically, we propose the notion of a link key, algorithms for extracting them, and measures to assess their quality. Finally, we recast this work in the perspective of the dynamics and evolution of knowledge.
Jérôme David, Jérôme Euzenat, Pierre Genevès, Nabil Layaïda, Evaluation of query transformations without data, in: Proc. WWW workshop on Reasoning on Data (RoD), Lyon (FR), pp1599-1602, 2018
Query transformations are ubiquitous in semantic web query processing. For any situation in which transformations are not proved correct by construction, the quality of these transformations has to be evaluated. Usual evaluation measures are either overly syntactic and not very informative ---the result being: correct or incorrect--- or dependent from the evaluation sources. Moreover, both approaches do not necessarily yield the same result. We suggest that grounding the evaluation on query containment allows for a data-independent evaluation that is more informative than the usual syntactic evaluation. In addition, such evaluation modalities may take into account ontologies, alignments or different query languages as soon as they are relevant to query evaluation.
Manel Achichi, Michelle Cheatham, Zlatan Dragisic, Jérôme Euzenat, Daniel Faria, Alfio Ferrara, Giorgos Flouris, Irini Fundulaki, Ian Harrow, Valentina Ivanova, Ernesto Jiménez-Ruiz, Kristian Kolthoff, Elena Kuss, Patrick Lambrix, Henrik Leopold, Huanyu Li, Christian Meilicke, Majid Mohammadi, Stefano Montanelli, Catia Pesquita, Tzanina Saveta, Pavel Shvaiko, Andrea Splendiani, Heiner Stuckenschmidt, Élodie Thiéblin, Konstantin Todorov, Cássia Trojahn dos Santos, Ondřej Zamazal, Results of the Ontology Alignment Evaluation Initiative 2017, in: Pavel Shvaiko, Jérôme Euzenat, Ernesto Jiménez-Ruiz, Michelle Cheatham, Oktie Hassanzadeh (eds), Proc. 12th ISWC workshop on ontology matching (OM), Wien (AT), pp61-113, 2017
Ontology matching consists of finding correspondences between semantically related entities of different ontologies. The Ontology Alignment Evaluation Initiative (OAEI) aims at comparing ontology matching systems on precisely defined test cases. These test cases can be based on ontologies of different levels of complexity (from simple thesauri to expressive OWL ontologies) and use different evaluation modalities (e.g., blind evaluation, open evaluation, or consensus). The OAEI 2017 campaign offered 9 tracks with 23 test cases, and was attended by 21 participants. This paper is an overall presentation of that campaign.