Linked data in areas (2024-02-10)
Nacira Abbas, Alexandre Bazin, Jérôme David, Amedeo Napoli, Discovery of link keys in resource description framework datasets based on pattern structures, International Journal of Approximate Reasoning 161:108978, 2023
In this paper, we present a detailed and complete study on data interlinking and the discovery of identity links between two RDF-Resource Description Framework-datasets over the web of data. Data interlinking is the task of discovering identity links between individuals across datasets. Link keys are constructions based on pairs of properties and classes that can be considered as rules allowing to infer identity links between subjects in two RDF datasets. Here we investigate how FCA-Formal Concept Analysis-and its extensions are well adapted to investigate and to support the discovery of link keys. Indeed plain FCA allows to discover the so-called link key candidates, while a specific pattern structure allows to associate a pair of classes with every candidate. Different link key candidates can generate sets of identity links between individuals that can be considered as equal when they are regarded as partitions of the identity relation and thus involving a kind of redundancy. In this paper, such a redundancy is deeply studied thanks to partition pattern structures. In particular, experiments are proposed where it is shown that redundancy of link key candidates while not significant when based on identity of partitions appears to be much more significant when based on similarity.
Data Interlinking, Link Key Discovery, Link Key Candidate, Formal Concept Analysis, Pattern Structures, Redundancy of Link Sets
Jérôme David, Measures for knowledge – with applications to ontology matching and data interlinking, Habilitation à diriger des recherches, Université Grenoble Alpes, Grenoble (FR), May 2023
The Semantic Web is an extension of the web that enables people to express knowledge in a way that machines can reason with it. At the web scale, this knowledge may be described using different ontologies, and alignments have been defined to express these differences. Furthermore, the same individual may be represented by different instances in different datasets. Dealing with knowledge heterogeneity in the Semantic Web requires comparing these knowledge structures. Our objective is to understand heterogeneity and benefit from this understanding, not to reduce diversity. In this context, we have studied and contributed to techniques and measures for comparing knowledge structures on the Semantic Web along three dimensions: ontologies, alignments, and instances. At the ontology level, we propose measures for the ontology space and alignment space. The first family of measures relies solely on the content of ontologies, while the second one takes advantage of alignments between ontologies. At the alignment level, we investigate how to assess the quality of alignments. First, we study how to extend classical controlled evaluation measures by considering the semantics of aligned ontologies while relaxing the all-or-nothing nature of logical entailment. We also propose estimating the quality of alignments when no reference alignment is available. At the instance level, we tackle the challenge of identifying resources from different knowledge graphs that represent the same entity. We follow an approach based on keys and alignments. Specifically, we propose the notion of a link key, algorithms for extracting them, and measures to assess their quality. Finally, we recast this work in the perspective of the dynamics and evolution of knowledge.
Chloé Khadija Jradeh, Jérôme David, Olivier Teste, Cássia Trojahn dos Santos, L'Apport Mutuel de la Combinaison des Tâches d'Interconnexion de Données et d'Alignement d'Ontologies pour l'Alignement Expressifs, in: Actes 34e journées francophones sur Ingénierie des connaissances (IC), Strasbourg (FR), pp59-68, 2023
Plusieurs méthodes ont été proposées pour aborder les tâches d’interconnexion de données et d’alignement d’ontologies, qui sont généralement traitées séparément. Dans cet article, nous présentons DICAP, un algorithme qui permet leur collaboration mutuelle. Les expériences réalisées montrent que l’ajout de relations owl:sameAs résultant de l’interconnexion de données permet de découvrir des correspondances ontologiques supplémentaires. De plus, la présence de correspondances ontologiques permet l’extraction de règles de liage supplémentaires et discriminantes.
Interconnection de données, Alignement d'ontologies
Nacira Abbas, Alexandre Bazin, Jérôme David, Amedeo Napoli, A study of the discovery and redundancy of link keys between two RDF datasets based on partition pattern structures, in: Pablo Cordero, Pavol Jozef Šafárik (eds), Proc. 16th International conference on Concept Lattices and their Applications (CLA), Tallinn (EE), pp175-189, 2022
A link key between two RDF datasets D1 and D2 is a set of pairs of properties allowing to identify pairs of individuals 𝑥1 and 𝑥2 through an identity link such as x1 owl∶sameAs x2. In this paper, relying on and extending previous work, we introduce an original formalization of link key discovery based on the framework of Partition Pattern Structures (pps). Our objective is to study and evaluate the redundancy of link keys based on the fact that owl:sameAs is an equivalence relation. In the pps concept lattice, every concept has an extent representing a link key candidate and an intent representing a partition of instances into sets of equivalent instances. Experiments show three main results. Firstly redundancy of link keys is not so significant in real-world datasets. Nevertheless, the link key discovery approach based on pps returns a reduced number of non redundant link key candidates when compared to a standard approach. Moreover, the pps-based approach is efficient and returns link keys of high quality.
Formal Concept Analysis, Pattern Structures, Linked data, Link key, Data interlinking, Resource Description Framework
Nacira Abbas, Alexandre Bazin, Jérôme David, Amedeo Napoli, Contributions to link key discovery in RDF datasets, in: Pascal Préa (ed), Proc. 27th conference on rencontres de la Société Française de Classification (SFC), Lyon (FR), 2022
A link key between two RDF datasets D1 and D2 is a set of pairs of properties allowing to identify pairs of individuals, say x1 in D1 and x2 in D2, which can be materialized as a x1 owl:sameAs x2 identity link. There exist several ways to mine such link keys but no one takes into account the fact that owl:sameAs is an equivalence relation, which leads to the discovery of non-redundant link keys. Accordingly, in this paper, we present the link key discovery based on Pattern Structures (PS). PS output a pattern concept lattice where every concept has an extent representing a set of pairs of individuals and an intent representing the related link key candidate. Then, we discuss the equivalence relation induced by a link key and we introduce the notion of non-redundant link key candidate.
Linked data, Resource Description Framework, Link key, Formal Concept Analysis, Pattern Structures
Khadija Jradeh, Optimised tableau algorithms for reasoning in the description logic ALC extended with link keys, Thèse d'informatique, Université de Grenoble, Grenoble (FR), 2022
Knowledge Graphs (KGs) are unceasingly used by different organisation to represent real- world entities in the form of a graph. They may use an ontological layer for describing the classes and properties of the represented entities. RDF knowledge graphs are knowledge graphs that convey to the RDF model. RDF knowledge graph interlinking is the task of identifying different IRIs belonging to different RDF knowledge graphs and referring to the same real- world entity. This facilitates data integration and interoperability by combining different entity descriptions present in different knowledge graphs.There exist different methods for addressing the task of interlinking RDF knowledge graph. Link keys are among these methods. They are used for interlinking RDF knowledge graphs described using different ontologies. Link keys specify the properties to be compared to decide whether two entities belonging to different classes and present in different knowledge graphs are the same.Link keys can be expressed as logical axioms, and, thus, it is possible to combine them with ontologies, and ontology alignments to perform logical reasoning. In this thesis, we aim to study the problem of reasoning with link keys. To formally investigate this problem, we model RDF knowledge graphs, ontologies, and ontology alignments using the description logic ALC. We choose the description logic ALC as a base language for reasoning. ALC covers many modeling capabilities used for knowledge representation and allows for a more easy extension to more expressive description logics. We extend ALC with link keys and individual equalities, the resulting description logic is called ALC+LK. We show that link key entailment can be reduced to link key consistency checking without the need of introducing the negation of link keys.Then we design an algorithm for deciding the consistency of ALC+LK ontology. We have proved that the algorithm is sound, complete, and always terminates. This algorithm runs in 2EXPTIME. However, there exist EXPTIME algorithms for reasoning in ALC and the completion rules added for handling link keys and equalities require no more computational power than that of ALC.In the light of the above, we design a sound, complete, worst-case optimal algorithm for reasoning in ALC+LK. This algorithm is inspired by the compressed tableau algorithm, which allows obtaining the EXPTIME optimal complexity result. However, this algorithm has a non- directed behaviour which obstruct its implementation.Last but most importantly, we propose a sound, complete, and worst-case optimal tableau algorithm for reasoning in the description logic ALC with individuals and link keys. This al- gorithm, in contrast to the non-directed one, is directed by the application of completion rules. This avoids the generation of useless structures and facilitates its implementation. We implement this algorithm and provide a number of proof-of-concept experiments that demonstrates the importance of reasoning with link keys for the data interlinking task.
Reasoning, Semantic web, Description logic, Data interlinking, Knowledge graphs
Nacira Abbas, Alexandre Bazin, Jérôme David, Amedeo Napoli, Non-redundant link keys in RDF data: preliminary steps, in: Proc. 9th IJCAI workshop on What can FCA do for Artificial Intelligence? (FCA4AI), Montréal (CA), pp125-130, 2021
A link key between two RDF datasets D1 and D2 is a set ofpairs of properties allowing to identify pairs of individuals, say x1 in D1 and x2 in D2, which can be materialized as ax1owl:sameAs x2 identity link. There exist several ways to mine such link keys but no one takes into account the fact that owl:sameAs is an equivalence relation, which leads to the discovery of non-redundant link keys. Accordingly, in this paper, we present the link key discovery based on Pattern Structures (PS). PS output a pattern concept lattice where every concept has an extent representing a set of pairs of individuals and an intent representing the related link key candidate. Then, we discuss the equivalence relation induced by a link key and we introduce the notion of non-redundant link key candidate.
Linked data, RDF, Link key, Formal concept analysis, Pattern structure
Manuel Atencia, Jérôme David, Jérôme Euzenat, On the relation between keys and link keys for data interlinking, Semantic web journal 12(4):547-567, 2021
Both keys and their generalisation, link keys, may be used to perform data interlinking, i.e. finding identical resources in different RDF datasets. However, the precise relationship between keys and link keys has not been fully determined yet. A common formal framework encompassing both keys and link keys is necessary to ensure the correctness of data interlinking tools based on them, and to determine their scope and possible overlapping. In this paper, we provide a semantics for keys and link keys within description logics. We determine under which conditions they are legitimate to generate links. We provide conditions under which link keys are logically equivalent to keys. In particular, we show that data interlinking with keys and ontology alignments can be reduced to data interlinking with link keys, but not the other way around.
Ontology alignment, Key, Link key, Data interlinking
Manuel Atencia, Jérôme David, Jérôme Euzenat, Amedeo Napoli, Jérémy Vizzini, Relational concept analysis for circular link key extraction, Deliverable 1.2, ELKER, 57p., December 2021
A link key extraction procedure in case of circular dependencies is presented. It uses relational concept analysis and extends the procedure of Deliverable 1.1. This leads to investigate more closely the semantics of relational concept analysis which is given in terms of fixed points. Extracting all fixed points may offer more link key candidates to consider.
Formal Concept Analysis, Relational Concept Analysis, linked data, link key, data interlinking, Resource Description Framework
Nacira Abbas, Jérôme David, Amedeo Napoli, Discovery of link keys in RDF data based on pattern structures: preliminary steps, in: Francisco José Valverde-Albacete, Martin Trnecka (eds), Proc. 15th International conference on Concept Lattices and their Applications (CLA), Tallinn (EE), pp235-246, 2020
In this paper, we are interested in the discovery of link keys among two different RDF datasets based on FCA and pattern structures. A link key identifies individuals which represent the same real world entity. Two main strategies are used to automatically discover link keys, ignoring or not the classes to which the individuals belong to. Indeed, a link key may be relevant for some pair of classes and not relevant for another. Then, discovering link keys for one pair of classes at a time may be computationally expensive if every pair should be considered. To overcome such limitations, we introduce a specific and original pattern structure where link keys can be discovered in one pass while specifying the pair of classes associated with each link key, focusing on the discovery process and allowing more flexibility.
Formal Concept Analysis, Pattern Structures, Linked data, Link key, Data interlinking, Resource Description Framework
Manuel Atencia, Jérôme David, Jérôme Euzenat, Liliana Ibanescu, Nathalie Pernelle, Fatiha Saïs, Élodie Thiéblin, Cássia Trojahn dos Santos, Discovering expressive rules for complex ontology matching and data interlinking, in: Pavel Shvaiko, Jérôme Euzenat, Oktie Hassanzadeh, Ernesto Jiménez-Ruiz, Cássia Trojahn dos Santos (eds), Proc. 14th ISWC workshop on ontology matching (OM), Auckland (NZ), pp199-200, 2020
Ontology matching and data interlinking as distinguished tasks aim at facilitating the interoperability between different knowledge bases. Although the field has fully developed in the last years, most works still focus on generating simple correspondences between entities. These correspondences are however insufficient to fully cover the different types of heterogeneity between the knowledge base and complex correspondences are therefore required. Compared to simple matching, few approaches for complex matching have been proposed, focusing on correspondence patterns or exploiting common instances between the ontologies. Similarly, unsupervised data interlinking approaches (which do not require labelled data samples) have recently been developed. One approach consists in discovering linking rules such as simple keys or conditional keys on unlabelled data. The results have shown that the more expressive the rules, the higher the recall. Even more expressive rules (referential expressions, graph keys, etc.) are rather required, however naive approaches to the discovery of these rules can not be envisaged on large data sets. Existing approaches presuppose either that the data conform to the same ontology or that all possible pairs of properties be examined. Complementary, link keys are a set of pairs of properties that identify the instances of two classes of two RDF datasets. Such, link keys may be directly extracted without the need for an alignment. We introduce here an approach that aims at evaluating the impact of complex correspondences in the task of data interlinking established from the application of keys.
Data interlinking, Ontology matching, Complex correspondence
Manuel Atencia, Jérôme David, Jérôme Euzenat, Amedeo Napoli, Jérémy Vizzini, Link key candidate extraction with relational concept analysis, Discrete applied mathematics 273:2-20, 2020
Linked data aims at publishing data expressed in RDF (Resource Description Framework) at the scale of the worldwide web. These datasets interoperate by publishing links which identify individuals across heterogeneous datasets. Such links may be found by using a generalisation of keys in databases, called link keys, which apply across datasets. They specify the pairs of properties to compare for linking individuals belonging to different classes of the datasets. Here, we show how to recast the proposed link key extraction techniques for RDF datasets in the framework of formal concept analysis. We define a formal context, where objects are pairs of resources and attributes are pairs of properties, and show that formal concepts correspond to link key candidates. We extend this characterisation to the full RDF model including non functional properties and interdependent link keys. We show how to use relational concept analysis for dealing with cyclic dependencies across classes and hence link keys. Finally, we discuss an implementation of this framework.
Formal Concept Analysis, Relational Concept Analysis, Linked data, Link key, Data interlinking, Resource Description Framework
Jérôme Euzenat, A map without a legend: the semantic web and knowledge evolution, Semantic web journal 11(1):63-68, 2020
The current state of the semantic web is focused on data. This is a worthwhile progress in web content processing and interoperability. However, this does only marginally contribute to knowledge improvement and evolution. Understanding the world, and interpreting data, requires knowledge. Not knowledge cast in stone for ever, but knowledge that can seamlessly evolve; not knowledge from one single authority, but diverse knowledge sources which stimulate confrontation and robustness; not consistent knowledge at web scale, but local theories that can be combined. We discuss two different ways in which semantic web technologies can greatly contribute to the advancement of knowledge: semantic eScience and cultural knowledge evolution.
Semantic web, Linked data, Big data, Open data, Knowledge representation, Knowledge, Ontology, Machine learning, Reproducible research, eScience, Cultural evolution
Jérôme Euzenat, Marie-Christine Rousset, Semantic web, in: Pierre Marquis, Odile Papini, Henri Prade (eds), A guided tour of artificial intelligence research, Springer, Berlin (DE), 575p., 2020, pp181-207
The semantic web aims at making web content interpretable. It is no less than offering knowledge representation at web scale. The main ingredients used in this context are the representation of assertional knowledge through graphs, the definition of the vocabularies used in graphs through ontologies, and the connection of these representations through the web. Artificial intelligence techniques and, more specifically, knowledge representation techniques, are put to use and to the test by the semantic web. Indeed, they have to face typical problems of the web: scale, heterogeneity, incompleteness, and dynamics. This chapter provides a short presentation of the state of the semantic web and refers to other chapters concerning those techniques at work in the semantic web.
RDF, OWL, RDF Model, Querying RDF, SPARQL, SPARQL Extensions
Nacira Abbas, Jérôme David, Amedeo Napoli, Linkex: A tool for link key discovery based on pattern structures, in: Proc. ICFCA workshop on Applications and tools of formal concept analysis, Frankfurt (DE), pp33-38, 2019
Links constitute the core of Linked Data philosophy. With the high growth of data published in the web, many frameworks have been proposed to deal with the link discovery problem, and particularly the identity links. Finding such kinds of links between different RDF data sets is a critical task. In this position paper, we focus on link key which consists of sets of pairs of properties identifying the same entities across heterogeneous datasets. We also propose to formalize the problem of link key discovery using Pattern Structures (PS), the generalization of Formal Concept Analysis dealing with non binary datasets. After providing the proper definitions of link keys and setting the problem in terms of PS, we show that the intents of the pattern concepts correspond to link keys and their extents to sets of identity links generated by their intents. Finally, we discuss an implementation of this framework and we show the applicability and the scalability of the proposed method.
RDF, Linked data, Pattern structure, Link key
Manuel Atencia, Jérôme David, Jérôme Euzenat, Amedeo Napoli, Jérémy Vizzini, A guided walk into link key candidate extraction with relational concept analysis, in: Claudia d'Amato, Lalana Kagal (eds), Proc. on journal track of the International semantic web conference, Auckland (NZ), 2019
Data interlinking is an important task for linked data interoperability. One of the possible techniques for finding links is the use of link keys which generalise relational keys to pairs of RDF models. We show how link key candidates may be directly extracted from RDF data sets by encoding the extraction problem in relational concept analysis. This method deals with non functional properties and circular dependent link key expressions. As such, it generalises those presented for non dependent link keys and link keys over the relational model. The proposed method is able to return link key candidates involving several classes at once.
Formal Concept Analysis, Relational Concept Analysis, Linked data, Link key, Data interlinking, Resource Description Framework
Manuel Atencia, Jérôme David, Jérôme Euzenat, Several link keys are better than one, or extracting disjunctions of link key candidates, in: Proc. 10th ACM international conference on knowledge capture (K-Cap), Marina del Rey (CA US), pp61-68, 2019
Link keys express conditions under which instances of two classes of different RDF data sets may be considered as equal. As such, they can be used for data interlinking. There exist algorithms to extract link key candidates from RDF data sets and different measures have been defined to evaluate the quality of link key candidates individually. For certain data sets, however, it may be necessary to use more than one link key on a pair of classes to retrieve a more complete set of links. To this end, in this paper, we define disjunction of link keys, propose strategies to extract disjunctions of link key candidates from RDF data, and apply existing quality measures to evaluate them. We also report on experiments with these strategies.
Linked data, RDF, Data interlinking, Link key, Antichain
Manuel Atencia, Jérôme Euzenat, Khadija Jradeh, Chan Le Duc, Tableau methods for reasoning with link keys, Deliverable 2.1, ELKER, 32p., 2019
Data interlinking is a critical task for widening and enhancing linked open data. One way to tackle data interlinking is to use link keys, which generalise keys to the case of two RDF datasets described using different ontologies. Link keys specify pairs of properties to compare for finding same-as links between instances of two classes of two different datasets. Hence, they can be used for finding links. Link keys can also be considered as logical axioms just like keys, ontologies and ontology alignments. We introduce the logic ALC+LK extending the description logic ALC with link keys. It may be used to reason and infer entailed link keys that may be more useful for a particular data interlinking task. We show that link key entailment can be reduced to consistency checking without introducing the negation of link keys. For deciding the consistency of an ALC+LK ontology, we introduce a new tableau-based algorithm. Contrary to the classical ones, the completion rules concerning link keys apply to pairs of individuals not directly related. We show that this algorithm is sound, complete and always terminates.
link keys, reasoning, tableau method
Manuel Atencia, Jérôme David, Jérôme Euzenat, Amedeo Napoli, Jérémy Vizzini, Candidate link key extraction with formal concept analysis, Deliverable 1.1, ELKER, 29p., October 2019
A link key extraction procedure using formal concept analysis is described. It is shown to extract all link key candidates.
Formal Concept Analysis, linked data, link key, data interlinking, Resource Description Framework
Jérôme David, Jérôme Euzenat, Jérémy Vizzini, Linkky: Extraction de clés de liage par une adaptation de l'analyse relationnelle de concepts, in: Actes 29e journées francophones sur Ingénierie des connaissances (IC), Nancy (FR), pp271-274, 2018
RDF, Clé de liage, Liage de données, Analyse relationelle de concepts, Analyse formelle de concepts, Network of ontologies
Pieter Pauwels, María Poveda Villalón, Alvaro Sicilia, Jérôme Euzenat, Semantic technologies and interoperability in the built environment, Semantic web journal 9(6):731-734, 2018
The built environment consists of plenty of physical assets with which we interact on a daily basis. In order to improve not only our built environment, but also our interaction with that environment, we would benefit a lot from semantic representations of this environment. This not only includes buildings, but also large infrastructure (bridges, tunnels, waterways, underground systems), and geospatial data. With this special issue, an insight is given into the current state of the art in terms of semantic technologies and interoperability in this built environment. This editorial not only summarizes the content of the Special Issue on Semantic Technologies and interoperability in the Built Environment, it also provides a brief overview of the current state of the art in general in terms of standardisation and community efforts.
Alvaro Sicilia, Pieter Pauwels, Leandro Madrazo, María Poveda Villalón, Jérôme Euzenat (eds), Special Issue on Semantic Technologies and Interoperability in the Build Environment, Semantic web journal (special issue) 9(6):729-855, 2018
Marie-Christine Rousset, Manuel Atencia, Jérôme David, Fabrice Jouanot, Olivier Palombi, Federico Ulliana, Datalog revisited for reasoning in linked data, in: Giovambattista Ianni, Domenico Lembo, Leopoldo Bertossi, Wolfgang Faber, Birte Glimm, Georg Gottlob, Steffen Staab (eds), Proc. 13th International summer school on reasoning web (RW), Lecture notes in computer science 10370, 2017, pp121-166
Linked Data provides access to huge, continuously growing amounts of open data and ontologies in RDF format that describe entities, links and properties on those entities. Equipping Linked Data with inference paves the way to make the Semantic Web a reality. In this survey, we describe a unifying framework for RDF ontologies and databases that we call deductive RDF triplestores. It consists in equipping RDF triplestores with Datalog inference rules. This rule language allows to capture in a uniform manner OWL constraints that are useful in practice, such as property transitivity or symmetry, but also domain-specific rules with practical relevance for users in many domains of interest. The expressivity and the genericity of this framework is illustrated for modeling Linked Data applications and for developing inference algorithms. In particular, we show how it allows to model the problem of data linkage in Linked Data as a reasoning problem on possibly decentralized data. We also explain how it makes possible to efficiently extract expressive modules from Semantic Web ontologies and databases with formal guarantees, whilst effectively controlling their succinctness. Experiments conducted on real-world datasets have demonstrated the feasibility of this approach and its usefulness in practice for data integration and information extraction.
Jérémy Vizzini, Data interlinking with relational concept analysis, Master's thesis, Université Grenoble Alpes, Grenoble (FR), 2017
Vast amounts of RDF data are made available on the web by various institutions providing overlapping information. To be fully exploited, different representations of the same object across various data sets have to be identified. This is what is called data interlinking. One novel way to generate such links is to use link keys. Link keys generalise database keys by applying them across two data sets. The structure of RDF makes this problem much more complex than for relational databases for several reasons. An instance can have multiple values for a given attribute. Moreover, values of properties are not necessarily datatypes but instances of the graph. A first method has been designed to extract and select link keys from two classes of objects which deals with multiple values but not object values. Moreover, the extraction step has been rephrased in formal concept analysis (FCA) allowing to generate link keys across relational tables. Our aim is to extend this work so that it can deal with multiple values. Then, we show how to use it to deal with object values when the data set is cycle free. This encoding does not necessarily generate the optimal link keys. Hence, we use relational concept analysis (RCA), an extension of FCA taking relations between concepts into account. We show that a new expression of this problem is able to extract the optimal link keys even in the presence of circularities. Moreover, the elaborated process does not require information about the alignments of the ontologies to find out for which pairs of classes to extract link keys. We implemented these methods and evaluated them by reproducing the experiments made in previous studies. This shows that the method extracts the expected results as well as (also expected) scalability issues.