Jérémy Vizzini, Data interlinking with relational concept analysis, Master's thesis, Université Grenoble Alpes, Grenoble (FR), 2017
Vast amounts of RDF data are made available on the web by various institutions providing overlapping information. To be fully exploited, different representations of the same object across various data sets have to be identified. This is what is called data interlinking. One novel way to generate such links is to use link keys. Link keys generalise database keys by applying them across two data sets. The structure of RDF makes this problem much more complex than for relational databases for several reasons. An instance can have multiple values for a given attribute. Moreover, values of properties are not necessarily datatypes but instances of the graph. A first method has been designed to extract and select link keys from two classes of objects which deals with multiple values but not object values. Moreover, the extraction step has been rephrased in formal concept analysis (FCA) allowing to generate link keys across relational tables. Our aim is to extend this work so that it can deal with multiple values. Then, we show how to use it to deal with object values when the data set is cycle free. This encoding does not necessarily generate the optimal link keys. Hence, we use relational concept analysis (RCA), an extension of FCA taking relations between concepts into account. We show that a new expression of this problem is able to extract the optimal link keys even in the presence of circularities. Moreover, the elaborated process does not require information about the alignments of the ontologies to find out for which pairs of classes to extract link keys. We implemented these methods and evaluated them by reproducing the experiments made in previous studies. This shows that the method extracts the expected results as well as (also expected) scalability issues.