Knowledge management/Gestion de connaissance in areas (2024-02-10)
Julien Breton, Mokhtar Boumedyen Billami, Max Chevalier, Cássia Trojahn dos Santos, Extraction terminologique juridique à faible supervision: une méthode hybride combinant LLM, règles syntaxiques et CamemBERT, in: Actes 36e Journées francophones sur Ingénierie des Connaissances (IC), Dijon (FR), pp35–43, 2025
Le secteur juridique se caractérise par un nombre important de documents et par leur complexité. Les entreprises ont l’obligation d’appliquer ces dispositions juridiques. En raison de l’évolution constante de ces documents, un intérêt croissant se manifeste pour l’automatisation du traitement des textes juridiques afin de faciliter la conformité réglementaire. Une étape clé de ce processus réside dans l’extraction des termes juridiques. Les méthodes état de l’art, telles que les systèmes à base de règles, les réseaux Bi-LSTM et BERT, requièrent une quantité importante de données annotées pour atteindre des performances satisfaisantes, une tâche particulièrement chronophage pour les experts du domaine. Avec l’essor des grands modèles de langage (LLM), la recherche s’oriente de plus en plus vers l’exploitation de leurs capacités, notamment à travers des approches faiblement supervisées. Dans cet article, nous présentons un système hybride qui distille les connaissances de GPT-4 vers un modèle CamemBERT, tout en appliquant un filtrage syntaxique. Cette approche réduit non seulement le besoin d’intervention d’experts par rapport au système CamemBERT classique, mais elle surpasse également le système reposant uniquement sur GPT-4, en améliorant le score F1 de 7 à 24 points de pourcentage.
Extraction terminologique juridique, Faible supervision, CamemBERT, Grands modèles de langage (LLM), Distillation des connaissances
Julien Breton, Mokhtar Boumedyen Billami, Max Chevalier, Ken Satoh, Cássia Trojahn dos Santos, May Myo Zin, Leveraging LLMs for legal terms extraction with limited annotated data, Artificial intelligence and law, 2025
The legal industry is characterized by the presence of dense and complex documents, which necessitate automatic processing methods to manage and analyse large volumes of data. Traditional methods for extracting legal information depend heavily on substantial quantities of annotated data during the training phase. However, a question arises on how to extract information effectively in contexts that do not favour the utilization of annotated data. This study investigates the application of Large Language Models (LLMs) as a transformative solution for the extraction of legal terms, presenting a novel approach to overcome the constraints associated with the need for extensive annotated datasets. Our research delved into methods such as prompt-engineering and fine-tuning to enhance their performance. We evaluated and compared, to a rule-based and BERT systems, the performance of four LLMs: GPT-4, Miqu-1-70b, Mixtral-8x7b, and Mistral-7b, within the scope of limited annotated data availability. We implemented and assessed our methodologies using Luxembourg’s traffic regulations as a case study. Our findings underscore the capacity of LLMs to successfully deal with legal terms extraction, emphasizing the benefits of one-shot and zero-shot learning capabilities in reducing reliance on annotated data by reaching 0.690 F1 Score. Moreover, our study sheds light on the optimal practices for employing LLMs in the processing of legal information, offering insights into the challenges and limitations, including issues related to terms boundary extraction.
Fine-tuning, One-shot learning, Large language models (LLMs), Limited annotated data, Legal terms extraction
Antoine Dupuy, Nathalie Aussenac-Gilles, Christophe Baehr, Cássia Trojahn dos Santos, Interpreting user needs with LLMs-based conversational agents and knowledge graphs: an earth observation use case, in: Proc. 24th ISWC Poster and demo track, Nara (JP), pp265–270, 2025
Open Science has broadened access to scientific datasets. However, identifying relevant ones to specific user needs remains challenging due to the volume, diversity and poor metadata. This paper proposes to integrate semantically enriched metadata with LLM agents to interpret user natural language queries, to extract user intent and to generate justifications for retrieved results. Experiments with different LLMs highlight the potential of such approach for scientific dataset retrieval.
Antoine Dupuy, Nathalie Aussenac-Gilles, Christophe Baehr, Cássia Trojahn dos Santos, Combining LLMs-based conversational agents and ontologies for open data research, in: Proc. 19th Research conference on Metadata and Semantic Research (MSR), Thessaloniki (GR), 2025
Open Science has significantly increased the availability of heterogeneous scientific datasets. However, these datasets are often described with poor metadata, which makes it difficult to identify data relevant to a specific user’s needs. The pertinent data sets may be hard to find when described with poor metadata, or if users’ needs are expressed using a different vocabulary. This paper proposes an approach that combines semantically enriched metadata with LLM-based agents that interpret natural language queries to manage the gap between users’ needs and dataset descriptions, and to support the retrieval of relevant datasets. It enables the extraction and refinement of user needs, as well as the generation of justifications for the retrieved results. To assess the performance of the proposed system, an evaluation was conducted across multiple Earth Observation (EO) data request scenarios. Four LLM agents have been evaluated (LLaMA 3.3 70B, Mistral Saba 24B, Deepseek-R1, and Qwen 32B) using metrics such as answer relevancy, contextual precision, recall, and faithfulness. The results, conducted with the Deepeval library with the LLaMA 3 8B model, show relatively high scores for answer relevance and contextual precision, especially with the LLaMA and Deepseek-R1 models.
Semantic Metadata, LLM, LLM-based agent, Dataset retrieval, Query interpretation, Knowledge graph
Soline Felice, Frank Arnould, Cássia Trojahn dos Santos, Towards a semantic representation of memory entities, in: Proc. 9th JOWO workshop on cognition and ontologies (CAOS), Catania (IT), 2025
Different disciplines have been studying human memory and related issues for thousands of years. However, the definitions of the concepts relating to memory vary depending on the discipline or theory. In order to conciliate these variations and ambiguities, a solution is to formally define the concepts studied through ontologies. This paper presents Mem’Onto, a Memory Ontology which gather concepts related to memory, based on the Tulving’s SPI model. This theory corresponds to a model of memory organisation and brings together various central elements of memory according to Tulving, whether in Memory Systems (e.g. Episodic Memory, Semantic Memory, Procedural Memory), in Mnesic Processes (e.g. Encoding, Storage, Retrieval) or in the level of consciousness of these subsystems during Retrieval (Implicit and Explicit). Mem’Onto is adapted from an existing ontology, CoTOn, a Cognitive Theory Ontology designed from a working memory use case.
memory, domain ontology, FAIRness, CoTOn, UFO
Maxime Lefrançois, Fatiha Saïs, Cássia Trojahn dos Santos, Introduction, Revue ouverte d'intelligence artificielle 6(1–2):1–4, 2025
L’ingénierie des connaissances est une thématique de l’Intelligence Artificielle qui contribue au développement de modèles, méthodes et outils pour l’acquisition, la représentation et l’intégration de connaissances. Sa finalité est la production de méthodes et outils « intelligents », capables d’aider l’humain dans ses activités et ses prises de décisions. La conférence Ingénierie des Connaissances est un lieu d’échanges et de réflexions, de présentation et de confrontation des théories, pratiques, méthodes et outils autour de l’ingénierie des connaissances. Cette communauté prend désormais en compte l’essor des algorithmes d’apprentissage automatique et leurs retombées sur les pratiques individuelles et collectives, tout en conservant l’humain au centre des systèmes de décision exploitant les données et les connaissances. Ce numéro spécial regroupe des versions étendues d’une sélection des meilleurs articles des éditions 2021, 2022, et 2023.
Ingénierie des connaissances