Structured Knowledge Extraction from Text using Large Language Models

Algoabra, Mohamad (2024) Structured Knowledge Extraction from Text using Large Language Models. Masters thesis, Universität Rostock.

[img] Text
Masterarbeit_mohamad_algoabra.pdf

Download (10MB)

Abstract

This thesis presents an approach to structured knowledge extraction using Large Language Models (LLMs), specifically addressing the challenge of transforming unstructured text into ontology-guided knowledge representations. We introduce a dual-task framework that first generates domain-specific ontologies and then extracts knowledge in the form of custom hypergraphs, ensuring both structural consistency and semantic accuracy. Through the implementation of Parameter-Efficient Fine-Tuning techniques, particularly Low-Rank Adaptation (LoRA), we demonstrate how LLM can be effectively adapted for complex knowledge extraction tasks while modifying less than 1% of the model’s parameters. Our methodology integrates several components: a synthetic data generation pipeline for creating training instances, a validation framework ensuring ontological consistency, and a custom hypergraph representation capable of capturing entities, binary relations, complex multi-entity relations and their attributes. We conducted two distinct sets of experiments – full block adaptation and selective attention-layer adaptation – each tested with different LoRA rank configurations (4, 16, and 32) to investigate how the type of targeted layers and number of adapted parameters affect performance. The experimental results demonstrate that full-block adaptation achieves superior performance across structural consistency and knowledge similarity metrics, with rank-16 configuration offering an optimal balance between efficiency and effectiveness. Although attention-only adaptation shows promise for computational efficiency by requiring only one-third of the parameters, it exhibits higher volatility in training and lower performance metrics. This research contributes to the field by providing a framework for adapting LLMs to structured knowledge extraction tasks, offering insights into the balance between model efficiency and extraction accuracy, and establishing a foundation for future work in automated knowledge management systems.

Item Type: Thesis (Masters)
Subjects: Autorenart > Studentische Arbeiten > Masterarbeit
Forschungsthemen > Hypergraph-Datenbanken
Forschungsthemen > Information Retrieval
Autorenart > Studentische Arbeiten
Depositing User: Dbis Admin
Date Deposited: 26 Nov 2024 15:12
Last Modified: 26 Nov 2024 15:12
URI: https://eprints.dbis.informatik.uni-rostock.de/id/eprint/1121

Actions (login required)

View Item View Item