Mitigating Hallucination in Large Models: A Modular Framework for Detection and Counterfactual Correction

Nyantakyi, A.B.

Mitigating Hallucination in Large Models: A Modular Framework for Detection and Counterfactual Correction

Files

Mitigating Hallucination in Large Models_A Modular Framework for Detection and Counterfactual Correction.pdf (1.32 MB)

Date

2025-12

Authors

Nyantakyi, A.B.

Publisher

UENR

Abstract

Large Language Models (LLMs) demonstrate impressive fluency yet remain unreliable in safetycritical environments due to persistent hallucination; confidently generating factually incorrect or semantically not supported answers. This research proposes a modular mitigation framework integrating Hallucination Potential Minimization (HPM) with Self-Generated Counterfactual Training (SGCT) to improve factual consistency in generative outputs. A lightweight DistilBERT-based HPM classifier was trained as a binary factuality judge using benchmark datasets including FEVER and TruthfulQA, prioritising recall to ensure conservative hallucination detection. Building on this foundation, SGCT fine-tuned a GPT-2 generative model rather than more recent architectures due to its computational accessibility, reproducibility, and suitability for controlled experimentation under resource constraints. SGCT incorporates likelihood loss for factual responses, unlikelihood loss to penalize hallucinations, and a contrastive objective to separate factual versus hallucinated answers representations in an embedding space. Experimental results demonstrated measurable improvements following SGCT, with accuracy increasing from 0.556 to 0.614, recall from 0.705 to 0.890, precision from 0.532 to 0.548, and F1- score from 0.607 to 0.692. Threshold calibration further revealed flexible trade-offs between factuality and output strictness, enabling uncertain responses to be routed into a safe “abstain” category. The findings indicate that classifier-guided generation provides a practical strategy for enhancing reliability in LLM-based systems while maintaining computational efficiency. The proposed SGCT-HPM pipeline represents a reproducible and adaptable approach for hallucination mitigation, with potential applications in domains requiring verifiable AI-generated content.

Keywords

Hallucinations, Datasets, Minimization

URI

https://space.uenr.edu.gh//handle/123456789/63

Collections

All theses

Full item page

Mitigating Hallucination in Large Models: A Modular Framework for Detection and Counterfactual Correction

Files

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Description

Keywords

Citation

URI

Collections