Mitigating Hallucination in Large Models: A Modular Framework for Detection and Counterfactual Correction
No Thumbnail Available
Date
2025-12
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
UENR
Abstract
Large Language Models (LLMs) demonstrate impressive fluency yet remain unreliable in
safetycritical environments due to persistent hallucination; confidently generating factually
incorrect or semantically not supported answers. This research proposes a modular mitigation
framework integrating Hallucination Potential Minimization (HPM) with Self-Generated
Counterfactual Training (SGCT) to improve factual consistency in generative outputs. A
lightweight DistilBERT-based HPM classifier was trained as a binary factuality judge using
benchmark datasets including FEVER and TruthfulQA, prioritising recall to ensure conservative
hallucination detection. Building on this foundation, SGCT fine-tuned a GPT-2 generative model
rather than more recent architectures due to its computational accessibility, reproducibility, and
suitability for controlled experimentation under resource constraints. SGCT incorporates
likelihood loss for factual responses, unlikelihood loss to penalize hallucinations, and a contrastive
objective to separate factual versus hallucinated answers representations in an embedding space.
Experimental results demonstrated measurable improvements following SGCT, with accuracy
increasing from 0.556 to 0.614, recall from 0.705 to 0.890, precision from 0.532 to 0.548, and F1-
score from 0.607 to 0.692. Threshold calibration further revealed flexible trade-offs between
factuality and output strictness, enabling uncertain responses to be routed into a safe “abstain”
category. The findings indicate that classifier-guided generation provides a practical strategy for
enhancing reliability in LLM-based systems while maintaining computational efficiency. The
proposed SGCT-HPM pipeline represents a reproducible and adaptable approach for hallucination
mitigation, with potential applications in domains requiring verifiable AI-generated content.
Description
Keywords
Hallucinations, Datasets, Minimization