How Language Models Understand Honorific Mismatches in Korean
Received: Oct 30, 2024 ; Revised: Dec 02, 2024 ; Accepted: Dec 03, 2024
Published Online: Dec 31, 2024
ABSTRACT
This study investigates whether language models can process honorific mismatches in Korean, which occur when syntactic agreement in honorification is violated. Two types of mismatches are examined: YN, in which an honorific referent is paired with a non-honorific verb; and NY, in which a non-honorific referent is paired with an honorific verb. Previous studies showed that native speakers consider YN mismatches relatively acceptable but not NY mismatches. To understand the manner by which language models manage such patterns, surprisal-a complexity metric reflecting sentence likelihood-is applied to four Korean models: KR-BERT, KoELECTRA-base, KLUE-RoBERTa-base, and KLUE-RoBERTa-large. A dataset of 3,200 sentences is used to estimate surprisal for NN matches, NY mismatches, YN mismatches, and YY matches. The results show that the models primarily reflect human judgments, i.e., YN mismatches are considered acceptable, whereas NY mismatches are not. However, the models deviated from human-like processing in managing YY matches, where no violations occurred, likely because of the rarity of YY constructions in the training data. This suggests that, whereas the models demonstrate partial success in processing honorifics, they depend on statistical patterns and lack the deeper pragmatic understanding required for full syntactic and contextual competence.
References
5.
7.
27.