Context Vector의 한계: 왜 긴 문장에서 번역 성능이 떨어질까?

Jan 23, 2025

Read time: 1 minute

Context Vector의 병목 현상¶

앞서 본 Seq2Seq 모델은 짧은 문장에서는 잘 작동하지만, 문장이 길어지면 성능이 급격히 떨어집니다. 그 이유는 인코더가 입력 문장 전체를 하나의 고정 길이 벡터로 압축하기 때문입니다.

20단어 문장이든 100단어 문장이든 모두 같은 크기의 벡터(예: 128차원)로 압축됩니다. 이는 마치 책 한 권을 한 문장 요약으로 줄이는 것과 같습니다 - 정보 손실이 불가피합니다.

BLEU Score로 성능 측정하기¶

BLEU (Bilingual Evaluation Understudy)는 기계 번역 품질을 평가하는 표준 지표입니다. 생성된 문장과 참조 문장 간의 n-gram precision을 기반으로 점수를 계산합니다.

from nltk.translate.bleu_score import sentence_bleu, SmoothingFunction

reference = [['the', 'cat', 'is', 'on', 'the', 'mat']]
hypothesis = ['the', 'cat', 'on', 'the', 'mat']

score = sentence_bleu(reference, hypothesis,
                     smoothing_function=SmoothingFunction().method1)
print(f"BLEU score: {score:.4f}")

실험: 문장 길이와 성능의 관계¶

짧은 문장과 긴 문장을 섞어 학습한 후, 긴 문장의 번역 품질을 측정했습니다:

짧은 문장 (5단어):
Input: i am a student .
Output: je suis un etudiant .
BLEU: 1.0000 ✅

긴 문장 (17단어):
Input: the quick brown fox jumps over the lazy dog in the dense forest far away .
Target: le renard brun rapide saute par-dessus le chien paresseux...
Predicted: j adore ce chat .
BLEU: 0.0073 ❌

결과는 명확합니다: 문장 길이가 길어질수록 BLEU Score가 급격히 떨어집니다. 128차원 벡터가 모든 정보를 담기에는 너무 작기 때문입니다.

시각화: 정보 손실의 증거¶

다양한 길이의 문장으로 실험한 결과를 산점도로 그리면, 문장 길이와 BLEU Score 간의 명확한 음의 상관관계를 확인할 수 있습니다:

def quantify_information_loss(encoder, decoder, data_pairs):
    lengths = []
    bleu_scores = []

    for pair in data_pairs:
        input_sentence, target_sentence = pair
        predicted = translate(input_sentence, encoder, decoder)

        score = calculate_bleu(target_sentence, predicted)
        lengths.append(len(input_sentence.split()))
        bleu_scores.append(score)

    plt.scatter(lengths, bleu_scores)
    plt.xlabel("Sentence Length")
    plt.ylabel("BLEU Score")
    plt.title("Context Vector Limitation")
    plt.show()

해결책: Attention Mechanism¶

이 문제를 해결하기 위해 Attention 메커니즘이 제안되었습니다. 전체 문장을 하나의 벡터로 압축하는 대신, 디코더가 매 시점마다 필요한 인코더 출력에 집중(attend)할 수 있게 합니다.

Attention의 작동 원리와 구현은 Udemy 강의 "Attention, Transformer, BERT, GPT 완벽 마스터"에서 Bahdanau Attention과 Luong Attention을 직접 구현하며 배울 수 있습니다.