HowRU-KoELECTRA-Emotion-Classifier

Model Description

KoELECTRA 기반의 한국어(특히 일기/심리 기록) 감정 분류 모델입니다.
텍스트에서 8가지 감정(기쁨, 설렘, 평범함, 놀라움, 불쾌함, 두려움, 슬픔, 분노)을 인식합니다.

Model type: Text Classification (Emotion Recognition)
Language: Korean (한국어, ko)
License: MIT
Finetuned from model: monologg/koelectra-base-v3-discriminator

Emotion Classes

이 모델은 입력된 한국어 문장의 주요 감정을 아래 8개 클래스 중 하나로 분류합니다.

Emotion (Korean)	Emotion (EN)
기쁨	Joy
설렘	Excitement
평범함	Neutral
놀라움	Surprise
불쾌함	Disgust
두려움	Fear
슬픔	Sadness
분노	Anger

How to Get Started with the Model

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
import torch.nn.functional as F

# 1) Load Model & Tokenizer
MODEL_NAME = "LimYeri/HowRU-KoELECTRA-Emotion-Classifier"

tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)
model = AutoModelForSequenceClassification.from_pretrained(MODEL_NAME)

# GPU 사용 가능 시 자동 전환
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model.to(device)
model.eval()

# 감정 라벨 매핑 (id2label)
id2label = model.config.id2label


# 2) Inference Function
def predict_emotion(text: str):
    """
    Returns:
        - top1_pred: 예측된 감정 라벨
        - probs_sorted: 감정별 확률(내림차순)
        - top2_pred: 상위 두 개의 감정
    """

    # 토크나이징
    inputs = tokenizer(
        text,
        return_tensors="pt",
        truncation=True,
        padding=True,
        max_length=512
    ).to(device)

    # 추론
    with torch.no_grad():
        logits = model(**inputs).logits
        probs = F.softmax(logits, dim=-1)[0]

    # 정렬된 확률
    probs_sorted = sorted(
        [(id2label[i], float(probs[i])) for i in range(len(probs))],
        key=lambda x: x[1],
        reverse=True
    )

    top1_pred = probs_sorted[0]
    top2_pred = probs_sorted[:2]

    return {
        "text": text,
        "top1_emotion": top1_pred,
        "top2_emotions": top2_pred,
        "all_probabilities": probs_sorted,
    }


# 3) Example
result = predict_emotion("오늘 정말 기분이 좋고 행복한 하루였어!")
print(result)

pipeline

from transformers import pipeline

MODEL_NAME = "LimYeri/HowRU-KoELECTRA-Emotion-Classifier"

classifier = pipeline(
    "text-classification",
    model=MODEL_NAME,
    tokenizer=MODEL_NAME,
    top_k=None   # 전체 감정 확률 반환
)

# 예측
text = "오늘 정말 기분이 좋고 행복한 하루였어!"
result = classifier(text)

result = result[0]

print("입력 문장:", text)
print("\nTop-1 감정:", result[0]['label'], f"({result[0]['score']:.4f})")
print("\n전체 감정 분포:")
for r in result:
    print(f"  {r['label']}: {r['score']:.4f}")

Training Details

Training Data

Total(8:2로 분할): 50,000행
Train: 40,000행
Validation: 10,000행

Training Procedure

Base Model: monologg/koelectra-base-v3-discriminator
Objective: Single-label classification
Max Length: 512

Training Hyperparameters

num_train_epochs: 3
learning_rate: 3e-5
weight_decay: 0.02
warmup_ratio: 0.15
per_device_train_batch_size: 32
per_device_eval_batch_size: 64
max_grad_norm: 1.0

Performance

Metric	Score
Eval Accuracy	0.95
Eval F1 Macro	0.95
Eval Loss	0.16

Model Architecture

1) ELECTRA Encoder (Base-size)

Hidden size: 768
Layers: 12 Transformer blocks
Attention heads: 12
MLP intermediate size: 3072
Activation: GELU
Dropout: 0.1

2) Classification Head

감정 8개 클래스를 예측하기 위한 추가 분류 헤드:

Dense Layer: 768 → 768
Activation: GELU
Dropout: 0.1
Output Projection: 768 → 8

Citation

@misc{HowRUEmotion2025,
  title={HowRU KoELECTRA Emotion Classifier},
  author={Lim, Yeri},
  year={2025},
  publisher={Hugging Face},
  howpublished={\url{https://huggingface.co/LimYeri/HowRU-KoELECTRA-Emotion-Classifier}}
}