[태그:] Role-Playing Steering

** Improving LLM Reasoning through Interpretable Role-Playing Steering (Findings of EMNLP 2025)

논문 “Improving LLM Reasoning through Interpretable Role-Playing Steering” (Findings of EMNLP 2025) 은 역할 수행(role-playing) 기반 추론 강화 기법을 LLM 내부 표현 수준에서 해석 가능하게 제어하는 새로운 접근법을 제안합니다. 핵심은 Sparse Autoencoder(SAE) 로 모델 내부 활성화 패턴을 분석하고, 역할 수행 시 활성화되는 잠재 특징(latent features)을 추출하여 Residual Stream에 주입(steering) 함으로써 모델의 “역할 일관적(reasoning-consistent)” 사고를 유도하는…

2월 13, 2026

** Improving LLM Reasoning through Interpretable Role-Playing Steering (Findings of EMNLP 2025)