- 名称
- The Tug of War Within: Mitigating the Fairness-Privacy Conflicts in Large Language Models
- 描述
确保在大语言模型(LLM)中确保公平和隐私的认识至关重要。有趣的是,我们发现了一种违反直觉的权衡现象,该现象通过监督的微调(SFT)方法来增强LLM的隐私意识,从而大大降低了其公平意识,以数千个样本。 To address this issue, inspired by the information theory, we introduce a training-free method to \textbf{S}uppress the \textbf{P}rivacy and fa\textbf{I}rness coupled \textbf{N}eurons (\textbf{SPIN}), which theoretically and empirically decrease the mutual information between fairness and privacy awareness. ...