arxiv Self-Distillation Bridges Distribution Gap in Language Model Fine-Tuning