arxiv Smaug: Fixing Failure Modes of Preference Optimisation with DPO-Positive