arxiv A General Theoretical Paradigm to Understand Learning from Human Preferences