arxiv Knowledge-enhanced Multi-perspective Video Representation Learning for Scene Recognition