arxiv CLIP-BEVFormer: Enhancing Multi-View Image-Based BEV Detector with Ground Truth Flow