arxiv Multimodal Motion Prediction with Stacked Transformers