arxiv Cross-Modal Causal Relational Reasoning for Event-Level Visual Question Answering