arxiv Benchmarking Large Language Models in Retrieval-Augmented Generation