https://doi.org/10.1140/epjds/s13688-025-00601-6
Research
Addressing investor concerns: a Chinese financial question-answering benchmark with LLM-based evaluation
1
Guangxi University of Finance and Economics, 530003, Nanning, China
2
Guangxi Key Laboratory of Seaward Economic Intelligent System Analysis and Decision-making, Guangxi University of Finance and Economics, 530003, Nanning, China
3
Department of Computer Science, Johns Hopkins University, 3400 N. Charles Street, 21218, Baltimore, MD, USA
4
School of Electronics, Electrical Engineering and Computer Science, Queen’s University Belfast, 16A Malone Road, BT9 5BN, Belfast, Northern Ireland, UK
5
Guangxi Key Laboratory of Big Data in Finance and Economics, Guangxi University of Finance and Economics, 530003, Nanning, China
a
This email address is being protected from spambots. You need JavaScript enabled to view it.
Received:
28
June
2025
Accepted:
22
November
2025
Published online:
18
December
2025
In recent years, large language models (LLMs) have shown impressive performance across various natural language processing tasks and are increasingly adopted in high-stakes fields such as financial analysis. However, their effectiveness in Chinese financial contexts is hindered by the scarcity of high-quality, domain-specific datasets. To bridge this gap, we present the Chinese Financial Question Answering (CFQA) dataset, a novel resource designed to advance research in financial analysis. CFQA is constructed from publicly available annual reports of multiple Chinese listed companies, paired with corresponding questions and human-annotated answers. Evaluation results reveal that existing QA methods perform poorly on this dataset. CFQA introduces several unique challenges: (1) source documents are in PDF format with complex tabular structures, making information extraction difficult; (2) the length and intricacy of financial reports complicate answer retrieval; and (3) the questions are tightly focused on domain-specific financial content.
Key words: Financial natural language processing / Large language models / Retrieval-augmented generation / Financial benchmark datasets
© The Author(s) 2025
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

