https://doi.org/10.1140/epjds/s13688-025-00535-z
Research
Entropy-based text feature engineering approach for forecasting financial liquidity changes
1
Laboratory for Social and Cognitive Informatics, National Research University Higher School of Economics, 55/2 Sedova St., St. Petersburg, Russia
2
Department of Data Analysis and Modeling, VTB Bank, Moscow, Russia
Received:
12
June
2024
Accepted:
25
February
2025
Published online:
4
March
2025
Changes in individual and institutional financial behavior leading to shifts in liquidity flows often depend on events reflected in news. However, the task of establishing relationship between financial behavior and news remains challenging and understudied. We propose a news-based feature generation approach that allows accounting for news events in liquidity flow time-series predicting tasks, thereby improving the forecasting quality. These features are constructed as different types of entropies and calculated at different levels of text abstraction based on word counts, TF-IDF values, probabilistic topics, and contextual embeddings. We show that this feature engineering procedure is effective for predicting changes in two types of liquidity flows: stock market trading volume and the volume of ATM cash withdrawals. As the first type, we use our original collection of 651, 208 business news articles from a Russian news agency dating to 2013-2021 to predict abnormal jumps in the trade volume of 32 leading Russian companies. With our approach, 97% of them experience an increase in the quality of predicting the differences in daily trading volumes from their median values. For the ATM withdrawals task, we test the impact of economic news from three leading Russian media sources (N = 55, 712) on withdrawals from 100 ATMs located in Moscow. For 95% of them we improve the quality of prediction of year-to-year weekly withdrawal volume change. Additionally, we find that some news sources have a higher predictive power than others. The approach is potentially generalizable for other domains of financial behavior across the globe.
Key words: Feature engineering / Financial time series / Natural language processing / Economic news / Entropy / Stock trade volumes / ATM cash withdrawals
© The Author(s) 2025
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.