Leveraging augmentation techniques for tasks with unbalancedness within the financial domain: a two-level ensemble approach

Golshid Ranjbaran; Diego Reforgiato Recupero; Gianfranco Lombardo; Sergio Consoli

doi:10.1140/epjds/s13688-023-00402-9

2024 Impact factor 2.5

Open Access

EPJ Data Sci. (2023) 12: 24
https://doi.org/10.1140/epjds/s13688-023-00402-9

Regular Article

Leveraging augmentation techniques for tasks with unbalancedness within the financial domain: a two-level ensemble approach

Golshid Ranjbaran¹, Diego Reforgiato Recupero², Gianfranco Lombardo³ and Sergio Consoli⁴^d

¹ Department of Electrical and Computer Engineering, Science and Research Branch, Islamic Azad University, Tehran, Iran
² Department of Mathematics and Computer Science, University of Cagliari, via Ospedale 72, 09121, Cagliari, Italy
³ Department of Engineering and Architecture, University of Parma, Parco Area delle Scienze, 43125, Parma, Italy
⁴ Joint Research Centre (DG JRC), European Commission, Via E. Fermi 2749, 21027, Ispra (VA), Italy

^d sergio.consoli@ec.europa.eu

Received: 14 November 2022
Accepted: 26 June 2023
Published online: 10 July 2023

Abstract

Modern financial markets produce massive datasets that need to be analysed using new modelling techniques like those from (deep) Machine Learning and Artificial Intelligence. The common goal of these techniques is to forecast the behaviour of the market, which can be translated into various classification tasks, such as, for instance, predicting the likelihood of companies’ bankruptcy or in fraud detection systems. However, it is often the case that real-world financial data are unbalanced, meaning that the classes’ distribution is not equally represented in such datasets. This gives the main issue since any Machine Learning model is trained according to the majority class mainly, leading to inaccurate predictions. In this paper, we explore different data augmentation techniques to deal with very unbalanced financial data. We consider a number of publicly available datasets, then apply state-of-the-art augmentation strategies to them, and finally evaluate the results for several Machine Learning models trained on the sampled data. The performance of the various approaches is evaluated according to their accuracy, micro, and macro F1 score, and finally by analyzing the precision and recall over the minority class. We show that a consistent and accurate improvement is achieved when data augmentation is employed. The obtained classification results look promising and indicate the efficiency of augmentation strategies on financial tasks. On the basis of these results, we present an approach focused on classification tasks within the financial domain that takes a dataset as input, identifies what kind of augmentation technique to use, and then applies an ensemble of all the augmentation techniques of the identified type to the input dataset along with an ensemble of different methods to tackle the underlying classification.

Key words: Augmentation techniques / Ensemble method / Financial sector / Machine learning / Unbalanced data

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Conference announcements

12 Internat. Congress of the Balkan Physical Union
July 8-12, 2025
Bucharest, Romania

Joint Annual Meeting of ÖPG and SPS
August 18-22, 2025
Wien, Austria

111th Italian National Society Congress
September 22-26, 2025
Palermo, Italy