https://doi.org/10.1140/epjds/s13688-021-00297-4
Regular Article
Extracting complements and substitutes from sales data: a network perspective
1
Mathematical Institute, University of Oxford, Woodstock Road, OX2 6GG, Oxford, UK
2
Tesco PLC, Tesco House, Shire Park, Kestrel Way, AL7 1GA, Welwyn Garden City, UK
Received:
1
March
2021
Accepted:
22
July
2021
Published online:
25
August
2021
The complementarity and substitutability between products are essential concepts in retail and marketing. Qualitatively, two products are said to be substitutable if a customer can replace one product by the other, while they are complementary if they tend to be bought together. In this article, we take a network perspective to help automatically identify complements and substitutes from sales transaction data. Starting from a bipartite product-purchase network representation, with both transaction nodes and product nodes, we develop appropriate null models to infer significant relations, either complements or substitutes, between products, and design measures based on random walks to quantify their importance. The resulting unipartite networks between products are then analysed with community detection methods, in order to find groups of similar products for the different types of relationships. The results are validated by combining observations from a real-world basket dataset with the existing product hierarchy, as well as a large-scale flavour compound and recipe dataset.
Key words: Product relationships / Network modelling / Role extraction / Sales data / Market basket analysis
Supplementary Information The online version contains supplementary material available at https://doi.org/10.1140/epjds/s13688-021-00297-4.
© The Author(s) 2021
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.