https://doi.org/10.1140/epjds/s13688-024-00466-1
Research
Segmentation using large language models: A new typology of American neighborhoods
1
Geographic Data Science Lab, University of Liverpool, Roxby Building, 74 Bedford St South, L69 7ZT, Liverpool, UK
2
Microsoft, 1650 Canyon Blvd., CO 80302, Boulder, USA
a
alex.singleton@liverpool.ac.uk
Received:
27
October
2023
Accepted:
18
March
2024
Published online:
22
April
2024
In the United States, recent changes to the National Statistical System have amplified the geographic-demographic resolution trade-off. That is, when working with demographic and economic data from the American Community Survey, as one zooms in geographically one loses resolution demographically due to very large margins of error. In this paper, we present a solution to this problem in the form of an AI based open and reproducible geodemographic classification system for the United States using small area estimates from the American Community Survey (ACS). We employ a partitioning clustering algorithm to a range of socio-economic, demographic, and built environment variables. Our approach utilizes an open source software pipeline that ensures adaptability to future data updates. A key innovation is the integration of GPT4, a state-of-the-art large language model, to generate intuitive cluster descriptions and names. This represents a novel application of natural language processing in geodemographic research and showcases the potential for human-AI collaboration within the geospatial domain.
Key words: Geodemographics / Large Language Model (LLM) / American Community Survey / Segmentation / Neighborhoods / Artificial Intelligence (AI) / Demographics / Retreival Augmented Generation (RAG)
© The Author(s) 2024
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.