https://doi.org/10.1140/epjds/s13688-025-00554-w
Research
Dream content discovery from social media using natural language processing
1
Heritage Institute of Technology, Chowbaga Road, Anandapur, 700107, Kolkata, West Bengal, India
2
Nokia Bell Labs, Broers Building, CB3 0FA, Cambridge, United Kingdom
3
Uehiro Oxford Institute, University of Oxford, St Ebbe’s Street, OX1 1PT, Oxford, United Kingdom
4
IT University of Copenhagen, Rued Langgaads Vej, 7, 2300, Copenhagen, Denmark
5
Pioneer Centre for AI, Øster Voldgade 3, 1350, Copenhagen, Denmark
6
Northwestern University, 633 Clark St, 60208, Evanston, Illinois, United States
7
Harvard University, 352 Harvard Street, 02138, Cambridge, Massachusetts, United States
8
Politecnico di Torino, Corso Duca degli Abruzzi, 24, 10129, Turin, Piedmont, Italy
a
sanja.scepanovic@nokia-bell-labs.com
Received:
8
June
2024
Accepted:
29
April
2025
Published online:
23
May
2025
Dreaming is a fundamental but not fully understood part of human experience. Traditional dream content analysis practices, while popular and aided by over 130 unique scales and rating systems, have limitations. Often based on retrospective surveys or lab studies, and sometimes on in-home dream reports collected over some days, they struggle to be applied on a large scale or to show the importance and connections between different dream themes. To overcome these issues, we conducted data-driven mixed-method analysis identifying topics in free-form dream reports through natural language processing. We applied this analysis on 44,213 dream reports from Reddit’s r/Dreams subreddit, where we uncovered 217 topics, grouped into 22 larger themes: the most extensive collection of dream topics to date. We validated our topics by comparing it to the widely-used Hall and van de Castle scale. Going beyond traditional scales, our method can find unique patterns in different dream types (like nightmares or recurring dreams), understand topic importance and connections (like finding a greater predominance of indoor location settings in Reddit dreams than what was in general stipulated by previous work), and observe changes in collective dream experiences over time and around major events (like the COVID-19 pandemic and the recent Russo-Ukrainian war). We envision that the applications of our method will provide valuable insights into the complex nature of dreaming and its interplay with our waking experiences.
Key words: Dreams / Dream content analysis / Neural topic modeling / Reddit
Supplementary Information The online version contains supplementary material available at https://doi.org/10.1140/epjds/s13688-025-00554-w.
© The Author(s) 2025
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.