Measuring the effect of node aggregation on community detection
Center for Operations Research and Econometrics, Université catholique de Louvain, Louvain-la-Neuve, Belgium
2 Institute of Information and Communication Technologies, Electronics and Applied Mathematics, Université catholique de Louvain, Louvain-la-Neuve, Belgium
3 Poppy, Jette, Belgium
* e-mail: firstname.lastname@example.org
Accepted: 25 February 2020
Published online: 11 March 2020
Many times the nodes of a complex network, whether deliberately or not, are aggregated for technical, ethical, legal limitations or privacy reasons. A common example is the geographic position: one may uncover communities in a network of places, or of individuals identified with their typical geographical position, and then aggregate these places into larger entities, such as municipalities, thus obtaining another network. The communities found in the networks obtained at various levels of aggregation may exhibit various degrees of similarity, from full alignment to perfect independence. This is akin to the problem of ecological and atomic fallacies in statistics, or to the Modified Areal Unit Problem in geography.
We identify the class of community detection algorithms most suitable to cope with node aggregation, and develop an index for aggregability, capturing to which extent the aggregation preserves the community structure. We illustrate its relevance on real-world examples (mobile phone and Twitter reply-to networks). Our main message is that any node-partitioning analysis performed on aggregated networks should be interpreted with caution, as the outcome may be strongly influenced by the level of the aggregation.
Key words: Community detection / Data aggregation / Twitter data / Phone call data
© The Author(s), 2020