We present a tool that leverages a Multimodal Large Language Model (MLLM) for the automatic classification of human activities from images of urban scenes. Starting from an image of spaces populated with people, the tool is capable to classify them according to five features: age group, sex, bodily posture, activity level and social configuration. The tool implements a sequential pipeline consisting of Faster R-CNN for person detection, followed by postprocessing and two consecutive applications of GPT-4o models for refined image description and information extraction. In the paper we also present an experimental test used to preliminary validation of the tool, comparing the ground truth on 24 images of urban scenes with the estimates provided by the tool, yielding a good degree of alignment. The tool is part of the wider research programme of massive assessment of urban affordances, within the framework of the capability approach.

Classifying Human Activities in Urban Spaces with a Multimodal AI: Towards a Massive Assessment of Urban Affordances / Blečić, Ivan; Floris, Alessandro; Giliberto, Giulia; Trunfio, Giuseppe A.. - 15890 LNCS:(2025), pp. 341-357. (Intervento presentato al convegno Workshops of the International Conference on Computational Science and Its Applications, ICCSA 2025 tenutosi a tur nel 2025) [10.1007/978-3-031-97606-3_23].

Classifying Human Activities in Urban Spaces with a Multimodal AI: Towards a Massive Assessment of Urban Affordances

Trunfio, Giuseppe A.
2025-01-01

Abstract

We present a tool that leverages a Multimodal Large Language Model (MLLM) for the automatic classification of human activities from images of urban scenes. Starting from an image of spaces populated with people, the tool is capable to classify them according to five features: age group, sex, bodily posture, activity level and social configuration. The tool implements a sequential pipeline consisting of Faster R-CNN for person detection, followed by postprocessing and two consecutive applications of GPT-4o models for refined image description and information extraction. In the paper we also present an experimental test used to preliminary validation of the tool, comparing the ground truth on 24 images of urban scenes with the estimates provided by the tool, yielding a good degree of alignment. The tool is part of the wider research programme of massive assessment of urban affordances, within the framework of the capability approach.
2025
9783031976056
9783031976063
Classifying Human Activities in Urban Spaces with a Multimodal AI: Towards a Massive Assessment of Urban Affordances / Blečić, Ivan; Floris, Alessandro; Giliberto, Giulia; Trunfio, Giuseppe A.. - 15890 LNCS:(2025), pp. 341-357. (Intervento presentato al convegno Workshops of the International Conference on Computational Science and Its Applications, ICCSA 2025 tenutosi a tur nel 2025) [10.1007/978-3-031-97606-3_23].
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11388/367950
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? ND
social impact