Classifying Human Activities in Urban Spaces with a Multimodal AI: Towards a Massive Assessment of Urban Affordances

IRIS

We present a tool that leverages a Multimodal Large Language Model (MLLM) for the automatic classification of human activities from images of urban scenes. Starting from an image of spaces populated with people, the tool is capable to classify them according to five features: age group, sex, bodily posture, activity level and social configuration. The tool implements a sequential pipeline consisting of Faster R-CNN for person detection, followed by postprocessing and two consecutive applications of GPT-4o models for refined image description and information extraction. In the paper we also present an experimental test used to preliminary validation of the tool, comparing the ground truth on 24 images of urban scenes with the estimates provided by the tool, yielding a good degree of alignment. The tool is part of the wider research programme of massive assessment of urban affordances, within the framework of the capability approach.

Classifying Human Activities in Urban Spaces with a Multimodal AI: Towards a Massive Assessment of Urban Affordances / Blečić, I., Floris, A., Giliberto, G., Trunfio, G.A.. - 15890 LNCS:(2025), pp. 341-357. (Workshops of the International Conference on Computational Science and Its Applications, ICCSA 2025 tur 2025) [10.1007/978-3-031-97606-3_23].

Classifying Human Activities in Urban Spaces with a Multimodal AI: Towards a Massive Assessment of Urban Affordances

Blečić, Ivan;Floris, Alessandro;Giliberto, Giulia;Trunfio, Giuseppe A.

2025-01-01

Abstract

We present a tool that leverages a Multimodal Large Language Model (MLLM) for the automatic classification of human activities from images of urban scenes. Starting from an image of spaces populated with people, the tool is capable to classify them according to five features: age group, sex, bodily posture, activity level and social configuration. The tool implements a sequential pipeline consisting of Faster R-CNN for person detection, followed by postprocessing and two consecutive applications of GPT-4o models for refined image description and information extraction. In the paper we also present an experimental test used to preliminary validation of the tool, comparing the ground truth on 24 images of urban scenes with the estimates provided by the tool, yielding a good degree of alignment. The tool is part of the wider research programme of massive assessment of urban affordances, within the framework of the capability approach.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2025
			
	Lingua/e
	
				Inglese
			
	Titolo del Volume
	
				Lecture Notes in Computer Science
			
	Serie
	
				LECTURE NOTES IN COMPUTER SCIENCE
			
	Titolo del convegno
	
				Workshops of the International Conference on Computational Science and Its Applications, ICCSA 2025
			
	Volume
	
				15890 LNCS
			
	Pagina iniziale
	
				341
			
	Pagina finale
	
				357
			
	Numero di pagine
	
				17
			
	Codice ISBN
	
				9783031976056
9783031976063
			
	Codice DOI
	
				https://dx.doi.org/10.1007/978-3-031-97606-3_23
			
	Nome Editore
	
				Springer Science and Business Media Deutschland GmbH
			
	Data del Convegno
	
				2025
			
	Luogo del Convegno
	
				tur
			
	Parole chiave
	
				capability approach; Multimodal Large Language Model; urban affordances; urban evaluation modelling; urban space analysis
			
	Codice Scopus
	
				2-s2.0-105010824077
			
	Presenza di coautori internazionali
	
				No
			
	Citazione
	
				Classifying Human Activities in Urban Spaces with a Multimodal AI: Towards a Massive Assessment of Urban Affordances / Blečić, I., Floris, A., Giliberto, G., Trunfio, G.A.. - 15890 LNCS:(2025), pp. 341-357. (Workshops of the International Conference on Computational Science and Its Applications, ICCSA 2025 tur 2025) [10.1007/978-3-031-97606-3_23].
			
	Tipologia
	
				4 Contributo in Atti di Convegno (Proceeding)::4.1 Contributo in Atti di convegno
			
	Tutti gli autori
	
						Blečić, Ivan; Floris, Alessandro; Giliberto, Giulia; Trunfio, Giuseppe A.
					
	Tipologia sito docente
	
				273
			
	Numero autori
	
				4
			
	Fulltext
	
				none
			
	Tipologia
	
				info:eu-repo/semantics/conferenceObject
			
	Appare nelle tipologie:
	
				4.1 Contributo in Atti di convegno

File in questo prodotto:

Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11388/367950

Citazioni

ND

0

ND

social impact