From early biological models to CNNs: do they look where humans look?

IRIS

Early hierarchical computational visual models as well as recent deep neural networks have been inspired by the functioning of the primate visual cortex system. Although much effort has been made to dissect neural networks to visualize the features they learn at the individual units, the scope of the visualizations has been limited to a categorization of the features in terms of their semantic level. Considering the ability humans have to select high semantic level regions of a scene, the question whether neural networks can match this ability, and if similarity with humans attention is correlated with neural networks performance naturally arise. To address this question we propose a pipeline to select and compare sets of feature points that maximally activate individual networks units to human fixations. We extract features from a variety of neural networks, from early hierarchical models such as HMAX up to recent deep convolutional neural netwoks such as Densnet, to compare them to human fixations. Experiments over the ETD database show that human fixations correlate with CNNs features from deep layers significantly better than with random sets of points, while they do not with features extracted from the first layers of CNNs, nor with the HMAX features, which seem to have low semantic level compared with the features that respond to the automatically learned filters of CNNs. It also turns out that there is a correlation between CNN's human similarity and classification performance.

From early biological models to CNNs: do they look where humans look? / Cadoni, Mi; Lagorio, A; Grosso, E; Huei, Tj; Seng, Cc. - (2021), pp. 6313-6320. [10.1109/ICPR48806.2021.9412717]

From early biological models to CNNs: do they look where humans look?

Cadoni, MI;Lagorio, A;Grosso, E;Huei, TJ;Seng, CC

2021-01-01

Abstract

Early hierarchical computational visual models as well as recent deep neural networks have been inspired by the functioning of the primate visual cortex system. Although much effort has been made to dissect neural networks to visualize the features they learn at the individual units, the scope of the visualizations has been limited to a categorization of the features in terms of their semantic level. Considering the ability humans have to select high semantic level regions of a scene, the question whether neural networks can match this ability, and if similarity with humans attention is correlated with neural networks performance naturally arise. To address this question we propose a pipeline to select and compare sets of feature points that maximally activate individual networks units to human fixations. We extract features from a variety of neural networks, from early hierarchical models such as HMAX up to recent deep convolutional neural netwoks such as Densnet, to compare them to human fixations. Experiments over the ETD database show that human fixations correlate with CNNs features from deep layers significantly better than with random sets of points, while they do not with features extracted from the first layers of CNNs, nor with the HMAX features, which seem to have low semantic level compared with the features that respond to the automatically learned filters of CNNs. It also turns out that there is a correlation between CNN's human similarity and classification performance.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2021
			
	Lingua/e
	
				Inglese
			
	Titolo del Volume
	
				International Conference on Pattern Recognition
			
	Serie
	
				INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION
			
	Pagina iniziale
	
				6313
			
	Pagina finale
	
				6320
			
	Numero di pagine
	
				8
			
	Codice ISBN
	
				978-1-7281-8808-9
			
	Codice DOI
	
				https://dx.doi.org/10.1109/ICPR48806.2021.9412717
			
	Nome Editore
	
				IEEE COMPUTER SOC
			
	Città Editore
	
				10662 LOS VAQUEROS CIRCLE, PO BOX 3014, LOS ALAMITOS, CA 90720-1264 USA
			
	Codice Scopus
	
				2-s2.0-85110552902
			
	Codice ISI WOS
	
				WOS:000678409206059
			
	Presenza di coautori internazionali
	
				Sì
			
	Citazione
	
				From early biological models to CNNs: do they look where humans look? / Cadoni, Mi; Lagorio, A; Grosso, E; Huei, Tj; Seng, Cc. - (2021), pp. 6313-6320. [10.1109/ICPR48806.2021.9412717]
			
	Tipologia
	
				4 Contributo in Atti di Convegno (Proceeding)::4.1 Contributo in Atti di convegno
			
	Tutti gli autori
	
						Cadoni, Mi; Lagorio, A; Grosso, E; Huei, Tj; Seng, Cc
					
	Tipologia sito docente
	
				273
			
	Numero autori
	
				5
			
	Fulltext
	
				none
			
	Tipologia
	
				info:eu-repo/semantics/conferenceObject
			
	Appare nelle tipologie:
	
				4.1 Contributo in Atti di convegno

File in questo prodotto:

Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11388/298757

Citazioni

ND

4

4

social impact