Verification and repair of control policies for safe reinforcement learning

IRIS

Reinforcement Learning is a well-known AI paradigm whereby control policies of autonomous agents can be synthesized in an incremental fashion with little or no knowledge about the properties of the environment. We are concerned with safety of agents whose policies are learned by reinforcement, i.e., we wish to bound the risk that, once learning is over, an agent damages either the environment or itself. We propose a general-purpose automated methodology to verify, i.e., establish risk bounds, and repair policies, i.e., fix policies to comply with stated risk bounds. Our approach is based on probabilistic model checking algorithms and tools, which provide theoretical and practical means to verify risk bounds and repair policies. Considering a taxonomy of potential repair approaches tested on an artificially-generated parametric domain, we show that our methodology is also more effective than comparable ones.

Verification and repair of control policies for safe reinforcement learning / Pathak, S., Pulina, L., Tacchella, A.. - In: APPLIED INTELLIGENCE. - ISSN 0924-669X. - 48:4(2018), pp. 886-908. [10.1007/s10489-017-0999-8]

Verification and repair of control policies for safe reinforcement learning

Pathak, Shashank;PULINA, Luca;Tacchella, Armando

2018-01-01

Abstract

Reinforcement Learning is a well-known AI paradigm whereby control policies of autonomous agents can be synthesized in an incremental fashion with little or no knowledge about the properties of the environment. We are concerned with safety of agents whose policies are learned by reinforcement, i.e., we wish to bound the risk that, once learning is over, an agent damages either the environment or itself. We propose a general-purpose automated methodology to verify, i.e., establish risk bounds, and repair policies, i.e., fix policies to comply with stated risk bounds. Our approach is based on probabilistic model checking algorithms and tools, which provide theoretical and practical means to verify risk bounds and repair policies. Considering a taxonomy of potential repair approaches tested on an artificially-generated parametric domain, we show that our methodology is also more effective than comparable ones.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2018
			
	Citazione
	
				Verification and repair of control policies for safe reinforcement learning / Pathak, S., Pulina, L., Tacchella, A.. - In: APPLIED INTELLIGENCE. - ISSN 0924-669X. - 48:4(2018), pp. 886-908. [10.1007/s10489-017-0999-8]
			
	Appare nelle tipologie:
	
				1.1 Articolo in rivista

File in questo prodotto:

Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11388/181732

Citazioni

ND

27

18

social impact