Unlocking Historical Insights: Developing a Dataset from Historical Archives

IRIS

The proliferation of data on the Web has resulted in an increased need for effective techniques to extract relevant and valuable knowledge from this data. The intersection of the fields of Information Extraction and Semantic Web has created new opportunities to improve ontology-based information extraction tools. However, the development and evaluation of such systems have been hampered by the scarcity of annotated documents, particularly in historical domains. This article discusses the current state of our work in creating a large RDF dataset that aims to support the development of ontology-based extraction tools. The dataset was created through manual annotation by domain experts as part of the arkivo project and contains approximately 300,000 triples, which are freely available. This dataset can be used as a benchmark to evaluate systems that automatically extract entities and annotate documents.

Unlocking Historical Insights: Developing a Dataset from Historical Archives / Pandolfo, L., Pulina, L.. - 3428:(2023). (38th Italian Conference on Computational Logic, CILC 2023 ita 2023).

Unlocking Historical Insights: Developing a Dataset from Historical Archives

Pandolfo L.;Pulina L.

2023-01-01

Abstract

The proliferation of data on the Web has resulted in an increased need for effective techniques to extract relevant and valuable knowledge from this data. The intersection of the fields of Information Extraction and Semantic Web has created new opportunities to improve ontology-based information extraction tools. However, the development and evaluation of such systems have been hampered by the scarcity of annotated documents, particularly in historical domains. This article discusses the current state of our work in creating a large RDF dataset that aims to support the development of ontology-based extraction tools. The dataset was created through manual annotation by domain experts as part of the arkivo project and contains approximately 300,000 triples, which are freely available. This dataset can be used as a benchmark to evaluate systems that automatically extract entities and annotate documents.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2023
			
	Citazione
	
				Unlocking Historical Insights: Developing a Dataset from Historical Archives / Pandolfo, L., Pulina, L.. - 3428:(2023). (38th Italian Conference on Computational Logic, CILC 2023 ita 2023).
			
	Appare nelle tipologie:
	
				4.1 Contributo in Atti di convegno

File in questo prodotto:

Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11388/317312

Citazioni

ND

0

ND

social impact