The proliferation of data on the Web has resulted in an increased need for effective techniques to extract relevant and valuable knowledge from this data. The intersection of the fields of Information Extraction and Semantic Web has created new opportunities to improve ontology-based information extraction tools. However, the development and evaluation of such systems have been hampered by the scarcity of annotated documents, particularly in historical domains. This article discusses the current state of our work in creating a large RDF dataset that aims to support the development of ontology-based extraction tools. The dataset was created through manual annotation by domain experts as part of the arkivo project and contains approximately 300,000 triples, which are freely available. This dataset can be used as a benchmark to evaluate systems that automatically extract entities and annotate documents.

Unlocking Historical Insights: Developing a Dataset from Historical Archives / Pandolfo, L.; Pulina, L.. - 3428:(2023). (Intervento presentato al convegno 38th Italian Conference on Computational Logic, CILC 2023 tenutosi a ita nel 2023).

Unlocking Historical Insights: Developing a Dataset from Historical Archives

Pandolfo L.;Pulina L.
2023-01-01

Abstract

The proliferation of data on the Web has resulted in an increased need for effective techniques to extract relevant and valuable knowledge from this data. The intersection of the fields of Information Extraction and Semantic Web has created new opportunities to improve ontology-based information extraction tools. However, the development and evaluation of such systems have been hampered by the scarcity of annotated documents, particularly in historical domains. This article discusses the current state of our work in creating a large RDF dataset that aims to support the development of ontology-based extraction tools. The dataset was created through manual annotation by domain experts as part of the arkivo project and contains approximately 300,000 triples, which are freely available. This dataset can be used as a benchmark to evaluate systems that automatically extract entities and annotate documents.
2023
Unlocking Historical Insights: Developing a Dataset from Historical Archives / Pandolfo, L.; Pulina, L.. - 3428:(2023). (Intervento presentato al convegno 38th Italian Conference on Computational Logic, CILC 2023 tenutosi a ita nel 2023).
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11388/317312
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? ND
social impact