This paper describes the use of a corpus-driven methodology, the retrieval of part-of-speech-grams (PoS-grams), which is extremely effective for the discovery of phraseologies that might otherwise remain hidden. The PoS-gram is a string of part-of-speech categories (Stubbs 2007: 91), the tokens of which are strings of words that have been annotated with these PoS tags. A list of PoS-grams retrieved from a sample corpus can be compared with that from a reference corpus. Statistically significant items are further analysed to identify recurrent patterns and potential phraseologies. The utility of PoS-grams will be illustrated by way of analysis of a one million token corpus composed of texts from ten sections of The Guardian, the Sassari Newspaper Article Corpus (SNAC).

Constance and variability. Using PoS-grams to find phraseologies in the language of newspapers / Pinna, Antonio; Brett, David. - (2018), pp. 107-130. [10.1075/scl.82.05pin]

Constance and variability. Using PoS-grams to find phraseologies in the language of newspapers

Pinna, Antonio
;
Brett, David
2018

Abstract

This paper describes the use of a corpus-driven methodology, the retrieval of part-of-speech-grams (PoS-grams), which is extremely effective for the discovery of phraseologies that might otherwise remain hidden. The PoS-gram is a string of part-of-speech categories (Stubbs 2007: 91), the tokens of which are strings of words that have been annotated with these PoS tags. A list of PoS-grams retrieved from a sample corpus can be compared with that from a reference corpus. Statistically significant items are further analysed to identify recurrent patterns and potential phraseologies. The utility of PoS-grams will be illustrated by way of analysis of a one million token corpus composed of texts from ten sections of The Guardian, the Sassari Newspaper Article Corpus (SNAC).
9789027200136
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: http://hdl.handle.net/11388/209627
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 1
  • ???jsp.display-item.citation.isi??? ND
social impact