The current work will describe the compilation of a large (10 million tokens) corpus of popular song lyrics in English divided into sub-genres: the Sassari Lyrics (SLY) Corpus. The texts were gathered by web crawling the index pages of an online song repository. It will then analyze the keywords of each sub-genre and shared keywords, highlighting similarities and differences between sub-genres. The first part of this paper will discuss the procedures adopted to retrieve the song lyrics, along with metadata such as date, author, album and sub-genre. The repository proved somewhat unreliable regarding the attribution of artists to musical sub-genres, therefore alternative semi-automatic processes had to be developed. Several other reliability issues will be discussed, for example, songs in foreign languages, covers, variation in song titles and artist names are all factors that had to be filtered out or normalized. The second part will present preliminary results concerning the analysis of keywords. While each sub-genre (ALTERNATIVE ROCK, COUNTRY, HIP HOP, HEAVY METAL, POP, RNB and ROCK) had a considerable number of keywords, we noticed that those of some sub-genres, such as HIP HOP and HEAVY METAL, were highly characteristic lexical items, those of others, such as POP and RNB were mainly grammatical items with very high frequencies. The latter two sub-genres share so many keywords that it could be argued that, at least on a textual basis, they are essentially not discernible.

Words (don't come easy): The automatic retrieval and analysis of popular song lyrics / Brett, David; Pinna, Antonio. - (2019).

Words (don't come easy): The automatic retrieval and analysis of popular song lyrics

Brett, David
;
Pinna, Antonio
2019-01-01

Abstract

The current work will describe the compilation of a large (10 million tokens) corpus of popular song lyrics in English divided into sub-genres: the Sassari Lyrics (SLY) Corpus. The texts were gathered by web crawling the index pages of an online song repository. It will then analyze the keywords of each sub-genre and shared keywords, highlighting similarities and differences between sub-genres. The first part of this paper will discuss the procedures adopted to retrieve the song lyrics, along with metadata such as date, author, album and sub-genre. The repository proved somewhat unreliable regarding the attribution of artists to musical sub-genres, therefore alternative semi-automatic processes had to be developed. Several other reliability issues will be discussed, for example, songs in foreign languages, covers, variation in song titles and artist names are all factors that had to be filtered out or normalized. The second part will present preliminary results concerning the analysis of keywords. While each sub-genre (ALTERNATIVE ROCK, COUNTRY, HIP HOP, HEAVY METAL, POP, RNB and ROCK) had a considerable number of keywords, we noticed that those of some sub-genres, such as HIP HOP and HEAVY METAL, were highly characteristic lexical items, those of others, such as POP and RNB were mainly grammatical items with very high frequencies. The latter two sub-genres share so many keywords that it could be argued that, at least on a textual basis, they are essentially not discernible.
2019
9789004390652
Words (don't come easy): The automatic retrieval and analysis of popular song lyrics / Brett, David; Pinna, Antonio. - (2019).
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11388/219183
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact