The current work will describe the compilation of a large (10 million tokens) corpus of popular song lyrics in English divided into sub-genres: the Sassari Lyrics (SLY) Corpus. The texts were gathered by web crawling the index pages of an online song repository. It will then analyze the keywords of each sub-genre and shared keywords, highlighting similarities and differences between sub-genres. The first part of this paper will discuss the procedures adopted to retrieve the song lyrics, along with metadata such as date, author, album and sub-genre. The repository proved somewhat unreliable regarding the attribution of artists to musical sub-genres, therefore alternative semi-automatic processes had to be developed. Several other reliability issues will be discussed, for example, songs in foreign languages, covers, variation in song titles and artist names are all factors that had to be filtered out or normalized. The second part will present preliminary results concerning the analysis of keywords. While each sub-genre (ALTERNATIVE ROCK, COUNTRY, HIP HOP, HEAVY METAL, POP, RNB and ROCK) had a considerable number of keywords, we noticed that those of some sub-genres, such as HIP HOP and HEAVY METAL, were highly characteristic lexical items, those of others, such as POP and RNB were mainly grammatical items with very high frequencies. The latter two sub-genres share so many keywords that it could be argued that, at least on a textual basis, they are essentially not discernible.
Scheda prodotto non validato
Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo
|Titolo:||Words (don't come easy): The automatic retrieval and analysis of popular song lyrics|
BRETT, David Finbar (Corresponding)
|Data di pubblicazione:||2019|
|Appare nelle tipologie:||2.1 Contributo in volume (Capitolo o Saggio)|