Giter VIP home page Giter VIP logo

hitsongprediction's Introduction

Hit Song Prediction System

Python project with the aim of analyzing Hit Song Prediction.

Dataset

The dataset from which I've started to train and validate the system proposed is SpotGenTrack. This dataset contains:

  • Spotify data of 101940 tracks between songs and podcasts (spotify_track.csv , spotify_album.csv, spotify_artists.csv)
  • Low level features computed on each song
  • Lyrics features computed on each song
  • Preview url for wach one

A cleaning process has been conducted to remove not properly annotated songs, duplicates, songs that contain podcasts and also to update the lyrics using the MusixMatch API since sometime title and lyrics of songs did not match.

I've ended up with two different datasets: one with only English songs and another with multi-lingual songs that can be found in the Datasets directory. Starting from the Spotify ID, in spotify_track.csv the url to download the audio file is available. In these datasets information stored are: spotify_ID, musixmatch_ID, release year, lyrics and popularity score assigned by Spotify in 2019.

Screenshot 2023-11-03 alle 11 16 42

Model

The model consists in three main components:

  • Audio Embedding extractor
  • Text Embedding extractor
  • Final Multi-Layer Perceptron to predict songs popularity
my_model

Repository Structure

  • datasets: it contains .csv / .parquet files with english songs and multilingual songs. Additional information of songs (title, artist ecc) are stored in /nas/home/ecastelli/Data Sources folder and divided into 3 files: spotify_album.csv, spotify_artists.csv and spotify_tracks.csv. A join can be done using as key the spotify_id and track_id columns.
  • models: it contains the models used for this project.
    • podcast_discriminator contains a .ipynb notebook used to implement a model for distinguishing audio that contains podcasts and podcasts that conatins music
    • genre_classificator is a model pre-trained on GTZAN Genre that is used as audio feature extractor
    • hsp_model is the final multi-layer perceptron used to predict the song popularity

Train

Starting from the train.py file the training can be started passing three parameters:

  • Problem to solve: classification (c) or regression (r)
  • Language to consider: english (en) or multilingual (mul)
  • Number of popularity classes to be considered in case of classification

To start the training process:

  • Install the requirements.txt in your virtual environment
  • Check the connection to the ISPL servers to be able to have access to the audio files stored there (/nas/home/ecastelli/thesis/Audio) and to the checkpoint of the pre-trained model ("/nas/home/ecastelli/thesis/models/Model/checkpoint/pretraining.ckpt").
  • Change the NeptuneLogger parameters if you want to be able to see logs on Neptune or substitute it with whatever you want
  • Change the device type used by the trainer in GTZANPreTrained.py and HSPModel.py with your kind of gpu
  • Launch from the main folder python train.py --help and follow the instructions

hitsongprediction's People

Contributors

elisacastelli avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.