In the age of streaming services, navigating vast music catalogs can be daunting. This project delves into classifying songs as either 'Hip-Hop' or 'Rock' using data from The Echo Nest. The objective is to understand and apply various data preprocessing techniques, exploratory data analysis, and machine learning algorithms to achieve accurate genre classification.
The project begins by loading metadata and track metrics from CSV and JSON files provided by The Echo Nest. These files are merged into pandas DataFrames, facilitating subsequent analysis.
- Pairwise Relationships: Analyzing correlations between continuous variables to avoid feature redundancy.
- Data Splitting: Segmenting data into features and labels to prepare for model training.
- Feature Normalization: Employing Principal Component Analysis (PCA) for dimensionality reduction and standardizing feature data.
- Decision Tree Classifier: Training a decision tree algorithm for genre classification, focusing on interpretability.
- Logistic Regression: Implementing logistic regression as an alternative model and comparing performance metrics.
- Handling Imbalance: Addressing the class imbalance issue between 'Hip-Hop' and 'Rock' genres.
- Cross-Validation: Utilizing K-fold cross-validation to rigorously evaluate model performance across different data subsets.
Through meticulous data preprocessing and model training, this project aims to showcase a systematic approach to music genre classification. The methodologies employed serve as a foundation for further explorations in the domain of audio data analysis and machine learning.
Feel free to adjust the content as needed for your repository.