Spotify Track Popularity Analysis
Pythonpolarsscikit-learnaltairmarimouv
Analysis of ~170K Spotify tracks to understand what audio features predict popularity. Uses clustering to discover track archetypes, classification to predict popularity tier, and regression to estimate raw popularity scores.
What’s inside
- Clustering — KMeans and DBSCAN on audio features (danceability, energy, valence, tempo, etc.) to find natural track groupings
- Classification — RandomForest, GradientBoosting, LogisticRegression to predict high/mid/low popularity tier
- Regression — Ridge, RandomForestRegressor, GradientBoostingRegressor to estimate popularity score (0–100)
- EDA — Correlation analysis, PCA, class distributions, feature importance
Stack
Python · polars · scikit-learn · altair · marimo · uv
Run
uv sync
uv run marimo edit notebook.py --watch