Spotify Track Popularity Analysis

Pythonpolarsscikit-learnaltairmarimouv

Analysis of ~170K Spotify tracks to understand what audio features predict popularity. Uses clustering to discover track archetypes, classification to predict popularity tier, and regression to estimate raw popularity scores.

What’s inside

  • Clustering — KMeans and DBSCAN on audio features (danceability, energy, valence, tempo, etc.) to find natural track groupings
  • Classification — RandomForest, GradientBoosting, LogisticRegression to predict high/mid/low popularity tier
  • Regression — Ridge, RandomForestRegressor, GradientBoostingRegressor to estimate popularity score (0–100)
  • EDA — Correlation analysis, PCA, class distributions, feature importance

Stack

Python · polars · scikit-learn · altair · marimo · uv

Run

uv sync
uv run marimo edit notebook.py --watch