Personalized Music Recommendation System

"A recommendation system using NMF"
05 January 2024

In this post, I will be documenting my journey in creating a recommendation system from scratch using my music that I have collected from Spotify. This post will be divided into two parts:

  1. Data Collection
  2. Model Training
  3. Evaluation
  4. Conclusion

Before we go further, I think it's important to know what is NMF, how NMF works, and why I chose this algorithm to create a recommendation system.

What is Non-negative Matrix Factorization (NMF)?

Non-negative Matrix Factorization (NMF) is a matrix factorization method that factorizes a non-negative matrix into two non-negative matrices. NMF equation can be expressed as follows:

Vn×mWn×p×Hp×m\underset{n \times m}{V} \approx \underset{n \times p}{W} \times \underset{p \times m}{H}

where:

  1. VV is a matrix containing the atrributes of the songs we collected from Spotify API.
  2. WW is a matrix containing the latent features of the songs.
  3. HH is a matrix containing the coefficients of the latent features.
  4. nn is the number of songs.
  5. mm is the number of features.
  6. pp is the number of latent features.

The goal of NMF is to find the latent features and the coefficients of the latent features that can reconstruct the original matrix. Note that the latent topics in pp are not directly observable in the data but are inferred from the relationships between songs and their attributes.

How do we determine if a song is similar to other songs? First, we are going to feed matrix Vn×mV_{n \times m} into the NMF algorithm from scikit-learn with pp latent features that produces Wn×pW_{n \times p}. Assuming that we have a song with pp features, let's call it S=[s1,...,sp]S = [s_1, ..., s_p]. Once we have matrix WW, we can multiplicate it with STS^T. Thus, W×STW \times S^T will give us a vector RR containing values representing how similar the song is to other songs, ranging from 0 to 1. 00 represents that the song is not similar to a particular song. Conversely, 11 represents that the song is very similar to a particular song.

Wn×p×STp×1=Rn×1[w11w1pwn1wnp]n×p×[s1sp]p×1=[r1rn]n×1\begin{aligned} \underset{n \times p}{W} \times \underset{p \times 1}{S^T} &= \underset{n \times 1}{R} \\ \begin{bmatrix}w_{11} & \cdots & w_{1p} \\ \vdots & \ddots & \vdots \\ w_{n1} & \cdots & w_{np}\end{bmatrix}_{n \times p} \times \begin{bmatrix}s_1 \\ \vdots \\ s_p\end{bmatrix}_{p \times 1} &= \begin{bmatrix}r_1 \\ \vdots \\ r_n\end{bmatrix}_{n \times 1} \end{aligned}

Now we know what NMF is and how it works in mathematical perspective. Here are the reasons why I chose NMF as the algorithm behind my music recommendation system:

  1. NMF can be scaled to large datasets, which is essential for music services that have vast amounts of tracks and user data.
  2. NMF models can be updated incrementally with new data (new users or new songs), which is crucial for dynamic systems like music recommendation services where the database is constantly growing
  3. Just like other matrix factorization techniques, NMF reduces the dimensionality of the feature space. This is beneficial because it helps in handling the curse of dimensionality and noise in the data. In the reduced space, similar songs are closer to each other, making it easier to find and recommend similar items.

Data Collection

My Spotify homepage
My Spotify homepage

Since I am a Spotify subscriber, I use their public APIs to collect approximately 2000 songs both from my own playlists and other playlists that I have never listened to. If you're interested on how I collected the data, you check the script here on GitHub.

There are endpoints that I used to collect the metadata of songs in my own playlist and also songs in the playlists that I never listened to:

  1. https://accounts.spotify.com/authorize to get the access token
  2. https://accounts.spotify.com/api/token to get the refresh token
  3. https://api.spotify.com/v1/me/playlists to get all of my playlists
  4. https://api.spotify.com/v1/browse/categories/{category_id}/playlists to get all of the playlists in a category
  5. https://api.spotify.com/v1/playlists/{playlist['id']}/tracks to get all of the tracks in a playlist
  6. https://api.spotify.com/v1/tracks?ids={track_ids} to get the tracks' metadata in a bulk
  7. https://api.spotify.com/v1/audio-features?ids={track_ids} to get the tracks' audio features in a bulk

While gathering the data, make sure that you send a request and get the response in a bulk. This way, you can reduce the number of requests that you send to the API. Spotify API has a rate limit of 10 requests per second, so you have to be careful when sending requests to the API.

Here is one of the songs's metadata that the API returns:

{
  "id": "2i2gDpKKWjvnRTOZRhaPh2",
  "title": "Moonlight",
  "artist(s)": "Kali Uchis",
  "popularity": 88,
  "danceability": 0.639,
  "energy": 0.723,
  "key": 7,
  "loudness": -6.462,
  "mode": 0,
  "speechiness": 0.0532,
  "acousticness": 0.511,
  "instrumentalness": 0.0,
  "liveness": 0.167,
  "valence": 0.878,
  "tempo": 136.872,
  "type": "audio_features",
  "uri": "spotify:track:2i2gDpKKWjvnRTOZRhaPh2",
  "track_href": "https://api.spotify.com/v1/tracks/2i2gDpKKWjvnRTOZRhaPh2",
  "analysis_url": "https://api.spotify.com/v1/audio-analysis/2i2gDpKKWjvnRTOZRhaPh2",
  "duration_ms": 187558,
  "time_signature": 4
}

To understand the meaning of each feature, you can read the documentation here. While writing this post halfway, I stumbled upon a library called Spotipy that can be used to interact with Spotify API. I could have used this library to collect the data, but I decided to stick with my own script.

Data Exploration

My playlist heatmap
My playlist heatmap

Looking at the heatmap above, we can see danceability, energy, and loudness are highly correlated. Let's see how the distribution of each feature looks like.

Distribution of each feature
Distribution of each feature

From the distribution above, we can see that most of the songs in my playlists are popular, danceable, and loud. I assume that's the reason why the heatmap above shows that those three features are highly correlated.

Scatter plots of danceability against other features
Scatter plots of danceability against other features

Things we can notice from the scatter plots above:

  1. Most danceable songs are popular.
  2. Danceability is positively correlated with energy and loudness.
  3. Danceable songs are low in speechiness.
  4. Danceability is negative correlated with acousticness and instrumentalness.
  5. Danceable songs have quite high tempo.

Scatter plots of popularity against other features
Scatter plots of popularity against other features

The observations from the scatter plots above:

  1. Popular songs are danceable
  2. Popular songs are energetic and loud.
  3. Popular songs are low in acousticness, instrumentalness, and speechiness.
  4. Popular songs are high in valence, meaning that they are happy and cheerful.
  5. Popular songs have quite high tempo.

Now let's see how the data values look like real close. Here is the first 5 rows of the dataset that I converted to a Pandas dataframe:

popularitydanceabilityenergykeyloudnessmodespeechinessacousticnessinstrumentalnesslivenessvalencetempoduration_mstime_signature
0880.6390.7237-6.46200.05320.5110.00000.1670.878136.8721875584
1830.4720.5188-7.37910.05100.3830.12700.2890.154147.8052116674
2680.8480.36411-10.05810.06370.6970.00530.1400.569137.5412141604
3610.3610.02011-25.06410.05550.9250.00220.1140.35176.6211233204
4820.4400.3178-9.25810.05310.8910.00000.1410.268169.9142334563

Let's see how much each feature contributes to the cumulative variance of the dataset. First we create two lists, one to store the reconstructed errors and the other to store the number of dimensions, starting from 2 to the number of columns in the dataset.

reconstruction_errors = []
dimensions = range(2, df.columns.size + 1)

After that, we create a pipeline that will scale the data, perform NMF, and normalize the data. Then we fit the pipeline to the dataset and append the reconstruction error to the list so that we can plot it.

# create an elbow plot to determine the optimal number of dimensions
for dimension in dimensions:
  pipeline = make_pipeline(
    MinMaxScaler(),
    NMF(
      n_components=dimension,
      max_iter=10000,
    ),
    Normalizer()
  )
  pipeline.fit(df)
  reconstruction_errors.append(pipeline.named_steps['nmf'].reconstruction_err_)

Reconstruction error graph using the Elbow method
Reconstruction error graph using the Elbow method

The general rule of thumb is to choose the number of components where the elbow starts to flatten out. However, the reconstruction error above has a step decline and only flattens out at 14 dimensions. Now, two questions emerged:

  1. Should I use all of the features to create a better recommendation system? That's where the reconstruction error flattens out.
  2. Should I just use half the number of the features to create a so-so recommendation system?

Looking at the graph, I was tempted to use all of the features to create the "perfect" recommendation system. However, there is a catch. If I use all of the features, meaning that the recommendation system will be so good that it will recommend me songs that I have already listened to.

Therefore, if I want to create a recommendation system that will recommend me songs that I have never listened to, I have to use half the number of the features to allow some mistakes to happen. Those mistakes will expose me to new songs or new genres that I have never listened to.

Model Training

This time, let's use 7 dimensions to create a recommendation system, and get the names of those 7 features.

pipeline = make_pipeline(
  MinMaxScaler(),
  NMF(
    n_components=dimension,
    max_iter=10000,
  ),
  Normalizer()
)
transformed_songs = pipeline.fit(df)
 
components = pipeline.named_steps['nmf'].components_
 
categories = pd.DataFrame(components, columns=df.columns.values)

There are two variables that we need to pay attention to, which are transformed_songs and categories. First, let's look how categories looks like:

popularitydanceabilityenergykeyloudnessmodespeechinessacousticnessinstrumentalnesslivenessvalencetempoduration_mstime_signature
00.5349030.2984420.2702660.0000000.4086490.0000000.0000000.0000000.0000000.0189990.0000000.2295390.2178000.248214
10.0000000.1067250.0686260.0000000.1898211.6379130.0000000.0000000.0000000.0000000.0000000.1046840.0978820.090924
20.0014530.0000000.0939440.0233660.0000000.0000000.0000000.0435221.2492950.0000000.0000000.2463920.0000000.000000
30.0000000.1043550.0068651.3355830.0231230.0000000.0000000.0000000.0000000.0000000.0000000.0098090.0000000.000000
40.0000000.5095180.3885430.0000000.3580000.0000001.4722850.0000000.0000000.1873310.0000000.6568880.6229940.116734
50.3379890.7888800.9243210.0000000.8217950.0000000.0000000.0000000.0000000.4758441.2647220.0618460.0000000.535879
60.6549700.1165400.0000000.0000000.2747940.0000000.0000001.4016930.0000000.1360650.2322260.3423290.3163280.246019

What I am gonna do is, for each row, I am gonna get the name of the feature that has the highest value. For example, for the first row, the feature that has the highest value is popularity. Thus, the first category is popularity. The same applies to the other rows.

The discovered categories are:

  1. Popularity
  2. Mode
  3. Instrumentalness
  4. Key
  5. Speechiness
  6. Valence
  7. Acousticness

Evaluation

If we print out the first element of transformed_songs, we will get something like this:

array([0.68781855, 0.        , 0.02267717, 0.42057155, 0.03739232,
       0.46179948, 0.36722474])

Let's present those values in a dataframe:

column_names = ["popularity", "mode", "instrumentalness", "key", "speechiness", "valence", "acousticness"]
 
transformed_songs_df = pd.DataFrame(
  transformed_songs,
  index=no_dup_songs['title'],
  columns=column_names
)

transformed_songs_df would look nicer like this:

titlepopularitymodeinstrumentalnesskeyspeechinessvalenceacousticness
Moonlight0.6878190.0000000.0226770.4205720.0373920.4617990.367225
Die For You0.7956590.4198120.0845640.3670310.0349010.0630330.208879
Traingazing0.4219270.4828560.0072710.5987610.0725040.2414630.408288
See You Later0.0000000.5025420.0000000.6338090.0000000.0615990.584759
Glimpse of Us0.5445960.4825100.0300810.4287210.0711490.0015090.529932

We can make it even nicer by categorizing each song based on the most dominant feature. First we gotta get the index of the dominant category.

dominant_categories = [np.argmax(processed_songs.iloc[i]) for i in range(len(processed_songs))]

Then we create a new dataframe using the dominant_categories and get the name of the category using the index.

categorized_songs = pd.DataFrame(
  [categories.iloc[i].sort_values(ascending=False).index.values[0] for i in dominant_categories],
  index=no_dup_songs.title,
  columns=["categories"]
)

Here is the first 20 rows of the categorized songs:

titlecategories
Moonlightpopularity
Die For Youpopularity
Traingazingkey
See You Laterkey
Glimpse of Uspopularity
La La Lost You - Acoustic Versionacousticness
bluekey
Surfpopularity
Good Newspopularity
Jocelyn Florespopularity
Come Back to Earthacousticness
Habits (Stay High)popularity
Best Part (feat. H.E.R.)popularity
Here With Mepopularity
Until I Found You - Piano Versionacousticness
i love youacousticness
Idea 10acousticness
Idea 1instrumentalness
Solasinstrumentalness
Idea 22instrumentalness

Just by looking at the first 20 rows, I can say that the model does a pretty good job at categorizing the songs. Counting the number of songs in each category, I noticed most songs I like and listen to are popular songs.

categories
popularity          113
mode                 33
instrumentalness     23
key                  17
acousticness         14
valence               7
speechiness           1

Take a music called "Idea 1" for example. It is an piano instrumental music that I often listen to. Let's see what the model would recommend me based on that song.

wanted_song = processed_songs.loc['Idea 1']
recommended_songs = processed_songs.dot(wanted_song)

Looking at the recommended songs below, I think the model did an impressive job since most of the songs are instrumental songs that in the same genre as "Idea 1".

titlescore
Cornfield Chase0.988377
Cornfield Chase - Piano-Cello Version0.985630
Cornfield Chase (Slowed + Reverb)0.985341
Idea 22 (Slowed + Reverb)0.982635
Interstellar- Main Theme0.979361
Solas0.966634
Day One (Interstellar Theme)0.961549
dream river.0.960631
Idea 22 (Sped Up)0.956474

To test the model, let's use the music that I have never listened to before.

titleartist(s)category
As It WasHarry Stylespop
WHERE SHE GOESBad Bunnypop
See The Light (feat. Fridayy)Swedish House Mafiapop
METAMORPHOSISINTERWORLDdance/electronic
FlareHensonndance/electronic
Over YouDillistonedance/electronic
SontheeLARdance/electronic
Hiraeth (feat. Kim Van Loo)Fejkádance/electronic
IndulgenceNora En Puredance/electronic
All Night LongMarshdance/electronic
Just OverYottodance/electronic
VoodooTinlickerdance/electronic
HopefulODESZAdance/electronic
SaltaSultan + Sheparddance/electronic
Believer - Marsh's Guatape RemixAbove & Beyond, Marshdance/electronic
BreathingBen Böhmerdance/electronic
Tell Me Why - MEDUZA RemixSupermode, MEDUZAdance/electronic
Lost In The RhythmDavid Guetta, MORTENdance
ExhalePatrick Bradleyjazz
As It WasThe Piano Guysclassical
Adagio in G Minor (Arr. for Harp and Orchestra)Xavier De Maistreclassical

Just by listening to these twenty songs, I notice that mose of them have a lot of instrumental elements, pretty much similar to "Idea 1".

Conclusion

In this project, I have successfully created a decent recommendation system that can recommend me songs using Non-negative Matrix Factorization (NMF). Other than this technique, there are other algorithms that can be used to create a recommendation system, such as Singular Value Decomposition (SVD), Alternating Least Squares (ALS), and Collaborative Filtering (CF). However, I chose NMF because it was the algorithm that I have never heard about until I was reading my old notes from my Linear Algebra class.

Am I satisfied with the result? Yes, I am, and it really help me discovered new songs that I have no idea about, and a lot of them ended up in my own playlists.

References

  1. D. Lee, Daniel. Seung, H. Sebastian. (1999). Learning the parts of objects by non-negative matrix factorization. Nature. 401. 788-791. https://www.nature.com/articles/44565
  2. Wikipedia. Non-negative matrix factorization. https://en.wikipedia.org/wiki/Non-negative_matrix_factorization