I have spent the last six weeks writing a music recommendation engine, theperceptron.com It was fun.
From the user’s perspective:
- Enter a band you like.
- Get recommendations for other bands you might also like.
- Test out the artists recommended by visiting their Myspaces and websites, reading their Wikipedia summaries and listening to sample tracks.
- Say whether you like or dislike your recommended bands.
- Add promising bands to your playlist so you can listen later.
- Suggest an artist or two that the site didn’t recommended.
- Get on with your life.
From the code’s perspective:
Recommendations are made based on connections between artists. These connections are found in data taken from the internet:
- Recommendations made by actual humans: tinymixtapes.com and epitonic.com and users of the perceptron.
- Artist admiration: artists’ top friends on Myspace.
- Artists on the same mixtape: muxtape.com
- Artists on the same record label: wikipedia.org
- Artists posted to the same mp3 blog: hypem.com
- Artists who have played gigs together: myspace.com
Each rating action that a user can perform on a recommended artist - liking or disliking, visiting websites, listening to songs or adding them to the playlist - is associated with a certain number of points. These points are used in two ways. First, each source has a running total of points given to the recommendations made by the source. Second, each artist connection has a running total of the number of points it has accrued.
Recommendations are given a score based upon these point totals. Ignoring the weightings of the source and connection score, a recommendation’s score is calculated thus:
score = (source_points + connection_points) / num_source_connections
the perceptron’s algorithm is pretty obvious. What makes the site good is the choice of data sources. However, the algorithm does allow experimentation with adding data sources. If I add a bad one, the scores given to its recommendations drop very rapidly. It only took about 200 user rating actions to get the site’s data source weights pretty good. Here is the current table (higher numbers are better):
|Epitonic similar artists||0.439|
|Tiny Mix Tapes similar artists||0.316|
|Myspace top friends||0.128|
|Epitonic other artists||0.016|
|Gigs||Score hasn't settled, yet.|
|the perceptron user recommendations||Score hasn't settled, yet.|