Thursday, September 13, 2007

Station Building 101: Listening Test Mark III

In the previous post, I described the second version of the listening test which I had been doing roughly monthly for the past six months. In the original listening test I only listened to ten songs from each station, and then we learned that Pandora was generating songs in sets of 3 or 4 songs. Thus, I moved to listening to ten sets of songs (30 to 40 sings total).

As I performed the September listening tests, I became slowly convinced that the player was occasionally generating two-song sets. Finally, while testing Pagan Pride (which has an eclectic mix of genres) I came across a case where two folk songs were jammed between a hard rock set and an electronica set.

I e-mailed Tom Conrad, and he confirms that the selection algorithm is now occasionally generating 2-song sets. I've been seeing less value in trying to identify the sets anyway, and so I've decided to switch to listening to 40 songs from each station, and taking ten time the average song score (1 for up, 0 for down, and 0.5 for neither) as the score for the station.

I'm less than halfway through this month's tests, but it appears that the selection algorithm has once again been improved. It would not surprise me if the average station score is up an additional 0.5 since July's increase of a full point. My Grrrlpower station scored an astonishing score of 9.5 with no thumbs-down and four non-thumbed up tracks in 34 songs.

I began to wonder what we might be losing in these changes since my satisfaction has been increasing so consistently. It seems to me that the stations might not be exploring as much new material, and so I've started roughly tracking the number of new tracks that are being played. I can't do it exactly since, clearly, I can not remember every track that has ever played on each station. Nevertheless, a track that already has a thumb-up has clearly been played before, and I'm pretty certainly that if I'm motivated to give a track a thumb up or a thumb down, then it's most likely new since I would have have had the same motivation the first time I heard it. And so I've begun to track the percent of new tracks being played and the percentage of new tracks that get a thumbs-up. So far over five stations the % New has run from 16% to 39% and the % Good|New (% of Good given that a track is New) is all over the map from 23% on Pagan Pride to 83% on O, Wow the Moon.

