Wednesday, June 27, 2007

Pandora, the shirt I wanted was taken.

So, I attended a Pandora meetup last week, Pandora being the internet radio service that’s grown out of the music genome project that studies songs, assigning scores among 400 different “genes” to sort of stamp out what the qualities of a song are. It’s a mechanism for finding recommendations based on what you like, rather than genre expectations or cultural approval. They built it into a radio system where you plug in an artist or song that you like, and then play songs that are qualitatively similar.

A bit (the whole meet up was basically a “town hall” meeting led by Tim Westergren, one of the founders) went toward explain the selection process, which is the sort of gear shifting that interests me. Since the genes are rated, there’s score to a song. So the next song may be the next closest in score. If all but one gene is the same, the different one...ehh...rhythmic vocalizations*, will define the distance. A song with the score of 5 may be followed by one that’s equal except for a 3 in a gene, rather one euqal except with a 9. And there’s weight to the genes as well, so a song 5 apart on vibrato might still be played before another one that is only 2 apart, but on the tempo scale—changes in tempo being much more significant (the most significant, actually). A neat business that means you might hear a lot of Celine Dion (the most thumbed down artist), whatever you might think you like.

Interestedly, “era” is a gene, and means that song selections gravitate toward a 20 year spectrum from the jump off. That’s a little off to me, since it seems like a non-musicology oriented classification to have—maybe it’s used to account for the variance in production values of the years.

I’ve enjoyed the system, and think it’s worth a shot for those who want to sample something close to what they like. I discovered my love of No Doubt and Gwen Stefani—‘cause really, that’s just New Wave isn’t it? Sadly, no Duvall approved “dumb lyrics” +/- system has yet to be implemented.** The site itself may be robust enough to a have a community, for those into that sort of thing, though I wonder how many presences people can maintain. Anyway, Pandora is pretty big. It managed 400,000 faxes in three days, the fasted inundation of Congress ever, after a call to arms regarding recent RIAA royalty shenanigans and internet radio. A fact both impressive and sad (as Westergren noted), since it means the most people get riled about is when you threaten their free music.

Learned other things: its on an value-added advertising business model, random recs is an interesting idea that executes poorly, as is computer listening (Pandora has 50(?) full time trained listeners quantifying songs). Then I dodged a thunderstorm.

Didn’t get the nice looking shirt (for the best, I don’t need more shirts), but got a nice hat, though. You know me and hats.

* I’m making shit up.
** No, Pandora, I do not like Ashlee Simpson. Stop it.


HoBs said...

Oh, human coders, how boring. thought they might have done kind of wavelet/signal processing analysis.

Otherwise, then the software is easy to write. I wrote one for Morgan Stanley when I was there. You use something called principle component analysis (I want to be that guy on Numb3rs if anyone else actually watches that crappy show), which is one of few more direct (and neat) applications of linear algebra.

Where you essentially plot all the songs in this 20 dimensional space (a tempo axis, and a year axis, and etc.) and then take the eigenvectors, which basically finds the combinations of dimensions that are most useful in classifying the songs, and then you can find songs that are closest in this reduced space.

hcduvall said...

Oh, I assume the coding's not difficult, it's a really simple system. So yeah, it is a bunch of shuffling. But the classifying and weighting choices are the expertise you're getting with the service, and the trained human critics are what makes their system better than say, fan based recs like tagging or purchasing trends. (Where your cultural influences are in play, and less precise) Training being the important part. Not everyone is cleared for each genre, and certain ones, like jazz, have their own "genes". Classical hasn't been incorporated because it's a much more involved, and studied, set of qualities at play.
The computer sgnal analysis actually came up directly during the q&a. Basically, the technology's just not there yet, in fact the current level of signal analysis is so far from the sophistication of a trained human ear that its not just not useful now, its not forseeable for quite some time.

Mike said...

Hobs, you know, I'm probably one of the few other people on Earth (percentage-wise vs the whole population that is) that wants to be the Numb3rs guy.

One thing that I don't think Duvall touched upon though was how you can "thumbs up" or "thumbs down" a any particular song (though sadly not artist), and even reject a song outright. I'm probably making a couple of logical jumps due to it being 2AM, but what you thumb up and down would actually close up the n-dimensional space fast (weighing it more and more toward just a few planes until some don't even matter) and create a curve between any two given vectors on the same plane. And any possible song that would play on that station would fall on or close to one or more of those curves. For instance, I've got a station that's tuned so that it pretty much spits only 3rd wave Ska out at me, but the curve allows for some old school stuff to slip in once in a while. It's pretty neat to think about.

Ultimately, I guess we're both thinking along the same lines, but one a bit more discretely, and the other a bit less.

As for machine analysis of the music itself, I think it'll will only be able to go so far (unless there's a gigantic leap in AI). A machine could easily do the quantifiable aspects of a song, like tempo, key, etc..., but not the qualatative genomes (I believe there might be one based on production values, i.e. being over-produced or not).

hcduvall said...

I don't think the production value gene is over or under--like looking at other creative works you assume "authorial intent". But what the production tries to draw out has probably changed over the years, so whatever flibberdigibbit functionalizing mechanistic system they're employing is both an analysis and contextual.

HoBs said...


glad to know there's someone else who watched numb3rs. (stupid 3 in there making it hard to type)

not sure if pandora uses the same thumbs up and thumbs down.

but most of those system, pioneered by pattie maes' firefly program at mit's media lab (back when I was an undergrad there) and her company firefly, that later got adopted by Amazon, Netflix and TV uses the same principal component analysis, but on a much huger space.

basically the space of all songs.

as in, if there are 10,000 songs out there, it is 10,000 dimensional. A person is a set of yes, no's on each 10,000 dimensions.

Such a system can work without professional categorizers. It just finds other songs based on the preferences of other people "near" you in this 10,000 dimensional space.