Saturday, December 13, 2008

Netflix and data mining

In a few of my classes, the topic of data mining came up.

Basically, it comes down to taking existing data and drawing conclusions from it. This can be seen, for example, if you have an amazon.com account. Amazon shows you suggestions of other items that you might be interested based off of what you are viewing. The way that is done is through a process called data mining.

Basically, it doesn't take a genius to say if someone buys season 1 of a TV show (say, House, M.D.), they may be interested in season 2 of that same TV show. However, if you like House, M.D., you might also be interested in something that may seen completely different - let's say, a Sherlock Holmes mystery. Amazon can tell that by taking everyone's past purchases and analyzing what people who bought a certain DVD liked, and comparing that to your past.

There are many other places where this happens - for example, credit card companies use data mining to guess if a transaction is fraudulent or not - based on your purchasing habits. For example, my credit card was rejected when I tried to go skydiving, because based on what I've charged, they thought that it was so out of character for me that they blocked the transaction, and I had to call in and verify that it was indeed me.

Another place that this happens is on sites like Netflix and Blockbuster Online. Of course, they want you to enjoy movies, but the underlying reason is so that you come back and stay a subscriber. Therefore, it is best for them to have a good data mining algorithm in place to show you movies (and therefore keep you subscribing).

As a matter of fact, if you can provide them a better algorithm than they have, you can be eligible for prize money - up to a million dollars, if your data matching algorithm is substantially better than theirs.

Web Link:
Thanks to Brian E. for the heads up!

No comments: