Thursday, 30 June 2011

If you build it, they will come

Or will they?

(Apparently, the actual quote is "If you build it, he will come" from Field of Dreams.)

I figured if I am to make money out of this blog by writing about the HHP, then it would be a good advert if I was actually doing well in the competiton. I recently have been putting effort in on this front, and team Sali Mali eventually got to the top of the leaderboard. 

Being a data geek, I looked at the subsequent stats of this blog and saw a huge spike for a particular half hour around the time I got to the top.  'Bingo' I thought - thats the way to generate traffic.

I then looked at my linkedin stats, as the only real way to get to this blog is via my Kaggle profile, which will take you to my linkedin page and then to this blog. Surprisingly there was no such spike there  - so what was going on?

Blog Views:

LinkedIn views:

As I don't know what exact time zones everything is in, my blog stats pointed me to this post on the kaggle blog, which I am assuming is the cause for the spike.

Anyway, the data mining point of all this is that sometimes people are quick to jump to conclusions that are completely wrong. The real answers are always in the data - which is why I think the HHP will be won by a data scientist and prior expert medical knowledge will pay no part at all. 


  1. Congratulations on getting to the top on Leaderboard! I like this blog and I use one of your R scripts from here for which I'm thankful.

    I'm subscribed to rss feed for this blog, so most of the time I read your posts with my Google Reader. I first found this blog through a tweet that was posted on Also, some visitors might be coming directly through the links in the forum posts, which would explain why it doesn't show up in your LinkedIn dashboard.

    Best Wishes!

  2. I tend to agree with your thought that the HHP will be won by a data scientist.

    The ability to exploit clinical knowledge has been wiped as a significant factor, by virtue of the data being so thoroughly sterilized, and then the double whammy of everybody else getting so much time for machines to wrestle micro-snippets of info value from the data. We don't have to have to have a clue about the clinical significance of anything, just that something added to "the model" reduces the RMSE by 0.000000001. An ensemble of data miners will come pretty close to finding all the truffles, given enough time.

    Which brings us to Key Factor #2 .... time. For a doctor, there's coming in #1 (aka winning), or coming in anywhere else in the pack (aka all of the time you put into the contest is wasted.) But for professionals from the data science side, we gain skills, comraderie (?sp), prestige and perhaps even new job opportunities for showing well. So the data scientists, individually and collectively, will have a lot more staying power in this than the clinicians.