Thursday 17 November 2011

The Pack is Catching Up


If you have been keeping an eye on the leaderboard, then you will notice there has been apparently little activity since the milestone 1 deadline. On some occasions there has been no change in the top 40 positions for over one week. This is quite an eerie silence and I suspect there may be a lack of submissions so teams can merge (the total submissions of teams merging has to be less than the number of days the comp has been running).


There have also been some very interesting movements if you look closer - more to come on this in a later post.

If you look further down though, the pack is catching up. In the past six weeks more teams are heading towards the 0.461 mark, which is the point the early leaders got to straight away and was the score to beat. Now it is only a top 50 place.


This score is a good single model. To improve dramatically from there though it is probably necessary to ensemble various models. What is pretty clear though is that the benchmark of 0.40 for the 3 million is impossible (hopefully this might be adjusted?).

I tried to put some nice colours in the chart below, which is generated in R, but could not find any up to date listing of colour codes in R. This is one of the disadvantages of the open source movement - documentation is very low on the contributors list of priorities (and what documentation there is leads a lot to be desired if R is to be used by 'regular' types of people).


I did find the following link though, which is where I got the colours for the plot,

http://colorbrewer2.org/
  
click on the image to enlarge