It is now day 12 of the HHP. I know this as I have been trying to make a submission every day, and the leaderboard says Sali Mali has made 12 submissions.

Dave is keeping his position near the top but Sali Mali is going downhill, as others overtake me.

I've now started to look at the problem in a little more detail, and decided I needed to a way of replicating the error function that is being used to judge this competiton. This will help in figuring out what sort of leaderboard score I expect my submissions to achieve.

I've also not been totally wasting my submissions during the time when I have had no useful model to submit. In previous time series competitons, it has been possible to learn a lot from the leaderboard (real examples will be given part 2 of this post), so I figured pinging a few constant valued submissions might help me extract some insight about the future (although I've not really thought about what that insight might be at the moment).

The following R code demonstrates

Dave is keeping his position near the top but Sali Mali is going downhill, as others overtake me.

I've now started to look at the problem in a little more detail, and decided I needed to a way of replicating the error function that is being used to judge this competiton. This will help in figuring out what sort of leaderboard score I expect my submissions to achieve.

I've also not been totally wasting my submissions during the time when I have had no useful model to submit. In previous time series competitons, it has been possible to learn a lot from the leaderboard (real examples will be given part 2 of this post), so I figured pinging a few constant valued submissions might help me extract some insight about the future (although I've not really thought about what that insight might be at the moment).

The following R code demonstrates

- how the function to calculate the error was put together
- how we can easily generate the errors for a sequence of constant valued predictions on Y2 and Y3
- how the Y4 leaderboard results were read from an Excel file and compared to the Y2 and Y3 results.

########################################### #function to calculate the model error ########################################### calc_error <- function(act,pred) { aact <- as.matrix(act) ppred <- as.matrix(pred) if(nrow(aact) == nrow(ppred)){ return (sqrt(colSums((log(ppred+1) - log(aact+1)) ^ 2) / nrow(aact))) } else { return (-99) } } #### EOF to calcualte model error #### ########################################### # function to calculate the model errors # given a sequence of constant values for # the predictions and the known outcome ########################################### err_seq <- function(sequence,act) { actual <- as.matrix(act) #vector of errors errors <- vector(length=length(sequence)) #vector for the predictions (will be constants) predictions <- vector(length=nrow(actual)) ind <- 0 for(i in sequence){ #report progress ind = ind + 1 cat("\n",i) flush.console() #set the prediction to the constant predictions[] <- i #calculate the error errors[ind] <- calc_error(act=actual,pred=predictions) } return (errors) } ###EOF to calculate sequence of errors #### #connect to data library(RODBC) #for data connection conn <- odbcConnect("sql_server_HHP") #load the actual days in hospital actualY2 <- sqlQuery(conn,"select DaysInHospital from DaysInHospital_Y2") actualY3 <- sqlQuery(conn,"select DaysInHospital from DaysInHospital_Y3") #set up a sequence of constants to be used as the predictions const_preds <- seq(from=0, to=1, by=0.01) #calculate the error sequences for Y2 & Y3 Y2Err <- err_seq(sequence=const_preds,act=actualY2) Y3Err <- err_seq(sequence=const_preds,act=actualY3) #read in the errors for Y4 from an excel file #where they were recorded library(xlsReadWrite) #this is also required to be run #see http://www.swissr.org/software/xlsreadwrite #xls.getshlib() Y4 <- read.xls("E:\\comps\\hhp\\constants.xls") ############################# #plot prediction v error ############################# ymin <- min(Y2Err,Y3Err) ymax <- max(Y2Err,Y3Err) plot(const_preds,Y2Err ,type='l' ,main='Learning from the Leaderboard' ,xlab='Predicted Constant' ,ylab='Error' ,ylim=c(ymin, ymax) ,col='blue' ) lines(const_preds,Y3Err,col='green') #add the line for Y4, discovered via leaderboard lines(Y4[,1],Y4[,2],col='red',lwd=2) legend('bottomright', c('Y2 - known','Y3 - known','Y4 - via leaderboard'),lty=1, col=c("blue","green","red"))

## No comments:

## Post a Comment