tag:blogger.com,1999:blog-8236580214478682937.post8486205815416124902..comments2024-03-08T01:09:13.900-08:00Comments on Another Data Mining Blog: Code for a Respectable HHP ModelSali Malihttp://www.blogger.com/profile/14741877332990090234noreply@blogger.comBlogger14125tag:blogger.com,1999:blog-8236580214478682937.post-90573897627601368982012-08-29T03:41:11.758-07:002012-08-29T03:41:11.758-07:00wow,,, thanks for your reply :)wow,,, thanks for your reply :)trainhttps://www.blogger.com/profile/00901545110369612143noreply@blogger.comtag:blogger.com,1999:blog-8236580214478682937.post-53415935642065007262012-08-29T02:25:27.138-07:002012-08-29T02:25:27.138-07:00Sure, but will cost you $3.1 million ;-)Sure, but will cost you $3.1 million ;-)Sali Malihttps://www.blogger.com/profile/14741877332990090234noreply@blogger.comtag:blogger.com,1999:blog-8236580214478682937.post-15424155929922915122012-08-29T02:11:09.727-07:002012-08-29T02:11:09.727-07:00Thank you so much for posting.
I'm a student. ...Thank you so much for posting.<br />I'm a student. I'm interested in Kaggle.<br /><br />Could you provide the more SQL code ?<br />including code to make Subset5,Subset6 (Data Set 1) and Data Set 2trainhttps://www.blogger.com/profile/00901545110369612143noreply@blogger.comtag:blogger.com,1999:blog-8236580214478682937.post-49435715690242768722011-10-25T12:37:25.142-07:002011-10-25T12:37:25.142-07:00@Signipinnis
Thanks for the comments. R is relati...@Signipinnis<br /><br />Thanks for the comments. R is relatively new to me (<2 years) but the more I need to do stuff, the more I find that someone has already done it before and has written a nice function for me.<br /><br />I did the data prep in SQL because that is what I am more familiar with. I'm sure the same thing can be done more easily in R, so if anyone wants to replicate the SQL in R then I'd appreciate the kickstart lesson.Sali Malihttps://www.blogger.com/profile/14741877332990090234noreply@blogger.comtag:blogger.com,1999:blog-8236580214478682937.post-83322166595805540662011-10-25T11:49:05.740-07:002011-10-25T11:49:05.740-07:00You are Heroes of the RRRRRRRRevolution. Your gene...You are Heroes of the RRRRRRRRevolution. Your generous sharing is much appreciated. <br /><br />I've never used SQL to create a lot of varied additional data elements/features, so that part also was very educational to me. Your code has an elegantly sparse, honed directness to it. Clearly "not your first rodeo" as they say.<br /><br />Best of luck the rest of the way. (Not that there seems to be much "luck" about what you're doing.)Signipinnishttps://www.blogger.com/profile/05372764321739036935noreply@blogger.comtag:blogger.com,1999:blog-8236580214478682937.post-48312290361103818162011-10-23T13:38:24.242-07:002011-10-23T13:38:24.242-07:00Thanks for these details and congratulations with ...Thanks for these details and congratulations with the first milestone price!<br /><br />I'm participating in this contest as a grad student in engineering and writing a thesis about it. But i still seem to lack a theoretical base to fully understand the given problem. I now finished reading "Machine Learning, Tom Mitchell, McGraw Hill, 1997" as an introductionary work, are there other must read books, papers, resources, .. that you could recommend?<br /><br />Thanks in advance!Rstohrhttps://www.blogger.com/profile/04315270539383047883noreply@blogger.comtag:blogger.com,1999:blog-8236580214478682937.post-69705372299495537312011-10-08T15:55:14.564-07:002011-10-08T15:55:14.564-07:00I was able to reproduce the results from using the...I was able to reproduce the results from using the 'gbm' only model. Are you going to post the parameters from the other models that you used in your ensemble methods?B. Riemannhttps://www.blogger.com/profile/01536635567216109898noreply@blogger.comtag:blogger.com,1999:blog-8236580214478682937.post-60778999780459123212011-10-05T17:08:21.673-07:002011-10-05T17:08:21.673-07:00Thank you so much for posting these details. I'...Thank you so much for posting these details. I'm a newbie and this blog has served as a great learning resource.pidtishttps://www.blogger.com/profile/02779423578553654758noreply@blogger.comtag:blogger.com,1999:blog-8236580214478682937.post-4670795686111988332011-10-05T11:11:56.430-07:002011-10-05T11:11:56.430-07:00Thanks for continuing to blog while competing. I h...Thanks for continuing to blog while competing. I haven't been active recently, but this has given me a great place to jump back in!Christianhttps://www.blogger.com/profile/17277827981568291665noreply@blogger.comtag:blogger.com,1999:blog-8236580214478682937.post-21194303542390796532011-10-02T11:25:52.946-07:002011-10-02T11:25:52.946-07:00Thanks Sarkis & Brian for sharing this. Intere...Thanks Sarkis & Brian for sharing this. Interestingly you both got slightly different results, which is to do with the stochastic nature of the algorithm. GBMs will overfit, and if you read the documentation, you will discover you can find the optimal number of trees for a specific learning rate by cross validation. Also if you read our paper, you will discover that there are some variables in the SQL that we chose not to use in the modelling! If you optimise this model and add the extra variables as described, a score of 0.461 should be achievable with a single model.Sali Malihttps://www.blogger.com/profile/14741877332990090234noreply@blogger.comtag:blogger.com,1999:blog-8236580214478682937.post-29637947699597986982011-10-02T11:09:55.616-07:002011-10-02T11:09:55.616-07:00Thank you for sharing this and congratulations on ...Thank you for sharing this and congratulations on winning the first milestone prize. I scored 0.463372 on the public leaderboard using this approach.Brian Chasehttps://www.blogger.com/profile/05575232923452622815noreply@blogger.comtag:blogger.com,1999:blog-8236580214478682937.post-35270034035275390332011-10-01T22:37:19.622-07:002011-10-01T22:37:19.622-07:00Sorry Zach, I don't think the rules allow redi...Sorry Zach, I don't think the rules allow redistributing the data. There are many free tools though to let you generate it yourself. In my earlier post on loading the data I think there is a link to a site that lists all the software.Sali Malihttps://www.blogger.com/profile/14741877332990090234noreply@blogger.comtag:blogger.com,1999:blog-8236580214478682937.post-25213973526790506182011-10-01T14:25:15.449-07:002011-10-01T14:25:15.449-07:00Could your provide the result of the SQL as a data...Could your provide the result of the SQL as a datafile?Anonymoushttps://www.blogger.com/profile/17305384425953877966noreply@blogger.comtag:blogger.com,1999:blog-8236580214478682937.post-10145277921232179852011-10-01T11:02:43.218-07:002011-10-01T11:02:43.218-07:00Thank you so very much. I was able to get public l...Thank you so very much. I was able to get public leaderboard score of 0.463549 with this code. Congratulations on the well deserved Heritage Health Prize Round 1 Milestone prize!Sargishttps://www.blogger.com/profile/01730048467387998912noreply@blogger.com