Sunday, 15 September 2019

Buy Low Sell High - leaderboard and rule clarification


Buy Low Sell High - leaderboard and rule clarification

2019
<<Previous  Next>>

You can now register your team and submit your trades files to see how you do on the leader board:


Just to clarify a few things:

  1. You should only use the values of the current and previous predictions to make a decision
  2. If you use future prediction values then you will probably get a good result but this logic is unimplementable
  3. Any strategies found to be using future prediction values will be disqualified, so there is no point in trying 
  4. Do not use anything else in your algorithm such as pair name, price or absolute time
  5. If you want to use previous prediction values, then use the relative time differences to determine them
  6. The new file we provide will be consistent in having predictions generated every 5 minutes intervals, but the absolute values could be anything
  7. After the deadline, you will be asked to nominate your 3 long and 3 short strategies you want to be evaluated
  8. All teams beating the Benchmark solution on the private leader board for their nominated strategies will then qualify for stage 2
  9. We will then invite those teams to run their code over several new files. They must only use the strategies nominated and the strategies must be locked in with no further parameter tuning allowed
  10. The new files will have different pair names and the start time for the field minutesSinceStart will not necessarily be the same start time as the file already provided
  11. The winners will be the team that gives the best return on the new data providing they still beat the Benchmark and we are confident no future prediction values have been used.
  12. There will be a winners for Short and a winners for Long
  13. If we suspect future prediction are being used then we will say how we came to this conclusion and the team will have a right of reply to prove otherwise
  14. There will be a benchmark prize for the first leading team on the private leaderboard as at 12 pm on Thu Oct 17th that wishes to reveal their method. In order to receive the prize the team must write a blog post describing their method so it can be reproduced by others. It must not use future information. It is not compulsory to reveal your method, so we will proceed down the ranking and award the prize to the first team that wishes to do so.

Just to clarify what we mean by 'future information'. The data set does contain records that are 'in the future' to the times we have asked you to make decisions for. It is OK to use this data to come up with a set of coefficients for a model.
What it is NOT OK to do is use the raw prediction values at time 'x' as inputs to a model making a decision for a time prior to 'x'. This is an unimplementable solution. 


If there is anything else that needs clarifying, ask below and we will add to the list


Good Luck




16 comments:

  1. Hi Phil , as i understand that we have to find enter/exit time, directions and pair name. The pair name which our model will find will be from given 14 pairs e.g. '0x_bitcoin','bitcoin_usdollar','bitcoincash_usdollar' etc.

    If above mine understanding is correct then what point 10 is telling that the new files will have different pairs?Different pairs you mean totally new brand pair name which are not like given 14 pairs?

    ReplyDelete
  2. Please read the https://anotherdataminingblog.blogspot.com/2019/08/buy-low-sell-high.html

    ReplyDelete
  3. You don't have to find the pair name - you have to provide entry and exit times for whatever pair name is given, based only on the predictions values for that pair name.

    ReplyDelete
  4. At what point will teams be disqualified for using future values in predictions? The leaderboard now looks like it is already happening :)

    ReplyDelete
    Replies
    1. Well the thing is, every submission on the leaderboard is using future information, due to the unseen pairs returns occuring in the same time period as the training data. There is no possible way to get around it.

      Delete
    2. Good question. You can submit what you like to the leaderboard, but ultimately the 3 strategies you select to be evaluated should not contain future information. If they do - and we will know when we provide you with the new files - then the team will only be disqualified at that point.

      It is easy to accidentally include future information without it being deliberate, so we urge you to make sure your code is rock solid. Any solutions that look too good (c/w the benchmark) please double check.

      Delete
    3. But its not a coding issue - it's a data issue :(

      Delete
    4. You can accidentally include the predictions from the future to make a trading decision and inadvertently get a very good score - for example if you were smoothing the predictions and got your indexes mixed up.

      Delete
  5. Question: Point 9 says "We will then start with the leading team and invite them to run their code..." but then point 11 says "if the returns beat the Benchmark and we are confident no future information has been used then that team will be a winner".

    Does this suggest that if the private rankings look like this:

    Team A: 1200%
    Team B: 1000%
    Team C: 900%
    Team D: 800%

    Then in order, A,B,C,D they will run their code on round 2. If the returns on the new data look like:

    A: 60%
    B: 70%
    C: 55%
    D: 130%

    A would still win, just because they were picked first? Does this not reward the team that overfits their solution on the private data during round 1 so that they get first pick in round 2? I would think that team D should be the overall winner for having the highest return in a more real scenario?

    ReplyDelete
    Replies
    1. Good question and good spot of a flaw in the rules. In this situation D would win. We will try to validate as many teams as possible that beat the benchmark and the final ranking would then be on the unseen data.

      Delete
  6. Further, given that there is a benchmark for both long and shorts, is the 'beating the benchmark' determined by adding the returns on both of your strategies? what if you only have a short strategy? The same question goes for the leaderboard ranking overall, what if one team has a 600% long strategy, and another team has a 400% long + 400% short strategy, with neither beating the benchmark?

    ReplyDelete
  7. Long and Shorts will be treated separate - so there will be a winners for Longs and winners for Shorts.

    ReplyDelete
  8. the 'clarification' above has now been edited. Hope it is now clear?

    ReplyDelete

  9. nice post,thanks for sharing this information.we are providing Data mining services for quality data

    ReplyDelete
  10. This comment has been removed by the author.

    ReplyDelete
  11. It's a great blog! As a newcomer, this helps me learn everything very well. Thanks for sharing!

    https://hirinfotech.com/web-scraping-data-mining/
    https://hirinfotech.com/services/
    https://hirinfotech.com/enterprise-web-crawling-service/
    https://hirinfotech.com/data-as-a-service/

    ReplyDelete