2019 |
This is information about the 2019 Melbourne Datathon - Please read the previous post first
The data is a single file and can be downloaded from the following links:
https://drive.google.com/open?id=11bWbg9kSmGXNBUOdGXv7MtWszTfVTUfR (553mb zipped)
https://drive.google.com/open?id=1WuFFk7NkBgYJxgDx8jeohwiLwwOMzLZP (1.5gb unzipped)
The column names should be self explanatory. There are 2 price columns;
barClosePrice - this is the price at the current time stamp 'minutesSinceStart'. It is the price which the predictions (LPred1b-LPred14b) are predicting the change of.
tradePrice - this is the price 5 minutes later. It is the price that will be used in the returns calculations. The reason for this is that we are assuming it takes 5 minutes from the time we decide to trade to actually executing the trade.
Below us some R code to get you going, and we'll be posting more in the coming days.
#--------------------------------------------------------------------------------------
# some R code to get you started with the 2019 Melbourne Datathon analytic challenge
# download the data from the link below and unzip to 'dataFolder'
# https://drive.google.com/file/d/11bWbg9kSmGXNBUOdGXv7MtWszTfVTUfR (553 mb)
#--------------------------------------------------------------------------------------
library(data.table)
#define where the data is
dataFolder <- "D:/buylowsellhigh/downloaded/"
theDataFile <- paste0(dataFolder,"melbdatathon2019_buylowsellhigh.csv")
#read in the data
dt <- fread(theDataFile)
nrow(dt)
#4,907,361
colnames(dt)
# [1] "keys_pair" "minutesSinceStart" "gap" "barClosePrice" "tradePrice" "Lpred1b" "Lpred2b" "Lpred3b" "Lpred4b"
#[10] "Lpred5b" "Lpred6b" "Lpred7b" "Lpred8b" "Lpred9b" "Lpred10b" "Lpred11b" "Lpred12b" "Lpred13b"
#[19] "Lpred14b"
unique(dt$keys_pair)
#[1] "0x_bitcoin" "bitcoin_usdollar" "bitcoincash_usdollar" "cardano_bitcoin" "dash_usdollar" "litecoin_bitcoin" "litecoin_tetherusd" "monero_bitcoin"
#[9] "qtum_bitcoin" "ripple_bitcoin" "ripple_usdollar" "stratis_bitcoin" "tron_tetherusd" "zcash_bitcoin" "pair_1" "pair_2"
#[17] "pair_3" "pair_4" "pair_5" "pair_6" "pair_7" "pair_8" "pair_9" "pair_10"
#[25] "pair_11" "pair_12" "pair_13" "pair_14" "pair_15" "pair_16" "pair_17" "pair_18"
#[33] "pair_19" "pair_20" "pair_21" "pair_22"
#price information is missing for the unnamed pairs
x1 <- subset(dt,keys_pair== "pair_1")
nrow(x1)
#132,721
summary(x1)
#distribution of the predictions
hist(dt$Lpred1b,breaks=100)
#time series of prices and predictions
x <- subset(dt,keys_pair== "bitcoin_usdollar")[1:1000]
plot(x$minutesSinceStart,x$barClosePrice,type='l',col="blue")
plot(x$minutesSinceStart,x$Lpred1b,type='l',col="blue")
Says above that data's coming on Wed 27th August - the next one of which is in 2025 :-)
ReplyDeletevery observant! Date now edited.
DeleteBelow is the python code to start up (similar to R code given in this post above)
ReplyDelete# anaconda python 3.6 and above
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
# Assuming file is stored in data folder in the same directory 'data/melbdatathon2019_buylowsellhigh.csv'
datafile = "..\data\datamelbdatathon2019_buylowsellhigh.csv"
#load data file
df = pd.read_csv("..\data\melbdatathon2019_buylowsellhigh.csv")
#check the shape of the data frame
df.shape
# list column names
df.columns
# list unique key_pairs
print(f"unique key_pairs = {df['keys_pair'].unique().shape[0]}")
df['keys_pair'].unique()
#price information is missing for the unnamed pairs
filter_x1 = df.keys_pair == 'pair_1'
x1 = df[filter_x1]
x1.shape
#132,721, 19
# summarise dataframe
x1.describe().transpose()
#distribution of the predictions
df.Lpred1b.hist(bins=100, figsize=(10, 6))
#time series of prices and predictions
filter_x2 = df.keys_pair == 'bitcoin_usdollar'
x = df[filter_x2].head(1000)
sns.lineplot(x='minutesSinceStart', y='barClosePrice', data=x)
sns.lineplot(x='minutesSinceStart', y='Lpred1b', data=x)
Better to run in python notebook. You can download from my git repo
Deletehttps://github.com/dheepdatascigit/dsmdatathon2019help/blob/master/buylowsellhigh_startup01.ipynb