Wow. I’m excited!
My first attemp to truly predict price changes on bitcoin. Let’s see how it goes.
Following an article I produced very first chart:
Second example from an article resulted in:
It’s time to try ARIMA package. Third code example produced another plot:
Finally, at least for shampoo dataset, let’s do some predictions:
It’s time to fetch in some real bitcoin data! First I need to expot into CSV last 100 transaction and try to plot some forecasts. Should be easy. An SQL query comes handy:
select time, price from gdax_matches where product_id = 'BTC-USD' order by time desc limit 100 into outfile 'btc-usd-matches-100.csv' fields terminated by ',' enclosed by '' lines terminated by '\n';
Manually added header row, and executed the script reasulted in this beautifull screenshot:
red – predicted,
blue – original values
Source code:
from pandas import read_csv from pandas import datetime from matplotlib import pyplot from statsmodels.tsa.arima_model import ARIMA from sklearn.metrics import mean_squared_error series = read_csv('btc-usd-matches-100.csv', header=0, parse_dates=[0], index_col=0, squeeze=True) X = series.values size = int(len(X) * 0.66) train, test = X[0:size], X[size:len(X)] history = [x for x in train] predictions = list() for t in range(len(test)): model = ARIMA(history, order=(5,1,0)) model_fit = model.fit(disp=0) output = model_fit.forecast() yhat = output[0] predictions.append(yhat) obs = test[t] history.append(obs) print('predicted=%f, expected=%f' % (yhat, obs)) error = mean_squared_error(test, predictions) print('Test MSE: %.3f' % error) # plot pyplot.plot(test) pyplot.plot(predictions, color='red') pyplot.show()
So it was a test on last 100 transactions. Let’s try last 1000 transactions. Here are the results:
with zoom-in:
But the shampoo example from the article first calculated autocorrelations. Let’s try that for 1000 last bitcoin transactions:
Ok – looks like the prediction script needs an alteration with a parameter 100 instead of 5 into ARIMA model. New test run into memory error.. For using 50 instead of 100 it is still calculating. Will post results tomorrow.
Oh, and by the way – over 2mln records in the database:
MariaDB [solocryptoprenuer]> select count(1) from gdax_matches; +----------+ | count(1) | +----------+ | 2084602 | +----------+ 1 row in set (6.05 sec)
Great! Sounds like a lot of training data.
Let me share some real historical trading data for a pair BTC-USD:
- btc-usd-matches-100.csv
- btc-usd-matches-1000.csv
- btc-usd-matches-10000.csv
- btc-usd-matches-100000.csv
- btc-usd-matches-897528.csv
Have fun.
Thanks for reading,
Łukasz.