Wow. I’m excited!

My first attemp to truly predict price changes on bitcoin. Let’s see how it goes.

Following an article I produced very first chart:

Second example from an article resulted in:

It’s time to try ARIMA package. Third code example produced another plot:

Finally, at least for shampoo dataset, let’s do some predictions:

It’s time to fetch in some real bitcoin data! First I need to expot into CSV last 100 transaction and try to plot some forecasts. Should be easy. An SQL query comes handy:

select time, price from gdax_matches where product_id = 'BTC-USD' order by time desc limit 100 into outfile 'btc-usd-matches-100.csv' fields terminated by ',' enclosed by '' lines terminated by '\n';

Manually added header row, and executed the script reasulted in this beautifull screenshot:

red – predicted,

blue – original values

Source code:

from pandas import read_csv from pandas import datetime from matplotlib import pyplot from statsmodels.tsa.arima_model import ARIMA from sklearn.metrics import mean_squared_error series = read_csv('btc-usd-matches-100.csv', header=0, parse_dates=[0], index_col=0, squeeze=True) X = series.values size = int(len(X) * 0.66) train, test = X[0:size], X[size:len(X)] history = [x for x in train] predictions = list() for t in range(len(test)): model = ARIMA(history, order=(5,1,0)) model_fit = model.fit(disp=0) output = model_fit.forecast() yhat = output[0] predictions.append(yhat) obs = test[t] history.append(obs) print('predicted=%f, expected=%f' % (yhat, obs)) error = mean_squared_error(test, predictions) print('Test MSE: %.3f' % error) # plot pyplot.plot(test) pyplot.plot(predictions, color='red') pyplot.show()

So it was a test on last 100 transactions. Let’s try last 1000 transactions. Here are the results:

with zoom-in:

But the shampoo example from the article first calculated autocorrelations. Let’s try that for 1000 last bitcoin transactions:

Ok – looks like the prediction script needs an alteration with a parameter 100 instead of 5 into ARIMA model. New test run into memory error.. For using 50 instead of 100 it is still calculating. Will post results tomorrow.

Oh, and by the way – over 2mln records in the database:

MariaDB [solocryptoprenuer]> select count(1) from gdax_matches; +----------+ | count(1) | +----------+ | 2084602 | +----------+ 1 row in set (6.05 sec)

Great! Sounds like a lot of training data.

Let me share some real historical trading data for a pair BTC-USD:

- btc-usd-matches-100.csv
- btc-usd-matches-1000.csv
- btc-usd-matches-10000.csv
- btc-usd-matches-100000.csv
- btc-usd-matches-897528.csv

Have fun.

Thanks for reading,

Łukasz.