这篇文章是关于使用时间序列预测的Bitcoin Price预测。时间序列预测与其他机器学习模型有很大不同，因为 -

1.时间依赖性。因此，观测值是独立的线性回归模型的基本假设在这种情况下不成立。
2.随着增加或减少的趋势，大多数时间序列都有某种形式的季节性趋势，即特定时间范围内的变化。

因此，不能使用简单的机器学习模型，因此时间序列预测是一个不同的研究领域。在本文中，AR，MA和ARIMA等时间序列模型用于预测Bitcoin 的价格。

该数据集包含2013年4月至2017年8月Bitcoin的开盘价和收盘价

导入必要的库

import pandas as kunfu
import numpy as dragon
import pylab as p
import matplotlib.pyplot as plot
from collections import Counter
import re
#importing packages for the prediction of time-series data
import statsmodels.api as sm
import statsmodels.tsa.api as smt
import statsmodels.formula.api as smf
from sklearn.metrics import mean_squared_error

绘制时间序列

将数据加载到训练数据框中，然后使用日期作为索引，系列用x轴上的日期和y轴上的收盘价格绘制。

data = train['Close']
Date1 = train['Date']
train1 = train[['Date','Close']]
# Setting the Date as Index
train2 = train1.set_index('Date')
train2.sort_index(inplace=True)
print (type(train2))
print (train2.head())
plot.plot(train2)
plot.xlabel('Date', fontsize=12)
plot.ylabel('Price in USD', fontsize=12)
plot.title("Closing price distribution of bitcoin", fontsize=15)
plot.show()

测试平稳性

增强Dicky Fuller测试：

增强Dicky Fuller测试是一种称为单位根测试的统计测试。

单位根检验背后的直觉是它决定了时间序列由趋势定义的强度。

ADF单位根检验和是应用最广泛的一种

1. Null Hypothesis (H0):原假设(null hypothesis)亦称待验假设、虚无假设、解消假设，时间序列可以用非平稳的单位根表示。。
2. Alternative Hypothesis (H1): 备择假设（Alternative Hypothesis）,时间序列是固定的。

ADF值的解释：

1. p值 > 0.05：接原假设（H0），数据具有单位根并且是非平稳的。
2. p值 <= 0.05：拒绝原假设（H0），数据是固定的。

from statsmodels.tsa.stattools import adfuller
def test_stationarity(x):
#Determing rolling statistics
rolmean = x.rolling(window=22,center=False).mean()
rolstd = x.rolling(window=12,center=False).std()
#Plot rolling statistics:
orig = plot.plot(x, color='blue',label='Original')
mean = plot.plot(rolmean, color='red', label='Rolling Mean')
std = plot.plot(rolstd, color='black', label = 'Rolling Std')
plot.legend(loc='best')
plot.title('Rolling Mean & Standard Deviation')
plot.show(block=False)
#Perform Dickey Fuller test
result=adfuller(x)
print('ADF Stastistic: %f'%result[0])
print('p-value: %f'%result[1])
pvalue=result[1]
for key,value in result[4].items():
if result[0]>value:
print("The graph is non stationery")
break
else:
print("The graph is stationery")
break;
print('Critical values:')
for key,value in result[4].items():
print('\t%s: %.3f ' % (key, value))
ts = train2['Close']
test_stationarity(ts)

日志转换系列

日志转换用于纠正高度倾斜的数据。从而有助于预测过程。

ts_log = dragon.log（ts）
plot.plot（ts_log，color =“green”）
plot.show（）
test_stationarity（ts_log）

decomposition消除趋势和季节性

decomposition是一种技术，在该技术中，该序列的季节性、趋势成分被移除，然后将模型应用于残差序列。

# Naive decomposition of our Time Series as explained above
from statsmodels.tsa.seasonal import seasonal_decompose
decomposition = seasonal_decompose(ts_log, model='multiplicative',freq = 7)
trend = decomposition.trend
seasonal = decomposition.seasonal
residual = decomposition.resid
plot.subplot(411)
plot.title('Obeserved = Trend + Seasonality + Residuals')
plot.plot(ts_log,label='Observed')
plot.legend(loc='best')
plot.subplot(412)
plot.plot(trend, label='Trend')
plot.legend(loc='best')
plot.subplot(413)
plot.plot(seasonal,label='Seasonality')
plot.legend(loc='best')
plot.subplot(414)
plot.plot(residual, label='Residuals')
plot.legend(loc='best')
plot.tight_layout()
plot.show()

用差分去除趋势和季节性

如果要使时间序列保持不变，则用之前的值减去当前值。正因如此，均值趋于稳定，从而增加了时间序列的平稳性。

ts_log_diff = ts_log - ts_log.shift()
plot.plot(ts_log_diff)
plot.show()

ts_log_diff.dropna(inplace=True)
test_stationarity(ts_log_diff)

由于我们的时间序列现在是平稳的，所以我们可以应用时间序列预测模型。

自回归模型

自回归模型是一个时序预测模型，其中当前值与过去值有关。

# follow lag
model = ARIMA(ts_log, order=(1,1,0))
results_ARIMA = model.fit(disp=-1)
plot.plot(ts_log_diff)
plot.plot(results_ARIMA.fittedvalues, color='red')
plot.title('RSS: %.7f'% sum((results_ARIMA.fittedvalues-ts_log_diff)**2))
plot.show()

移动平均模型

在移动平均模型中，该系列依赖于过去的误差项。

＃follow error model = ARIMA（ts_log，order =（0,1,1））
results_MA = model.fit（disp = -1）
plot.plot（ts_log_diff）
plot.plot（results_MA.fittedvalues ，color ='red'）
plot.title（'RSS：％.7f '％sum（（results_MA.fittedvalues-ts_log_diff）** 2））
plot.show（）

自回归整合移动平均模型

它是AR和MA模型的组合。它通过差分过程使时间序列本身固定。因此差分不需要为ARIMA模型明确进行

from statsmodels.tsa.arima_model import ARIMA
model = ARIMA（ts_log，order =（8,1,0））
results_ARIMA = model.fit（disp = -1）
plot.plot（ts_log_diff）
plot.plot（results_ARIMA.fittedvalues ，color ='red'）
plot.title（'RSS：％.7f '％sum（（results_ARIMA.fittedvalues-ts_log_diff）** 2））
plot.show（）

size = int(len(ts_log)-100)
train_arima, test_arima = ts_log[0:size], ts_log[size:len(ts_log)]
history = [x for x in train_arima]
predictions = list()
originals = list()
error_list = list()
print('Printing Predicted vs Expected Values...')
print('\n')
for t in range(len(test_arima)):
model = ARIMA(history, order=(2, 1, 0))
model_fit = model.fit(disp=-1)
output = model_fit.forecast()
pred_value = output[0]
original_value = test_arima[t]
history.append(original_value)
pred_value = dragon.exp(pred_value)
original_value = dragon.exp(original_value)
#Calculatig the serror
error = ((abs(pred_value - original_value)) / original_value) * 100
error_list.append(error)
print('predicted = %f, expected = %f, error = %f ' % (pred_value, original_value, error), '%')
predictions.append(float(pred_value))
originals.append(float(original_value))
print('\n Means Error in Predicting Test Case Articles : %f ' % (sum(error_list)/float(len(error_list))), '%')
plot.figure(figsize=(8, 6))
test_day = [t
for t in range(len(test_arima))]
labels={'Orginal','Predicted'}
plot.plot(test_day, predictions, color= 'green')
plot.plot(test_day, originals, color = 'orange')
plot.title('Expected Vs Predicted Views Forecasting')
plot.xlabel('Day')
plot.ylabel('Closing Price')
plot.legend(labels)
plot.show()

predicted = 2513.745189, expected = 2564.060000, error = 1.962310 %

predicted = 2566.007269, expected = 2601.640000, error = 1.369626 %

predicted = 2604.348629, expected = 2601.990000, error = 0.090647 %

predicted = 2605.558976, expected = 2608.560000, error = 0.115045 %

predicted = 2613.835793, expected = 2518.660000, error = 3.778827 %

predicted = 2523.203681, expected = 2571.340000, error = 1.872032 %

predicted = 2580.654927, expected = 2518.440000, error = 2.470376 %

predicted = 2521.053567, expected = 2372.560000, error = 6.258791 %

predicted = 2379.066829, expected = 2337.790000, error = 1.765635 %

predicted = 2348.468544, expected = 2398.840000, error = 2.099826 %

predicted = 2405.299995, expected = 2357.900000, error = 2.010263 %

predicted = 2359.650935, expected = 2233.340000, error = 5.655697 %

predicted = 2239.002236, expected = 1998.860000, error = 12.013960 %

predicted = 2006.206534, expected = 1929.820000, error = 3.958221 %

predicted = 1942.244784, expected = 2228.410000, error = 12.841677 %

predicted = 2238.150016, expected = 2318.880000, error = 3.481421 %

predicted = 2307.325788, expected = 2273.430000, error = 1.490954 %

predicted = 2272.890197, expected = 2817.600000, error = 19.332404 %

predicted = 2829.051277, expected = 2667.760000, error = 6.045944 %

predicted = 2646.110662, expected = 2810.120000, error = 5.836382 %

predicted = 2822.356853, expected = 2730.400000, error = 3.367889 %

predicted = 2730.087031, expected = 2754.860000, error = 0.899246 %

predicted = 2763.766195, expected = 2576.480000, error = 7.269072 %

predicted = 2580.946838, expected = 2529.450000, error = 2.035891 %

predicted = 2541.493507, expected = 2671.780000, error = 4.876393 %

predicted = 2679.029936, expected = 2809.010000, error = 4.627255 %

predicted = 2808.092238, expected = 2726.450000, error = 2.994452 %

predicted = 2726.150588, expected = 2757.180000, error = 1.125404 %

predicted = 2766.298163, expected = 2875.340000, error = 3.792311 %

Means Error in Predicting Test Case Articles : 3.593133 %

因此，原始和预测时间序列绘制的平均误差为3.59％。

网站首页 > 技术文章正文

机器学习:使用时间序列预测的Bitcoin Price预测模型

导入必要的库

绘制时间序列

测试平稳性

日志转换系列

decomposition消除趋势和季节性

用差分去除趋势和季节性

自回归模型

移动平均模型

自回归整合移动平均模型

猜你喜欢

本文暂时没有评论，来添加一个吧(●'◡'●)

取消回复欢迎你发表评论:

网站首页 > 技术文章 正文

机器学习:使用时间序列预测的Bitcoin Price预测模型

导入必要的库

绘制时间序列

测试平稳性

日志转换系列

decomposition消除趋势和季节性

用差分去除趋势和季节性

自回归模型

移动平均模型

自回归整合移动平均模型

猜你喜欢

本文暂时没有评论，来添加一个吧(●'◡'●)

取消回复欢迎 你 发表评论:

网站首页 > 技术文章正文

取消回复欢迎你发表评论: