Forecasting the stock price of Tesla with cryptocurrencies (Fall 2021)
[Notice] Journey to the academic researcher This is the story of how I became the insightful researcher.
The subject of this report is whether the fluctuation of cryptocurrency can explain the change in the stock price of Tesla. Since Tesla announced that they accept cryptocurrency to buy their products Tesla and holds about 2 billion dollars, some financial analysts have claimed that the effect of the volatility of cryptocurrency is huge for the stock price of Tesla. Furthermore, As the stock price of Tesla has skyrocketed since the pandemic, Elon Musk, CEO of Tesla, has influenced not only the stock price but also the price of cryptocurrency especially the DOGE coin. We will use two methods for forecasting 1) the traditional approach: forecasting the stock price through VAR(Vector auto-regression with whole variables). 2) LASSO approach: forecasting it through VAR with LASSO eliminating the variables that have low coefficients.
Research purpose
Whether it is possible that the fluctuation of cryptocurrency explains the change of the stock price of Tesla and which approach can forecast the stock price better.
Methods of analysis
- Setting up variables The following 10 cryptocurrencies are exogenous variables based on the market value.
- Bitcoin, Ethereum, Binance, Cardano, Tether, Ripple, Solana, Kraken, USD Coin, and Dogecoin
In addition, we also control for the following variables:
i) Treasury bond yield: as an opportunity cost of the stock price, it has a negative effect on the stock price
ii) Fed Funds Rate: Higher rates indicate that the economy is heating up (more consumption) and vice-versa.
iii) Inflation (inflation expectation and 10-year break-even inflation): relating the purchasing power of customers and stimulating the increase of interest rates
iv) Trade Weighted Dollar Index: TWD can influence the export and import of products along with the fluctuation of stock price
v) Oil price: A higher oil price may cause people to switch to buy an electric car in which case, the price of Tesla’s company would rise. vi) Gold and Silver prices: As a safety asset, the prices of both silver and gold can have a negative relationship with the stock price
the independent variable is the stock price of Tesla
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import statsmodels.api as sm
from statsmodels.tsa.ar_model import AutoReg
import numpy as np
from sklearn.linear_model import ElasticNetCV, ElasticNet
from sklearn.metrics import mean_squared_error
from sklearn.linear_model import LassoCV
import warnings
warnings.simplefilter('ignore')
from sklearn import (linear_model, metrics, neural_network, pipeline, preprocessing, model_selection)
import seaborn as sns
sns.set()
import datetime as dt
from statsmodels.tsa.stattools import grangercausalitytests
dset = pd.read_csv('TSLA_ds_1022.csv')
dset = dset.drop(0)
dset = dset.set_index("Date")
dset.head()
TSLA | 10CMR | FFR | infexp | TWD | 10YBE | oil | XAU | XAG | BTC | ... | ETH_diff | BNB_diff | ADA_diff | Tether_diff | XRP_diff | SOL_diff | pDOTn_diff | USDC_diff | DOGE_diff | const | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Date | |||||||||||||||||||||
2016-01-05 | 44.69 | 2.25 | 0.36 | 1.81 | 114.2649 | 1.56 | 35.97 | 1077.66 | 13.98 | 431.8 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 2.0000 | 0.0 | 0.0 | 0.0 | 0.0 | 1 |
2016-01-06 | 43.81 | 2.18 | 0.36 | 1.80 | 114.6177 | 1.53 | 33.97 | 1094.45 | 14.01 | 428.0 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0000 | 0.0 | 0.0 | 0.0 | 0.0 | 1 |
2016-01-07 | 43.13 | 2.16 | 0.36 | 1.75 | 114.6517 | 1.50 | 33.29 | 1109.25 | 14.31 | 459.0 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0000 | 0.0 | 0.0 | 0.0 | 0.0 | 1 |
2016-01-08 | 42.20 | 2.13 | 0.36 | 1.73 | 115.0097 | 1.48 | 33.20 | 1104.24 | 13.95 | 454.0 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0000 | 0.0 | 0.0 | 0.0 | 0.0 | 1 |
2016-01-11 | 41.57 | 2.17 | 0.36 | 1.71 | 115.0141 | 1.45 | 31.42 | 1094.26 | 13.86 | 449.3 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 24.6883 | 0.0 | 0.0 | 0.0 | 0.0 | 1 |
5 rows × 39 columns
Analyzing the Granger Causality
In the result of Granger Causality analysis, all cryptocurrency variables have a probability greater than 5%. Therefore, the variables of cryptocurrency do not have a significant effect on the traditional approach.
maxlag=10
test = 'ssr_chi2test'
def grangers_causation_matrix(data, variables, test='ssr_chi2test', verbose=False):
"""Check Granger Causality of all possible combinations of the Time series.
The rows are the response variable, columns are predictors. The values in the table
are the P-Values. P-Values lesser than the significance level (0.05), implies
the Null Hypothesis that the coefficients of the corresponding past values is
zero, that is, the X does not cause Y can be rejected.
data : pandas dataframe containing the time series variables
variables : list containing names of the time series variables.
"""
df = pd.DataFrame(np.zeros((len(variables), len(variables))), columns=variables, index=variables)
for c in df.columns:
for r in df.index:
test_result = grangercausalitytests(data[[r, c]], maxlag=maxlag, verbose=False)
p_values = [round(test_result[i+1][0][test][1],4) for i in range(maxlag)]
if verbose: print(f'Y = {r}, X = {c}, P Values = {p_values}')
min_p_value = np.min(p_values)
df.loc[r, c] = min_p_value
df.columns = [var + '_x' for var in variables]
df.index = [var + '_y' for var in variables]
return df
Granger = dset.loc[:, "TSLA_diff":"DOGE_diff"]
grangers_causation_matrix(Granger, variables = Granger.columns)
TSLA_diff_x | 10CMR_diff_x | FFR_diff_x | infexp_diff_x | TWD_diff_x | 10YBE_diff_x | oil_diff_x | XAU_diff_x | XAG_diff_x | BTC_diff_x | ETH_diff_x | BNB_diff_x | ADA_diff_x | Tether_diff_x | XRP_diff_x | SOL_diff_x | pDOTn_diff_x | USDC_diff_x | DOGE_diff_x | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
TSLA_diff_y | 1.0000 | 0.6571 | 0.1963 | 0.4127 | 0.2082 | 0.1380 | 0.8005 | 0.0007 | 0.0096 | 0.6192 | 0.4485 | 0.9632 | 0.5514 | 0.9211 | 0.3492 | 0.8981 | 0.7859 | 0.8515 | 0.8469 |
10CMR_diff_y | 0.0091 | 1.0000 | 0.0000 | 0.0000 | 0.0010 | 0.0000 | 0.0063 | 0.0000 | 0.0001 | 0.0000 | 0.0002 | 0.2052 | 0.0001 | 0.6290 | 0.0214 | 0.1545 | 0.0431 | 0.7412 | 0.4359 |
FFR_diff_y | 0.0879 | 0.0000 | 1.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0005 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.4487 | 0.0000 | 0.9813 | 0.0001 | 0.5253 | 0.8308 | 0.8973 | 0.8941 |
infexp_diff_y | 0.4964 | 0.0000 | 0.0000 | 1.0000 | 0.0002 | 0.0000 | 0.0000 | 0.0067 | 0.0000 | 0.0000 | 0.0000 | 0.3346 | 0.0000 | 0.0065 | 0.1274 | 0.3326 | 0.5260 | 0.4813 | 0.6607 |
TWD_diff_y | 0.4741 | 0.0009 | 0.0000 | 0.0000 | 1.0000 | 0.0000 | 0.2020 | 0.0000 | 0.0000 | 0.0007 | 0.0000 | 0.6265 | 0.0104 | 0.3737 | 0.3939 | 0.6338 | 0.3210 | 0.2664 | 0.8202 |
10YBE_diff_y | 0.7924 | 0.0000 | 0.0000 | 0.0000 | 0.0154 | 1.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.7873 | 0.0000 | 0.8048 | 0.0012 | 0.4042 | 0.3702 | 0.5009 | 0.8122 |
oil_diff_y | 0.6599 | 0.0000 | 0.0107 | 0.0000 | 0.0259 | 0.0000 | 1.0000 | 0.0001 | 0.2883 | 0.4544 | 0.3266 | 0.6329 | 0.7146 | 0.9673 | 0.5308 | 0.9539 | 0.7794 | 0.8809 | 0.9600 |
XAU_diff_y | 0.0261 | 0.0000 | 0.0000 | 0.0390 | 0.5379 | 0.0000 | 0.0409 | 1.0000 | 0.0050 | 0.0009 | 0.0118 | 0.1037 | 0.3502 | 0.5957 | 0.5645 | 0.4666 | 0.7449 | 0.4006 | 0.8543 |
XAG_diff_y | 0.3175 | 0.0000 | 0.0082 | 0.0337 | 0.1683 | 0.0000 | 0.5411 | 0.0126 | 1.0000 | 0.0051 | 0.0107 | 0.5948 | 0.0260 | 0.4045 | 0.4021 | 0.0001 | 0.3679 | 0.6546 | 0.4319 |
BTC_diff_y | 0.7210 | 0.0136 | 0.0491 | 0.3187 | 0.0040 | 0.0734 | 0.4174 | 0.0213 | 0.1035 | 1.0000 | 0.0116 | 0.2759 | 0.0452 | 0.6021 | 0.0607 | 0.1768 | 0.5764 | 0.4664 | 0.7431 |
ETH_diff_y | 0.0000 | 0.0363 | 0.0885 | 0.1287 | 0.0091 | 0.0976 | 0.3597 | 0.4238 | 0.6921 | 0.0062 | 1.0000 | 0.3661 | 0.1468 | 0.3435 | 0.0898 | 0.7666 | 0.4659 | 0.7766 | 0.6625 |
BNB_diff_y | 0.8345 | 0.7036 | 0.3885 | 0.4279 | 0.7411 | 0.6453 | 0.9371 | 0.6083 | 0.8366 | 0.3796 | 0.5876 | 1.0000 | 0.1206 | 0.6467 | 0.0745 | 0.0412 | 0.4296 | 0.3453 | 0.6194 |
ADA_diff_y | 0.7034 | 0.0912 | 0.3040 | 0.0432 | 0.5556 | 0.0284 | 0.3320 | 0.0443 | 0.4985 | 0.0004 | 0.0477 | 0.0245 | 1.0000 | 0.5514 | 0.0004 | 0.0650 | 0.1236 | 0.4002 | 0.2660 |
Tether_diff_y | 0.9104 | 0.8093 | 0.9815 | 0.7463 | 0.0559 | 0.6725 | 0.9786 | 0.2917 | 0.5475 | 0.0700 | 0.1434 | 0.0008 | 0.0003 | 1.0000 | 0.0000 | 0.9261 | 0.9519 | 0.7658 | 0.9077 |
XRP_diff_y | 0.6536 | 0.5104 | 0.2915 | 0.2571 | 0.3972 | 0.2159 | 0.8442 | 0.2406 | 0.6087 | 0.0201 | 0.0903 | 0.0270 | 0.0012 | 0.1630 | 1.0000 | 0.5952 | 0.2933 | 0.6438 | 0.5779 |
SOL_diff_y | 0.7142 | 0.0926 | 0.4007 | 0.3217 | 0.7922 | 0.4763 | 0.9589 | 0.1414 | 0.0322 | 0.3276 | 0.2857 | 0.3227 | 0.0004 | 0.8986 | 0.0000 | 1.0000 | 0.0000 | 0.9475 | 0.9911 |
pDOTn_diff_y | 0.8101 | 0.1768 | 0.4263 | 0.8178 | 0.1299 | 0.5545 | 0.9439 | 0.0013 | 0.0000 | 0.2988 | 0.1380 | 0.7665 | 0.3767 | 0.9649 | 0.0016 | 0.0024 | 1.0000 | 0.9831 | 0.9593 |
USDC_diff_y | 0.8319 | 0.4363 | 0.9091 | 0.5988 | 0.2546 | 0.5857 | 0.9543 | 0.4587 | 0.5843 | 0.0244 | 0.2662 | 0.1003 | 0.0029 | 0.8840 | 0.6250 | 0.9224 | 0.9979 | 1.0000 | 0.9930 |
DOGE_diff_y | 0.7456 | 0.6685 | 0.9377 | 0.7775 | 0.8012 | 0.6029 | 0.9161 | 0.5163 | 0.5112 | 0.3599 | 0.2325 | 0.4398 | 0.3318 | 0.9068 | 0.5919 | 0.9506 | 0.9652 | 0.9761 | 1.0000 |
Confirm training period and test period
We created three time periods of the training set. The first set starts with the beginning of 2016 when cryptocurrencies became famous. The second set starts with the beginning of 2017 when cryptocurrency skyrocketed the first time. The third set starts with the beginning of 2020 when both Tesla and cryptocurrencies skyrocketed. Because the third set only represents that LASSO does not eliminate the coefficient of cryptocurrencies and we realized that the current relationship between the stock price of Tesla and the fluctuations of cryptocurrencies is stronger than before, we decide to choose the Third set as a training set. Furthermore, we confirmed that the test period from Jul-01-2021 to Oct-18-2021.
TSLA = dset["TSLA_diff"]
Exogen = dset.loc[:, "10CMR_diff":"DOGE_diff"]
TSLA_OLS = sm.OLS(TSLA, Exogen)
result_OLS = TSLA_OLS.fit()
names = dset.columns.values[1:19]
plt.figure(figsize=(15,5))
plt.bar(names,result_OLS.params[0:])
plt.title('Relationship 2016-01-01 ~ 2021-10-19')
plt.xlabel('Intercept and Coefficients')
plt.ylabel('OLS Estimates');
print(result_OLS.params[0:])
10CMR_diff 0.014131
FFR_diff 0.092686
infexp_diff 0.087457
TWD_diff -1.309505
10YBE_diff 0.015982
oil_diff -0.011928
XAU_diff -1.087257
XAG_diff 0.463873
BTC_diff 0.020326
ETH_diff 0.017720
BNB_diff -0.004339
ADA_diff 0.013301
Tether_diff -0.002773
XRP_diff -0.020046
SOL_diff 0.026030
pDOTn_diff 0.036460
USDC_diff 0.026032
DOGE_diff 0.000118
dtype: float64
TSLA_EN = LassoCV(cv=10, random_state=1)
TSLA_EN.fit(Exogen, TSLA)
plt.figure(figsize=(15,5))
plt.bar(names, TSLA_EN.coef_)
plt.title('Relationship 2016-01-01 ~ 2021-10-19')
plt.xlabel('Intercept and Coefficients')
plt.ylabel('Lasso Estimates');
print(TSLA_EN.coef_)
[ 0. 0.05427374 0. -0. 0. -0.
-0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0. ]
TSLA = dset.loc["2017-01-01": ,"TSLA_diff"]
Exogen = dset.loc["2017-01-01": , "10CMR_diff":"DOGE_diff"]
TSLA_OLS = sm.OLS(TSLA, Exogen)
result_OLS = TSLA_OLS.fit()
names = dset.columns.values[1:19]
plt.figure(figsize=(15,5))
plt.bar(names,result_OLS.params[0:])
plt.title('Relationship 2017-01-01 ~ 2021-10-19')
plt.xlabel('Intercept and Coefficients')
plt.ylabel('OLS Estimates');
print(result_OLS.params[0:])
10CMR_diff 0.080909
FFR_diff 0.018976
infexp_diff -0.210880
TWD_diff -1.256933
10YBE_diff 0.298315
oil_diff -0.006214
XAU_diff -0.544770
XAG_diff 0.468780
BTC_diff 0.009931
ETH_diff 0.021716
BNB_diff -0.005463
ADA_diff -0.009273
Tether_diff 0.006178
XRP_diff 0.017648
SOL_diff 0.021012
pDOTn_diff 0.026336
USDC_diff 0.027537
DOGE_diff 0.000150
dtype: float64
TSLA_EN = LassoCV(cv=10, random_state=1)
TSLA_EN.fit(Exogen, TSLA)
plt.figure(figsize=(15,5))
plt.bar(names, TSLA_EN.coef_)
plt.title('Relationship 2017-01-01 ~ 2021-10-19')
plt.xlabel('Intercept and Coefficients')
plt.ylabel('Lasso Estimates');
print(TSLA_EN.coef_)
[ 0.04315125 0.03434962 0. -0. 0.06338159 -0.00461966
0. 0.05125903 0. 0.02463747 -0. 0.
0. 0.0127104 0.02291635 0.00664378 0. 0.00013601]
TSLA = dset.loc["2020-01-02": ,"TSLA_diff"]
Exogen = dset.loc["2020-01-02": , "10CMR_diff":"DOGE_diff"]
TSLA_OLS = sm.OLS(TSLA, Exogen)
result_OLS = TSLA_OLS.fit()
names = dset.columns.values[1:19]
plt.figure(figsize=(15,5))
plt.bar(names,result_OLS.params[0:])
plt.title('Relationship 2020-01-02 ~ 2021-10-19')
plt.xlabel('Intercept and Coefficients')
plt.ylabel('OLS Estimates');
print(result_OLS.params[0:])
10CMR_diff 0.068849
FFR_diff 0.017995
infexp_diff -0.398306
TWD_diff -2.191456
10YBE_diff 0.380097
oil_diff -0.006267
XAU_diff -0.312150
XAG_diff 0.398949
BTC_diff 0.071159
ETH_diff 0.036119
BNB_diff -0.023011
ADA_diff -0.021280
Tether_diff 3.290954
XRP_diff 0.046905
SOL_diff 0.012452
pDOTn_diff 0.000428
USDC_diff 2.495569
DOGE_diff 0.000170
dtype: float64
TSLA_EN = LassoCV(cv=10, random_state=1)
TSLA_EN.fit(Exogen, TSLA)
plt.figure(figsize=(15,5))
plt.bar(names, TSLA_EN.coef_)
plt.title('Relationship 2020-01-02 ~ 2021-10-19')
plt.xlabel('Intercept and Coefficients')
plt.ylabel('Lasso Estimates');
print(TSLA_EN.coef_)
[ 0.00260532 0.04136176 0. -0. 0.00862294 -0.00686096
0. 0. 0.00785282 0.07127208 -0. 0.
0. 0.03075644 0.01217944 0. -0. 0.00015222]
"""alphas = [0.0001, 0.001, 0.01, 0.1, 0.3, 0.5, 0.7, 1]
for a in alphas:
model = ElasticNet(alpha=a).fit(Exogen,TSLA)
score = model.score(Exogen,TSLA)
pred_y = model.predict(Exogen)
mse = mean_squared_error(TSLA, pred_y)
print("Alpha:{0:.4f}, R2:{1:.2f}, MSE:{2:.2f}, RMSE:{3:.2f}"
.format(a, score, mse, np.sqrt(mse))) """
'alphas = [0.0001, 0.001, 0.01, 0.1, 0.3, 0.5, 0.7, 1]\nfor a in alphas:\n model = ElasticNet(alpha=a).fit(Exogen,TSLA) \n score = model.score(Exogen,TSLA)\n pred_y = model.predict(Exogen)\n mse = mean_squared_error(TSLA, pred_y)\n print("Alpha:{0:.4f}, R2:{1:.2f}, MSE:{2:.2f}, RMSE:{3:.2f}"\n .format(a, score, mse, np.sqrt(mse))) '
from statsmodels.tsa.vector_ar.var_model import VAR
dset1 = pd.read_csv('TSLA_ds_1022.csv')
dset1 = dset1.drop(0)
dset1 = dset1.set_index("Date")
names1 = ["TSLA_diff", "10CMR_diff", "FFR_diff", "infexp_diff", "TWD_diff", "10YBE_diff",\
"oil_diff", "XAU_diff", "XAG_diff", "BTC_diff", "ETH_diff", "BNB_diff",\
"ADA_diff", "Tether_diff", "XRP_diff", "SOL_diff", "pDOTn_diff", "USDC_diff", "DOGE_diff"]
names2 = ["10CMR_diff", "FFR_diff", "infexp_diff", "TWD_diff", "10YBE_diff",\
"oil_diff", "XAU_diff", "XAG_diff", "BTC_diff", "ETH_diff", "BNB_diff",\
"ADA_diff", "Tether_diff", "XRP_diff", "SOL_diff", "pDOTn_diff", "USDC_diff", "DOGE_diff"]
dset2 = dset1.loc[:,names1]
#data_t = dset2.loc["2016-01-04":"2021-06-21", :]
#col_mask=dset2.isnull().any(axis=0)
#row_mask=dset2.isnull().any(axis=1)
#print(dset2.loc[row_mask,col_mask])
#dset2.fillna(dset2.mean(), inplace=True)
#dset2 = dset2.fillna(dset2.mean())
#dset2._is_view
start_date = "2021-06-21"
end_date = "2021-10-19"
dset2.index.names = ["Date"]
dset2.index = pd.to_datetime(dset.index)
dset2.to_period("D")
data_train = dset2.loc["2016-01-05":"2021-06-21", :]
var_train = VAR(data_train)
results = var_train.fit(25)
lag_order = results.k_ar
forecasted = pd.DataFrame(results.forecast(data_train.values[-lag_order:], 120)) # Forecast 120 months
# Rename forecasted columns
forecasted_names = list(forecasted.columns.values)
data_train_names = list(data_train.columns.values)
var_dict = dict(zip(forecasted_names, data_train_names))
for f,t in var_dict.items():
forecasted = forecasted.rename(columns={f:t + "_fcast"})
forecasted.index= pd.DatetimeIndex(pd.date_range(start_date, periods=forecasted.shape[0]))
forecasted.index.names = ["Date"]
# Parse together forecasted data with original dataset
final_data = pd.merge(forecasted, dset2, left_index=True, right_index=True)
final_data = final_data.sort_index(axis=0, ascending=True)
final_data = pd.concat([data_train, final_data], sort=True, axis=0)
final_data = final_data.sort_index(axis=0, ascending=True)
TSLA_fs = final_data.loc["2021-06-22":"2021-10-18","TSLA_diff_fcast"]
TSLA_r = final_data.loc["2021-06-22":"2021-10-18","TSLA_diff"]
print(TSLA_fs)
var_mse1 = metrics.mean_squared_error(TSLA_fs, TSLA_r)
Date
2021-06-22 0.101006
2021-06-23 5.433423
2021-06-24 -7.929520
2021-06-25 0.569414
2021-06-28 2.115058
...
2021-10-12 -0.075105
2021-10-13 0.118774
2021-10-14 0.335843
2021-10-15 -0.108588
2021-10-18 0.121682
Name: TSLA_diff_fcast, Length: 83, dtype: float64
print(f"The mean squared error between the forecasted and actual values is {var_mse1}")
The mean squared error between the forecasted and actual values is 17.311018428934627
Results of forecasting
The MSE(Mean squared error) of forecasting the stock price through VAR(Vector auto -regression) with the traditional model is 17.3110. The MSE of forecasting the stock price through VAR with the LASSO model is 10.6772.
fig, ax = plt.subplots(figsize=(14,6))
colors = sns.color_palette("deep", 8)
TSLA_rplot = final_data.loc["2021-01-02":"2021-10-18","TSLA_diff"]
TSLA_fs.plot(ax=ax, legend=True, linewidth=2.5, linestyle="dashed")
TSLA_rplot.plot(ax=ax, legend=True, alpha=0.6, linestyle="solid")
ax.set_title("VAR in-sample forecast, traditional approach", fontsize=16, fontweight="bold", fontname="Verdana", loc="left")
ax.set_ylabel("First differences", fontname="Verdana")
ax.legend([f"VAR Forecast, MSE={var_mse1}", "TSLA Real Fluctuations"])
<matplotlib.legend.Legend at 0x272c5e19550>
def train_test_plot(model, X_train, X_test):
"""
This will plot the actual values of CPI against the one fitted by the model
We train the model until 2009 and then use it from 2009 onwards on the test features dataset
"""
fig, ax = plt.subplots(figsize=(12,4))
colors = sns.color_palette("deep", 8)
yvalues = pd.DataFrame(y_test)
forecasted = list(model.predict(X_test)) # Use the model fit on features data from 2009 onwards
df_fcast = pd.DataFrame({"date": list(yvalues.index), "TSLA_fcast": forecasted})
df_fcast = df_fcast.set_index("date")
df = pd.merge(yvalues, df_fcast, left_index=True, right_index=True)
df["TSLA_fcast"].plot(ax=ax, legend=True, linewidth=2.5, linestyle="dashed", color="forestgreen") # TSLA fitted
df["TSLA_diff"].plot(ax=ax, legend=True, linewidth=1.5, linestyle="solid", color="salmon") # Actual TSLA values
ax.set_title("TSLA vs. Model's TSLA")
ax.set_ylabel("First differences")
ax.legend(["Fitted TSLA","Actual TSLA"])
x_train = dset2.loc["2020-01-05":"2021-06-21", names2]
x_test = dset2.loc["2021-06-21":,names2]
y_train = dset2.loc["2020-01-05":"2021-06-21", "TSLA_diff"]
y_test = dset2.loc["2021-06-21":, "TSLA_diff"]
lasso = linear_model.LassoCV(cv=model_selection.TimeSeriesSplit(n_splits=5),
alphas=None, tol = 10000, normalize=True)
fred_lasso = lasso.fit(x_train, y_train)
optimal_alpha = fred_lasso.alpha_
lasso2 = linear_model.Lasso(alpha=optimal_alpha, normalize=True)
lasso2.fit(x_train, y_train)
lasso2.coef_
#train_test_plot(dset_EN, x_train, x_test)
array([ 7.20902440e-02, 2.00701734e-02, -4.02847424e-01, -2.21122815e+00,
3.77903319e-01, -6.91098494e-03, -3.34285652e-01, 4.24181205e-01,
8.54914242e-02, 3.52091625e-02, -2.50758876e-02, -3.56472136e-02,
3.52150559e+00, 6.06896275e-02, 8.61617708e-03, 7.33200359e-03,
3.14226828e+00, 2.28882758e-04])
fig, ax = plt.subplots(figsize=(12,4))
colors = sns.color_palette("deep", 8)
yvalues = pd.DataFrame(y_test)
forecasted = list(lasso2.predict(x_test)) # Use the model fit on features data from 2009 onwards
df_fcast = pd.DataFrame({"date": list(yvalues.index), "TSLA_fcast": forecasted})
df_fcast = df_fcast.set_index("date")
df = pd.merge(yvalues, df_fcast, left_index=True, right_index=True)
df["TSLA_fcast"].plot(ax=ax, legend=True, linewidth=2.5, linestyle="dashed", color="forestgreen") # TSLA fitted
df["TSLA_diff"].plot(ax=ax, legend=True, linewidth=1.5, linestyle="solid", color="salmon") # Actual TSLA values
ax.set_title("TSLA vs. Model's TSLA")
ax.set_ylabel("First differences")
ax.legend(["Fitted TSLA","Actual TSLA"])
<matplotlib.legend.Legend at 0x272c414f5e0>
metrics.mean_squared_error(y_test, dset_EN.predict(x_test))
3.8741380753790273
lasso_coefs = pd.DataFrame({"features":list(x_train), "coef": lasso2.coef_})
lasso_coefs = lasso_coefs[lasso_coefs.coef != 0.0]
lasso_coefs.sort_values("coef", ascending=False)
features | coef | |
---|---|---|
12 | Tether_diff | 3.521506 |
16 | USDC_diff | 3.142268 |
7 | XAG_diff | 0.424181 |
4 | 10YBE_diff | 0.377903 |
8 | BTC_diff | 0.085491 |
0 | 10CMR_diff | 0.072090 |
13 | XRP_diff | 0.060690 |
9 | ETH_diff | 0.035209 |
1 | FFR_diff | 0.020070 |
14 | SOL_diff | 0.008616 |
15 | pDOTn_diff | 0.007332 |
17 | DOGE_diff | 0.000229 |
5 | oil_diff | -0.006911 |
10 | BNB_diff | -0.025076 |
11 | ADA_diff | -0.035647 |
6 | XAU_diff | -0.334286 |
2 | infexp_diff | -0.402847 |
3 | TWD_diff | -2.211228 |
names3 = ["TSLA_diff","FFR_diff", "ETH_diff", "XRP_diff",\
"10YBE_diff", "BTC_diff", "SOL_diff"]
dset3 = dset1.loc[:,names3]
#mse2, df2 = var_create(columns=names, data=dset)
#print(f"The mean squared error between the forecasted and actual values is {mse2}")
start_date = "2021-06-21"
end_date = "2021-10-19"
dset3.index.names = ["Date"]
dset3.index = pd.to_datetime(dset.index)
dset3.to_period("D")
data_train = dset3.loc["2020-01-02":"2021-06-21", :]
var_train = VAR(data_train)
results = var_train.fit(25)
lag_order = results.k_ar
forecasted = pd.DataFrame(results.forecast(data_train.values[-lag_order:], 120)) # Forecast 120 months
# Rename forecasted columns
forecasted_names = list(forecasted.columns.values)
data_train_names = list(data_train.columns.values)
var_dict = dict(zip(forecasted_names, data_train_names))
for f,t in var_dict.items():
forecasted = forecasted.rename(columns={f:t + "_fcast"})
forecasted.index= pd.DatetimeIndex(pd.date_range(start_date, periods=forecasted.shape[0]))
forecasted.index.names = ["Date"]
# Parse together forecasted data with original dataset
final_data = pd.merge(forecasted, dset3, left_index=True, right_index=True)
final_data = final_data.sort_index(axis=0, ascending=True)
final_data = pd.concat([data_train, final_data], sort=True, axis=0)
final_data = final_data.sort_index(axis=0, ascending=True)
TSLA_fs = final_data.loc["2021-06-22":"2021-10-18","TSLA_diff_fcast"]
TSLA_r = final_data.loc["2021-06-22":"2021-10-18","TSLA_diff"]
print(TSLA_fs)
var_mse2 = metrics.mean_squared_error(TSLA_fs, TSLA_r)
Date
2021-06-22 9.055724
2021-06-23 12.795725
2021-06-24 0.255226
2021-06-25 -4.068157
2021-06-28 -1.563721
...
2021-10-12 -0.436792
2021-10-13 0.067200
2021-10-14 -0.005382
2021-10-15 0.128331
2021-10-18 -0.751468
Name: TSLA_diff_fcast, Length: 83, dtype: float64
fig, ax = plt.subplots(figsize=(14,6))
colors = sns.color_palette("deep", 8)
TSLA_rplot = final_data.loc["2021-01-02":"2021-10-18","TSLA_diff"]
TSLA_fs.plot(ax=ax, legend=True, linewidth=2.5, linestyle="dashed")
TSLA_rplot.plot(ax=ax, legend=True, alpha=0.6, linestyle="solid")
ax.set_title("VAR in-sample forecast, LASSO approach", fontsize=16, fontweight="bold", fontname="Verdana", loc="left")
ax.set_ylabel("First differences", fontname="Verdana")
ax.legend([f"VAR Forecast, MSE={var_mse2}", "TSLA Real Fluctuations"])
<matplotlib.legend.Legend at 0x272c71231f0>
Result and inference
With the traditional approach, we can not find any significant coefficient relationship between the stock price of Tesla and the price of cryptocurrencies. However, with the LASSO, we figure out that even the coefficients of cryptocurrencies do not have statistical significance, the fluctuations of cryptocurrencies can support to explain more the change of stock price of Tesla because the MSE with the LASSO is lower than that with the traditional approach. Furthermore, we can confirm that even though the change of price of cryptocurrencies did not affect the fluctuations of the stock price of Tesla, nowadays the effect has become larger than before.