How Covid Shifted Macroeconomic Factors Affecting Unemployment: what makes an economy resilient (Fall 2021)
[Notice] Journey to the academic researcher This is the story of how I became the insightful researcher.
ABSTRACT
This research examines how Covid changed significant macroeconomic factors affecting unemployment. The research compares the statistical results of two multivariable regression models in 2019 and 2020. To maximize the validity of each variable, this research conducted significance tests on various units of macroeconomic variables in each cross-sectional model. In addition, the research also investigated how many variables can generate the highest adjusted R-squared by using the Stepwise regression method, which subtracts the least significant variables to improve the efficiency of the models. The research models are adjusted to follow Gauss-Markov Assumptions, representing that the models’ results are not biased. The analysis provides three economic insights. Overall, the higher R-squared in 2020 model implies that demand-deficient and structural unemployment are the main types of unemployment after the outbreak of covid. Demographically, countries with high populations in rural areas can deal with unemployment problems better than others since Covid had a relatively lower impact on rural areas. Industrially, countries with high dependency on the service industry were vulnerable to Covid effect on unemployment because the service industry is more susceptible to economic cycles than other industries.
1. Introduction
The unemployment rate is a key factor when determining a country’s economic health. Countries with high unemployment rates cannot achieve potential real GDP growth, the ideal growth rate with a natural unemployment rate. Unemployment is also a substantial issue for individuals. Because many individuals rely on incomes from their employment, their everyday life will be in danger if a constant high unemployment rate is maintained. As a result, with the inflation rate, the unemployment rate is the main component of ‘Misery Index,’ measuring the degree of economic distress felt by individuals (Clay Halton, 2022). The unemployment rate is defined as the portion of people who are looking for a job but have not been able to find one at economically active ages. Because most job seekers are from low or mid-income, a high unemployment rate causes economic polarization (Martin and Alicia, 2000). The wealth disparity generated by unemployment has triggered various social issues, such as Discriminatory abuse, increased homelessness, and elevated anxiety about social safety. Moreover, Covid-19 accelerated the discrepancy further. Wealthy people took advantage of bubble markets, while the middle or lower classes lost jobs and suffered from the disease without proper treatment. The complication will be more stringent unless broad social cooperation is made. A profound understanding of the reasons for unemployment is inevitably required to resolve unemployment problems effectively and efficiently. However, the unemployment rate is affected by various factors such as cultural background, economic cycles, and technological changes, therefore resolutions for decreasing unemployment should reflect shifted variables. Moreover, the world has experienced the Covid pandemic, one of the most influential events in human history. Covid explicitly influenced unemployment with massive quarantines and lay-offs. During the Covid downturn, millions lost their lives and jobs. Individuals who could not work remotely lost their jobs or were exposed to a susceptible situation of getting Covid. Not only individuals but many businesses and stores were also closed since they could not sell products or services in person. Even though surviving companies have operated their businesses, they had to lay off employees to cut expenses. The recession of Covid can be compared to the Great Depression in the 1930s (David Wheelock, 2020). The recent unemployment cannot be thoroughly explained without understanding the unprecedented impacts of Covid on the economy. In this research, to analyze the paradigm shift of significant variables that affect the unemployment rate, I compare the statistical results of two multivariable regression models in 2019 and 2020. Two regression models are adjusted to follow Gauss-Markov Assumptions. The results of the regression models imposed that the factors affecting the unemployment rate after the outbreak of Covid have shifted from before Covid.
import pandas as pd
import statsmodels.api as sm
import numpy as np
import matplotlib.pyplot as plt
from scipy import stats
dset=pd.read_csv('sel_dataset.csv')
dset # set "Date" to indexes of the column
Country Name | Country Code | unemp19 | unemp20 | GDP19 | GDP20 | pGDP19 | pGDP20 | lGDP19 | lGDP20 | ... | lpmanv19 | lpmanv20 | servv19 | servv20 | lserv19 | lserv20 | pserv19 | pserv20 | lpserv19 | lpserv20 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | Africa Eastern and Southern | AFE | 6.47 | 6.81 | 9.803716e+11 | 9.008286e+11 | 7.30 | 7.19 | 27.61 | 27.53 | ... | 5.00 | 4.88 | 4.814599e+11 | 4.366725e+11 | 26.90 | 26.80 | 729.43 | 644.78 | 6.59 | 6.47 |
1 | Africa Western and Central | AFW | 5.93 | 6.30 | 7.920789e+11 | 7.865850e+11 | 7.48 | 7.45 | 27.40 | 27.39 | ... | 5.35 | 5.35 | 3.760931e+11 | 3.532896e+11 | 26.65 | 26.59 | 841.54 | 770.02 | 6.74 | 6.65 |
2 | Albania | ALB | 11.47 | 11.70 | 1.528661e+10 | 1.479962e+10 | 8.59 | 8.56 | 23.45 | 23.42 | ... | 5.82 | 5.78 | 7.425612e+09 | 7.167200e+09 | 22.73 | 22.69 | 2601.65 | 2525.67 | 7.86 | 7.83 |
3 | Armenia | ARM | 18.81 | 20.21 | 1.367280e+10 | 1.264546e+10 | 8.44 | 8.36 | 23.34 | 23.26 | ... | 6.30 | 6.27 | 7.415285e+09 | 6.739570e+09 | 22.73 | 22.63 | 2507.09 | 2274.40 | 7.83 | 7.73 |
4 | Australia | AUS | 5.16 | 6.61 | 1.396567e+12 | 1.330901e+12 | 10.92 | 10.86 | 27.97 | 27.92 | ... | 8.04 | 7.99 | 9.221563e+11 | 8.789295e+11 | 27.55 | 27.50 | 36354.40 | 34216.84 | 10.50 | 10.44 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
122 | Uruguay | URY | 9.35 | 12.67 | 6.123115e+10 | 5.362883e+10 | 9.78 | 9.64 | 24.84 | 24.71 | ... | 7.52 | 7.37 | 3.934694e+10 | 3.379209e+10 | 24.40 | 24.24 | 11366.26 | 9727.91 | 9.34 | 9.18 |
123 | Vietnam | VNM | 2.04 | 2.27 | 2.619212e+11 | 2.711584e+11 | 7.91 | 7.93 | 26.29 | 26.33 | ... | 6.10 | 6.14 | 1.090600e+11 | 1.128705e+11 | 25.42 | 25.45 | 1130.60 | 1159.57 | 7.03 | 7.06 |
124 | Samoa | WSM | 8.22 | 8.87 | 8.522502e+08 | 8.070272e+08 | 8.37 | 8.31 | 20.56 | 20.51 | ... | 5.57 | 5.40 | 6.362866e+08 | 6.039875e+08 | 20.27 | 20.22 | 3228.36 | 3044.14 | 8.08 | 8.02 |
125 | South Africa | ZAF | 28.47 | 28.74 | 3.514316e+11 | 3.019236e+11 | 8.70 | 8.54 | 26.59 | 26.43 | ... | 6.56 | 6.37 | 2.150928e+11 | 1.855286e+11 | 26.09 | 25.95 | 3673.14 | 3128.19 | 8.21 | 8.05 |
126 | Zambia | ZMB | 11.91 | 12.17 | 2.330869e+10 | 1.932005e+10 | 7.17 | 6.96 | 23.87 | 23.68 | ... | 4.48 | 4.33 | 1.272739e+10 | 9.335536e+09 | 23.27 | 22.96 | 712.58 | 507.81 | 6.57 | 6.23 |
127 rows × 50 columns
2. Data Description
Dependent Variable
Unemployment is defined as “individuals who are employable and actively seeking a job but are unable to find a job” (CFI Team, 2022). Unemployment can be divided into four types based on the reason for each unemployment (CFI Team, 2022). To identify influential factors to unemployment, I previously understood four types of unemployment and specify which type of unemployment occurred in 2020. Demand Deficient Unemployment: Demand Deficient unemployment is caused by decreased demand for employment during a recession. I expected that this type of unemployment accounts for the highest portion of unemployment during Covid. As people went into quarantine, many businesses experienced a loss in consumer demand. To overcome reduced consumer demand, companies would decrease the number of employees and cut labor expenses. Frictional Unemployment: Frictional unemployment is a natural part of unemployment that occurs when employees are switching jobs. Because this unemployment is the gap between one job and another, it is inevitable and always exists. I anticipated that this type of unemployment does not have a significant difference before and after Covid. Structural Unemployment: Structural unemployment is generated by the disparity between demand and supply of employment such as the requirement of skillsets and geographical location of the jobs. Because people could not fluently travel abroad during Covid, the geographical disparity could cause unemployment. Additionally, the disparity of skillsets can have an important impact on unemployment because the demand of cutting-edge computing skills has significantly increased during Covid. Voluntary Unemployment: Voluntary unemployment is caused due to the individual’s decision rather than a structural or economic issue. I did not predict that voluntary unemployment influentially increased during Covid.
Independent Variables
I categorized 11 independent variables into five parts as GDP, Inflation, Demographical, Education, and Industry. Because the unemployment rate is influenced by various factors, I tried to collect diverse variables as much as possible. In World Bank data, I figured out 11 variables that have the potential to impact unemployment. Because several countries highly affected by Covid could not provide macroeconomic data, only 127 out of 166 countries have required data in both 2019 and 2020. Additionally, private information such as an individual’s skillsets, wealth, and cultural background can influence unemployment, but the data could not be obtained and considered in those models. However, I can intuitively recognize that the part that macroeconomic data cannot justify is the one that private data can explain. Furthermore, the unit of macroeconomic data can affect the significance of variables. Through comparing R-squares in cross-sectional models with different units, logarithm and value, I used the unit with high R-square and low P values to maximize the explainability of variables.
def scatter_subs(data, col_1, col_2, color):
"""
Break down scatterplots into different years
"""
fig, ax = plt.subplots(1, 2, figsize=(8,4), sharey=True)
ax[0].scatter(x=data[col_1+"19"], y=data[col_2+"19"], alpha=0.4, color=color)
ax[1].scatter(x=data[col_1+"20"], y=data[col_2+"20"], alpha=0.4, color=color)
print(col_2)
print(data[col_2+'19'].mean())
print(data[col_2+'19'].std())
print(data[col_2+'20'].mean())
print(data[col_2+'20'].std())
ax[0].set_title("2019", fontsize=14, fontname="Verdana")
ax[1].set_title("2020", fontsize=14, fontname="Verdana")
for i in list(range(2)):
ax[i].set_xlabel("unemployment rate")
ax[i].set_ylabel(col_2)
scatter_subs(data=dset, col_1="unemp", col_2="lGDP", color="orange")
scatter_subs(data=dset, col_1="unemp", col_2="gGDP", color="orange")
scatter_subs(data=dset, col_1="unemp", col_2="pGDP", color="orange")
scatter_subs(data=dset, col_1="unemp", col_2="cpi", color="orange")
scatter_subs(data=dset, col_1="unemp", col_2="ltotpop", color="orange")
scatter_subs(data=dset, col_1="unemp", col_2="prur", color="orange")
scatter_subs(data=dset, col_1="unemp", col_2="pEpop", color="orange")
scatter_subs(data=dset, col_1="unemp", col_2="educ", color="orange")
scatter_subs(data=dset, col_1="unemp", col_2="lagriv", color="orange")
scatter_subs(data=dset, col_1="unemp", col_2="lmanv", color="orange")
scatter_subs(data=dset, col_1="unemp", col_2="lserv", color="orange")
scatter_subs(data=dset, col_1="unemp", col_2="pagriv", color="orange")
scatter_subs(data=dset, col_1="unemp", col_2="pmanv", color="orange")
scatter_subs(data=dset, col_1="unemp", col_2="pserv", color="orange")
scatter_subs(data=dset, col_1="unemp", col_2="lpagriv", color="orange")
scatter_subs(data=dset, col_1="unemp", col_2="lpmanv", color="orange")
scatter_subs(data=dset, col_1="unemp", col_2="lpserv", color="orange")
lGDP
26.135511811023623
2.445947314991992
26.078897637795276
2.447155533210861
gGDP
2.7941732283464566
2.322411322463392
-3.976535433070865
4.251656517124537
pGDP
8.865826771653545
1.3200904496096495
8.797637795275595
1.3163426103285303
cpi
2.8092913385826774
2.8599065311350906
3.5568503937007883
8.006735924206568
ltotpop
17.26976377952756
2.456279437261925
17.280787401574802
2.45748632908055
prur
38.152362204724405
20.007374491877805
37.78897637795276
19.927404550479853
pEpop
64.11685039370079
5.427798773449451
64.07590551181102
5.261597070950309
educ
10.035433070866143
2.1686630933707596
10.066929133858268
2.186141600654618
lagriv
23.194251968503938
2.544772893376763
23.208740157480317
2.563145696310968
lmanv
24.069606299212595
2.6101065482037082
24.006141732283464
2.617436358512589
lserv
25.545275590551185
2.4814114732622525
25.493543307086608
2.4796081127772758
pagriv
447.5115748031496
324.75039836828137
446.84448818897636
293.25279101693144
pmanv
2164.881732283465
3349.577930598974
2074.40968503937
3511.686838200013
pserv
9900.206535433073
14364.11932587272
9500.118976377951
14183.428146048876
lpagriv
5.924409448818898
0.6270574434374642
5.928503937007874
0.636874908277835
lpmanv
6.7997637795275585
1.4061680816729414
6.7258267716535425
1.4071514854794824
lpserv
8.275039370078739
1.4383852686180354
8.21267716535433
1.443116913731061
3. Validity of Empirical Models
3.1. Logarithm vs Real Number
Based on the research, I selected 11 independent variables to regress unemployment. First, I confirmed which unit of the industry’s value-added was used in regression models. Because logarithmic expression can explicitly present the coefficient correlation between independent variables and dependent variables, and it makes outliers centralized, large numbers are transformed into logarithmic form. However, how large numbers should be transformed in logarithms is obscure. For example, total GDP, more than a million dollars and obviously large, should be converted to logarithm form, whereas total agriculture value added per capita that has a minimum of 21 and maximum of 3017 is obscure to change its form. I conducted further analysis to determine whether the variables are shifted to logarithmic form. With 2019 cross-sectional data, I compared the R-squared and P-values of the three options below. With the other 8 variables, I considered three options and regressed all options:
dset['const'] = 1
reg=sm.OLS(endog=dset['unemp19'], exog=dset[['const','lGDP19','gGDP19','pGDP19','cpi19','ltotpop19','prur19','pEpop19','educ19','pagriv19','pmanv19','pserv19']], missing='drop')
results=reg.fit()
print(results.summary())
OLS Regression Results
==============================================================================
Dep. Variable: unemp19 R-squared: 0.191
Model: OLS Adj. R-squared: 0.113
Method: Least Squares F-statistic: 2.464
Date: Wed, 08 Dec 2021 Prob (F-statistic): 0.00835
Time: 19:02:58 Log-Likelihood: -359.34
No. Observations: 127 AIC: 742.7
Df Residuals: 115 BIC: 776.8
Df Model: 11
Covariance Type: nonrobust
==============================================================================
coef std err t P>|t| [0.025 0.975]
------------------------------------------------------------------------------
const 19.3945 8.084 2.399 0.018 3.382 35.407
lGDP19 -36.7477 87.569 -0.420 0.676 -210.205 136.709
gGDP19 -0.3941 0.191 -2.063 0.041 -0.773 -0.016
pGDP19 36.9468 87.550 0.422 0.674 -136.473 210.367
cpi19 0.2057 0.148 1.394 0.166 -0.087 0.498
ltotpop19 36.4120 87.563 0.416 0.678 -137.034 209.858
prur19 -0.0570 0.034 -1.666 0.098 -0.125 0.011
pEpop19 -0.0540 0.097 -0.558 0.578 -0.245 0.138
educ19 -0.0929 0.198 -0.470 0.639 -0.484 0.299
pagriv19 -0.0008 0.001 -0.559 0.578 -0.004 0.002
pmanv19 -6.756e-05 0.000 -0.355 0.724 -0.000 0.000
pserv19 -9.818e-05 5.4e-05 -1.817 0.072 -0.000 8.84e-06
==============================================================================
Omnibus: 57.633 Durbin-Watson: 1.929
Prob(Omnibus): 0.000 Jarque-Bera (JB): 180.131
Skew: 1.714 Prob(JB): 7.67e-40
Kurtosis: 7.721 Cond. No. 7.03e+06
==============================================================================
Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The condition number is large, 7.03e+06. This might indicate that there are
strong multicollinearity or other numerical problems.
dset['const'] = 1
reg=sm.OLS(endog=dset['unemp19'], exog=dset[['const','lGDP19','gGDP19','pGDP19','cpi19','ltotpop19','prur19','pEpop19','educ19','lagriv19','lmanv19','lserv19']], missing='drop')
results=reg.fit()
print(results.summary())
OLS Regression Results
==============================================================================
Dep. Variable: unemp19 R-squared: 0.202
Model: OLS Adj. R-squared: 0.125
Method: Least Squares F-statistic: 2.642
Date: Wed, 08 Dec 2021 Prob (F-statistic): 0.00476
Time: 19:03:01 Log-Likelihood: -358.47
No. Observations: 127 AIC: 740.9
Df Residuals: 115 BIC: 775.1
Df Model: 11
Covariance Type: nonrobust
==============================================================================
coef std err t P>|t| [0.025 0.975]
------------------------------------------------------------------------------
const 39.6649 9.972 3.978 0.000 19.912 59.417
lGDP19 -66.3461 87.635 -0.757 0.451 -239.934 107.242
gGDP19 -0.3455 0.193 -1.792 0.076 -0.728 0.037
pGDP19 55.4592 87.244 0.636 0.526 -117.354 228.272
cpi19 0.1442 0.146 0.985 0.327 -0.146 0.434
ltotpop19 56.9223 87.191 0.653 0.515 -115.786 229.630
prur19 -0.0723 0.034 -2.150 0.034 -0.139 -0.006
pEpop19 0.0315 0.089 0.353 0.725 -0.145 0.208
educ19 -0.1799 0.207 -0.868 0.387 -0.591 0.231
lagriv19 0.1501 0.752 0.200 0.842 -1.340 1.640
lmanv19 1.0680 1.018 1.049 0.296 -0.948 3.084
lserv19 7.8418 3.203 2.448 0.016 1.497 14.187
==============================================================================
Omnibus: 54.592 Durbin-Watson: 1.914
Prob(Omnibus): 0.000 Jarque-Bera (JB): 168.215
Skew: 1.615 Prob(JB): 2.97e-37
Kurtosis: 7.621 Cond. No. 3.69e+04
==============================================================================
Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The condition number is large, 3.69e+04. This might indicate that there are
strong multicollinearity or other numerical problems.
dset['const'] = 1
reg=sm.OLS(endog=dset['unemp19'], exog=dset[['const','lGDP19','gGDP19','pGDP19','cpi19','ltotpop19','prur19','pEpop19','educ19','lpagriv19','lpmanv19','lpserv19']], missing='drop')
results=reg.fit()
print(results.summary())
OLS Regression Results
==============================================================================
Dep. Variable: unemp19 R-squared: 0.203
Model: OLS Adj. R-squared: 0.126
Method: Least Squares F-statistic: 2.656
Date: Wed, 08 Dec 2021 Prob (F-statistic): 0.00456
Time: 19:03:01 Log-Likelihood: -358.41
No. Observations: 127 AIC: 740.8
Df Residuals: 115 BIC: 774.9
Df Model: 11
Covariance Type: nonrobust
==============================================================================
coef std err t P>|t| [0.025 0.975]
------------------------------------------------------------------------------
const 39.8393 9.974 3.994 0.000 20.083 59.595
lGDP19 -63.6677 87.434 -0.728 0.468 -236.858 109.522
gGDP19 -0.3455 0.193 -1.793 0.076 -0.727 0.036
pGDP19 52.6861 87.095 0.605 0.546 -119.832 225.204
cpi19 0.1436 0.146 0.981 0.329 -0.146 0.433
ltotpop19 63.3025 87.424 0.724 0.470 -109.867 236.472
prur19 -0.0722 0.034 -2.147 0.034 -0.139 -0.006
pEpop19 0.0300 0.089 0.336 0.738 -0.147 0.207
educ19 -0.1814 0.207 -0.875 0.383 -0.592 0.229
lpagriv19 0.1638 0.753 0.217 0.828 -1.329 1.656
lpmanv19 1.0923 1.019 1.072 0.286 -0.926 3.110
lpserv19 7.9087 3.200 2.472 0.015 1.570 14.247
==============================================================================
Omnibus: 54.186 Durbin-Watson: 1.913
Prob(Omnibus): 0.000 Jarque-Bera (JB): 165.336
Skew: 1.606 Prob(JB): 1.25e-36
Kurtosis: 7.574 Cond. No. 3.32e+04
==============================================================================
Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The condition number is large, 3.32e+04. This might indicate that there are
strong multicollinearity or other numerical problems.
dset['const'] = 1
reg19=sm.OLS(endog=dset['unemp19'], exog=dset[['const','lGDP19','gGDP19','cpi19','ltotpop19','prur19','lpmanv19','lpserv19']], missing='drop')
results19=reg19.fit()
print(results19.summary())
OLS Regression Results
==============================================================================
Dep. Variable: unemp19 R-squared: 0.194
Model: OLS Adj. R-squared: 0.147
Method: Least Squares F-statistic: 4.093
Date: Wed, 08 Dec 2021 Prob (F-statistic): 0.000465
Time: 19:07:20 Log-Likelihood: -359.08
No. Observations: 127 AIC: 734.2
Df Residuals: 119 BIC: 756.9
Df Model: 7
Covariance Type: nonrobust
==============================================================================
coef std err t P>|t| [0.025 0.975]
------------------------------------------------------------------------------
const 38.5487 8.481 4.545 0.000 21.755 55.342
lGDP19 -9.8321 3.629 -2.710 0.008 -17.017 -2.647
gGDP19 -0.3611 0.186 -1.945 0.054 -0.729 0.006
cpi19 0.1442 0.143 1.008 0.315 -0.139 0.427
ltotpop19 9.4730 3.604 2.628 0.010 2.336 16.610
prur19 -0.0667 0.032 -2.089 0.039 -0.130 -0.003
lpmanv19 1.0411 1.000 1.041 0.300 -0.939 3.021
lpserv19 6.9725 2.999 2.325 0.022 1.034 12.911
==============================================================================
Omnibus: 55.977 Durbin-Watson: 1.875
Prob(Omnibus): 0.000 Jarque-Bera (JB): 174.845
Skew: 1.656 Prob(JB): 1.08e-38
Kurtosis: 7.698 Cond. No. 1.36e+03
==============================================================================
Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The condition number is large, 1.36e+03. This might indicate that there are
strong multicollinearity or other numerical problems.
print(results19.f_test("(cpi19 = lpmanv19 = 0)"))
<F test: F=array([[1.11948456]]), p=0.329861190185458, df_denom=119, df_num=2>
3.2. Stepwise Regression Method
Since some countries that more seriously suffered from Covid-19 in 2020 than others could not collect sufficient macroeconomic data, only 122 out of 166 countries had all the data. Because the number of dependent variables was about 10% of the total sample, overfitted variables could hurt the degree of freedom. As some variables were not statistically significant and decreased adjusted R-squared, I decided to decrease the number of variables based on both adjusted R and t-value.
dset['const'] = 1
reg=sm.OLS(endog=dset['unemp20'], exog=dset[['const','lGDP20','gGDP20','pGDP20','cpi20','ltotpop20','prur20','pEpop20','educ20','lpagriv20','lpmanv20','lpserv20']], missing='drop')
results=reg.fit()
print(results.summary())
OLS Regression Results
==============================================================================
Dep. Variable: unemp20 R-squared: 0.281
Model: OLS Adj. R-squared: 0.212
Method: Least Squares F-statistic: 4.087
Date: Wed, 08 Dec 2021 Prob (F-statistic): 4.57e-05
Time: 19:03:01 Log-Likelihood: -355.55
No. Observations: 127 AIC: 735.1
Df Residuals: 115 BIC: 769.2
Df Model: 11
Covariance Type: nonrobust
==============================================================================
coef std err t P>|t| [0.025 0.975]
------------------------------------------------------------------------------
const 48.3157 10.264 4.707 0.000 27.984 68.647
lGDP20 -69.9025 74.190 -0.942 0.348 -216.858 77.053
gGDP20 -0.1930 0.123 -1.571 0.119 -0.436 0.050
pGDP20 56.0472 74.133 0.756 0.451 -90.796 202.891
cpi20 -0.1127 0.054 -2.100 0.038 -0.219 -0.006
ltotpop20 69.5714 74.200 0.938 0.350 -77.405 216.548
prur20 -0.1204 0.031 -3.851 0.000 -0.182 -0.058
pEpop20 0.0245 0.092 0.267 0.790 -0.157 0.206
educ20 -0.1731 0.202 -0.855 0.394 -0.574 0.228
lpagriv20 0.4660 0.724 0.644 0.521 -0.968 1.900
lpmanv20 1.1662 1.015 1.149 0.253 -0.845 3.177
lpserv20 9.8633 3.904 2.526 0.013 2.130 17.597
==============================================================================
Omnibus: 37.655 Durbin-Watson: 1.869
Prob(Omnibus): 0.000 Jarque-Bera (JB): 86.307
Skew: 1.191 Prob(JB): 1.81e-19
Kurtosis: 6.261 Cond. No. 2.88e+04
==============================================================================
Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The condition number is large, 2.88e+04. This might indicate that there are
strong multicollinearity or other numerical problems.
dset['const'] = 1
reg20=sm.OLS(endog=dset['unemp20'], exog=dset[['const','lGDP20','gGDP20','cpi20','ltotpop20','prur20','lpmanv20','lpserv20']], missing='drop')
results20=reg20.fit()
print(results20.summary())
OLS Regression Results
==============================================================================
Dep. Variable: unemp20 R-squared: 0.270
Model: OLS Adj. R-squared: 0.227
Method: Least Squares F-statistic: 6.300
Date: Wed, 08 Dec 2021 Prob (F-statistic): 2.68e-06
Time: 19:07:43 Log-Likelihood: -356.48
No. Observations: 127 AIC: 729.0
Df Residuals: 119 BIC: 751.7
Df Model: 7
Covariance Type: nonrobust
==============================================================================
coef std err t P>|t| [0.025 0.975]
------------------------------------------------------------------------------
const 48.4946 8.886 5.457 0.000 30.899 66.091
lGDP20 -12.7050 4.322 -2.940 0.004 -21.263 -4.147
gGDP20 -0.1995 0.115 -1.728 0.087 -0.428 0.029
cpi20 -0.1046 0.052 -1.995 0.048 -0.208 -0.001
ltotpop20 12.3739 4.296 2.880 0.005 3.867 20.880
prur20 -0.1151 0.030 -3.834 0.000 -0.174 -0.056
lpmanv20 1.1668 0.994 1.174 0.243 -0.801 3.135
lpserv20 8.8898 3.755 2.367 0.020 1.454 16.325
==============================================================================
Omnibus: 39.266 Durbin-Watson: 1.837
Prob(Omnibus): 0.000 Jarque-Bera (JB): 93.674
Skew: 1.225 Prob(JB): 4.56e-21
Kurtosis: 6.420 Cond. No. 1.53e+03
==============================================================================
Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The condition number is large, 1.53e+03. This might indicate that there are
strong multicollinearity or other numerical problems.
3.3. Gauss-Markov Assumptions
I checked whether Gauss-Markov’s 5 assumptions of OLS were followed.
Assumption MLR. 1 : Linear in Parameters
Efficient models show the linear relationship between the unemployment rate (dependent variable) and independent variables (total GDP, GDP growth rate, cpi, % of the rural population, manufacturing value-added, and service value added). Therefore, the models follow the 1st assumption of MLR.
Assumption MLR. 2 : Random Sampling
I collected data from the World Bank across countries and used all available data. Data were not selected by specific rules or bias. Therefore, the samples in the models are unbiased and follow the 2nd assumption.
Assumption MLR. 3 : No perfect Collinearity
I calculated correlations between seven independent variables and could not find any perfect collinearity. The correlation between total GDP and the total population had a high correlation of 0.8557, and the correlation between manufacturing value added and service value added was also a high correlation with 0.9443, but those cases were not perfect collinearity since neither of them was 1 nor -1.
Assumption MLR. 4 : Zero Conditional Mean
Omitting an important independent variable that is highly correlated with independent variables could cause a violation of the 4th assumption. To verify whether the efficient models omit the variable, I calculated residuals for each model based on efficient models. I obtained the results of the correlation between residuals and independent variables, as well as the p-values of each independent variable. As a result, there was no statistically significant independent variable to the residual. Even the highest absolute value of correlation is the correlation of 0.00894 between cpi and residuals. The correlation is too small to have a certain relationship. In addition, most P-values gathered were above 0.9, and the smallest P-value is 0.3178. Considering the fact that the usual boundary is 5% or 10%, 31.75% of the P-value illustrates that the independent variable is not statistically significant. Furthermore, I analyzed the scatter plots between residuals and independent variables. I could confirm that there is no explicit relationship between the residuals and independent variables. Therefore, based on the results of correlation and P-value between residuals and independent variables, I concluded that the efficient models follow the 4th assumption, Zero conditional means.
cortest = dset.loc[:,['lGDP19','gGDP19','cpi19','ltotpop19','prur19','lpmanv19','lpserv19']]
crrMat = cortest.corr()
print(crrMat)
lGDP19 gGDP19 cpi19 ltotpop19 prur19 lpmanv19 \
lGDP19 1.000000 -0.070904 -0.079772 0.855033 -0.166483 0.342336
gGDP19 -0.070904 1.000000 -0.093235 0.095691 0.427495 -0.299833
cpi19 -0.079772 -0.093235 1.000000 0.090869 0.172876 -0.289869
ltotpop19 0.855033 0.095691 0.090869 1.000000 0.254569 -0.173570
prur19 -0.166483 0.427495 0.172876 0.254569 1.000000 -0.749861
lpmanv19 0.342336 -0.299833 -0.289869 -0.173570 -0.749861 1.000000
lpserv19 0.260932 -0.325195 -0.305918 -0.275285 -0.781109 0.949266
lpserv19
lGDP19 0.260932
gGDP19 -0.325195
cpi19 -0.305918
ltotpop19 -0.275285
prur19 -0.781109
lpmanv19 0.949266
lpserv19 1.000000
cortest = dset.loc[:,['lGDP20','gGDP20','cpi20','ltotpop20','prur20','lpmanv20','lpserv20']]
crrMat = cortest.corr()
print(crrMat)
lGDP20 gGDP20 cpi20 ltotpop20 prur20 lpmanv20 \
lGDP20 1.000000 0.108832 -0.112325 0.855731 -0.158579 0.344366
gGDP20 0.108832 1.000000 -0.233817 0.205857 0.262347 -0.093266
cpi20 -0.112325 -0.233817 1.000000 0.010819 0.036621 -0.303645
ltotpop20 0.855731 0.205857 0.010819 1.000000 0.253146 -0.168678
prur20 -0.158579 0.262347 0.036621 0.253146 1.000000 -0.726935
lpmanv20 0.344366 -0.093266 -0.303645 -0.168678 -0.726935 1.000000
lpserv20 0.256241 -0.233699 -0.202287 -0.278815 -0.770208 0.944298
lpserv20
lGDP20 0.256241
gGDP20 -0.233699
cpi20 -0.202287
ltotpop20 -0.278815
prur20 -0.770208
lpmanv20 0.944298
lpserv20 1.000000
influence19 = results19.get_influence()
std_resid19 = influence19.resid_studentized_internal
influence20 = results20.get_influence()
std_resid20 = influence20.resid_studentized_internal
#print(std_resid, len(std_resid))
plt.scatter(dset['lGDP19'], std_resid19)
plt.xlabel('x')
plt.ylabel('Standardized Residuals')
plt.show()
def scatter_resid(col_1):
"""
Break down scatterplots into different years
"""
fig, ax = plt.subplots(1, 2, figsize=(8,4), sharey=True)
ax[0].scatter(x=dset[col_1+"19"], y=std_resid19, alpha=0.4)
ax[1].scatter(x=dset[col_1+"20"], y=std_resid20, alpha=0.4)
ax[0].set_title("2019", fontsize=14, fontname="Verdana")
ax[1].set_title("2020", fontsize=14, fontname="Verdana")
ax[0].axhline(y = 0, color = 'black', linestyle = '--', linewidth = 1)
ax[1].axhline(y = 0, color = 'black', linestyle = '--', linewidth = 1)
print(col_1)
print(stats.pearsonr(std_resid19, dset[col_1+"19"]))
print(stats.pearsonr(std_resid20, dset[col_1+"20"]))
for i in list(range(2)):
ax[i].set_xlabel(col_1)
ax[i].set_ylabel("Residual")
scatter_resid("lGDP")
scatter_resid("gGDP")
scatter_resid("cpi")
scatter_resid("ltotpop")
scatter_resid("prur")
scatter_resid("lpmanv")
scatter_resid("lpserv")
lGDP
(-0.0013134211532044714, 0.9883072911292736)
(0.005206318951894936, 0.9536749403781897)
gGDP
(0.009785490638838816, 0.9130524755959806)
(0.037459742552051624, 0.6758573989626463)
cpi
(0.0010644766697441763, 0.9905234000463365)
(-0.08935426301154548, 0.3177873726782726)
ltotpop
(-0.0010154694931416813, 0.9909596719838065)
(0.003667705091435988, 0.9673561671831085)
prur
(0.0009132214612353689, 0.991869913382784)
(0.011434235118338005, 0.8984755115745292)
lpmanv
(0.0021863829115624264, 0.9805369832997183)
(0.013932887200478413, 0.8764502266134842)
lpserv
(-0.0013055649129459818, 0.9883772262686317)
(-0.000380295045957475, 0.99661432030712)
cortest = dset.loc[:,['lGDP19','gGDP19','cpi19','ltotpop19','prur19','lpmanv19','lpserv19']]
cortest['std_resid19'] = std_resid19
crrMat = cortest.corr()
print(crrMat)
lGDP19 gGDP19 cpi19 ltotpop19 prur19 lpmanv19 \
lGDP19 1.000000 -0.070904 -0.079772 0.855033 -0.166483 0.342336
gGDP19 -0.070904 1.000000 -0.093235 0.095691 0.427495 -0.299833
cpi19 -0.079772 -0.093235 1.000000 0.090869 0.172876 -0.289869
ltotpop19 0.855033 0.095691 0.090869 1.000000 0.254569 -0.173570
prur19 -0.166483 0.427495 0.172876 0.254569 1.000000 -0.749861
lpmanv19 0.342336 -0.299833 -0.289869 -0.173570 -0.749861 1.000000
lpserv19 0.260932 -0.325195 -0.305918 -0.275285 -0.781109 0.949266
std_resid19 -0.001313 0.009785 0.001064 -0.001015 0.000913 0.002186
lpserv19 std_resid19
lGDP19 0.260932 -0.001313
gGDP19 -0.325195 0.009785
cpi19 -0.305918 0.001064
ltotpop19 -0.275285 -0.001015
prur19 -0.781109 0.000913
lpmanv19 0.949266 0.002186
lpserv19 1.000000 -0.001306
std_resid19 -0.001306 1.000000
cortest = dset.loc[:,['lGDP20','gGDP20','cpi20','ltotpop20','prur20','lpmanv20','lpserv20']]
cortest['std_resid20'] = std_resid20
crrMat = cortest.corr()
print(crrMat)
lGDP20 gGDP20 cpi20 ltotpop20 prur20 lpmanv20 \
lGDP20 1.000000 0.108832 -0.112325 0.855731 -0.158579 0.344366
gGDP20 0.108832 1.000000 -0.233817 0.205857 0.262347 -0.093266
cpi20 -0.112325 -0.233817 1.000000 0.010819 0.036621 -0.303645
ltotpop20 0.855731 0.205857 0.010819 1.000000 0.253146 -0.168678
prur20 -0.158579 0.262347 0.036621 0.253146 1.000000 -0.726935
lpmanv20 0.344366 -0.093266 -0.303645 -0.168678 -0.726935 1.000000
lpserv20 0.256241 -0.233699 -0.202287 -0.278815 -0.770208 0.944298
std_resid20 0.005206 0.037460 -0.089354 0.003668 0.011434 0.013933
lpserv20 std_resid20
lGDP20 0.256241 0.005206
gGDP20 -0.233699 0.037460
cpi20 -0.202287 -0.089354
ltotpop20 -0.278815 0.003668
prur20 -0.770208 0.011434
lpmanv20 0.944298 0.013933
lpserv20 1.000000 -0.000380
std_resid20 -0.000380 1.000000
Assumption MLR. 5 : Homoscedasticity
To test whether the models are heteroscedasticity, I utilized the Breusch and Pagan test. The B-P (Breusch and Pagan) test is the regression result between squared residuals and independent variables. If the f-statistic of the model is higher than the critical value and the p-value of the f-statistic is less than the significance level, the regression model shows heteroscedasticity. The regression results are below
resid19 = results19.resid
resid19sq = resid19**2
resid20 = results20.resid
resid20sq = resid20**2
dset['const'] = 1
reg19=sm.OLS(endog=resid19sq, exog=dset[['const','lGDP19','gGDP19','cpi19','ltotpop19','prur19','lpmanv19','lpserv19']], missing='drop')
results19=reg19.fit()
print(results19.summary())
OLS Regression Results
==============================================================================
Dep. Variable: y R-squared: 0.116
Model: OLS Adj. R-squared: 0.064
Method: Least Squares F-statistic: 2.236
Date: Thu, 09 Dec 2021 Prob (F-statistic): 0.0359
Time: 04:45:21 Log-Likelihood: -650.88
No. Observations: 127 AIC: 1318.
Df Residuals: 119 BIC: 1341.
Df Model: 7
Covariance Type: nonrobust
==============================================================================
coef std err t P>|t| [0.025 0.975]
------------------------------------------------------------------------------
const 303.7883 84.390 3.600 0.000 136.688 470.888
lGDP19 -70.6257 36.105 -1.956 0.053 -142.118 0.867
gGDP19 -2.6625 1.847 -1.441 0.152 -6.320 0.995
cpi19 -0.9938 1.423 -0.698 0.486 -3.811 1.824
ltotpop19 66.1401 35.864 1.844 0.068 -4.874 137.154
prur19 -0.3606 0.318 -1.134 0.259 -0.990 0.269
lpmanv19 16.2500 9.949 1.633 0.105 -3.450 35.950
lpserv19 39.8846 29.843 1.336 0.184 -19.207 98.976
==============================================================================
Omnibus: 187.839 Durbin-Watson: 2.122
Prob(Omnibus): 0.000 Jarque-Bera (JB): 10881.012
Skew: 5.829 Prob(JB): 0.00
Kurtosis: 46.822 Cond. No. 1.36e+03
==============================================================================
Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The condition number is large, 1.36e+03. This might indicate that there are
strong multicollinearity or other numerical problems.
dset['const'] = 1
reg20=sm.OLS(endog=resid20sq, exog=dset[['const','lGDP20','gGDP20','cpi20','ltotpop20','prur20','lpmanv20','lpserv20']], missing='drop')
results20=reg20.fit()
print(results20.summary())
OLS Regression Results
==============================================================================
Dep. Variable: y R-squared: 0.143
Model: OLS Adj. R-squared: 0.092
Method: Least Squares F-statistic: 2.832
Date: Thu, 09 Dec 2021 Prob (F-statistic): 0.00914
Time: 05:07:24 Log-Likelihood: -630.30
No. Observations: 127 AIC: 1277.
Df Residuals: 119 BIC: 1299.
Df Model: 7
Covariance Type: nonrobust
==============================================================================
coef std err t P>|t| [0.025 0.975]
------------------------------------------------------------------------------
const 290.6908 76.749 3.788 0.000 138.721 442.661
lGDP20 -79.3629 37.328 -2.126 0.036 -153.276 -5.449
gGDP20 -0.8246 0.997 -0.827 0.410 -2.799 1.149
cpi20 -0.4560 0.453 -1.007 0.316 -1.352 0.440
ltotpop20 75.2324 37.103 2.028 0.045 1.765 148.700
prur20 -0.3584 0.259 -1.382 0.169 -0.872 0.155
lpmanv20 18.7222 8.583 2.181 0.031 1.727 35.717
lpserv20 46.3854 32.432 1.430 0.155 -17.833 110.604
==============================================================================
Omnibus: 192.665 Durbin-Watson: 2.070
Prob(Omnibus): 0.000 Jarque-Bera (JB): 13113.393
Skew: 6.013 Prob(JB): 0.00
Kurtosis: 51.306 Cond. No. 1.53e+03
==============================================================================
Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The condition number is large, 1.53e+03. This might indicate that there are
strong multicollinearity or other numerical problems.
dset['const'] = 1
reg19=sm.OLS(endog=dset['unemp19'], exog=dset[['const','lGDP19','gGDP19','cpi19','ltotpop19','prur19','lpmanv19','lpserv19']], missing='drop')
results19=reg19.fit(cov_type = 'HC1')
print(results19.summary())
OLS Regression Results
==============================================================================
Dep. Variable: unemp19 R-squared: 0.194
Model: OLS Adj. R-squared: 0.147
Method: Least Squares F-statistic: 4.144
Date: Thu, 09 Dec 2021 Prob (F-statistic): 0.000411
Time: 05:13:31 Log-Likelihood: -359.08
No. Observations: 127 AIC: 734.2
Df Residuals: 119 BIC: 756.9
Df Model: 7
Covariance Type: HC1
==============================================================================
coef std err z P>|z| [0.025 0.975]
------------------------------------------------------------------------------
const 38.5487 10.194 3.781 0.000 18.569 58.529
lGDP19 -9.8321 3.385 -2.904 0.004 -16.468 -3.197
gGDP19 -0.3611 0.268 -1.345 0.179 -0.887 0.165
cpi19 0.1442 0.109 1.325 0.185 -0.069 0.357
ltotpop19 9.4730 3.331 2.844 0.004 2.945 16.001
prur19 -0.0667 0.035 -1.881 0.060 -0.136 0.003
lpmanv19 1.0411 1.159 0.898 0.369 -1.231 3.313
lpserv19 6.9725 2.948 2.365 0.018 1.194 12.751
==============================================================================
Omnibus: 55.977 Durbin-Watson: 1.875
Prob(Omnibus): 0.000 Jarque-Bera (JB): 174.845
Skew: 1.656 Prob(JB): 1.08e-38
Kurtosis: 7.698 Cond. No. 1.36e+03
==============================================================================
Notes:
[1] Standard Errors are heteroscedasticity robust (HC1)
[2] The condition number is large, 1.36e+03. This might indicate that there are
strong multicollinearity or other numerical problems.
dset['const'] = 1
reg20=sm.OLS(endog=dset['unemp20'], exog=dset[['const','lGDP20','gGDP20','cpi20','ltotpop20','prur20','lpmanv20','lpserv20']], missing='drop')
results20=reg20.fit(cov_type = 'HC1')
print(results20.summary())
OLS Regression Results
==============================================================================
Dep. Variable: unemp20 R-squared: 0.270
Model: OLS Adj. R-squared: 0.227
Method: Least Squares F-statistic: 5.564
Date: Thu, 09 Dec 2021 Prob (F-statistic): 1.46e-05
Time: 05:17:33 Log-Likelihood: -356.48
No. Observations: 127 AIC: 729.0
Df Residuals: 119 BIC: 751.7
Df Model: 7
Covariance Type: HC1
==============================================================================
coef std err z P>|z| [0.025 0.975]
------------------------------------------------------------------------------
const 48.4946 10.307 4.705 0.000 28.293 68.696
lGDP20 -12.7050 4.051 -3.137 0.002 -20.644 -4.766
gGDP20 -0.1995 0.121 -1.652 0.099 -0.436 0.037
cpi20 -0.1046 0.044 -2.377 0.017 -0.191 -0.018
ltotpop20 12.3739 4.005 3.090 0.002 4.525 20.223
prur20 -0.1151 0.030 -3.788 0.000 -0.175 -0.056
lpmanv20 1.1668 1.046 1.116 0.265 -0.883 3.217
lpserv20 8.8898 3.428 2.593 0.010 2.171 15.609
==============================================================================
Omnibus: 39.266 Durbin-Watson: 1.837
Prob(Omnibus): 0.000 Jarque-Bera (JB): 93.674
Skew: 1.225 Prob(JB): 4.56e-21
Kurtosis: 6.420 Cond. No. 1.53e+03
==============================================================================
Notes:
[1] Standard Errors are heteroscedasticity robust (HC1)
[2] The condition number is large, 1.53e+03. This might indicate that there are
strong multicollinearity or other numerical problems.