23 minute read

[Notice] Journey to the academic researcher This is the story of how I became the insightful researcher.

ABSTRACT

This research examines how Covid changed significant macroeconomic factors affecting unemployment. The research compares the statistical results of two multivariable regression models in 2019 and 2020. To maximize the validity of each variable, this research conducted significance tests on various units of macroeconomic variables in each cross-sectional model. In addition, the research also investigated how many variables can generate the highest adjusted R-squared by using the Stepwise regression method, which subtracts the least significant variables to improve the efficiency of the models. The research models are adjusted to follow Gauss-Markov Assumptions, representing that the models’ results are not biased. The analysis provides three economic insights. Overall, the higher R-squared in 2020 model implies that demand-deficient and structural unemployment are the main types of unemployment after the outbreak of covid. Demographically, countries with high populations in rural areas can deal with unemployment problems better than others since Covid had a relatively lower impact on rural areas. Industrially, countries with high dependency on the service industry were vulnerable to Covid effect on unemployment because the service industry is more susceptible to economic cycles than other industries.

1. Introduction

The unemployment rate is a key factor when determining a country’s economic health. Countries with high unemployment rates cannot achieve potential real GDP growth, the ideal growth rate with a natural unemployment rate. Unemployment is also a substantial issue for individuals. Because many individuals rely on incomes from their employment, their everyday life will be in danger if a constant high unemployment rate is maintained. As a result, with the inflation rate, the unemployment rate is the main component of ‘Misery Index,’ measuring the degree of economic distress felt by individuals (Clay Halton, 2022). The unemployment rate is defined as the portion of people who are looking for a job but have not been able to find one at economically active ages. Because most job seekers are from low or mid-income, a high unemployment rate causes economic polarization (Martin and Alicia, 2000). The wealth disparity generated by unemployment has triggered various social issues, such as Discriminatory abuse, increased homelessness, and elevated anxiety about social safety. Moreover, Covid-19 accelerated the discrepancy further. Wealthy people took advantage of bubble markets, while the middle or lower classes lost jobs and suffered from the disease without proper treatment. The complication will be more stringent unless broad social cooperation is made. A profound understanding of the reasons for unemployment is inevitably required to resolve unemployment problems effectively and efficiently. However, the unemployment rate is affected by various factors such as cultural background, economic cycles, and technological changes, therefore resolutions for decreasing unemployment should reflect shifted variables. Moreover, the world has experienced the Covid pandemic, one of the most influential events in human history. Covid explicitly influenced unemployment with massive quarantines and lay-offs. During the Covid downturn, millions lost their lives and jobs. Individuals who could not work remotely lost their jobs or were exposed to a susceptible situation of getting Covid. Not only individuals but many businesses and stores were also closed since they could not sell products or services in person. Even though surviving companies have operated their businesses, they had to lay off employees to cut expenses. The recession of Covid can be compared to the Great Depression in the 1930s (David Wheelock, 2020). The recent unemployment cannot be thoroughly explained without understanding the unprecedented impacts of Covid on the economy. In this research, to analyze the paradigm shift of significant variables that affect the unemployment rate, I compare the statistical results of two multivariable regression models in 2019 and 2020. Two regression models are adjusted to follow Gauss-Markov Assumptions. The results of the regression models imposed that the factors affecting the unemployment rate after the outbreak of Covid have shifted from before Covid.

import pandas as pd
import statsmodels.api as sm
import numpy as np
import matplotlib.pyplot as plt
from scipy import stats
dset=pd.read_csv('sel_dataset.csv')
dset # set "Date" to indexes of the column
Country Name Country Code unemp19 unemp20 GDP19 GDP20 pGDP19 pGDP20 lGDP19 lGDP20 ... lpmanv19 lpmanv20 servv19 servv20 lserv19 lserv20 pserv19 pserv20 lpserv19 lpserv20
0 Africa Eastern and Southern AFE 6.47 6.81 9.803716e+11 9.008286e+11 7.30 7.19 27.61 27.53 ... 5.00 4.88 4.814599e+11 4.366725e+11 26.90 26.80 729.43 644.78 6.59 6.47
1 Africa Western and Central AFW 5.93 6.30 7.920789e+11 7.865850e+11 7.48 7.45 27.40 27.39 ... 5.35 5.35 3.760931e+11 3.532896e+11 26.65 26.59 841.54 770.02 6.74 6.65
2 Albania ALB 11.47 11.70 1.528661e+10 1.479962e+10 8.59 8.56 23.45 23.42 ... 5.82 5.78 7.425612e+09 7.167200e+09 22.73 22.69 2601.65 2525.67 7.86 7.83
3 Armenia ARM 18.81 20.21 1.367280e+10 1.264546e+10 8.44 8.36 23.34 23.26 ... 6.30 6.27 7.415285e+09 6.739570e+09 22.73 22.63 2507.09 2274.40 7.83 7.73
4 Australia AUS 5.16 6.61 1.396567e+12 1.330901e+12 10.92 10.86 27.97 27.92 ... 8.04 7.99 9.221563e+11 8.789295e+11 27.55 27.50 36354.40 34216.84 10.50 10.44
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
122 Uruguay URY 9.35 12.67 6.123115e+10 5.362883e+10 9.78 9.64 24.84 24.71 ... 7.52 7.37 3.934694e+10 3.379209e+10 24.40 24.24 11366.26 9727.91 9.34 9.18
123 Vietnam VNM 2.04 2.27 2.619212e+11 2.711584e+11 7.91 7.93 26.29 26.33 ... 6.10 6.14 1.090600e+11 1.128705e+11 25.42 25.45 1130.60 1159.57 7.03 7.06
124 Samoa WSM 8.22 8.87 8.522502e+08 8.070272e+08 8.37 8.31 20.56 20.51 ... 5.57 5.40 6.362866e+08 6.039875e+08 20.27 20.22 3228.36 3044.14 8.08 8.02
125 South Africa ZAF 28.47 28.74 3.514316e+11 3.019236e+11 8.70 8.54 26.59 26.43 ... 6.56 6.37 2.150928e+11 1.855286e+11 26.09 25.95 3673.14 3128.19 8.21 8.05
126 Zambia ZMB 11.91 12.17 2.330869e+10 1.932005e+10 7.17 6.96 23.87 23.68 ... 4.48 4.33 1.272739e+10 9.335536e+09 23.27 22.96 712.58 507.81 6.57 6.23

127 rows × 50 columns

2. Data Description

Dependent Variable

Unemployment is defined as “individuals who are employable and actively seeking a job but are unable to find a job” (CFI Team, 2022). Unemployment can be divided into four types based on the reason for each unemployment (CFI Team, 2022). To identify influential factors to unemployment, I previously understood four types of unemployment and specify which type of unemployment occurred in 2020. Demand Deficient Unemployment: Demand Deficient unemployment is caused by decreased demand for employment during a recession. I expected that this type of unemployment accounts for the highest portion of unemployment during Covid. As people went into quarantine, many businesses experienced a loss in consumer demand. To overcome reduced consumer demand, companies would decrease the number of employees and cut labor expenses. Frictional Unemployment: Frictional unemployment is a natural part of unemployment that occurs when employees are switching jobs. Because this unemployment is the gap between one job and another, it is inevitable and always exists. I anticipated that this type of unemployment does not have a significant difference before and after Covid. Structural Unemployment: Structural unemployment is generated by the disparity between demand and supply of employment such as the requirement of skillsets and geographical location of the jobs. Because people could not fluently travel abroad during Covid, the geographical disparity could cause unemployment. Additionally, the disparity of skillsets can have an important impact on unemployment because the demand of cutting-edge computing skills has significantly increased during Covid. Voluntary Unemployment: Voluntary unemployment is caused due to the individual’s decision rather than a structural or economic issue. I did not predict that voluntary unemployment influentially increased during Covid.

Independent Variables

I categorized 11 independent variables into five parts as GDP, Inflation, Demographical, Education, and Industry. Because the unemployment rate is influenced by various factors, I tried to collect diverse variables as much as possible. In World Bank data, I figured out 11 variables that have the potential to impact unemployment. Because several countries highly affected by Covid could not provide macroeconomic data, only 127 out of 166 countries have required data in both 2019 and 2020. Additionally, private information such as an individual’s skillsets, wealth, and cultural background can influence unemployment, but the data could not be obtained and considered in those models. However, I can intuitively recognize that the part that macroeconomic data cannot justify is the one that private data can explain. Furthermore, the unit of macroeconomic data can affect the significance of variables. Through comparing R-squares in cross-sectional models with different units, logarithm and value, I used the unit with high R-square and low P values to maximize the explainability of variables.

def scatter_subs(data, col_1, col_2, color):
    
    """
    Break down scatterplots into different years
    """
    
    fig, ax = plt.subplots(1, 2, figsize=(8,4), sharey=True)
    
    ax[0].scatter(x=data[col_1+"19"], y=data[col_2+"19"], alpha=0.4, color=color)
    ax[1].scatter(x=data[col_1+"20"], y=data[col_2+"20"], alpha=0.4, color=color)

    print(col_2)
    print(data[col_2+'19'].mean())
    print(data[col_2+'19'].std())
    print(data[col_2+'20'].mean())
    print(data[col_2+'20'].std())

    ax[0].set_title("2019", fontsize=14, fontname="Verdana")
    ax[1].set_title("2020", fontsize=14, fontname="Verdana")

    for i in list(range(2)):
        ax[i].set_xlabel("unemployment rate")
        ax[i].set_ylabel(col_2)
scatter_subs(data=dset, col_1="unemp", col_2="lGDP", color="orange")
scatter_subs(data=dset, col_1="unemp", col_2="gGDP", color="orange")
scatter_subs(data=dset, col_1="unemp", col_2="pGDP", color="orange")

scatter_subs(data=dset, col_1="unemp", col_2="cpi", color="orange")

scatter_subs(data=dset, col_1="unemp", col_2="ltotpop", color="orange")
scatter_subs(data=dset, col_1="unemp", col_2="prur", color="orange")
scatter_subs(data=dset, col_1="unemp", col_2="pEpop", color="orange")
scatter_subs(data=dset, col_1="unemp", col_2="educ", color="orange")

scatter_subs(data=dset, col_1="unemp", col_2="lagriv", color="orange")
scatter_subs(data=dset, col_1="unemp", col_2="lmanv", color="orange")
scatter_subs(data=dset, col_1="unemp", col_2="lserv", color="orange")

scatter_subs(data=dset, col_1="unemp", col_2="pagriv", color="orange")
scatter_subs(data=dset, col_1="unemp", col_2="pmanv", color="orange")
scatter_subs(data=dset, col_1="unemp", col_2="pserv", color="orange")

scatter_subs(data=dset, col_1="unemp", col_2="lpagriv", color="orange")
scatter_subs(data=dset, col_1="unemp", col_2="lpmanv", color="orange")
scatter_subs(data=dset, col_1="unemp", col_2="lpserv", color="orange")
lGDP
26.135511811023623
2.445947314991992
26.078897637795276
2.447155533210861
gGDP
2.7941732283464566
2.322411322463392
-3.976535433070865
4.251656517124537
pGDP
8.865826771653545
1.3200904496096495
8.797637795275595
1.3163426103285303
cpi
2.8092913385826774
2.8599065311350906
3.5568503937007883
8.006735924206568
ltotpop
17.26976377952756
2.456279437261925
17.280787401574802
2.45748632908055
prur
38.152362204724405
20.007374491877805
37.78897637795276
19.927404550479853
pEpop
64.11685039370079
5.427798773449451
64.07590551181102
5.261597070950309
educ
10.035433070866143
2.1686630933707596
10.066929133858268
2.186141600654618
lagriv
23.194251968503938
2.544772893376763
23.208740157480317
2.563145696310968
lmanv
24.069606299212595
2.6101065482037082
24.006141732283464
2.617436358512589
lserv
25.545275590551185
2.4814114732622525
25.493543307086608
2.4796081127772758
pagriv
447.5115748031496
324.75039836828137
446.84448818897636
293.25279101693144
pmanv
2164.881732283465
3349.577930598974
2074.40968503937
3511.686838200013
pserv
9900.206535433073
14364.11932587272
9500.118976377951
14183.428146048876
lpagriv
5.924409448818898
0.6270574434374642
5.928503937007874
0.636874908277835
lpmanv
6.7997637795275585
1.4061680816729414
6.7258267716535425
1.4071514854794824
lpserv
8.275039370078739
1.4383852686180354
8.21267716535433
1.443116913731061

family_1

family_1

family_1

family_1

family_1

family_1

family_1

family_1

family_1

family_1

family_1

family_1

family_1

family_1

family_1

family_1

family_1

3. Validity of Empirical Models

3.1. Logarithm vs Real Number

Based on the research, I selected 11 independent variables to regress unemployment. First, I confirmed which unit of the industry’s value-added was used in regression models. Because logarithmic expression can explicitly present the coefficient correlation between independent variables and dependent variables, and it makes outliers centralized, large numbers are transformed into logarithmic form. However, how large numbers should be transformed in logarithms is obscure. For example, total GDP, more than a million dollars and obviously large, should be converted to logarithm form, whereas total agriculture value added per capita that has a minimum of 21 and maximum of 3017 is obscure to change its form. I conducted further analysis to determine whether the variables are shifted to logarithmic form. With 2019 cross-sectional data, I compared the R-squared and P-values of the three options below. With the other 8 variables, I considered three options and regressed all options:

dset['const'] = 1
reg=sm.OLS(endog=dset['unemp19'], exog=dset[['const','lGDP19','gGDP19','pGDP19','cpi19','ltotpop19','prur19','pEpop19','educ19','pagriv19','pmanv19','pserv19']], missing='drop')
results=reg.fit()
print(results.summary())
                            OLS Regression Results                            
==============================================================================
Dep. Variable:                unemp19   R-squared:                       0.191
Model:                            OLS   Adj. R-squared:                  0.113
Method:                 Least Squares   F-statistic:                     2.464
Date:                Wed, 08 Dec 2021   Prob (F-statistic):            0.00835
Time:                        19:02:58   Log-Likelihood:                -359.34
No. Observations:                 127   AIC:                             742.7
Df Residuals:                     115   BIC:                             776.8
Df Model:                          11                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
const         19.3945      8.084      2.399      0.018       3.382      35.407
lGDP19       -36.7477     87.569     -0.420      0.676    -210.205     136.709
gGDP19        -0.3941      0.191     -2.063      0.041      -0.773      -0.016
pGDP19        36.9468     87.550      0.422      0.674    -136.473     210.367
cpi19          0.2057      0.148      1.394      0.166      -0.087       0.498
ltotpop19     36.4120     87.563      0.416      0.678    -137.034     209.858
prur19        -0.0570      0.034     -1.666      0.098      -0.125       0.011
pEpop19       -0.0540      0.097     -0.558      0.578      -0.245       0.138
educ19        -0.0929      0.198     -0.470      0.639      -0.484       0.299
pagriv19      -0.0008      0.001     -0.559      0.578      -0.004       0.002
pmanv19    -6.756e-05      0.000     -0.355      0.724      -0.000       0.000
pserv19    -9.818e-05    5.4e-05     -1.817      0.072      -0.000    8.84e-06
==============================================================================
Omnibus:                       57.633   Durbin-Watson:                   1.929
Prob(Omnibus):                  0.000   Jarque-Bera (JB):              180.131
Skew:                           1.714   Prob(JB):                     7.67e-40
Kurtosis:                       7.721   Cond. No.                     7.03e+06
==============================================================================

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The condition number is large, 7.03e+06. This might indicate that there are
strong multicollinearity or other numerical problems.
dset['const'] = 1
reg=sm.OLS(endog=dset['unemp19'], exog=dset[['const','lGDP19','gGDP19','pGDP19','cpi19','ltotpop19','prur19','pEpop19','educ19','lagriv19','lmanv19','lserv19']], missing='drop')
results=reg.fit()
print(results.summary())
                            OLS Regression Results                            
==============================================================================
Dep. Variable:                unemp19   R-squared:                       0.202
Model:                            OLS   Adj. R-squared:                  0.125
Method:                 Least Squares   F-statistic:                     2.642
Date:                Wed, 08 Dec 2021   Prob (F-statistic):            0.00476
Time:                        19:03:01   Log-Likelihood:                -358.47
No. Observations:                 127   AIC:                             740.9
Df Residuals:                     115   BIC:                             775.1
Df Model:                          11                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
const         39.6649      9.972      3.978      0.000      19.912      59.417
lGDP19       -66.3461     87.635     -0.757      0.451    -239.934     107.242
gGDP19        -0.3455      0.193     -1.792      0.076      -0.728       0.037
pGDP19        55.4592     87.244      0.636      0.526    -117.354     228.272
cpi19          0.1442      0.146      0.985      0.327      -0.146       0.434
ltotpop19     56.9223     87.191      0.653      0.515    -115.786     229.630
prur19        -0.0723      0.034     -2.150      0.034      -0.139      -0.006
pEpop19        0.0315      0.089      0.353      0.725      -0.145       0.208
educ19        -0.1799      0.207     -0.868      0.387      -0.591       0.231
lagriv19       0.1501      0.752      0.200      0.842      -1.340       1.640
lmanv19        1.0680      1.018      1.049      0.296      -0.948       3.084
lserv19        7.8418      3.203      2.448      0.016       1.497      14.187
==============================================================================
Omnibus:                       54.592   Durbin-Watson:                   1.914
Prob(Omnibus):                  0.000   Jarque-Bera (JB):              168.215
Skew:                           1.615   Prob(JB):                     2.97e-37
Kurtosis:                       7.621   Cond. No.                     3.69e+04
==============================================================================

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The condition number is large, 3.69e+04. This might indicate that there are
strong multicollinearity or other numerical problems.
dset['const'] = 1
reg=sm.OLS(endog=dset['unemp19'], exog=dset[['const','lGDP19','gGDP19','pGDP19','cpi19','ltotpop19','prur19','pEpop19','educ19','lpagriv19','lpmanv19','lpserv19']], missing='drop')
results=reg.fit()
print(results.summary())
                            OLS Regression Results                            
==============================================================================
Dep. Variable:                unemp19   R-squared:                       0.203
Model:                            OLS   Adj. R-squared:                  0.126
Method:                 Least Squares   F-statistic:                     2.656
Date:                Wed, 08 Dec 2021   Prob (F-statistic):            0.00456
Time:                        19:03:01   Log-Likelihood:                -358.41
No. Observations:                 127   AIC:                             740.8
Df Residuals:                     115   BIC:                             774.9
Df Model:                          11                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
const         39.8393      9.974      3.994      0.000      20.083      59.595
lGDP19       -63.6677     87.434     -0.728      0.468    -236.858     109.522
gGDP19        -0.3455      0.193     -1.793      0.076      -0.727       0.036
pGDP19        52.6861     87.095      0.605      0.546    -119.832     225.204
cpi19          0.1436      0.146      0.981      0.329      -0.146       0.433
ltotpop19     63.3025     87.424      0.724      0.470    -109.867     236.472
prur19        -0.0722      0.034     -2.147      0.034      -0.139      -0.006
pEpop19        0.0300      0.089      0.336      0.738      -0.147       0.207
educ19        -0.1814      0.207     -0.875      0.383      -0.592       0.229
lpagriv19      0.1638      0.753      0.217      0.828      -1.329       1.656
lpmanv19       1.0923      1.019      1.072      0.286      -0.926       3.110
lpserv19       7.9087      3.200      2.472      0.015       1.570      14.247
==============================================================================
Omnibus:                       54.186   Durbin-Watson:                   1.913
Prob(Omnibus):                  0.000   Jarque-Bera (JB):              165.336
Skew:                           1.606   Prob(JB):                     1.25e-36
Kurtosis:                       7.574   Cond. No.                     3.32e+04
==============================================================================

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The condition number is large, 3.32e+04. This might indicate that there are
strong multicollinearity or other numerical problems.
dset['const'] = 1
reg19=sm.OLS(endog=dset['unemp19'], exog=dset[['const','lGDP19','gGDP19','cpi19','ltotpop19','prur19','lpmanv19','lpserv19']], missing='drop')
results19=reg19.fit()
print(results19.summary())
                            OLS Regression Results                            
==============================================================================
Dep. Variable:                unemp19   R-squared:                       0.194
Model:                            OLS   Adj. R-squared:                  0.147
Method:                 Least Squares   F-statistic:                     4.093
Date:                Wed, 08 Dec 2021   Prob (F-statistic):           0.000465
Time:                        19:07:20   Log-Likelihood:                -359.08
No. Observations:                 127   AIC:                             734.2
Df Residuals:                     119   BIC:                             756.9
Df Model:                           7                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
const         38.5487      8.481      4.545      0.000      21.755      55.342
lGDP19        -9.8321      3.629     -2.710      0.008     -17.017      -2.647
gGDP19        -0.3611      0.186     -1.945      0.054      -0.729       0.006
cpi19          0.1442      0.143      1.008      0.315      -0.139       0.427
ltotpop19      9.4730      3.604      2.628      0.010       2.336      16.610
prur19        -0.0667      0.032     -2.089      0.039      -0.130      -0.003
lpmanv19       1.0411      1.000      1.041      0.300      -0.939       3.021
lpserv19       6.9725      2.999      2.325      0.022       1.034      12.911
==============================================================================
Omnibus:                       55.977   Durbin-Watson:                   1.875
Prob(Omnibus):                  0.000   Jarque-Bera (JB):              174.845
Skew:                           1.656   Prob(JB):                     1.08e-38
Kurtosis:                       7.698   Cond. No.                     1.36e+03
==============================================================================

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The condition number is large, 1.36e+03. This might indicate that there are
strong multicollinearity or other numerical problems.
print(results19.f_test("(cpi19 = lpmanv19 = 0)"))
<F test: F=array([[1.11948456]]), p=0.329861190185458, df_denom=119, df_num=2>

3.2. Stepwise Regression Method

Since some countries that more seriously suffered from Covid-19 in 2020 than others could not collect sufficient macroeconomic data, only 122 out of 166 countries had all the data. Because the number of dependent variables was about 10% of the total sample, overfitted variables could hurt the degree of freedom. As some variables were not statistically significant and decreased adjusted R-squared, I decided to decrease the number of variables based on both adjusted R and t-value.

dset['const'] = 1
reg=sm.OLS(endog=dset['unemp20'], exog=dset[['const','lGDP20','gGDP20','pGDP20','cpi20','ltotpop20','prur20','pEpop20','educ20','lpagriv20','lpmanv20','lpserv20']], missing='drop')
results=reg.fit()
print(results.summary())
                            OLS Regression Results                            
==============================================================================
Dep. Variable:                unemp20   R-squared:                       0.281
Model:                            OLS   Adj. R-squared:                  0.212
Method:                 Least Squares   F-statistic:                     4.087
Date:                Wed, 08 Dec 2021   Prob (F-statistic):           4.57e-05
Time:                        19:03:01   Log-Likelihood:                -355.55
No. Observations:                 127   AIC:                             735.1
Df Residuals:                     115   BIC:                             769.2
Df Model:                          11                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
const         48.3157     10.264      4.707      0.000      27.984      68.647
lGDP20       -69.9025     74.190     -0.942      0.348    -216.858      77.053
gGDP20        -0.1930      0.123     -1.571      0.119      -0.436       0.050
pGDP20        56.0472     74.133      0.756      0.451     -90.796     202.891
cpi20         -0.1127      0.054     -2.100      0.038      -0.219      -0.006
ltotpop20     69.5714     74.200      0.938      0.350     -77.405     216.548
prur20        -0.1204      0.031     -3.851      0.000      -0.182      -0.058
pEpop20        0.0245      0.092      0.267      0.790      -0.157       0.206
educ20        -0.1731      0.202     -0.855      0.394      -0.574       0.228
lpagriv20      0.4660      0.724      0.644      0.521      -0.968       1.900
lpmanv20       1.1662      1.015      1.149      0.253      -0.845       3.177
lpserv20       9.8633      3.904      2.526      0.013       2.130      17.597
==============================================================================
Omnibus:                       37.655   Durbin-Watson:                   1.869
Prob(Omnibus):                  0.000   Jarque-Bera (JB):               86.307
Skew:                           1.191   Prob(JB):                     1.81e-19
Kurtosis:                       6.261   Cond. No.                     2.88e+04
==============================================================================

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The condition number is large, 2.88e+04. This might indicate that there are
strong multicollinearity or other numerical problems.
dset['const'] = 1
reg20=sm.OLS(endog=dset['unemp20'], exog=dset[['const','lGDP20','gGDP20','cpi20','ltotpop20','prur20','lpmanv20','lpserv20']], missing='drop')
results20=reg20.fit()
print(results20.summary())
                            OLS Regression Results                            
==============================================================================
Dep. Variable:                unemp20   R-squared:                       0.270
Model:                            OLS   Adj. R-squared:                  0.227
Method:                 Least Squares   F-statistic:                     6.300
Date:                Wed, 08 Dec 2021   Prob (F-statistic):           2.68e-06
Time:                        19:07:43   Log-Likelihood:                -356.48
No. Observations:                 127   AIC:                             729.0
Df Residuals:                     119   BIC:                             751.7
Df Model:                           7                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
const         48.4946      8.886      5.457      0.000      30.899      66.091
lGDP20       -12.7050      4.322     -2.940      0.004     -21.263      -4.147
gGDP20        -0.1995      0.115     -1.728      0.087      -0.428       0.029
cpi20         -0.1046      0.052     -1.995      0.048      -0.208      -0.001
ltotpop20     12.3739      4.296      2.880      0.005       3.867      20.880
prur20        -0.1151      0.030     -3.834      0.000      -0.174      -0.056
lpmanv20       1.1668      0.994      1.174      0.243      -0.801       3.135
lpserv20       8.8898      3.755      2.367      0.020       1.454      16.325
==============================================================================
Omnibus:                       39.266   Durbin-Watson:                   1.837
Prob(Omnibus):                  0.000   Jarque-Bera (JB):               93.674
Skew:                           1.225   Prob(JB):                     4.56e-21
Kurtosis:                       6.420   Cond. No.                     1.53e+03
==============================================================================

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The condition number is large, 1.53e+03. This might indicate that there are
strong multicollinearity or other numerical problems.

3.3. Gauss-Markov Assumptions

I checked whether Gauss-Markov’s 5 assumptions of OLS were followed.

Assumption MLR. 1 : Linear in Parameters

Efficient models show the linear relationship between the unemployment rate (dependent variable) and independent variables (total GDP, GDP growth rate, cpi, % of the rural population, manufacturing value-added, and service value added). Therefore, the models follow the 1st assumption of MLR.

Assumption MLR. 2 : Random Sampling

I collected data from the World Bank across countries and used all available data. Data were not selected by specific rules or bias. Therefore, the samples in the models are unbiased and follow the 2nd assumption.

Assumption MLR. 3 : No perfect Collinearity

I calculated correlations between seven independent variables and could not find any perfect collinearity. The correlation between total GDP and the total population had a high correlation of 0.8557, and the correlation between manufacturing value added and service value added was also a high correlation with 0.9443, but those cases were not perfect collinearity since neither of them was 1 nor -1.

Assumption MLR. 4 : Zero Conditional Mean

Omitting an important independent variable that is highly correlated with independent variables could cause a violation of the 4th assumption. To verify whether the efficient models omit the variable, I calculated residuals for each model based on efficient models. I obtained the results of the correlation between residuals and independent variables, as well as the p-values of each independent variable. As a result, there was no statistically significant independent variable to the residual. Even the highest absolute value of correlation is the correlation of 0.00894 between cpi and residuals. The correlation is too small to have a certain relationship. In addition, most P-values gathered were above 0.9, and the smallest P-value is 0.3178. Considering the fact that the usual boundary is 5% or 10%, 31.75% of the P-value illustrates that the independent variable is not statistically significant. Furthermore, I analyzed the scatter plots between residuals and independent variables. I could confirm that there is no explicit relationship between the residuals and independent variables. Therefore, based on the results of correlation and P-value between residuals and independent variables, I concluded that the efficient models follow the 4th assumption, Zero conditional means.

cortest = dset.loc[:,['lGDP19','gGDP19','cpi19','ltotpop19','prur19','lpmanv19','lpserv19']]
crrMat = cortest.corr()
print(crrMat)
             lGDP19    gGDP19     cpi19  ltotpop19    prur19  lpmanv19  \
lGDP19     1.000000 -0.070904 -0.079772   0.855033 -0.166483  0.342336   
gGDP19    -0.070904  1.000000 -0.093235   0.095691  0.427495 -0.299833   
cpi19     -0.079772 -0.093235  1.000000   0.090869  0.172876 -0.289869   
ltotpop19  0.855033  0.095691  0.090869   1.000000  0.254569 -0.173570   
prur19    -0.166483  0.427495  0.172876   0.254569  1.000000 -0.749861   
lpmanv19   0.342336 -0.299833 -0.289869  -0.173570 -0.749861  1.000000   
lpserv19   0.260932 -0.325195 -0.305918  -0.275285 -0.781109  0.949266   

           lpserv19  
lGDP19     0.260932  
gGDP19    -0.325195  
cpi19     -0.305918  
ltotpop19 -0.275285  
prur19    -0.781109  
lpmanv19   0.949266  
lpserv19   1.000000  
cortest = dset.loc[:,['lGDP20','gGDP20','cpi20','ltotpop20','prur20','lpmanv20','lpserv20']]
crrMat = cortest.corr()
print(crrMat)
             lGDP20    gGDP20     cpi20  ltotpop20    prur20  lpmanv20  \
lGDP20     1.000000  0.108832 -0.112325   0.855731 -0.158579  0.344366   
gGDP20     0.108832  1.000000 -0.233817   0.205857  0.262347 -0.093266   
cpi20     -0.112325 -0.233817  1.000000   0.010819  0.036621 -0.303645   
ltotpop20  0.855731  0.205857  0.010819   1.000000  0.253146 -0.168678   
prur20    -0.158579  0.262347  0.036621   0.253146  1.000000 -0.726935   
lpmanv20   0.344366 -0.093266 -0.303645  -0.168678 -0.726935  1.000000   
lpserv20   0.256241 -0.233699 -0.202287  -0.278815 -0.770208  0.944298   

           lpserv20  
lGDP20     0.256241  
gGDP20    -0.233699  
cpi20     -0.202287  
ltotpop20 -0.278815  
prur20    -0.770208  
lpmanv20   0.944298  
lpserv20   1.000000  
influence19 = results19.get_influence()
std_resid19 = influence19.resid_studentized_internal
influence20 = results20.get_influence()
std_resid20 = influence20.resid_studentized_internal
#print(std_resid, len(std_resid))
plt.scatter(dset['lGDP19'], std_resid19)
plt.xlabel('x')
plt.ylabel('Standardized Residuals')

plt.show()

family_1

def scatter_resid(col_1):
    
    """
    Break down scatterplots into different years
    """
    
    fig, ax = plt.subplots(1, 2, figsize=(8,4), sharey=True)
    
    ax[0].scatter(x=dset[col_1+"19"], y=std_resid19, alpha=0.4)
    ax[1].scatter(x=dset[col_1+"20"], y=std_resid20, alpha=0.4)

    ax[0].set_title("2019", fontsize=14, fontname="Verdana")
    ax[1].set_title("2020", fontsize=14, fontname="Verdana")
    
    ax[0].axhline(y = 0, color = 'black', linestyle = '--', linewidth = 1)
    ax[1].axhline(y = 0, color = 'black', linestyle = '--', linewidth = 1)
    
    print(col_1)
    print(stats.pearsonr(std_resid19, dset[col_1+"19"]))
    print(stats.pearsonr(std_resid20, dset[col_1+"20"]))
    
    for i in list(range(2)):
        ax[i].set_xlabel(col_1)
        ax[i].set_ylabel("Residual")
scatter_resid("lGDP")
scatter_resid("gGDP")
scatter_resid("cpi")
scatter_resid("ltotpop")
scatter_resid("prur")
scatter_resid("lpmanv")
scatter_resid("lpserv")
lGDP
(-0.0013134211532044714, 0.9883072911292736)
(0.005206318951894936, 0.9536749403781897)
gGDP
(0.009785490638838816, 0.9130524755959806)
(0.037459742552051624, 0.6758573989626463)
cpi
(0.0010644766697441763, 0.9905234000463365)
(-0.08935426301154548, 0.3177873726782726)
ltotpop
(-0.0010154694931416813, 0.9909596719838065)
(0.003667705091435988, 0.9673561671831085)
prur
(0.0009132214612353689, 0.991869913382784)
(0.011434235118338005, 0.8984755115745292)
lpmanv
(0.0021863829115624264, 0.9805369832997183)
(0.013932887200478413, 0.8764502266134842)
lpserv
(-0.0013055649129459818, 0.9883772262686317)
(-0.000380295045957475, 0.99661432030712)

family_1

family_1

family_1

family_1

family_1

family_1

family_1

cortest = dset.loc[:,['lGDP19','gGDP19','cpi19','ltotpop19','prur19','lpmanv19','lpserv19']]
cortest['std_resid19'] = std_resid19
crrMat = cortest.corr()
print(crrMat)
               lGDP19    gGDP19     cpi19  ltotpop19    prur19  lpmanv19  \
lGDP19       1.000000 -0.070904 -0.079772   0.855033 -0.166483  0.342336   
gGDP19      -0.070904  1.000000 -0.093235   0.095691  0.427495 -0.299833   
cpi19       -0.079772 -0.093235  1.000000   0.090869  0.172876 -0.289869   
ltotpop19    0.855033  0.095691  0.090869   1.000000  0.254569 -0.173570   
prur19      -0.166483  0.427495  0.172876   0.254569  1.000000 -0.749861   
lpmanv19     0.342336 -0.299833 -0.289869  -0.173570 -0.749861  1.000000   
lpserv19     0.260932 -0.325195 -0.305918  -0.275285 -0.781109  0.949266   
std_resid19 -0.001313  0.009785  0.001064  -0.001015  0.000913  0.002186   

             lpserv19  std_resid19  
lGDP19       0.260932    -0.001313  
gGDP19      -0.325195     0.009785  
cpi19       -0.305918     0.001064  
ltotpop19   -0.275285    -0.001015  
prur19      -0.781109     0.000913  
lpmanv19     0.949266     0.002186  
lpserv19     1.000000    -0.001306  
std_resid19 -0.001306     1.000000  
cortest = dset.loc[:,['lGDP20','gGDP20','cpi20','ltotpop20','prur20','lpmanv20','lpserv20']]
cortest['std_resid20'] = std_resid20
crrMat = cortest.corr()
print(crrMat)
               lGDP20    gGDP20     cpi20  ltotpop20    prur20  lpmanv20  \
lGDP20       1.000000  0.108832 -0.112325   0.855731 -0.158579  0.344366   
gGDP20       0.108832  1.000000 -0.233817   0.205857  0.262347 -0.093266   
cpi20       -0.112325 -0.233817  1.000000   0.010819  0.036621 -0.303645   
ltotpop20    0.855731  0.205857  0.010819   1.000000  0.253146 -0.168678   
prur20      -0.158579  0.262347  0.036621   0.253146  1.000000 -0.726935   
lpmanv20     0.344366 -0.093266 -0.303645  -0.168678 -0.726935  1.000000   
lpserv20     0.256241 -0.233699 -0.202287  -0.278815 -0.770208  0.944298   
std_resid20  0.005206  0.037460 -0.089354   0.003668  0.011434  0.013933   

             lpserv20  std_resid20  
lGDP20       0.256241     0.005206  
gGDP20      -0.233699     0.037460  
cpi20       -0.202287    -0.089354  
ltotpop20   -0.278815     0.003668  
prur20      -0.770208     0.011434  
lpmanv20     0.944298     0.013933  
lpserv20     1.000000    -0.000380  
std_resid20 -0.000380     1.000000  

Assumption MLR. 5 : Homoscedasticity

To test whether the models are heteroscedasticity, I utilized the Breusch and Pagan test. The B-P (Breusch and Pagan) test is the regression result between squared residuals and independent variables. If the f-statistic of the model is higher than the critical value and the p-value of the f-statistic is less than the significance level, the regression model shows heteroscedasticity. The regression results are below

resid19 = results19.resid
resid19sq = resid19**2
resid20 = results20.resid
resid20sq = resid20**2
dset['const'] = 1
reg19=sm.OLS(endog=resid19sq, exog=dset[['const','lGDP19','gGDP19','cpi19','ltotpop19','prur19','lpmanv19','lpserv19']], missing='drop')
results19=reg19.fit()
print(results19.summary())
                            OLS Regression Results                            
==============================================================================
Dep. Variable:                      y   R-squared:                       0.116
Model:                            OLS   Adj. R-squared:                  0.064
Method:                 Least Squares   F-statistic:                     2.236
Date:                Thu, 09 Dec 2021   Prob (F-statistic):             0.0359
Time:                        04:45:21   Log-Likelihood:                -650.88
No. Observations:                 127   AIC:                             1318.
Df Residuals:                     119   BIC:                             1341.
Df Model:                           7                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
const        303.7883     84.390      3.600      0.000     136.688     470.888
lGDP19       -70.6257     36.105     -1.956      0.053    -142.118       0.867
gGDP19        -2.6625      1.847     -1.441      0.152      -6.320       0.995
cpi19         -0.9938      1.423     -0.698      0.486      -3.811       1.824
ltotpop19     66.1401     35.864      1.844      0.068      -4.874     137.154
prur19        -0.3606      0.318     -1.134      0.259      -0.990       0.269
lpmanv19      16.2500      9.949      1.633      0.105      -3.450      35.950
lpserv19      39.8846     29.843      1.336      0.184     -19.207      98.976
==============================================================================
Omnibus:                      187.839   Durbin-Watson:                   2.122
Prob(Omnibus):                  0.000   Jarque-Bera (JB):            10881.012
Skew:                           5.829   Prob(JB):                         0.00
Kurtosis:                      46.822   Cond. No.                     1.36e+03
==============================================================================

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The condition number is large, 1.36e+03. This might indicate that there are
strong multicollinearity or other numerical problems.
dset['const'] = 1
reg20=sm.OLS(endog=resid20sq, exog=dset[['const','lGDP20','gGDP20','cpi20','ltotpop20','prur20','lpmanv20','lpserv20']], missing='drop')
results20=reg20.fit()
print(results20.summary())
                            OLS Regression Results                            
==============================================================================
Dep. Variable:                      y   R-squared:                       0.143
Model:                            OLS   Adj. R-squared:                  0.092
Method:                 Least Squares   F-statistic:                     2.832
Date:                Thu, 09 Dec 2021   Prob (F-statistic):            0.00914
Time:                        05:07:24   Log-Likelihood:                -630.30
No. Observations:                 127   AIC:                             1277.
Df Residuals:                     119   BIC:                             1299.
Df Model:                           7                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
const        290.6908     76.749      3.788      0.000     138.721     442.661
lGDP20       -79.3629     37.328     -2.126      0.036    -153.276      -5.449
gGDP20        -0.8246      0.997     -0.827      0.410      -2.799       1.149
cpi20         -0.4560      0.453     -1.007      0.316      -1.352       0.440
ltotpop20     75.2324     37.103      2.028      0.045       1.765     148.700
prur20        -0.3584      0.259     -1.382      0.169      -0.872       0.155
lpmanv20      18.7222      8.583      2.181      0.031       1.727      35.717
lpserv20      46.3854     32.432      1.430      0.155     -17.833     110.604
==============================================================================
Omnibus:                      192.665   Durbin-Watson:                   2.070
Prob(Omnibus):                  0.000   Jarque-Bera (JB):            13113.393
Skew:                           6.013   Prob(JB):                         0.00
Kurtosis:                      51.306   Cond. No.                     1.53e+03
==============================================================================

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The condition number is large, 1.53e+03. This might indicate that there are
strong multicollinearity or other numerical problems.
dset['const'] = 1
reg19=sm.OLS(endog=dset['unemp19'], exog=dset[['const','lGDP19','gGDP19','cpi19','ltotpop19','prur19','lpmanv19','lpserv19']], missing='drop')
results19=reg19.fit(cov_type = 'HC1')
print(results19.summary())
                            OLS Regression Results                            
==============================================================================
Dep. Variable:                unemp19   R-squared:                       0.194
Model:                            OLS   Adj. R-squared:                  0.147
Method:                 Least Squares   F-statistic:                     4.144
Date:                Thu, 09 Dec 2021   Prob (F-statistic):           0.000411
Time:                        05:13:31   Log-Likelihood:                -359.08
No. Observations:                 127   AIC:                             734.2
Df Residuals:                     119   BIC:                             756.9
Df Model:                           7                                         
Covariance Type:                  HC1                                         
==============================================================================
                 coef    std err          z      P>|z|      [0.025      0.975]
------------------------------------------------------------------------------
const         38.5487     10.194      3.781      0.000      18.569      58.529
lGDP19        -9.8321      3.385     -2.904      0.004     -16.468      -3.197
gGDP19        -0.3611      0.268     -1.345      0.179      -0.887       0.165
cpi19          0.1442      0.109      1.325      0.185      -0.069       0.357
ltotpop19      9.4730      3.331      2.844      0.004       2.945      16.001
prur19        -0.0667      0.035     -1.881      0.060      -0.136       0.003
lpmanv19       1.0411      1.159      0.898      0.369      -1.231       3.313
lpserv19       6.9725      2.948      2.365      0.018       1.194      12.751
==============================================================================
Omnibus:                       55.977   Durbin-Watson:                   1.875
Prob(Omnibus):                  0.000   Jarque-Bera (JB):              174.845
Skew:                           1.656   Prob(JB):                     1.08e-38
Kurtosis:                       7.698   Cond. No.                     1.36e+03
==============================================================================

Notes:
[1] Standard Errors are heteroscedasticity robust (HC1)
[2] The condition number is large, 1.36e+03. This might indicate that there are
strong multicollinearity or other numerical problems.
dset['const'] = 1
reg20=sm.OLS(endog=dset['unemp20'], exog=dset[['const','lGDP20','gGDP20','cpi20','ltotpop20','prur20','lpmanv20','lpserv20']], missing='drop')
results20=reg20.fit(cov_type = 'HC1')
print(results20.summary())
                            OLS Regression Results                            
==============================================================================
Dep. Variable:                unemp20   R-squared:                       0.270
Model:                            OLS   Adj. R-squared:                  0.227
Method:                 Least Squares   F-statistic:                     5.564
Date:                Thu, 09 Dec 2021   Prob (F-statistic):           1.46e-05
Time:                        05:17:33   Log-Likelihood:                -356.48
No. Observations:                 127   AIC:                             729.0
Df Residuals:                     119   BIC:                             751.7
Df Model:                           7                                         
Covariance Type:                  HC1                                         
==============================================================================
                 coef    std err          z      P>|z|      [0.025      0.975]
------------------------------------------------------------------------------
const         48.4946     10.307      4.705      0.000      28.293      68.696
lGDP20       -12.7050      4.051     -3.137      0.002     -20.644      -4.766
gGDP20        -0.1995      0.121     -1.652      0.099      -0.436       0.037
cpi20         -0.1046      0.044     -2.377      0.017      -0.191      -0.018
ltotpop20     12.3739      4.005      3.090      0.002       4.525      20.223
prur20        -0.1151      0.030     -3.788      0.000      -0.175      -0.056
lpmanv20       1.1668      1.046      1.116      0.265      -0.883       3.217
lpserv20       8.8898      3.428      2.593      0.010       2.171      15.609
==============================================================================
Omnibus:                       39.266   Durbin-Watson:                   1.837
Prob(Omnibus):                  0.000   Jarque-Bera (JB):               93.674
Skew:                           1.225   Prob(JB):                     4.56e-21
Kurtosis:                       6.420   Cond. No.                     1.53e+03
==============================================================================

Notes:
[1] Standard Errors are heteroscedasticity robust (HC1)
[2] The condition number is large, 1.53e+03. This might indicate that there are
strong multicollinearity or other numerical problems.