Summary: This article highlights the importance of machine learning algorithms and traditional econometrics models. Using a classic classroom example, this article suggests that a student of economics should use both tools in economic modeling.
While teaching econometrics, the fundamental challenge we face is to choose the perfect example or data to use while explaining econometrics and the importance of being present in class. Most of the time, I ended up using the classic example of grade points (GPA or SGPA) and how it is affected by attendance, IQ, internal marks, and so on. The example and the method to prove to my students that these factors are crucial for getting a good grade remained the same for the past few years. The mighty ordinary linear least square (OLS) regressions always do their tricks and show that the student will get lower grades if they perform poorly in the internal exams or have less attendance. However, I always questioned whether the OLS is the best model. In most cases, students are in their first year. I cannot teach them non-linear equations, time-varying state space, or any fancy model that may fit the data perfectly.
Figure 1: SGPA and Average Internal Marks of the students of the Department of Economics |
An OLS model seems
perfect for the data presented in Figure 1. However, the data has a higher
dispersion at specific ranges, such as 60 to 70 or 87 to 94, which is a classic
case of heteroskedasticity. One can remove these data points and label
them outliers, but then the students will question my intention. So, removing
data points or applying a complex model is not an option.
If a student who has an average internal mark of 65 approaches me and wants to know the predicted SGPA,
I will use OLS to show that, based on the regression result of Table 1, the
student will get an average SGPA of 4.9 with a mean squared error (MSE) of 0.93
and r2=0.81. However, as I mentioned, students in this cluster have a
higher variation, which means my prediction may be misleading.
Table 1: Simple OLS result of SGPA on Average Internal Marks |
Figure 2: OLS prediction of SGPA for the student with 65 average internal marks |
In the era of data
analytics and machine learning, I should use machine learning techniques to
predict my students' SGPAs. One of the basic methods is the K-Nearest Neighbourhood
algorithm.
Figure 3: K- Nearest Neighborhood prediction of SGPA for a student with 65 Average Internal Marks |
The data's clustering behavior may still lead to wrong predictions. So, I used the decision tree algorithm, which is more appropriate when neighboring clusters display different patterns or the data has a more complex pattern. Using a basic decision tree algorithm, I predicted that the student with an average of 65 internal marks might get an SGPA of 6.28 with a mean squared error of 2.03 and r2=0.41, which is way above the OLS prediction (figure 4).
Figure 4: Decision tree prediction of SGPA for a student with 65 average internal marks |
All three models have strengths and weaknesses; no one can claim that one model is better in all situations. As the literature has mentioned, there is always a trade-off between unbiasedness and standard error. So the investigator should be careful while using these models for forecasting or predicting a variable. Although machine learning algorithms are popular, OLS is a powerful and simple technique with a solid theoretical background. The overall relationship between Internal marks and SGPA or attendance and SGPA is positive and significant as predicted by the OLS. And remember, under all the assumptions of classical linear regression, OLS is still BLUE (Best Linear Unbiased Estimate).
Please Note: Don’t take this post seriously. Econometrics is just for fun. (All Python codes are available in open sources.)
By
Dr. Akash Kumar Baikar
Assistant Professor, Department of Economics, SBSS, MRIIRS