Morale Function and Model Error
Learn about the true (morale) function and the total (model) error.
Let’s use the interp1d
module to create an interpolation function to fill in the week and morale points. We are using the kind='cubic'
parameter that indicates the smoothing for our interpolation.
True function
In the code example below, week_points
and morale_points
have values, which interp1d
uses to approximate the morale_func
function that gives the true values in morale_true
.
morale_func = interp1d(x=week_points, y=morale_points, kind='cubic')# try kind = 'linear' (default) and see the difference!morale_true = morale_func(days) # 84 days....check aboveprint("Number of datapoints in 'moral_true' '{}' for days'{}'".format(len(morale_true), len(days)))# print(morale_true)
Let’s plot the data points that we used in interp1d
on the left and the interpolated values on the right for our true function for days predicting morale. The trends should be the same in both plots, but the one after interpolation should be smoother with morale against each day.
# try this, the trend is the same as in the next plot! but with missing points, right?# plt.plot(week_points, morale_points);# At this stage, this code below should be self-explanatory!fig1 , (ax1, ax2) = plt.subplots(ncols=2,figsize=(16,6), sharey=True)# Available dataax1.plot(week_points, morale_points, lw=5.0, c='r', alpha=0.7, label='true function')ax1.scatter(week_points, morale_points, s = 100)# Interpolated dataax2.plot(days, morale_true, lw=5.0, c='r', alpha=0.7, label='true function')ax2.scatter(days, morale_true, s = 100)# Setting title, labels ...... etc!ax1.set_title('\nMorale over time (available data)\n')ax2.set_title('\nMorale over time (interpolated data)\n')ax1.set_xlabel('days\n')ax2.set_xlabel('days\n')ax1.set_ylabel('morale\n')ax1.legend(loc=2) # 'upper left'ax2.legend(loc=2)#'upper left'plt.tight_layout();
Our true function for morale can have the following interpretations.
With no measurement error:
All students may have the same morale at every time point (day), and the function represents no measurement error in the morale at the given time or day for any student. This is a situation where our measurement tool or survey was perfect, and we measured the same morale for every student at each time point.
What if there were some unavoidable issues in the instrument that randomly added some noise in each observation?
What if some external parameter (weather) affected the measurement tool someday and added an error (unavoidable and irreducible)?
With no individual variance:
We can interpret that our true function is the baseline morale for each time point, and all students vary around this function to some degree (±). A student’s morale at any given time point is baseline ± deviation. However, there is no individual variance in a student (for a particular student, the morale line is just an offset ± from the baseline). This might mean the variance is biased to the baseline.
Just a heads-up, while generating the data for an individual student, we’ll add some random noise in the true function to create some individual variance.
Average or mean across infinite students:
Our measurements of morale vary at each time point for an individual student. Still, if we had an infinite number of students and averaged all their morale measurements across all time points, we would have the true function of morale. We might need to factor in high variance or not being able to quantify the relationship.
In the situations above, we are trying to interpret morale as a function of time with no error. However, each situation is a different source of error.
Irreducible error: Occurs from an imperfect ability to measure morale because of some unavoidable reasons.
Bias error: Occurs from an imperfect relationship between time and morale.
Variance error: Occurs from an insufficient amount of good data that can correctly quantify the relationship(s).
These sources of errors combine, resulting in the final error in our trained model.
Note: We always have errors in our models. However, it depends on how much and what proportion of each type. We can play with bias and variance to find the sweet spot. However, we can’t do anything about the irreducible error.
Total error
There are three sources of error in a model:
We merely try to pin down where these different contributions are coming from in our model’s error and look for the average value that we expect to observe for the error (MSE) measured across all samples and all data points given to a particular model for training. Want to know a little more about the relationship above? Here is the typical formula, ...