The deviation of the prediction from the actual was quite narrow. In Machine Learning, this deviation (between actual and prediction) is called Error.
Error = Irreducible Error + Reducible Error.
Irreducible Error
If there is any missing information in your historic data - means information for the marketing cost of the event was not captured at all. The deviation that occurs in your prediction, is called as Irreducible Error. In other words, Irreducible error occurs due to the inherent error present in the measurement system and cannot be controlled or reduced even by building good ML models.
Reducible Error
The reducible error is the element that we can improve. It is the quantity that we reduce when the model is learning on a training dataset and we try to get this number as close to zero as possible.
Reducible Error = Bias + Variance
The total reducible error is the sum of total bias and variance in the data. Hence, when one increases, the other decreases.
Bias
Bias is the difference between the average prediction of our model and the correct value which we are trying to predict. Model with high bias pays very little attention to the training data and oversimplifies the model. It always leads to high error on training and test data.
Variance
Variance is the variability of model prediction for a given data point or a value which tells us spread of our data. Model with high variance pays a lot of attention to training data and does not generalize on the data which it hasn’t seen before. As a result, such models perform very well on training data but has high error rates on test data.
Bias vs Variance
High Bias | Low Variance | Low Bias | High Variance |
Consistent, but inaccurate on an average | Accurate on an average, but inconsistent |
Model is too simple, Does not capture the relationship between the predictors and the response variable | Model can become complex. Capture all the variations/ noise in the data |
Cannot generalize to unseen data | Cannot generalize to unseen data |
Under-fits the train data. Performs poorly on the train and text (unseen) data. | Over-fits the train data. Performs very well on the train data, but poorly on the test (unseen) data |
Eg., Linear Regression, Logistic Regression, Linear Discriminant Analysis | Eg., Decision Trees, K-Nearest Neighbors. |
With a high bias and low variance, the error in predictions in both the training and test data is high (Underfitting).
With a high variance and low bias, the prediction error in the train data is very less, while the error in test data is quite high (Overfitting).
How to achieve Bias-Variance trade-off?
Reducing Bias Error
Increasing the features or the number of predictors to estimate the target will reduce bias. More features allow the model to better understand the relationship between the predictors and response variable.
Reducing Variance Error
Increasing the training samples will reduce variance. More samples increase the data to noise ratio and hence reduces the variance. Intuitively, this leans over the law of large numbers, which states that as the sample size increases, the data becomes representative of the population, thereby reducing variance.
While the above two methods serve as the first hand treatment to achieve bias-variance trade-off, the following are a few other ways to achieve optimal bias and variance.
Fit the model with the best model parameters.
Tune the hyperparameters.
Use Cross Validation, ensemble, bagging, boosting techniques.
Resource: Wikipedia, Medium
The Tech Platform
Comments