Difference Between Overfitting and Underfitting

Overfitting Underfitting

​A statistical model is said to be overfitted when we train it with a lot of data

A statistical model or a machine learning algorithm is said to have underfitting when it cannot capture the underlying trend of the data.

​A solution to avoid overfitting is using a linear algorithm if we have linear data or using the parameters like the maximal depth if we are using decision trees.

Underfitting can be avoided by using more data and also reducing the features by feature selection.

The causes of overfitting are the non- parametric and non-linear methods because these types of machine learning algorithms have more freedom in building the model based on the dataset and therefore they can really build unrealistic models.

It usually happens when we have less data to build an accurate model and also when we try to build a linear model with fewer non-linear data.

High variance and low bias

High bias and low variance

When a model gets trained with so much data, it starts learning from the noise and inaccurate data entries in our data set. Then the model does not categorize the data correctly, because of too many details and noise.

Underfitting destroys the accuracy of our machine learning model. Its occurrence simply means that our model or the algorithm does not fit the data well enough.

Techniques to reduce Overfitting 1. Increase Training data 2. Reduce Model Complexity 3. Ridge regularization and Lasso Regularization 4. Use Dropout for neural network to tackle overfitting

Technique to reduce Underfitting 1. Increase Model Complexity 2. Increase the number of features, Performing feature engineering 3. Remove noise from the data 4. increase the number of epochs or increase the duration of training to get better results.

The Tech Platform