Algorithms for Advanced Hyper-Parameter Optimization/Tuning

Most Professional Machine Learning practitioners follow the ML Pipeline as a standard, to keep their work efficient and to keep the flow of work. A pipeline is created to allow data flow from its raw format to some useful information. All sub-fields in this pipeline’s modules are equally important for us to produce quality results, and one of them is Hyper-Parameter Tuning.

A Generalized Machine Learning Pipeline

Most of us know the best way to proceed with Hyper-Parameter Tuning is to use the GridSearchCV or RandomSearchCV from the sklearn module. But apart from these algorithms, there are many other Advanced methods for Hyper-Parameter Tuning. This is what the article is all about, Introduction to Advanced Hyper-Parameter Optimization, Transfer Learning and when & how to use these algorithms to make out the best of them.

Both of the algorithms, Grid-Search and Random-Search are instances of Uninformed Search. Now, let’s dive deep !!

Uninformed search

Here in these algorithms, each iteration of the Hyper-parameter tuning does not learn from the previous iterations. This is what allows us to parallelize our work. But, this isn’t very efficient and costs a lot of computational power.

Random search tries out a bunch of hyperparameters from a uniform distribution randomly over the preset list/hyperparameter search space (the number iterations is defined). It is good in testing a wide range of values and normally reaches to a very good combination very fastly, but the problem is that, it doesn’t guarantee to give the best parameter’s combination.

On the other hand, Grid search will give the best combination, but it can takes a lot of time and the computational cost is high.

Searching Pattern of Grid and Random Search

It may look like grid search is the better option, compared to the random one, but bare in mind that when the dimensionality is high, the number of combinations we have to search is enormous. For example, to grid-search ten boolean (yes/no) parameters you will have to test 1024 (2¹⁰) different combinations. This is the reason, why random search is sometimes combined with clever heuristics, is often used.

Why bring Randomness in Grid Search? [Mathematical Explanation]

Random search is more of a stochastic search/optimization perspective — the reason we introduce noise (or some form of stochasticity) into the process is to potentially bounce out of poor local minima. While this is more typically used to explain the intuition in gener