Accelerate Hyperparameter Tuning in Deep Learning with the Keras Hyperband Tuner

The performance of machine learning algorithms is highly dependent on selecting a good set of hyperparameters. The Keras Tuner is a package that helps you select the best set of hyperparameters for your application. The process of finding the optimal collection of hyperparameters for your machine learning or deep learning application is known as hyperparameter tuning. Hyperband is a hyperparameter tuning framework that helps speed up the process of hyperparameter tuning. This article will focus on understanding the hyperband frame. Here are the topics to be covered in this article.

Contents

  1. About HPO Approaches
  2. What is a hyperband?
  3. Bayesian vs Hyperband Optimization
  4. Hyperband Operation

Hyperparameters are not model parameters and cannot be learned directly from data. When we optimize a loss function with something like gradient descent, we learn the parameters of the model during training. Let’s talk about Hyperband and try to understand the need for its creation.

About HPO Approaches

The approach of tuning the hyperparameters of machine learning algorithms is known as hyperparameter optimization (HPO). Excellent machine learning algorithms have diverse, diverse, and complicated hyperparameters that produce a massive search space. Deep learning is used as the basis for many startup processes, and the search space for deep learning methods is considerably larger than for typical ML algorithms. Tuning on a large search space is a difficult task. Data-driven strategies should be used to address HPO difficulties. Manual approaches do not work.

Analytics India Magazine

Are you looking for a comprehensive repository of Python libraries used in data science, check here.

What is a hyperband?

By defining hyperparameter optimization as a pure exploration adaptive resource allocation problem dealing with how to distribute resources among randomly chosen hyperparameter configurations, a new technique for configuration evaluation has been devised. . This is called a Hyperband configuration. It allocates resources using a logical early stopping technique, which allows it to test orders of magnitude more configurations than black box processes such as Bayesian optimization methods. Unlike previous configuration assessment methodologies, Hyperband is a general purpose tool that makes few assumptions.

The ability of Hyperband to adapt to unknown convergence rates and the behavior of validation losses depending on hyperparameters have been proven by the developers in the theoretical study. Additionally, for a range of deep learning and kernel-based learning problems, Hyperband is 5 to 30 times faster than typical Bayesian optimization techniques. In the non-stochastic environment, Hyperband is a solution with similar properties to the pure exploration and infinite gunman problem.

The need for hyperband

The hyperparameters are input to a machine learning algorithm that governs the generalization of the algorithm’s performance to unseen data. Due to the increasing number of tuning parameters associated with these models, it is difficult to define them by standard optimization techniques.

In an effort to develop more efficient research methods, Bayesian optimization approaches that focus on optimizing hyperparameter configuration selection have recently dominated the topic of hyperparameter optimization. By choosing configurations adaptively, these approaches seek to discover good configurations faster than typical baselines such as random search. These approaches, however, deal with the inherently difficult problem of fitting and optimizing a high-dimensional non-convex function with uncertain regularity and possibly noisy evaluations.

The goal of an orthogonal approach to hyperparameter optimization is to speed up configuration evaluation. These methods are computationally adaptive, providing greater resources to promising hyperparameter combinations while rapidly discarding bad ones. Training set size, feature count, or iteration count for iterative algorithms are all examples of resources.

These techniques seek to analyze orders of magnitude more hyperparameter configurations than approaches that uniformly train all configurations to completion, thereby quickly discovering the appropriate hyperparameters. Hyperband is designed to speed up random search by providing a simple and theoretically sound starting point.

Bayesian vs Hyperband Optimization

Bayesian optimization Hyperband
A probabilistic model A model based on bandits
Learns an expensive objective function by past observation. In each given situation, the goal is to reduce simple regret as quickly as possible, defined as the distance to the best choice.
Bayesian optimization is only applicable to continuous hyperparameters, not categorical ones. Hyperband can work for continuous and categorical hyperparameters

Hyperband Operation

Hyperband calls the SuccessiveHalving technique introduced for hyperparameter optimization a subroutine and enhances it. The original method of successive halving takes its name from the theory behind it: evenly distribute a budget to a collection of hyperparameter configurations, evaluate the performance of all configurations, eliminate the worst half, and repeat until only one configuration remains. More promising combinations receive exponentially more resources from the algorithm.

The Hyperband algorithm is made up of two parts.

  • For fixed configuration and resource levels, the inner loop is called successive halving.
  • The outer loop iterates over various resource configurations and settings.

Each loop that performs the SuccessiveHalving in Hyperband is called a “support”. Each slice is intended to consume a portion of the total resource budget and is a distinct trade-off between n and B/n. Therefore, a single Hyperband runtime has a limited budget. Two inputs are required for hyperband.

  • The most resources that can be assigned to a single configuration
  • An input that determines the number of setups discarded in each successive round of halving

The two entries determine the number of distinct brackets examined; in particular, various configuration parameters. Hyperband starts with the most aggressive support, which configures the configuration to maximize exploration while requiring that at least one configuration be allocated R resources. Each consecutive slice decreases the number of configurations by a factor until the last slice, which allocates resources to all configurations. As a result, Hyperband performs a geometric search in the average budget per configuration, eliminating the requirement to choose the number of configurations for a defined budget at a certain cost.

Settings

  • hypermodel: Keras tuner class that allows you to create and develop models using a search space.
  • objective: This is the loss function for the model described in the hypermodel, like ‘mse’ or ‘val_loss’. It has data type string. If the parameter is a string, the optimization direction (minimum or maximum) will be deduced. If we have a list of goals, we will minimize the sum of all goals to be minimized while maximizing the total of all goals to be maximized.
  • max_epochs: The number of epochs needed to train a single model. It is advisable to set this value to a value slightly higher than the estimated convergence epochs for your largest model and to use an early stop during training. The default is 100.
  • factor: Integer, factor for reducing the number of epochs and the number of models for each slice. the default value is 3.
  • hyperband_iterations: The number of times the Hyperband algorithm is iterated. In all trials, an iteration will run over max epochs * (math.log(max epochs, factor) ** 2) cumulative epochs. Set it to the highest number that fits your resource budget. The default value is 1.
  • plant: An optional integer that serves as a random seed.
  • hyperparameters: Optional HyperParameters instance. Can be used to override (or pre-register) search space hyperparameters.
  • set new entries: Boolean indicating whether or not hyperparameter entries required by the hypermodel but not defined in the hyperparameters should be included in the search space. If not, the default values ​​for these parameters will be used. True is the default.
  • allow new entries: The hypermodel is allowed to request hyperparameter inputs that are not mentioned in the hyperparameters. True is the default.

Conclusion

Since the arms are self-contained and randomly sampled, the hyperband has the potential to be parallelized. The simplest basic parallelization approach is to distribute individual successive halving brackets to separate computers. With this article, we have understood the bandit-based hyperparameter tuning algorithm and its variation from Bayesian optimization.

References

Comments are closed.