Recently, OpenAI collaborated with UberAI to propose a new approach — Synthetic Petri Dish — for accelerating the most expensive step of Neural Architecture Search (NAS). The researchers explored whether the computational efficiency of NAS can be improved by creating a new kind of surrogate, one that can benefit from miniaturised training and still generalise beyond the observed distribution of ground-truth evaluations.
Deep neural networks have been witnessing success and are able to mitigate various business challenges such as speech recognition, image recognition, machine translation, among others for a few years now.
According to the researchers, Neural Architecture Search (NAS) explores a large space of architectural motifs and is a compute-intensive process that often involves ground-truth evaluation of each motif by instantiating it within a large network, and training and evaluating the network with thousands or more data samples. By motif, the researchers meant the design of a repeating recurrent cell or activation function that is repeated often in a larger Neural Network blueprint.
Behind Synthetic Petri Dish
In this work, the researchers took inspiration from an idea in biology and materialised this idea with machine learning, the application of a Synthetic Petri Dish is created that aims to identify high-performing architectural motifs. Thus, the approach proposed in this research attempted to algorithmically recreate this kind of scientific process for the purpose of finding better neural network motifs.
According to the researchers, the aim of the Synthetic Petri Dish is to create a microcosm training environment such that the performance of a small-scale motif trained within it well-predicts performance of the fully-expanded motif in the ground-truth evaluation.
How It Works
In the above figure, the left figure illustrates the inner-loop and outer-loop training of Synthetic Petri Dish procedure. The motifs (in this example, activation functions) are extracted from the full network (e.g a 2-layer, 100 wide MLP) and instantiated in separate, much smaller motif-networks (e.g. a two-layer, single-neuron MLP).
The motif-networks are then trained in the inner-loop with the synthetic training data and evaluated using synthetic validation data. In the outer-loop, an average mean squared error loss is computed between the normalised Petri dish validation losses and the corresponding normalised ground-truth losses. Synthetic training and validation data are optimised by taking gradient steps with respect to the outer-loop loss.
How Is It Different From Other Net Models
According to the researchers, unlike other neural network-based prediction models that parse the structure of the motif to estimate its performance, the Synthetic Petri Dish predicts the performance of the motif by training the actual motif in an artificial setting, thus deriving predictions from its true intrinsic properties.
The researchers compared Synthetic Petri Dish to the control of training a neural network surrogate model to predict performance as a function of the sigmoid slope. This NN-based surrogate control is a 2-layer, 10-neuron-wide feedforward network that takes the sigmoid value as input and predicts the corresponding MNIST network validation accuracy as its output.
Unlike this Neural Network-based model that predicts the performance of new motifs based on their scalar value, the Synthetic Petri Dish trains and evaluates each new motif independently with synthetic data, which means it actually uses a NN with a particular sigmoidal slope in a small experiment and thus should have better information regarding how well this slope performs.
Key Takeaways From This Research:
Synthetic Petri Dish has the capability to predict the performance of new motifs with significantly higher accuracy, especially when insufficient ground truth data is available
According to the researchers, this research can inspire a new research direction in studying the performance of extracted components of models in a synthetic diagnostic setting optimised to provide informative evaluations
The researchers stated that by approaching architecture search in this way as a kind of question-answering problem on how certain motifs or factors impact final results, they gained the intriguing advantage that the prediction model is no longer a black box.