Neural Architecture Search

[ deep-learning  Neural-Architecture-Search  nas  nasnet  pnasnet  amoebanet  morphnet  nas-fpn  detnas  ]

There has been a lot of recent interest in automatically learning good neural net architectures.

current techniques usually fall into one of two categories: evolutionary algorithms or reinforcement learning.

  • When using evolutionary algorithms (EA), each neural network structure is encoded as a string, and random mutations and recombinations of the strings are performed during the search process; each string (model) is then trained and evaluated on a validation set, and the top performing models generate “children”.
  • When using reinforcement learning (RL), the agent performs a sequence of actions, which specifies the structure of the model; this model is then trained and its validation performance is returned as the reward, which is used to update the RNN controller.

Although both EA and RL methods have been able to learn network structures that outperform manually designed architectures, they require significant computational resources.


NASNet proposes to search for an architectural building block on a small dataset and then transfer the block to a larger dataset. For example, the best convolutional layer (or “cell”) searched on the CIFAR-10 dataset is applied to the ImageNet dataset by stacking together more copies of this cell, which is named a “NASNet architecture”

For ImageNet, Our model is 1.2% better in top-1 accuracy than the best human-invented architectures while having 9 billion fewer FLOPS – a reduction of 28% in computational demand from the previous state-of-the-art model.

In NAS, a controller recurrent neural network (RNN) samples child networks with different architectures. The child networks are trained to convergence to obtain some accuracy on a held-out validation set. The resulting accuracies are used to update the controller so that the controller will generate better architectures over time, via policy gradient (as reinforcement learning).

The network under searched is consisted of stacks of cells. Each cell receives as input two initial hidden states $h_i$ and $h_{i−1}$ which are the outputs of two cells in previous two lower layers or the input image. The controller RNN recursively predicts the rest of the structure of the convolutional cell, given these two initial hidden states. Each prediction contains five steps:

  • Step1. Select a hidden state from $h_i$, $h_{i−1}$ or from the set of hidden states created in previous blocks.
  • Step2. Select a second hidden state from the same options as in Step1.
  • Step3. Select an operation (some predefined convolution and pooling) to apply to the hidden state selected in Step1.
  • Step4. Select an operation to apply to the hidden state selected in Step2.
  • Step 5. Select a method to combine (element-wise add or concatenation) the outputs of Step 3 and 4 to create a new hidden state.


Progressive Neural Architecture Search (PNAS) propose a more efficient search method via a sequential model-based optimization (SMBO) strategy, in which we search for structures in order of increasing complexity, while simultaneously learning a surrogate model to guide the search through structure space.

Similar as NASNet, PNASNet searches for basic cell and the searched cell will be stacked to form the final network. Heuristic search is used to search the space of cell structures, starting with all the simple (shallow) models and progressing to complex ones, pruning out unpromising structures as we go. Since this process is expensive, we also learn a model or surrogate function which can predict the performance of a structure (very similar to the one already seen) without needing to training it.

The surrogate function is a LSTM, that reads a sequence of length 4b (representing input signal, the other input signal, operation on the first input signal and operation on the second input signal for each block), and the input at each step is a one-hot vector of size Ib or O , followed by embedding lookup. We use a shared embed- ding of dimension D for the tokens I1,I2 ∈ I, and another shared embedding for O1,O2 ∈ O. The final LSTM hidden state goes through a fully-connected layer and sigmoid to regress the validation accuracy.


AmoebaNet modifies the tournament selection evolutionary algorithm by introducing an age property to favor the younger genotypes.

This search space associates convolutional neural network architectures with small directed graphs in which vertices represent hidden states and labeled edges represent common network operations (such as convolutions or pooling layers). Our mutation rules only alter architectures by randomly reconnecting the origin of edges to different vertices and by randomly relabeling the edges, covering the full search space.

It keeps a population of P trained models throughout the experiment.

  • The population is initialized with models with random architectures. All architectures that conform to the search space described are possible and equally likely.
  • After this, evolution improves the initial population in cycles. At each cycle,
    • it samples S random models from the population, each drawn uniformly at random with replacement.
    • The model with the highest validation fitness within this sample is selected as the parent.
    • A new architecture, called the child, is constructed from the parent by the application of a transformation called a mutation. A mutation causes a simple and random modification of the architecture, e.g., change input or change operation.
    • Once the child architecture is constructed, it is then trained, evaluated, and added to the population. This process is called tournament selection
    • the oldest model is removed to make the candidate pool size under some constraint


MorphNet proposes a method of searching for more efficient network architecture under the budget of FLOP or model size. It iterates the following steps:

  • searching for the optimal parameters given the budget, which replies on the sparsity either on the weight (model size) or activation (FLOP)
  • find the width of network given the searched parameter
  • increase the width of the network until reach the budget.

As you could find this method increases the width of layers of network by a shared constant.

NAS-FPN: Learning Scalable Feature Pyramid Architecture for Object Detection

NAS-FPN searches for the optimal network for object detection. It specially focuses on searching cross-scale connections. The base network is from RetinaNetwork, which is backbone network with a feature pyramid network. Tne proposed method will search why pairs of feature will be combined via a RNN controller.

The RNN controller selects any two candidate feature layers and a binary operation to com- bine them into a new feature layer, where all feature layers may have different resolution. Each merging cell has 4 pre- diction steps made by distinct softmax classifiers:

  • select a feature layer
  • select the other feature layer
  • select output feature resolution
  • select a binary op (sum pooling or global pooling) to combine those two features

DetNAS: Neural Architecture Search on Object Detection

In this paper, we propose DetNAS to automatically search neural architectures for the backbones of object detectors. In DetNAS, the search space is formulated into a supernet and the search method relies on evolution algorithm (EA). In experiments, we show the effectiveness of DetNAS on various detectors, the one-stage detector, RetinaNet, and the two-stage detector, FPN.

Written on April 24, 2019