Meta-data gives information about other data, therefore, meta-learning is learning about the learning. Neural Networks are based on the human brain, but they require an extensive dataset to achieve reasonable accuracy. Humans can learn from just one or two examples.
Meta-learning is best suitable for few-shot learning. Few-shot or k-shot learning is learning from a few (k) data points. With Meta-learning, models trained on similar datasets do not need to be retrained and still be helpful for a different dataset.
Let’s understand how it works with meta-learning.
The figures above explains how meta-learning differs from the classical way of learning in Neural Networks. Meta-learner plays the role of an optimizer. This role is played in different ways, and based on that, meta-learning is classified into the following types:
In this type of learning, the distance of extracted features of a class is calculated from the elements of another class. This provides information about how similar the two classes are. This is helpful when we want to reuse our model for a similar structured task.
In classical Neural Networks, weights are initialized randomly. In initialization learning, optimal or almost optimal weights are initialized such that the algorithm can learn in a few iterations.
In this type of meta-learning, we replace the optimizer function with a model that runs on another optimizer function. The optimized model passes the loss as inputs to the meta-learner, which is optimized by some optimizer function like gradient descent.
The following section looks at an algorithm that learns by gradient descent.
In the example above, the meta-learner has taken the place of an optimizer for the NN model. The loss calculated by the NN model is fed into the meta-learner. This can be unclear because the meta-learner is trained on gradient descent.
In place of the meta-learner, we require some model that can generate a sequence of outputs based on the previous outputs. We will use these outputs to calculate the new weights of the NN model. We choose Recurrent Neural Network (RNN) for this purpose. The RNN learns from the loss generated by the NN model and generates outputs, which the NN model uses to update its weights. That, in turn, helps train the NN model.
Most optimization algorithms converge using gradients. It requires a large number of data points. So gradient-based algorithms, like gradient descent, fail in few-shot learning. Meta-learning provides ease as it can help converge faster. They also help learn similar structured tasks with minimum effort.