Few-shot Transfer Learning From Pre-Trained Networks
Abstract
The advent of Deep Learning algorithms for mobile devices and sensors has ushered in an unprecedented explosion in the availability and number of systems trained on a wide range of machine learning tasks, allowing for many new opportunities and challenges in the realm of transfer learning. Currently, most transfer learning methods require some kind of control over the systems learned, either by forcing constraints on the source training, or by having a joint optimization objective between tasks and requiring that all data be sent to a central source for training. However, in some cases, for practical or ethical reasons, we may have no control over the individual source task training or access to source training samples, and instead only have access to features pre-trained on the data in a black box setup. In this paper, we introduce the multi-source learning problem of training a classifier using an ensemble of pre-trained neural networks for a set of classes that have not been observed by any of the source networks, and for which we have very few training samples. We show that by using these distributed networks as feature extractors, we can train an effective classifier in a computationally-efficient manner on very few training samples using the theory of non-linear maximal correlation functions. We show that a maximal correlation objective can be used for weighting feature functions to build a classifier on the target task, and we propose the Maximal Correlation Weighting (MCW) method for building this classifier. Finally, we illustrate the effectiveness of this classifier on datasets derived from the CIFAR-100, Stanford Dogs, and Tiny ImageNet datasets, and explore the source prioritization performance on CIFAR-100.