Stichable Neural Networks

TLDR; the Stichable Neural Networks paper includes some interesting concepts. It allows the creation of multiple neural networks with varying complexity and performance trade-offs from a family of pretrained models.

Key Principles

How to choose anchors from well-performed pretrained models in a model family
The design of stitching layers
The stitching direction and strategy
Simple but effective training strateg

A key question about combining sub-networks from different pretrained models is how to maintain accuracy. The paper concludes that the final performance of these combinations is nearly predictable due to an interpolation-like performance curve between anchors. This predictability allows for selective pre-training of stitches based on various deployment scenarios.

The Choice of Anchors

Anchors that are pretrained on different tasks can learn very different representations due to the large distribution gap of different domains. Therefore, the selected anchors should be consistent in terms of the pretrained domain.

The Stitching Layer and its Initialization

SN-Net is built upon pretrained models. Therefore, the anchors have already learned good representations, which allows to directly obtain an accurate transformation matrix by solving the least squares problem:

$$||AM_o - B|| = min||AM - b||_F$$

where $A \in R^{N \times D_1}$ and $B \in R^{N \times D_2}$...