std::bodun::blog
PhD student at University of Texas at Austin 🤘. Doing systems for ML.
马上订阅 std::bodun::blog RSS 更新: https://www.bodunhu.com/blog/index.xml
Stichable Neural Networks
TLDR; the Stichable Neural Networks paper includes some interesting concepts. It allows the creation of multiple neural networks with varying complexity and performance trade-offs from a family of pretrained models.
Key Principles
- How to choose anchors from well-performed pretrained models in a model family
- The design of stitching layers
- The stitching direction and strategy
- Simple but effective training strateg
A key question about combining sub-networks from different pretrained models is how to maintain accuracy. The paper concludes that the final performance of these combinations is nearly predictable due to an interpolation-like performance curve between anchors. This predictability allows for selective pre-training of stitches based on various deployment scenarios.
The Choice of Anchors
Anchors that are pretrained on different tasks can learn very different representations due to the large distribution gap of different domains. Therefore, the selected anchors should be consistent in terms of the pretrained domain.
The Stitching Layer and its Initialization
SN-Net is built upon pretrained models. Therefore, the anchors have already learned good representations, which allows to directly obtain an accurate transformation matrix by solving the least squares problem:
$$||AM_o - B|| = min||AM - b||_F$$
where $A \in R^{N \times D_1}$ and \(B \in R^{N \times D_2}\)...
剩余内容已隐藏