# Three things everyone should know about Vision Transformers This repository contains PyTorch evaluation code, training code and pretrained models for the following projects: * [DeiT](README_deit.md) (Data-Efficient Image Transformers), ICML 2021 * [CaiT](README_cait.md) (Going deeper with Image Transformers), ICCV 2021 (Oral) * [ResMLP](README_resmlp.md) (ResMLP: Feedforward networks for image classification with data-efficient training) * [PatchConvnet](README_patchconvnet.md) (Augmenting Convolutional networks with attention-based aggregation) * 3Things (Three things everyone should know about Vision Transformers) * [DeiT III](README_revenge.md) (DeiT III: Revenge of the ViT) For details see [Three things everyone should know about Vision Transformers](https://arxiv.org/pdf/2203.09795) by Hugo Touvron, Matthieu Cord, Alaaeldin El-Nouby, Jakob Verbeek and Hervé Jégou. If you use this code for a paper please cite: ``` @article{Touvron2022ThreeTE, title={Three things everyone should know about Vision Transformers}, author={Hugo Touvron and Matthieu Cord and Alaaeldin El-Nouby and Jakob Verbeek and Herve Jegou}, journal={arXiv preprint arXiv:2203.09795}, year={2022}, } ``` ## Attention only fine-tuning We propose to finetune only the attentions (flag ```--attn-only```) to adapt the models to higher resolutions or to do transfer learning.

## MLP patch projection We propose to replace the linear patch projection by an MLP patch projection (see class [hMLP_stem](models_v2.py)). A key advantage is that this pre-processing stem is compatible with and improves mask-based self-supervised training like BeiT.

## Parallel blocks We propose to use block in parallele in order to have more flexible architectures (see class [Layer_scale_init_Block_paralx2](models_v2.py)):

# License This repository is released under the Apache 2.0 license as found in the [LICENSE](LICENSE) file. # Contributing We actively welcome your pull requests! Please see [CONTRIBUTING.md](.github/CONTRIBUTING.md) and [CODE_OF_CONDUCT.md](.github/CODE_OF_CONDUCT.md) for more info.