🌐 Language
English | 简体中文 | 繁體中文 | 日本語 | 한국어 | हिन्दी | ไทย | Français | Deutsch | Español | Italiano | Русский | Português | Nederlands | Polski | العربية | فارسی | Türkçe | Tiếng Việt | Bahasa Indonesia
# 🚀 No Time to Train! ### Training-Free Reference-Based Instance Segmentation [![GitHub](https://img.shields.io/badge/%E2%80%8B-No%20Time%20To%20Train-black?logo=github)](https://github.com/miquel-espinosa/no-time-to-train) [![Website](https://img.shields.io/badge/🌐-Project%20Page-grey)](https://miquel-espinosa.github.io/no-time-to-train/) [![arXiv](https://img.shields.io/badge/arXiv-2507.02798-b31b1b)](https://arxiv.org/abs/2507.02798) **State-of-the-art (Papers with Code)** [**_SOTA 1-shot_**](https://paperswithcode.com/sota/few-shot-object-detection-on-ms-coco-1-shot?p=no-time-to-train-training-free-reference) | [![PWC](https://img.shields.io/badge/State%20of%20the%20Art-Few--Shot%20Object%20Detection%20on%20MS--COCO%20(1--shot)-21CBCE?style=flat&logo=paperswithcode)](https://paperswithcode.com/sota/few-shot-object-detection-on-ms-coco-1-shot?p=no-time-to-train-training-free-reference) [**_SOTA 10-shot_**](https://paperswithcode.com/sota/few-shot-object-detection-on-ms-coco-10-shot?p=no-time-to-train-training-free-reference) | [![PWC](https://img.shields.io/badge/State%20of%20the%20Art-Few--Shot%20Object%20Detection%20on%20MS--COCO%20(10--shot)-21CBCE?style=flat&logo=paperswithcode)](https://paperswithcode.com/sota/few-shot-object-detection-on-ms-coco-10-shot?p=no-time-to-train-training-free-reference) [**_SOTA 30-shot_**](https://paperswithcode.com/sota/few-shot-object-detection-on-ms-coco-30-shot?p=no-time-to-train-training-free-reference) | [![PWC](https://img.shields.io/badge/State%20of%20the%20Art-Few--Shot%20Object%20Detection%20on%20MS--COCO%20(30--shot)-21CBCE?style=flat&logo=paperswithcode)](https://paperswithcode.com/sota/few-shot-object-detection-on-ms-coco-30-shot?p=no-time-to-train-training-free-reference)
--- > 🚨 **Update (22nd July 2025):** Instructions for custom datasets have been added! > > 🔔 **Update (16th July 2025):** Code has been updated with instructions! --- ## 📋 Table of Contents - [🎯 Highlights](#-highlights) - [📜 Abstract](#-abstract) - [🧠 Architecture](#-architecture) - [🛠️ Installation instructions](#️-installation-instructions) - [1. Clone the repository](#1-clone-the-repository) - [2. Create conda environment](#2-create-conda-environment) - [3. Install SAM2 and DinoV2](#3-install-sam2-and-dinov2) - [4. Download datasets](#4-download-datasets) - [5. Download SAM2 and DinoV2 checkpoints](#5-download-sam2-and-dinov2-checkpoints) - [📊 Inference code: Reproduce 30-shot SOTA results in Few-shot COCO](#-inference-code) - [0. Create reference set](#0-create-reference-set) - [1. Fill memory with references](#1-fill-memory-with-references) - [2. Post-process memory bank](#2-post-process-memory-bank) - [3. Inference on target images](#3-inference-on-target-images) - [Results](#results) - [🔍 Custom dataset](#-custom-dataset) - [0. Prepare a custom dataset ⛵🐦](#0-prepare-a-custom-dataset) - [0.1 If only bbox annotations are available](#01-if-only-bbox-annotations-are-available) - [0.2 Convert coco annotations to pickle file](#02-convert-coco-annotations-to-pickle-file) - [1. Fill memory with references](#1-fill-memory-with-references) - [2. Post-process memory bank](#2-post-process-memory-bank) - [📚 Citation](#-citation) ## 🎯 Highlights - 💡 **Training-Free**: No fine-tuning, no prompt engineering—just a reference image. - 🖼️ **Reference-Based**: Segment new objects using just a few examples. - 🔥 **SOTA Performance**: Outperforms previous training-free approaches on COCO, PASCAL VOC, and Cross-Domain FSOD. **Links:** - 🧾 [**arXiv Paper**](https://arxiv.org/abs/2507.02798) - 🌐 [**Project Website**](https://miquel-espinosa.github.io/no-time-to-train/) - 📈 [**Papers with Code**](https://paperswithcode.com/paper/no-time-to-train-training-free-reference) ## 📜 Abstract > The performance of image segmentation models has historically been constrained by the high cost of collecting large-scale annotated data. The Segment Anything Model (SAM) alleviates this original problem through a promptable, semantics-agnostic, segmentation paradigm and yet still requires manual visual-prompts or complex domain-dependent prompt-generation rules to process a new image. Towards reducing this new burden, our work investigates the task of object segmentation when provided with, alternatively, only a small set of reference images. Our key insight is to leverage strong semantic priors, as learned by foundation models, to identify corresponding regions between a reference and a target image. We find that correspondences enable automatic generation of instance-level segmentation masks for downstream tasks and instantiate our ideas via a multi-stage, training-free method incorporating (1) memory bank construction; (2) representation aggregation and (3) semantic-aware feature matching. Our experiments show significant improvements on segmentation metrics, leading to state-of-the-art performance on COCO FSOD (36.8% nAP), PASCAL VOC Few-Shot (71.2% nAP50) and outperforming existing training-free approaches on the Cross-Domain FSOD benchmark (22.4% nAP). ![cdfsod-results-final-comic-sans-min](https://github.com/user-attachments/assets/ab302c02-c080-4042-99fc-0e181ba8abb9) ## 🧠 Architecture ![training-free-architecture-comic-sans-min](https://github.com/user-attachments/assets/d84dd83a-505e-45a0-8ce3-98e1838017f9) ## 🛠️ Installation instructions ### 1. Clone the repository ```bash git clone https://github.com/miquel-espinosa/no-time-to-train.git cd no-time-to-train ``` ### 2. Create conda environment We will create a conda environment with the required packages. ```bash conda env create -f environment.yml conda activate no-time-to-train ``` ### 3. Install SAM2 and DinoV2 We will install SAM2 and DinoV2 from source. ```bash pip install -e . cd dinov2 pip install -e . cd .. ``` ### 4. Download datasets Please download COCO dataset and place it in `data/coco` ### 5. Download SAM2 and DinoV2 checkpoints We will download the exact SAM2 checkpoints used in the paper. (Note, however, that SAM2.1 checkpoints are already available and might perform better.) ```bash mkdir -p checkpoints/dinov2 cd checkpoints wget https://dl.fbaipublicfiles.com/segment_anything_2/072824/sam2_hiera_large.pt cd dinov2 wget https://dl.fbaipublicfiles.com/dinov2/dinov2_vitl14/dinov2_vitl14_pretrain.pth cd ../.. ``` ## 📊 Inference code ⚠️ Disclaimer: This is research code — expect a bit of chaos! ### Reproducing 30-shot SOTA results in Few-shot COCO Define useful variables and create a folder for results: ```bash CONFIG=./no_time_to_train/new_exps/coco_fewshot_10shot_Sam2L.yaml CLASS_SPLIT="few_shot_classes" RESULTS_DIR=work_dirs/few_shot_results SHOTS=30 SEED=33 GPUS=4 mkdir -p $RESULTS_DIR FILENAME=few_shot_${SHOTS}shot_seed${SEED}.pkl ``` #### 0. Create reference set ```bash python no_time_to_train/dataset/few_shot_sampling.py \ --n-shot $SHOTS \ --out-path ${RESULTS_DIR}/${FILENAME} \ --seed $SEED \ --dataset $CLASS_SPLIT ``` #### 1. Fill memory with references ```bash python run_lightening.py test --config $CONFIG \ --model.test_mode fill_memory \ --out_path ${RESULTS_DIR}/memory.ckpt \ --model.init_args.model_cfg.memory_bank_cfg.length $SHOTS \ --model.init_args.dataset_cfgs.fill_memory.memory_pkl ${RESULTS_DIR}/${FILENAME} \ --model.init_args.dataset_cfgs.fill_memory.memory_length $SHOTS \ --model.init_args.dataset_cfgs.fill_memory.class_split $CLASS_SPLIT \ --trainer.logger.save_dir ${RESULTS_DIR}/ \ --trainer.devices $GPUS ``` #### 2. Post-process memory bank ```bash python run_lightening.py test --config $CONFIG \ --model.test_mode postprocess_memory \ --model.init_args.model_cfg.memory_bank_cfg.length $SHOTS \ --ckpt_path ${RESULTS_DIR}/memory.ckpt \ --out_path ${RESULTS_DIR}/memory_postprocessed.ckpt \ --trainer.devices 1 ``` #### 3. Inference on target images ```bash python run_lightening.py test --config $CONFIG \ --ckpt_path ${RESULTS_DIR}/memory_postprocessed.ckpt \ --model.init_args.test_mode test \ --model.init_args.model_cfg.memory_bank_cfg.length $SHOTS \ --model.init_args.model_cfg.dataset_name $CLASS_SPLIT \ --model.init_args.dataset_cfgs.test.class_split $CLASS_SPLIT \ --trainer.logger.save_dir ${RESULTS_DIR}/ \ --trainer.devices $GPUS ``` If you'd like to see inference results online (as they are computed), add the argument: ```bash --model.init_args.model_cfg.test.online_vis True ``` To adjust the score threshold `score_thr` parameter, add the argument (for example, visualising all instances with score higher than `0.4`): ```bash --model.init_args.model_cfg.test.vis_thr 0.4 ``` Images will now be saved in `results_analysis/few_shot_classes/`. The image on the left shows the ground truth, the image on the right shows the segmented instances found by our training-free method. Note that in this example we are using the `few_shot_classes` split, thus, we should only expect to see segmented instances of the classes in this split (not all classes in COCO). #### Results After running all images in the validation set, you should obtain: ``` BBOX RESULTS: Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.368 SEGM RESULTS: Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.342 ``` --- ## 🔍 Custom dataset We provide the instructions for running our pipeline on a custom dataset. Annotation format are always in COCO format. > **TLDR;** To directly see how to run full pipeline on *custom datasets*, find `scripts/matching_cdfsod_pipeline.sh` together with example scripts of CD-FSOD datasets (e.g. `scripts/dior_fish.sh`) ### 0. Prepare a custom dataset ⛵🐦 Let's imagine we want to detect **boats**⛵ and **birds**🐦 in a custom dataset. To use our method we will need: - At least 1 *annotated* reference image for each class (i.e. 1 reference image for boat and 1 reference image for bird) - Multiple target images to find instances of our desired classes. We have prepared a toy script to create a custom dataset with coco images, for a **1-shot** setting. ```bash mkdir -p data/my_custom_dataset python scripts/make_custom_dataset.py ``` This will create a custom dataset with the following folder structure: ``` data/my_custom_dataset/ ├── annotations/ │ ├── custom_references.json │ ├── custom_targets.json │ └── references_visualisations/ │ ├── bird_1.jpg │ └── boat_1.jpg └── images/ ├── 429819.jpg ├── 101435.jpg └── (all target and reference images) ``` **Reference images visualisation (1-shot):** | 1-shot Reference Image for BIRD 🐦 | 1-shot Reference Image for BOAT ⛵ | |:---------------------------------:|:----------------------------------:| | bird_1 | boat_1 | ### 0.1 If only bbox annotations are available We also provide a script to generate instance-level segmentation masks by using SAM2. This is useful if you only have bounding box annotations available for the reference images. ```bash # Download sam_h checkpoint. Feel free to use more recent checkpoints (note: code might need to be adapted) wget https://dl.fbaipublicfiles.com/segment_anything/sam_vit_h_4b8939.pth -O checkpoints/sam_vit_h_4b8939.pth # Run automatic instance segmentation from ground truth bounding boxes. python no_time_to_train/dataset/sam_bbox_to_segm_batch.py \ --input_json data/my_custom_dataset/annotations/custom_references.json \ --image_dir data/my_custom_dataset/images \ --sam_checkpoint checkpoints/sam_vit_h_4b8939.pth \ --model_type vit_h \ --device cuda \ --batch_size 8 \ --visualize ``` **Reference images with instance-level segmentation masks (generated by SAM2 from gt bounding boxes, 1-shot):** Visualisation of the generated segmentation masks are saved in `data/my_custom_dataset/annotations/custom_references_with_SAM_segm/references_visualisations/`. | 1-shot Reference Image for BIRD 🐦 (automatically segmented with SAM) | 1-shot Reference Image for BOAT ⛵ (automatically segmented with SAM) | |:---------------------------------:|:----------------------------------:| | bird_1_with_SAM_segm | boat_1_with_SAM_segm | ### 0.2 Convert coco annotations to pickle file ```bash python no_time_to_train/dataset/coco_to_pkl.py \ data/my_custom_dataset/annotations/custom_references_with_segm.json \ data/my_custom_dataset/annotations/custom_references_with_segm.pkl \ 1 ``` ### 1. Fill memory with references First, define useful variables and create a folder for results. For correct visualisation of labels, class names should be ordered by category id as appears in the json file. E.g. `bird` has category id `16`, `boat` has category id `9`. Thus, `CAT_NAMES=boat,bird`. ```bash DATASET_NAME=my_custom_dataset DATASET_PATH=data/my_custom_dataset CAT_NAMES=boat,bird CATEGORY_NUM=2 SHOT=1 YAML_PATH=no_time_to_train/pl_configs/matching_cdfsod_template.yaml PATH_TO_SAVE_CKPTS=./tmp_ckpts/my_custom_dataset mkdir -p $PATH_TO_SAVE_CKPTS ``` Run step 1: ```bash python run_lightening.py test --config $YAML_PATH \ --model.test_mode fill_memory \ --out_path $PATH_TO_SAVE_CKPTS/$DATASET_NAME\_$SHOT\_refs_memory.pth \ --model.init_args.dataset_cfgs.fill_memory.root $DATASET_PATH/images \ --model.init_args.dataset_cfgs.fill_memory.json_file $DATASET_PATH/annotations/custom_references_with_segm.json \ --model.init_args.dataset_cfgs.fill_memory.memory_pkl $DATASET_PATH/annotations/custom_references_with_segm.pkl \ --model.init_args.dataset_cfgs.fill_memory.memory_length $SHOT \ --model.init_args.dataset_cfgs.fill_memory.cat_names $CAT_NAMES \ --model.init_args.model_cfg.dataset_name $DATASET_NAME \ --model.init_args.model_cfg.memory_bank_cfg.length $SHOT \ --model.init_args.model_cfg.memory_bank_cfg.category_num $CATEGORY_NUM \ --trainer.devices 1 ``` ### 2. Post-process memory bank ```bash python run_lightening.py test --config $YAML_PATH \ --model.test_mode postprocess_memory \ --ckpt_path $PATH_TO_SAVE_CKPTS/$DATASET_NAME\_$SHOT\_refs_memory.pth \ --out_path $PATH_TO_SAVE_CKPTS/$DATASET_NAME\_$SHOT\_refs_memory_postprocessed.pth \ --model.init_args.model_cfg.dataset_name $DATASET_NAME \ --model.init_args.model_cfg.memory_bank_cfg.length $SHOT \ --model.init_args.model_cfg.memory_bank_cfg.category_num $CATEGORY_NUM \ --trainer.devices 1 ``` #### 2.1 Visualise post-processed memory bank ```bash python run_lightening.py test --config $YAML_PATH \ --model.test_mode vis_memory \ --ckpt_path $PATH_TO_SAVE_CKPTS/$DATASET_NAME\_$SHOT\_refs_memory_postprocessed.pth \ --model.init_args.dataset_cfgs.fill_memory.root $DATASET_PATH/images \ --model.init_args.dataset_cfgs.fill_memory.json_file $DATASET_PATH/annotations/custom_references_with_segm.json \ --model.init_args.dataset_cfgs.fill_memory.memory_pkl $DATASET_PATH/annotations/custom_references_with_segm.pkl \ --model.init_args.dataset_cfgs.fill_memory.memory_length $SHOT \ --model.init_args.dataset_cfgs.fill_memory.cat_names $CAT_NAMES \ --model.init_args.model_cfg.dataset_name $DATASET_NAME \ --model.init_args.model_cfg.memory_bank_cfg.length $SHOT \ --model.init_args.model_cfg.memory_bank_cfg.category_num $CATEGORY_NUM \ --trainer.devices 1 ``` PCA and K-means visualisations for the memory bank images are stored in `results_analysis/memory_vis/my_custom_dataset`. ### 3. Inference on target images If `ONLINE_VIS` is set to True, prediction results will be saved in `results_analysis/my_custom_dataset/` and displayed as they are computed. NOTE that running with online visualisation is much slower. Feel free to change the score threshold `VIS_THR` to see more or less segmented instances. ```bash ONLINE_VIS=True VIS_THR=0.4 python run_lightening.py test --config $YAML_PATH \ --model.test_mode test \ --ckpt_path $PATH_TO_SAVE_CKPTS/$DATASET_NAME\_$SHOT\_refs_memory_postprocessed.pth \ --model.init_args.model_cfg.dataset_name $DATASET_NAME \ --model.init_args.model_cfg.memory_bank_cfg.length $SHOT \ --model.init_args.model_cfg.memory_bank_cfg.category_num $CATEGORY_NUM \ --model.init_args.model_cfg.test.imgs_path $DATASET_PATH/images \ --model.init_args.model_cfg.test.online_vis $ONLINE_VIS \ --model.init_args.model_cfg.test.vis_thr $VIS_THR \ --model.init_args.dataset_cfgs.test.root $DATASET_PATH/images \ --model.init_args.dataset_cfgs.test.json_file $DATASET_PATH/annotations/custom_targets.json \ --model.init_args.dataset_cfgs.test.cat_names $CAT_NAMES \ --trainer.devices 1 ``` ### Results Performance metrics (with the exact same parameters as commands above) should be: ``` BBOX RESULTS: Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.478 SEGM RESULTS: Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.458 ``` Visual results are saved in `results_analysis/my_custom_dataset/`. Note that our method works for false negatives, that is, images that do not contain any instances of the desired classes. *Click images to enlarge ⬇️* | Target image with boats ⛵ (left GT, right predictions) | Target image with birds 🐦 (left GT, right predictions) | |:----------------------:|:----------------------:| | ![000000459673](https://github.com/user-attachments/assets/678dc15a-dd3b-49d5-9287-6290da16aa6b) | ![000000407180](https://github.com/user-attachments/assets/fe306e48-af49-4d83-ac82-76fac6c456d1) | | Target image with boats and birds ⛵🐦 (left GT, right predictions) | Target image without boats or birds 🚫 (left GT, right predictions) | |:---------------------------------:|:----------------------------------:| | ![000000517410](https://github.com/user-attachments/assets/9849b227-7f43-43d7-81ea-58010a623ad5) | ![000000460598](https://github.com/user-attachments/assets/7587700c-e09d-4cf6-8590-3df129c2568e) | ## 📚 Citation If you use this work, please cite us: ```bibtex @article{espinosa2025notimetotrain, title={No time to train! Training-Free Reference-Based Instance Segmentation}, author={Miguel Espinosa and Chenhongyi Yang and Linus Ericsson and Steven McDonagh and Elliot J. Crowley}, journal={arXiv preprint arXiv:2507.02798}, year={2025}, primaryclass={cs.CV} } ```