---
name: gpu-quality-priority
description: "KINTSUGI processing principles: Never sacrifice quality for speed, always use GPU when available. Trigger: performance optimization, CPU/GPU choice, fast mode, quality vs speed."
author: KINTSUGI Team
date: 2025-12-14
---

# GPU-Only and Quality-First Processing Principles

## Experiment Overview
| Item | Details |
|------|---------|
| **Date** | 2025-12-14 |
| **Goal** | Establish processing principles for KINTSUGI batch processing |
| **Environment** | HiPerGator, multi-GPU (NVIDIA), CuPy, KINTSUGI pipeline |
| **Status** | Policy Established |

## Context

During performance optimization of Notebook 2 (Cycle Processing), a "fast mode" was proposed that would reduce BaSiC iteration parameters to speed up processing. The user explicitly rejected this approach, establishing core principles for KINTSUGI processing.

**Scientific imaging requires quality-first processing.** Unlike consumer applications where "good enough" may be acceptable, multiplex immunofluorescence analysis depends on accurate quantification. Quality degradation compounds through the pipeline: illumination correction errors affect stitching, which affects deconvolution, which affects segmentation, which affects all downstream analysis.

## Core Principles

### 1. NEVER Sacrifice Quality for Speed

Quality parameters must remain at their scientifically-validated defaults unless the quality impact is **negligible** (verified, not assumed).

```python
# CORRECT: Quality parameters (do not reduce)
BASIC_IF_DARKFIELD = True
BASIC_MAX_ITERATIONS = 500
BASIC_OPTIMIZATION_TOLERANCE = 1e-6
BASIC_MAX_REWEIGHT_ITERATIONS = 25
BASIC_REWEIGHT_TOLERANCE = 1e-3
```

### 2. ALWAYS Use GPU When Available - No CPU Fallback

If a GPU is available, it must be used. CPU fallback options should be disabled or removed.

```python
# CORRECT: GPU enforcement
if not USE_GPU:
    raise RuntimeError(
        "GPU not available but required for processing.\n"
        "Check GPU status with: from kintsugi.gpu import get_gpu_manager; "
        "print(get_gpu_manager().summary())"
    )
use_gpu = True  # Always True - GPU required
```

### 3. Remove CPU Options When GPU Exists

Don't provide `use_cpu` or `use_gpu=False` options. If the system has a GPU, use it.

```python
# WRONG: Providing CPU option
def process(use_gpu=True):  # Allows use_gpu=False
    ...

# CORRECT: GPU-only
def process(device_id=None):  # GPU assumed, only device selection
    if device_id is None:
        device_id = GPU_DEVICE_IDS[0]
    ...
```

## Failed Attempts (Critical)

| Attempt | Why it Failed | Lesson Learned |
|---------|---------------|----------------|
| Added `BASIC_FAST_MODE` with reduced iterations (200/10) | User rejected - quality is non-negotiable | Never propose quality/speed tradeoffs without explicit request |
| Added `use_gpu_basic=True/False` parameter | Creates temptation to use CPU | Remove CPU options entirely when GPU is available |
| Proposed "fast mode for testing" | Testing should use production parameters | If testing finds issues, they should be found with real parameters |
| Suggested relaxed tolerances (1e-5, 1e-2) | Even "slightly" relaxed tolerances compound errors | Keep validated parameters exactly as specified |

## Acceptable Optimizations

These optimizations improve speed WITHOUT sacrificing quality:

| Optimization | Impact | Safe? |
|-------------|--------|-------|
| Parallel image loading (ThreadPoolExecutor) | 10-20x faster I/O | YES - same data, faster loading |
| Parallel image resizing | 10-20x faster preprocessing | YES - same resize algorithm |
| GPU-accelerated computation | 10-50x faster | YES - same algorithm, faster hardware |
| Multi-GPU parallelism | Linear scaling | YES - same computation, more hardware |
| Optimized DCT (dctn vs sequential dct) | 2-3x faster | YES - mathematically equivalent |
| Power iteration for SVD | 10x faster | YES - sufficient for top singular value |

## Key Insights

- **Quality is non-negotiable** - Scientific imaging requires accurate quantification
- **Speed comes from better hardware, not shortcuts** - Invest in GPUs, not reduced iterations
- **Errors compound** - A 5% error in illumination correction becomes 10%+ by segmentation
- **"Fast mode for testing" is a trap** - Test with production parameters or you'll miss production issues
- **CPU fallback is never needed** - If no GPU, the user should know immediately, not get silent degradation

## Implementation Pattern

```python
# GPU enforcement at module level
if not USE_GPU:
    raise RuntimeError("GPU required for KINTSUGI processing")

# Function signatures - no CPU options
def process_zplane(
    ...,
    device_id: int = None,  # Which GPU, not whether to use GPU
):
    """GPU is REQUIRED - no CPU fallback."""
    if device_id is None:
        device_id = GPU_DEVICE_IDS[0]

    # Use validated quality parameters
    corrector = KCorrectGPU(use_gpu=True, device_id=device_id)
    flatfield, darkfield = corrector.fit(
        images,
        max_iterations=500,           # Quality parameter - DO NOT REDUCE
        max_reweight_iterations=25,   # Quality parameter - DO NOT REDUCE
        optimization_tolerance=1e-6,  # Quality parameter - DO NOT REDUCE
    )
```

## References

- KINTSUGI Notebook 2: Cycle Processing
- BaSiC paper: Peng et al., Nature Communications 2017
- Skills Registry: `basic-caching-evaluation` (another quality-compromising approach that failed)