--- name: axiom-swift-performance description: Use when optimizing Swift code performance, reducing memory usage, improving runtime efficiency, dealing with COW, ARC overhead, generics specialization, or collection optimization license: MIT metadata: version: "1.2.0" --- # Swift Performance Optimization ## Purpose **Core Principle**: Optimize Swift code by understanding language-level performance characteristics—value semantics, ARC behavior, generic specialization, and memory layout—to write fast, efficient code without premature micro-optimization. **Swift Version**: Swift 6.2+ (for InlineArray, Span, `@concurrent`) **Xcode**: 16+ **Platforms**: iOS 18+, macOS 15+ **Related Skills**: - `axiom-performance-profiling` — Use Instruments to measure (do this first!) - `axiom-swiftui-performance` — SwiftUI-specific optimizations - `axiom-build-performance` — Compilation speed - `axiom-swift-concurrency` — Correctness-focused concurrency patterns ## When to Use This Skill ### ✅ Use this skill when - App profiling shows Swift code as the bottleneck (Time Profiler hotspots) - Excessive memory allocations or retain/release traffic - Implementing performance-critical algorithms or data structures - Writing framework or library code with performance requirements - Optimizing tight loops or frequently called methods - Dealing with large data structures or collections - Code review identifying performance anti-patterns ## Quick Decision Tree ``` Performance issue identified? │ ├─ Profiler shows excessive copying? │ └─ → Part 1: Noncopyable Types │ └─ → Part 2: Copy-on-Write │ ├─ Retain/release overhead in Time Profiler? │ └─ → Part 4: ARC Optimization │ ├─ Generic code in hot path? │ └─ → Part 5: Generics & Specialization │ ├─ Collection operations slow? │ └─ → Part 7: Collection Performance │ ├─ Async/await overhead visible? │ └─ → Part 8: Concurrency Performance │ ├─ Struct vs class decision? │ └─ → Part 3: Value vs Reference │ └─ Memory layout concerns? └─ → Part 9: Memory Layout ``` --- ## The Four Principles of Swift Performance From WWDC 2024-10217: Swift's low-level performance characteristics come down to four areas. Each maps to a Part in this skill. | Principle | What It Costs | Skill Coverage | |-----------|--------------|----------------| | **Function Calls** | Dispatch overhead, optimization barriers | Part 5 (Generics), Part 6 (Inlining) | | **Memory Allocation** | Stack vs heap, allocation frequency | Part 3 (Value vs Reference), Part 7 (Collections) | | **Memory Layout** | Cache locality, padding, contiguity | Part 9 (Memory Layout), Part 11 (Span) | | **Value Copying** | COW triggers, defensive copies, ARC traffic | Part 1 (Noncopyable), Part 2 (COW), Part 4 (ARC) | Understanding which principle is causing your bottleneck determines which Part to use. --- ## Part 1: Noncopyable Types (~Copyable) **Swift 6.0+** introduces noncopyable types for performance-critical scenarios where you want to avoid implicit copies. ### When to Use - Large types that should never be copied (file handles, GPU buffers) - Types with ownership semantics (must be explicitly consumed) - Performance-critical code where copies are expensive ### Basic Pattern ```swift // Noncopyable type struct FileHandle: ~Copyable { private let fd: Int32 init(path: String) throws { self.fd = open(path, O_RDONLY) guard fd != -1 else { throw FileError.openFailed } } deinit { close(fd) } // Must explicitly consume consuming func close() { _ = consume self } } // Usage func processFile() throws { let handle = try FileHandle(path: "/data.txt") // handle is automatically consumed at end of scope // Cannot accidentally copy handle } ``` ### Ownership Annotations ```swift // consuming - takes ownership, caller cannot use after func process(consuming data: [UInt8]) { // data is consumed } // borrowing - temporary access without ownership func validate(borrowing data: [UInt8]) -> Bool { // data can still be used by caller return data.count > 0 } // inout - mutable access func modify(inout data: [UInt8]) { data.append(0) } ``` ### Performance Impact - **Eliminates implicit copies**: Compiler error instead of runtime copy - **Zero-cost abstraction**: Same performance as manual memory management - **Use when**: Type is expensive to copy (>64 bytes) and copies are rare --- ## Part 2: Copy-on-Write (COW) Swift collections use COW for efficient memory sharing. Understanding when copies happen is critical for performance. ### How COW Works ```swift var array1 = [1, 2, 3] // Single allocation var array2 = array1 // Share storage (no copy) array2.append(4) // Now copies (array1 modified array2) ``` For custom COW implementation, see Copy-Paste Pattern 1 (COW Wrapper) below. ### Performance Tips ```swift // ❌ Accidental copy in loop for i in 0.. 64 bytes or contains large data | | **Identity** | No identity needed | Needs identity (===) | | **Inheritance** | Not needed | Inheritance required | | **Mutation** | Infrequent | Frequent in-place updates | | **Sharing** | No sharing needed | Must be shared across scope | ### Small Structs (Fast) ```swift // ✅ Fast - fits in registers, no heap allocation struct Point { var x: Double // 8 bytes var y: Double // 8 bytes } // Total: 16 bytes - excellent for struct struct Color { var r, g, b, a: UInt8 // 4 bytes total - perfect for struct } ``` ### Large Structs (Slow) ```swift // ❌ Slow - excessive copying struct HugeData { var buffer: [UInt8] // 1MB var metadata: String } func process(_ data: HugeData) { // Copies 1MB! // ... } // ✅ Use reference semantics for large data final class HugeData { var buffer: [UInt8] var metadata: String } func process(_ data: HugeData) { // Only copies pointer (8 bytes) // ... } ``` ### Indirect Storage for Flexibility For large data that needs value semantics externally with reference storage internally, use the COW Wrapper pattern — see Copy-Paste Pattern 1 below. --- ## Part 4: ARC Optimization Automatic Reference Counting adds overhead. Minimize it where possible. ### Weak vs Unowned Performance ```swift class Parent { var child: Child? } class Child { // ❌ Weak adds overhead (optional, thread-safe zeroing) weak var parent: Parent? } // ✅ Unowned when you know lifetime guarantees class Child { unowned let parent: Parent // No overhead, crashes if parent deallocated } ``` **Performance**: `unowned` is ~2x faster than `weak` (no atomic operations). **Use when**: Child lifetime < Parent lifetime (guaranteed). ### Closure Capture Optimization ```swift class DataProcessor { var data: [Int] // ❌ Captures self strongly, then uses weak - unnecessary weak overhead func process(completion: @escaping () -> Void) { DispatchQueue.global().async { [weak self] in guard let self else { return } self.data.forEach { print($0) } completion() } } // ✅ Capture only what you need func process(completion: @escaping () -> Void) { let data = self.data // Copy value type DispatchQueue.global().async { data.forEach { print($0) } // No self captured completion() } } } ``` ### Closure Capture Costs From WWDC 2024-10217: Closures have different performance profiles depending on whether they escape. ```swift // Non-escaping closure — stack-allocated context, zero ARC overhead func processItems(_ items: [Item], using transform: (Item) -> Result) -> [Result] { items.map(transform) // Closure context lives on stack } // Escaping closure — heap-allocated context, ARC on every captured reference func processItemsLater(_ items: [Item], transform: @escaping (Item) -> Result) { // Closure context heap-allocated as anonymous class instance // Each captured reference gets retain/release self.pending = { items.map(transform) } } ``` **Why this matters**: `@Sendable` closures are always escaping, meaning every Task closure heap-allocates its capture context. **In hot paths**: Prefer non-escaping closures. If you see `swift_allocObject` in Time Profiler for closure contexts, look for escaping closures that could be non-escaping. ### Observable Object Lifetimes **From WWDC 2021-10216**: Object lifetimes end at **last use**, not at closing brace. ```swift // ❌ Relying on observed lifetime is fragile class Traveler { weak var account: Account? deinit { print("Deinitialized") // May run BEFORE expected with ARC optimizations! } } func test() { let traveler = Traveler() let account = Account(traveler: traveler) // traveler's last use is above - may deallocate here! account.printSummary() // weak reference may be nil! } // ✅ Explicitly extend lifetime when needed func test() { let traveler = Traveler() let account = Account(traveler: traveler) withExtendedLifetime(traveler) { account.printSummary() // traveler guaranteed to live } } ``` Object lifetimes can change between Xcode versions, Debug vs Release, and unrelated code changes. Enable "Optimize Object Lifetimes" (Xcode 13+) during development to expose hidden lifetime bugs early. --- ## Part 5: Generics & Specialization Generic code can be fast or slow depending on specialization. ### Specialization Basics ```swift // Generic function func process(_ value: T) { print(value) } // Calling with concrete type process(42) // Compiler specializes: process_Int(42) process("hello") // Compiler specializes: process_String("hello") ``` ### Existential Overhead ```swift protocol Drawable { func draw() } // ❌ Existential container - expensive (heap allocation, indirection) func drawAll(shapes: [any Drawable]) { for shape in shapes { shape.draw() // Dynamic dispatch through witness table } } // ✅ Generic with constraint - can specialize func drawAll(shapes: [T]) { for shape in shapes { shape.draw() // Static dispatch after specialization } } ``` **Performance**: Generic version ~10x faster (eliminates witness table overhead). ### Existential Container Overhead **From WWDC 2016-416**: `any Protocol` uses a 40-byte existential container (5 words on 64-bit). The container stores type metadata + protocol witness table (16 bytes) plus a 24-byte inline value buffer. Types ≤24 bytes are stored directly in the buffer (fast, ~5ns access); larger types require a heap allocation with pointer indirection (slower, ~15ns). `some Protocol` eliminates all container overhead (~2ns). **When `some` isn't available** (heterogeneous collections require `any`): - **Reduce type sizes to ≤24 bytes** — keep protocol-conforming types small enough for inline storage (3 words: e.g., `Point { x, y, z: Double }` fits exactly) - **Use enum dispatch instead** — eliminates containers entirely, trades open extensibility for performance: ```swift // ❌ Existential: 40 bytes/element, witness table dispatch let shapes: [any Drawable] = [circle, rect] // ✅ Enum: value-sized, static dispatch via switch enum Shape { case circle(Circle), rect(Rect) } func draw(_ shape: Shape) { switch shape { case .circle(let c): c.draw() case .rect(let r): r.draw() } } ``` - **Batch operations** — amortize per-element existential overhead by processing in chunks rather than one-at-a-time - **Measure first** — existential overhead (~10ns/access) only matters in tight loops; for UI-level code it's negligible ### `@_specialize` Attribute Force specialization for common types when the compiler doesn't do it automatically: ```swift @_specialize(where T == Int) @_specialize(where T == String) func process(_ value: T) -> T { value } // Generates specialized versions + generic fallback ``` --- ## Part 6: Inlining Inlining eliminates function call overhead but increases code size. ### When to Inline ```swift // ✅ Small, frequently called functions @inlinable public func fastAdd(_ a: Int, _ b: Int) -> Int { return a + b } // ❌ Large functions - code bloat @inlinable // Don't do this! public func complexAlgorithm() { // 100 lines of code... } ``` ### Cross-Module Optimization ```swift // Framework code public struct Point { public var x: Double public var y: Double // ✅ Inlinable for cross-module optimization @inlinable public func distance(to other: Point) -> Double { let dx = x - other.x let dy = y - other.y return sqrt(dx*dx + dy*dy) } } // Client code let p1 = Point(x: 0, y: 0) let p2 = Point(x: 3, y: 4) let d = p1.distance(to: p2) // Inlined across module boundary ``` ### `@usableFromInline` ```swift // Internal helper that can be inlined @usableFromInline internal func helperFunction() { } // Public API that uses it @inlinable public func publicAPI() { helperFunction() // Can inline internal function } ``` **Trade-off**: `@inlinable` exposes implementation, prevents future optimization. --- ## Part 7: Collection Performance Choosing the right collection and using it correctly matters. ### Array vs ContiguousArray ```swift // ❌ Array - may use NSArray bridging (Swift/ObjC interop) let array: Array = [1, 2, 3] // ✅ ContiguousArray - guaranteed contiguous memory (no bridging) let array: ContiguousArray = [1, 2, 3] ``` **Use `ContiguousArray` when**: No ObjC bridging needed (pure Swift), ~15% faster. ### Reserve Capacity ```swift // ❌ Multiple reallocations var array: [Int] = [] for i in 0..<10000 { array.append(i) // Reallocates ~14 times } // ✅ Single allocation var array: [Int] = [] array.reserveCapacity(10000) for i in 0..<10000 { array.append(i) // No reallocations } ``` ### Dictionary Hashing ```swift struct BadKey: Hashable { var data: [Int] // ❌ Expensive hash (iterates entire array) func hash(into hasher: inout Hasher) { for element in data { hasher.combine(element) } } } struct GoodKey: Hashable { var id: UUID // Fast hash var data: [Int] // Not hashed // ✅ Hash only the unique identifier func hash(into hasher: inout Hasher) { hasher.combine(id) } } ``` ### InlineArray (Swift 6.2) Fixed-size arrays stored directly on the stack—no heap allocation, no COW overhead. Uses value generics to encode size in the type. ```swift // Traditional Array - heap allocated, COW overhead var sprites: [Sprite] = Array(repeating: .default, count: 40) // InlineArray - stack allocated, no COW (value generic syntax) var sprites = InlineArray<40, Sprite>(repeating: .default) ``` **Conformances**: `RandomAccessCollection`, `MutableCollection`, `BitwiseCopyable`, `Sendable`. Supports `~Copyable` element types. **When to Use InlineArray**: - Fixed size known at compile time - Performance-critical paths (tight loops, hot paths) - Want to avoid heap allocation entirely - Small to medium sizes (practical limit ~1KB stack usage) InlineArray is stack-allocated (no heap), eagerly copied (not COW), and provides `.span`/`.mutableSpan` for zero-copy access. Measure your own benchmarks for allocation/copy/mutation trade-offs vs Array. **Copy Semantics Warning**: ```swift // ❌ Unexpected: InlineArray copies eagerly func processLarge(_ data: InlineArray<1000, UInt8>) { // Copies all 1000 bytes on call! } // ✅ Use Span to avoid copy func processLarge(_ data: Span) { // Zero-copy view, no matter the size } // Best practice: Store InlineArray, pass Span struct Buffer { var storage = InlineArray<1000, UInt8>(repeating: 0) func process() { helper(storage.span) // Pass view, not copy } } ``` **When NOT to Use InlineArray**: - Dynamic sizes (use Array) - Large data (>1KB stack usage risky) - Frequently passed by value (use Span instead) - Need COW semantics (use Array) ### Lazy Sequences ```swift // ❌ Eager evaluation - processes entire array let result = array .map { expensive($0) } .filter { $0 > 0 } .first // Only need first element! // ✅ Lazy evaluation - stops at first match let result = array .lazy .map { expensive($0) } .filter { $0 > 0 } .first // Only evaluates until first match ``` --- ## Part 8: Concurrency Performance Async/await and actors add overhead. Use appropriately. ### Actor Isolation Overhead ```swift actor Counter { private var value = 0 // ❌ Actor call overhead for simple operation func increment() { value += 1 } } // Calling from different isolation domain for _ in 0..<10000 { await counter.increment() // 10,000 actor hops! } // ✅ Batch operations to reduce actor overhead actor Counter { private var value = 0 func incrementBatch(_ count: Int) { value += count } } await counter.incrementBatch(10000) // Single actor hop ``` ### Async Overhead Each async suspension costs ~20-30μs. Keep synchronous operations synchronous—don't mark a function `async` if it doesn't need to await. ### Task Creation Cost ```swift // ❌ Creating task per item (~100μs overhead each) for item in items { Task { await process(item) } } // ✅ Single task for batch Task { for item in items { await process(item) } } // ✅ Or use TaskGroup for parallelism await withTaskGroup(of: Void.self) { group in for item in items { group.addTask { await process(item) } } } ``` ### `@concurrent` Attribute (Swift 6.2) ```swift // Force background execution @concurrent func expensiveComputation() -> Int { // Always runs on background thread, even if called from MainActor return complexCalculation() } // Safe to call from main actor without blocking @MainActor func updateUI() async { let result = await expensiveComputation() // Guaranteed off main thread label.text = "\(result)" } ``` For `nonisolated` performance patterns and detailed actor isolation guidance, see `axiom-swift-concurrency`. --- ## Part 9: Memory Layout Understanding memory layout helps optimize cache performance and reduce allocations. ### Struct Padding ```swift // ❌ Poor layout (24 bytes due to padding) struct BadLayout { var a: Bool // 1 byte + 7 padding var b: Int64 // 8 bytes var c: Bool // 1 byte + 7 padding } print(MemoryLayout.size) // 24 bytes // ✅ Optimized layout (16 bytes) struct GoodLayout { var b: Int64 // 8 bytes var a: Bool // 1 byte var c: Bool // 1 byte + 6 padding } print(MemoryLayout.size) // 16 bytes ``` ### Alignment ```swift // Query alignment print(MemoryLayout.alignment) // 8 print(MemoryLayout.alignment) // 4 // Structs align to largest member struct Mixed { var int32: Int32 // 4 bytes, 4-byte aligned var double: Double // 8 bytes, 8-byte aligned } print(MemoryLayout.alignment) // 8 (largest member) ``` ### Cache-Friendly Data Structures ```swift // ❌ Poor cache locality struct PointerBased { var next: UnsafeMutablePointer? // Pointer chasing } // ✅ Array-based for cache locality struct ArrayBased { var data: ContiguousArray // Contiguous memory } // Array iteration ~10x faster due to cache prefetching ``` ### Exclusivity Checks From WWDC 2025-312: Runtime exclusivity enforcement (`swift_beginAccess`/`swift_endAccess`) appears in Time Profiler when the compiler cannot prove memory safety statically. **What they are**: Swift enforces that no two accesses to the same variable overlap if one is a write. For struct properties, this is checked at compile time. For class stored properties, runtime checks are inserted. **How to identify**: Look for `swift_beginAccess` and `swift_endAccess` in Time Profiler or Processor Trace flame graphs. ```swift // ❌ Class properties require runtime exclusivity checks class Parser { var state: ParserState var cache: [Int: Pixel] func parse() { state.advance() // swift_beginAccess / swift_endAccess cache[key] = pixel // swift_beginAccess / swift_endAccess } } // ✅ Struct properties checked at compile time — zero runtime cost struct Parser { var state: ParserState var cache: InlineArray<64, Pixel> mutating func parse() { state.advance() // No runtime check cache[key] = pixel // No runtime check } } ``` **Real-world impact**: In WWDC 2025-312's QOI image parser, moving properties from a class to a struct eliminated all runtime exclusivity checks, contributing to a measurable speedup as part of a >700x total improvement. --- ## Part 10: Typed Throws (Swift 6) Typed throws can be faster than untyped by avoiding existential overhead. ### Untyped vs Typed ```swift // Untyped - existential container for error func fetchData() throws -> Data { // Can throw any Error throw NetworkError.timeout } // Typed - concrete error type func fetchData() throws(NetworkError) -> Data { // Can only throw NetworkError throw NetworkError.timeout } ``` ### Performance Impact ```swift // Measure with tight loop func untypedThrows() throws -> Int { throw GenericError.failed } func typedThrows() throws(GenericError) -> Int { throw GenericError.failed } // Benchmark: typed ~5-10% faster (no existential overhead) ``` ### When to Use - **Typed**: Library code with well-defined error types, hot paths - **Untyped**: Application code, error types unknown at compile time --- ## Part 11: Span Types **Swift 6.2+** introduces Span—a non-escapable, non-owning view into memory that provides safe, efficient access to contiguous data. ### What is Span? Span is a modern replacement for `UnsafeBufferPointer` that provides: - **Spatial safety**: Bounds-checked operations prevent out-of-bounds access - **Temporal safety**: Lifetime inherited from source, preventing use-after-free - **Zero overhead**: No heap allocation, no reference counting - **Non-escapable**: Cannot outlive the data it references ```swift // Traditional unsafe approach func processUnsafe(_ data: UnsafeMutableBufferPointer) { data[100] = 0 // Crashes if out of bounds! } // Safe Span approach func processSafe(_ data: MutableSpan) { data[100] = 0 // Traps with clear error if out of bounds } ``` ### When to Use Span vs Array vs UnsafeBufferPointer | Use Case | Recommendation | |----------|---------------| | **Own the data** | Array (full ownership, COW) | | **Temporary view for reading** | Span (safe, fast) | | **Temporary view for writing** | MutableSpan (safe, fast) | | **C interop, performance-critical** | RawSpan (untyped bytes) | | **Unsafe performance** | UnsafeBufferPointer (legacy, avoid) | ### Basic Span Usage ```swift let array = [1, 2, 3, 4, 5] let span = array.span // Read-only view print(span[0]) // Subscript access for element in span { } // Safe iteration let slice = span[1..<3] // Span slice, no copy ``` ### MutableSpan for Modifications ```swift var array = [10, 20, 30, 40, 50] let mutableSpan = array.mutableSpan mutableSpan[0] = 100 // Modifies array in-place, bounds-checked ``` ### RawSpan for Untyped Bytes ```swift func parsePacket(_ data: RawSpan) -> PacketHeader? { guard data.count >= MemoryLayout.size else { return nil } // Safe byte-level access via subscript return PacketHeader(version: data[0], flags: data[1], length: UInt16(data[3]) << 8 | UInt16(data[2])) } let header = parsePacket(bytes.rawSpan) // .rawSpan on any [UInt8] ``` All Swift 6.2 collections provide `.span` and `.mutableSpan` properties, including `Array`, `ContiguousArray`, and `UnsafeBufferPointer` (migration path). Span access speed matches `UnsafeBufferPointer` (~2ns) with bounds checking. ### Non-Escapable Lifetime Safety Span's lifetime is bound to its source. The compiler prevents returning a Span from a function where the source would be deallocated — unlike `UnsafeBufferPointer`, which allows this bug silently. ```swift func dangerousSpan() -> Span { let array = [1, 2, 3] return array.span // ❌ Error: Cannot return non-escapable value } ``` InlineArray also provides `.span`/`.mutableSpan` — see Part 7 for InlineArray usage and copy-avoidance via Span. ### Migration from UnsafeBufferPointer ```swift // ❌ Old: unsafe, no bounds checking func parseLegacy(_ buffer: UnsafeBufferPointer) -> Header { Header(magic: buffer[0], version: buffer[1]) // Silent OOB crash } // ✅ New: safe, bounds-checked, same performance func parseModern(_ span: Span) -> Header { Header(magic: span[0], version: span[1]) // Traps on OOB } // Bridge: existing UnsafeBufferPointer → Span let span = buffer.span // Wrap unsafe in safe span parseModern(span) ``` ### OutputSpan — Safe Initialization OutputSpan/OutputRawSpan replace `UnsafeMutableBufferPointer` for initializing new collections without intermediate allocations. ```swift // Binary serialization: write header bytes safely @lifetime(&output) func writeHeader(to output: inout OutputRawSpan) { output.append(0x01) // version output.append(0x00) // flags output.append(UInt16(42)) // length (type-safe) } ``` Use for building byte arrays, binary serialization, image pixel data. Apple's open-source [Swift Binary Parsing](https://github.com/apple/swift-binary-parsing) library is built entirely on Span types. ### When NOT to Use Span - **Ownership**: Span can't be stored in structs/classes — use Array for owned data, provide `.span` access via computed property - **Return values**: Span is non-escapable — process in scope, return owned data - **Long-lived references**: Span lifetime is bound to source — use Array if data must outlive the current scope --- ## Copy-Paste Patterns ### Pattern 1: COW Wrapper ```swift final class Storage { var value: T init(_ value: T) { self.value = value } } struct COWWrapper { private var storage: Storage init(_ value: T) { storage = Storage(value) } var value: T { get { storage.value } set { if !isKnownUniquelyReferenced(&storage) { storage = Storage(newValue) } else { storage.value = newValue } } } } ``` ### Pattern 2: Performance-Critical Loop ```swift func processLargeArray(_ input: [Int]) -> [Int] { var result = ContiguousArray() result.reserveCapacity(input.count) for element in input { result.append(transform(element)) } return Array(result) } ``` ### Pattern 3: Inline Cache Lookup ```swift private var cache: [Key: Value] = [:] @inlinable func getCached(_ key: Key) -> Value? { return cache[key] // Inlined across modules } ``` --- ## Anti-Patterns | Anti-Pattern | Problem | Fix | |---|---|---| | **Premature optimization** | Complex COW/ContiguousArray with no profiling data | Start simple, profile, optimize what matters | | **Weak everywhere** | `weak` on every delegate (atomic overhead) | Use `unowned` when lifetime is guaranteed (see Part 4) | | **Actor for everything** | Actor isolation on simple counters (~100μs/call) | Use lock-free atomics (`ManagedAtomic`) for simple sync data | --- ## Code Review Checklist ### Memory Management - [ ] Large structs (>64 bytes) use indirect storage or are classes - [ ] COW types use `isKnownUniquelyReferenced` before mutation - [ ] Collections use `reserveCapacity` when size is known - [ ] Weak references only where needed (prefer unowned when safe) ### Generics - [ ] Protocol types use `some` instead of `any` where possible - [ ] Hot paths use concrete types or `@_specialize` - [ ] Generic constraints are as specific as possible ### Collections - [ ] Pure Swift code uses `ContiguousArray` over `Array` - [ ] Dictionary keys have efficient `hash(into:)` implementations - [ ] Lazy evaluation used for short-circuit operations ### Concurrency - [ ] Synchronous operations don't use `async` - [ ] Actor calls are batched when possible - [ ] Task creation is minimized (use TaskGroup) - [ ] CPU-intensive work uses `@concurrent` (Swift 6.2) ### Optimization - [ ] Profiling data exists before optimization - [ ] Inlining only for small, frequently called functions - [ ] Memory layout optimized for cache locality (large structs) --- ## Pressure Scenarios ### Scenario 1: "Just make it faster, we ship tomorrow" **The Pressure**: Manager sees "slow" in profiler, demands immediate action. **Red Flags**: - No baseline measurements - No Time Profiler data showing hotspots - "Make everything faster" without targets **Time Cost Comparison**: - Premature optimization: 2 days of work, no measurable improvement - Profile-guided optimization: 2 hours profiling + 4 hours fixing actual bottleneck = 40% faster **How to Push Back Professionally**: ``` "I want to optimize effectively. Let me spend 30 minutes with Instruments to find the actual bottleneck. This prevents wasting time on code that's not the problem. I've seen this save days of work." ``` ### Scenario 2: "Use actors everywhere for thread safety" **The Pressure**: Team adopts Swift 6, decides "everything should be an actor." **Red Flags**: - Actor for simple value types - Actor for synchronous-only operations - Async overhead in tight loops **Time Cost Comparison**: - Actor everywhere: 100μs overhead per operation, janky UI - Appropriate isolation: 10μs overhead, smooth 60fps **How to Push Back Professionally**: ``` "Actors are great for isolation, but they add overhead. For this simple counter, lock-free atomics are 10x faster. Let's use actors where we need them—shared mutable state—and avoid them for pure value types." ``` ### Scenario 3: "Inline everything for speed" **The Pressure**: Someone reads that inlining is faster, marks everything `@inlinable`. **Red Flags**: - Large functions marked `@inlinable` - Internal implementation details exposed - Binary size increases 50% **Time Cost Comparison**: - Inline everything: Code bloat, slower app launch (3s → 5s) - Selective inlining: Fast launch, actual hotspots optimized **How to Push Back Professionally**: ``` "Inlining trades code size for speed. The compiler already inlines when beneficial. Manual @inlinable should be for small, frequently called functions. Let's profile and inline the 3 actual hotspots, not everything." ``` --- ## Real-World Examples ### Example 1: Image Processing Pipeline **Problem**: Processing 1000 images takes 30 seconds. **Investigation**: ```swift // Original code func processImages(_ images: [UIImage]) -> [ProcessedImage] { var results: [ProcessedImage] = [] for image in images { results.append(expensiveProcess(image)) // Reallocations! } return results } ``` **Solution**: ```swift func processImages(_ images: [UIImage]) -> [ProcessedImage] { var results = ContiguousArray() results.reserveCapacity(images.count) // Single allocation for image in images { results.append(expensiveProcess(image)) } return Array(results) } ``` **Result**: 30s → 8s (73% faster) by eliminating reallocations. ### Example 2: Generic Specialization **Problem**: Protocol-based rendering is slow. **Investigation**: ```swift // Original - existential overhead func render(shapes: [any Shape]) { for shape in shapes { shape.draw() // Dynamic dispatch } } ``` **Solution**: ```swift // Specialized generic func render(shapes: [S]) { for shape in shapes { shape.draw() // Static dispatch after specialization } } // Or use @_specialize @_specialize(where S == Circle) @_specialize(where S == Rectangle) func render(shapes: [S]) { } ``` **Result**: 100ms → 10ms (10x faster) by eliminating witness table overhead. --- ## Resources **WWDC**: 2025-312, 2024-10217, 2024-10170, 2021-10216, 2016-416 **Docs**: /swift/inlinearray, /swift/span, /swift/outputspan **Skills**: axiom-performance-profiling, axiom-swift-concurrency, axiom-swiftui-performance ---