--- name: coreml description: "Integrate and optimize Core ML models in iOS apps for on-device machine learning inference. Covers model loading (.mlmodelc, .mlpackage), predictions with auto-generated classes and MLFeatureProvider, compute unit configuration (CPU, GPU, Neural Engine), MLTensor, VNCoreMLRequest, MLComputePlan, multi-model pipelines, and deployment strategies. Use when loading Core ML models, making predictions, configuring compute units, or profiling model performance." --- # Core ML Swift Integration Load, configure, and run Core ML models in iOS apps. This skill covers the Swift side: model loading, prediction, MLTensor, profiling, and deployment. Target iOS 26+ with Swift 6.3, backward-compatible to iOS 14 unless noted. > **Scope boundary:** Python-side model conversion, optimization (quantization, > palettization, pruning), and framework selection live in the `apple-on-device-ai` > skill. This skill owns Swift integration only. See [references/coreml-swift-integration.md](references/coreml-swift-integration.md) for complete code patterns including actor-based caching, batch inference, image preprocessing, and testing. ## Contents - [Loading Models](#loading-models) - [Model Configuration](#model-configuration) - [Making Predictions](#making-predictions) - [MLTensor (iOS 18+)](#mltensor-ios-18) - [Working with MLMultiArray](#working-with-mlmultiarray) - [Image Preprocessing](#image-preprocessing) - [Multi-Model Pipelines](#multi-model-pipelines) - [Vision Integration](#vision-integration) - [Performance Profiling](#performance-profiling) - [Model Deployment](#model-deployment) - [Memory Management](#memory-management) - [Common Mistakes](#common-mistakes) - [Review Checklist](#review-checklist) - [References](#references) ## Loading Models ### Auto-Generated Classes When you drag a `.mlpackage` or `.mlmodelc` into Xcode, it generates a Swift class with typed input/output. Use this whenever possible. ```swift import CoreML let config = MLModelConfiguration() config.computeUnits = .all let model = try MyImageClassifier(configuration: config) ``` ### Manual Loading Load from a URL when the model is downloaded at runtime or stored outside the bundle. ```swift let modelURL = Bundle.main.url( forResource: "MyModel", withExtension: "mlmodelc" )! let model = try MLModel(contentsOf: modelURL, configuration: config) ``` ### Async Loading (iOS 16+) Load models without blocking the main thread. Prefer this for large models. ```swift let model = try await MLModel.load( contentsOf: modelURL, configuration: config ) ``` ### Compile at Runtime Compile a `.mlpackage` or `.mlmodel` to `.mlmodelc` on device. Useful for models downloaded from a server. ```swift let compiledURL = try await MLModel.compileModel(at: packageURL) let model = try MLModel(contentsOf: compiledURL, configuration: config) ``` Cache the compiled URL -- recompiling on every launch wastes time. Copy `compiledURL` to a persistent location (e.g., Application Support). ## Model Configuration `MLModelConfiguration` controls compute units, GPU access, and model parameters. ### Compute Units Decision Table | Value | Uses | When to Choose | |---|---|---| | `.all` | CPU + GPU + Neural Engine | Default. Let the system decide. | | `.cpuOnly` | CPU | Background tasks, audio sessions, or when GPU is busy. | | `.cpuAndGPU` | CPU + GPU | Need GPU but model has ops unsupported by ANE. | | `.cpuAndNeuralEngine` | CPU + Neural Engine | Best energy efficiency for compatible models. | ```swift let config = MLModelConfiguration() config.computeUnits = .cpuAndNeuralEngine // Allow low-priority background inference config.computeUnits = .cpuOnly ``` ### Configuration Properties ```swift let config = MLModelConfiguration() config.computeUnits = .all config.allowLowPrecisionAccumulationOnGPU = true // faster, slight precision loss ``` ## Making Predictions ### With Auto-Generated Classes The generated class provides typed input/output structs. ```swift let model = try MyImageClassifier(configuration: config) let input = MyImageClassifierInput(image: pixelBuffer) let output = try model.prediction(input: input) print(output.classLabel) // "golden_retriever" print(output.classLabelProbs) // ["golden_retriever": 0.95, ...] ``` ### With MLDictionaryFeatureProvider Use when inputs are dynamic or not known at compile time. ```swift let inputFeatures = try MLDictionaryFeatureProvider(dictionary: [ "image": MLFeatureValue(pixelBuffer: pixelBuffer), "confidence_threshold": MLFeatureValue(double: 0.5), ]) let output = try model.prediction(from: inputFeatures) let label = output.featureValue(for: "classLabel")?.stringValue ``` ### Async Prediction (iOS 17+) ```swift let output = try await model.prediction(from: inputFeatures) ``` ### Batch Prediction Process multiple inputs in one call for better throughput. ```swift let batchInputs = try MLArrayBatchProvider(array: inputs.map { input in try MLDictionaryFeatureProvider(dictionary: ["image": MLFeatureValue(pixelBuffer: input)]) }) let batchOutput = try model.predictions(from: batchInputs) for i in 0.. CVPixelBuffer? { var pixelBuffer: CVPixelBuffer? let attrs: [CFString: Any] = [ kCVPixelBufferCGImageCompatibilityKey: true, kCVPixelBufferCGBitmapContextCompatibilityKey: true, ] CVPixelBufferCreate(kCFAllocatorDefault, width, height, kCVPixelFormatType_32ARGB, attrs as CFDictionary, &pixelBuffer) guard let buffer = pixelBuffer else { return nil } CVPixelBufferLockBaseAddress(buffer, []) let context = CGContext( data: CVPixelBufferGetBaseAddress(buffer), width: width, height: height, bitsPerComponent: 8, bytesPerRow: CVPixelBufferGetBytesPerRow(buffer), space: CGColorSpaceCreateDeviceRGB(), bitmapInfo: CGImageAlphaInfo.noneSkipFirst.rawValue ) context?.draw(cgImage, in: CGRect(x: 0, y: 0, width: width, height: height)) CVPixelBufferUnlockBaseAddress(buffer, []) return buffer } ``` For additional preprocessing patterns (normalization, center-cropping), see [references/coreml-swift-integration.md](references/coreml-swift-integration.md). ## Multi-Model Pipelines Chain models when preprocessing or postprocessing requires a separate model. ```swift // Sequential inference: preprocessor -> main model -> postprocessor let preprocessed = try preprocessor.prediction(from: rawInput) let mainOutput = try mainModel.prediction(from: preprocessed) let finalOutput = try postprocessor.prediction(from: mainOutput) ``` For Xcode-managed pipelines, use the pipeline model type in the `.mlpackage`. Each sub-model runs on its optimal compute unit. ## Vision Integration Use Vision to run Core ML image models with automatic image preprocessing (resizing, normalization, color space, orientation). ### Modern: CoreMLRequest (iOS 18+) ```swift import Vision import CoreML let model = try MLModel(contentsOf: modelURL, configuration: config) let request = CoreMLRequest(model: .init(model)) let results = try await request.perform(on: cgImage) if let classification = results.first as? ClassificationObservation { print("\(classification.identifier): \(classification.confidence)") } ``` ### Legacy: VNCoreMLRequest ```swift let vnModel = try VNCoreMLModel(for: model) let request = VNCoreMLRequest(model: vnModel) { request, error in guard let results = request.results as? [VNRecognizedObjectObservation] else { return } for observation in results { let label = observation.labels.first?.identifier ?? "unknown" let confidence = observation.labels.first?.confidence ?? 0 let boundingBox = observation.boundingBox // normalized coordinates print("\(label): \(confidence) at \(boundingBox)") } } request.imageCropAndScaleOption = .scaleFill let handler = VNImageRequestHandler(cvPixelBuffer: pixelBuffer) try handler.perform([request]) ``` > For complete Vision framework patterns (text recognition, barcode detection, > document scanning), see the `vision-framework` skill. ## Performance Profiling ### MLComputePlan (iOS 17.4+) Inspect which compute device each operation will use before running predictions. ```swift let computePlan = try await MLComputePlan.load( contentsOf: modelURL, configuration: config ) guard case let .program(program) = computePlan.modelStructure else { return } guard let mainFunction = program.functions["main"] else { return } for operation in mainFunction.block.operations { let deviceUsage = computePlan.deviceUsage(for: operation) let estimatedCost = computePlan.estimatedCost(of: operation) print("\(operation.operatorName): \(deviceUsage?.preferredComputeDevice ?? "unknown")") } ``` ### Instruments Use the **Core ML** instrument template in Instruments to profile: - Model load time - Prediction latency (per-operation breakdown) - Compute device dispatch (CPU/GPU/ANE per operation) - Memory allocation Run outside the debugger for accurate results (Xcode: Product > Profile). ## Model Deployment ### Bundle vs On-Demand Resources | Strategy | Pros | Cons | |---|---|---| | Bundle in app | Instant availability, works offline | Increases app download size | | On-demand resources | Smaller initial download | Requires download before first use | | Background Assets (iOS 16+) | Downloads ahead of time | More complex setup | | CloudKit / server | Maximum flexibility | Requires network, longer setup | ### Size Considerations - App Store limit: 4 GB for app bundle - Cellular download limit: 200 MB (can request exception) - Use ODR tags for models > 50 MB - Pre-compile to `.mlmodelc` to skip on-device compilation ```swift // On-demand resource loading let request = NSBundleResourceRequest(tags: ["ml-model-v2"]) try await request.beginAccessingResources() let modelURL = Bundle.main.url(forResource: "LargeModel", withExtension: "mlmodelc")! let model = try await MLModel.load(contentsOf: modelURL, configuration: config) // Call request.endAccessingResources() when done ``` ## Memory Management - **Unload on background:** Release model references when the app enters background to free GPU/ANE memory. Reload on foreground return. - **Use `.cpuOnly` for background tasks:** Background processing cannot use GPU or ANE; setting `.cpuOnly` avoids silent fallback and resource contention. - **Share model instances:** Never create multiple `MLModel` instances from the same compiled model. Use an actor to provide shared access. - **Monitor memory pressure:** Large models (>100 MB) can trigger memory warnings. Register for `UIApplication.didReceiveMemoryWarningNotification` and release cached models when under pressure. See [references/coreml-swift-integration.md](references/coreml-swift-integration.md) for an actor-based model manager with lifecycle-aware loading and cache eviction. ## Common Mistakes **DON'T:** Load models on the main thread. **DO:** Use `MLModel.load(contentsOf:configuration:)` async API or load on a background actor. **Why:** Large models can take seconds to load, freezing the UI. **DON'T:** Recompile `.mlpackage` to `.mlmodelc` on every app launch. **DO:** Compile once with `MLModel.compileModel(at:)` and cache the compiled URL persistently. **Why:** Compilation is expensive. Cache the `.mlmodelc` in Application Support. **DON'T:** Hardcode `.cpuOnly` unless you have a specific reason. **DO:** Use `.all` and let the system choose the optimal compute unit. **Why:** `.all` enables Neural Engine and GPU, which are faster and more energy-efficient. **DON'T:** Ignore `MLFeatureValue` type mismatches between input and model expectations. **DO:** Match types exactly -- use `MLFeatureValue(pixelBuffer:)` for images, not raw data. **Why:** Type mismatches cause cryptic runtime crashes or silent incorrect results. **DON'T:** Create a new `MLModel` instance for every prediction. **DO:** Load once and reuse. Use an actor to manage the model lifecycle. **Why:** Model loading allocates significant memory and compute resources. **DON'T:** Skip error handling for model loading and prediction. **DO:** Catch errors and provide fallback behavior when the model fails. **Why:** Models can fail to load on older devices or when resources are constrained. **DON'T:** Assume all operations run on the Neural Engine. **DO:** Use `MLComputePlan` (iOS 17.4+) to verify device dispatch per operation. **Why:** Unsupported operations fall back to CPU, which may bottleneck the pipeline. **DON'T:** Process images manually before passing to Vision + Core ML. **DO:** Use `CoreMLRequest` (iOS 18+) or `VNCoreMLRequest` (legacy) to let Vision handle preprocessing. **Why:** Vision handles orientation, scaling, and pixel format conversion correctly. ## Review Checklist - [ ] Model loaded asynchronously (not blocking main thread) - [ ] `MLModelConfiguration.computeUnits` set appropriately for use case - [ ] Model instance reused across predictions (not recreated each time) - [ ] Auto-generated class used when available (typed inputs/outputs) - [ ] Error handling for model loading and prediction failures - [ ] Compiled model cached persistently if compiled at runtime - [ ] Image inputs use Vision pipeline (`CoreMLRequest` iOS 18+ or `VNCoreMLRequest`) for correct preprocessing - [ ] `MLComputePlan` checked to verify compute device dispatch (iOS 17.4+) - [ ] Batch predictions used when processing multiple inputs - [ ] Model size appropriate for deployment strategy (bundle vs ODR) - [ ] Memory tested on target devices (especially older devices with less RAM) - [ ] Predictions run outside debugger for accurate performance measurement ## References - Patterns and code: [references/coreml-swift-integration.md](references/coreml-swift-integration.md) - Model conversion and optimization (Python-side): covered in the `apple-on-device-ai` skill - Apple docs: [Core ML](https://sosumi.ai/documentation/coreml) | [MLModel](https://sosumi.ai/documentation/coreml/mlmodel) | [MLComputePlan](https://sosumi.ai/documentation/coreml/mlcomputeplan-1w21n)