--- name: natural-language description: "Tokenize, tag, and analyze natural language text using Apple's NaturalLanguage framework and translate between languages with the Translation framework. Use when adding language identification, sentiment analysis, named entity recognition, part-of-speech tagging, text embeddings, or in-app translation to iOS/macOS/visionOS apps." --- # NaturalLanguage + Translation Analyze natural language text for tokenization, part-of-speech tagging, named entity recognition, sentiment analysis, language identification, and word/sentence embeddings. Translate text between languages with the Translation framework. Targets Swift 6.2 / iOS 26+. > This skill covers two related frameworks: **NaturalLanguage** (`NLTokenizer`, `NLTagger`, `NLEmbedding`) for on-device text analysis, and **Translation** (`TranslationSession`, `LanguageAvailability`) for language translation. ## Contents - [Setup](#setup) - [Tokenization](#tokenization) - [Language Identification](#language-identification) - [Part-of-Speech Tagging](#part-of-speech-tagging) - [Named Entity Recognition](#named-entity-recognition) - [Sentiment Analysis](#sentiment-analysis) - [Text Embeddings](#text-embeddings) - [Translation](#translation) - [Common Mistakes](#common-mistakes) - [Review Checklist](#review-checklist) - [References](#references) ## Setup Import `NaturalLanguage` for text analysis and `Translation` for language translation. No special entitlements or capabilities are required for NaturalLanguage. Translation requires iOS 17.4+ / macOS 14.4+. ```swift import NaturalLanguage import Translation ``` NaturalLanguage classes (`NLTokenizer`, `NLTagger`) are **not thread-safe**. Use each instance from one thread or dispatch queue at a time. ## Tokenization Segment text into words, sentences, or paragraphs with `NLTokenizer`. ```swift import NaturalLanguage func tokenizeWords(in text: String) -> [String] { let tokenizer = NLTokenizer(unit: .word) tokenizer.string = text let range = text.startIndex.. NLLanguage? { NLLanguageRecognizer.dominantLanguage(for: text) } // Multiple hypotheses with confidence scores func languageHypotheses(for text: String, max: Int = 5) -> [NLLanguage: Double] { let recognizer = NLLanguageRecognizer() recognizer.processString(text) return recognizer.languageHypotheses(withMaximum: max) } ``` Constrain the recognizer to expected languages for better accuracy on short text. ```swift let recognizer = NLLanguageRecognizer() recognizer.languageConstraints = [.english, .french, .spanish] recognizer.processString(text) let detected = recognizer.dominantLanguage ``` ## Part-of-Speech Tagging Identify nouns, verbs, adjectives, and other lexical classes with `NLTagger`. ```swift func tagPartsOfSpeech(in text: String) -> [(String, NLTag)] { let tagger = NLTagger(tagSchemes: [.lexicalClass]) tagger.string = text var results: [(String, NLTag)] = [] let range = text.startIndex.. [(String, NLTag)] { let tagger = NLTagger(tagSchemes: [.nameType]) tagger.string = text var entities: [(String, NLTag)] = [] let options: NLTagger.Options = [.omitPunctuation, .omitWhitespace, .joinNames] tagger.enumerateTags( in: text.startIndex.. Double? { let tagger = NLTagger(tagSchemes: [.sentimentScore]) tagger.string = text let (tag, _) = tagger.tag( at: text.startIndex, unit: .paragraph, scheme: .sentimentScore ) return tag.flatMap { Double($0.rawValue) } } ``` ## Text Embeddings Measure semantic similarity between words or sentences with `NLEmbedding`. ```swift func wordSimilarity(_ word1: String, _ word2: String) -> Double? { guard let embedding = NLEmbedding.wordEmbedding(for: .english) else { return nil } return embedding.distance(between: word1, and: word2, distanceType: .cosine) } func findSimilarWords(to word: String, count: Int = 5) -> [(String, Double)] { guard let embedding = NLEmbedding.wordEmbedding(for: .english) else { return [] } return embedding.neighbors(for: word, maximumCount: count, distanceType: .cosine) } ``` Sentence embeddings compare entire sentences. ```swift func sentenceSimilarity(_ s1: String, _ s2: String) -> Double? { guard let embedding = NLEmbedding.sentenceEmbedding(for: .english) else { return nil } return embedding.distance(between: s1, and: s2, distanceType: .cosine) } ``` ## Translation ### System Translation Overlay Show the built-in translation UI with `.translationPresentation()`. ```swift import SwiftUI import Translation struct TranslatableView: View { @State private var showTranslation = false let text = "Hello, how are you?" var body: some View { Text(text) .onTapGesture { showTranslation = true } .translationPresentation( isPresented: $showTranslation, text: text ) } } ``` ### Programmatic Translation Use `.translationTask()` for programmatic translations within a view context. ```swift struct TranslatingView: View { @State private var translatedText = "" @State private var configuration: TranslationSession.Configuration? var body: some View { VStack { Text(translatedText) Button("Translate") { configuration = .init(source: Locale.Language(identifier: "en"), target: Locale.Language(identifier: "es")) } } .translationTask(configuration) { session in let response = try await session.translate("Hello, world!") translatedText = response.targetText } } } ``` ### Batch Translation Translate multiple strings in a single session. ```swift .translationTask(configuration) { session in let requests = texts.enumerated().map { index, text in TranslationSession.Request(sourceText: text, clientIdentifier: "\(index)") } let responses = try await session.translations(from: requests) for response in responses { print("\(response.sourceText) -> \(response.targetText)") } } ``` ### Checking Language Availability ```swift let availability = LanguageAvailability() let status = await availability.status( from: Locale.Language(identifier: "en"), to: Locale.Language(identifier: "ja") ) switch status { case .installed: break // Ready to translate offline case .supported: break // Needs download case .unsupported: break // Language pair not available } ``` ## Common Mistakes ### DON'T: Share NLTagger/NLTokenizer across threads These classes are not thread-safe and will produce incorrect results or crash. ```swift // WRONG let sharedTagger = NLTagger(tagSchemes: [.lexicalClass]) DispatchQueue.concurrentPerform(iterations: 10) { _ in sharedTagger.string = someText // Data race } // CORRECT await withTaskGroup(of: Void.self) { group in for _ in 0..<10 { group.addTask { let tagger = NLTagger(tagSchemes: [.lexicalClass]) tagger.string = someText // process... } } } ``` ### DON'T: Confuse NaturalLanguage with Core ML NaturalLanguage provides built-in linguistic analysis. Use Core ML for custom trained models. They complement each other via `NLModel`. ```swift // WRONG: Trying to do NER with raw Core ML let coreMLModel = try MLModel(contentsOf: modelURL) // CORRECT: Use NLTagger for built-in NER let tagger = NLTagger(tagSchemes: [.nameType]) // Or load a custom Core ML model via NLModel let nlModel = try NLModel(mlModel: coreMLModel) tagger.setModels([nlModel], forTagScheme: .nameType) ``` ### DON'T: Assume embeddings exist for all languages Not all languages have word or sentence embeddings available on device. ```swift // WRONG: Force unwrap let embedding = NLEmbedding.wordEmbedding(for: .japanese)! // CORRECT: Handle nil guard let embedding = NLEmbedding.wordEmbedding(for: .japanese) else { // Embedding not available for this language return } ``` ### DON'T: Create a new tagger per token Creating and configuring a tagger is expensive. Reuse it for the same text. ```swift // WRONG: New tagger per word for word in words { let tagger = NLTagger(tagSchemes: [.lexicalClass]) tagger.string = word } // CORRECT: Set string once, enumerate let tagger = NLTagger(tagSchemes: [.lexicalClass]) tagger.string = fullText tagger.enumerateTags(in: fullText.startIndex..