# Sanitization System

The Slothlet sanitization system transforms arbitrary strings (file names, path segments) into valid JavaScript identifiers suitable for dot-notation property access. It provides sophisticated rule-based transformation with pattern matching, case preservation, and intelligent segment handling.

## Table of Contents

- [Overview](#overview)
- [Basic Usage](#basic-usage)
- [Core Concepts](#core-concepts)
- [Options](#options)
- [Rules](#rules)
- [Pattern Matching](#pattern-matching)
- [Rule Precedence](#rule-precedence)
- [Examples](#examples)
- [Runtime Convenience Method](#runtime-convenience-method)
- [Testing](#testing)
- [Implementation Notes](#implementation-notes)
- [API Reference](#api-reference)

## Overview

The sanitization function converts filenames into camelCase identifiers while respecting various transformation rules and case preservation options:

```javascript
import { sanitizePropertyName } from "@cldmv/slothlet/helpers/sanitize";

sanitizePropertyName("auto-ip"); // "autoIp"
sanitizePropertyName("parse-JSON-data"); // "parseJSONData"
sanitizePropertyName("get-api-status"); // "getApiStatus"
```

## Basic Usage

```javascript
import { sanitizePropertyName } from "@cldmv/slothlet/helpers/sanitize";

// Simple camelCase conversion
sanitizePropertyName("my-module"); // "myModule"
sanitizePropertyName("get-user-data"); // "getUserData"

// With options
sanitizePropertyName("MyModule", { lowerFirst: false }); // "MyModule"
sanitizePropertyName("COMMON_APPS", { preserveAllUpper: true }); // "COMMON_APPS"

// With rules
sanitizePropertyName("auto-ip", {
	rules: { upper: ["*-ip"] }
}); // "autoIP"
```

## Core Concepts

### Segmentation Levels

The sanitization system uses **two-level segmentation**:

1. **Primary Segments**: Split by hyphens and other non-identifier characters (NOT underscores)
   - Example: `"get-api-status"` → `["get", "api", "status"]`
   - Primary segments are joined with camelCase

2. **Sub-Segments**: Within each primary segment, split by underscores
   - Example: `"Mixed_APPS"` → `["Mixed", "APPS"]`
   - Sub-segments are preserved with underscores
   - Individual sub-segments can have rules applied

### Transformation Order

1. **Pre-split pattern matching**: Check if patterns match the original string
2. **Segment-level rules**: Apply preservation and transformation rules to each segment
3. **CamelCase transformation**: Apply camelCase at the primary segment level
4. **Final cleanup**: Ensure valid JavaScript identifier

## Options

### `lowerFirst` (default: `true`)

Lowercase the first character of the first primary segment for camelCase convention.

```javascript
sanitizePropertyName("MyModule"); // "myModule"
sanitizePropertyName("MyModule", { lowerFirst: false }); // "MyModule"
sanitizePropertyName("parse-json"); // "parseJson"
```

### `preserveAllUpper` (default: `false`)

Automatically preserve sub-segments that are entirely uppercase.

```javascript
sanitizePropertyName("COMMON_APPS", { preserveAllUpper: true }); // "COMMON_APPS"
sanitizePropertyName("Mixed_APPS", { preserveAllUpper: true }); // "mixed_APPS"
sanitizePropertyName("get-API-status", { preserveAllUpper: true }); // "getAPIStatus"
```

**Note**: Only applies to sub-segments that are **entirely** uppercase. Mixed case like `"Mixed"` will still be transformed.

### `preserveAllLower` (default: `false`)

Automatically preserve sub-segments that are entirely lowercase.

```javascript
sanitizePropertyName("common_apps", { preserveAllLower: true }); // "common_apps"
sanitizePropertyName("Mixed_apps", { preserveAllLower: true }); // "mixed_apps"
```

> [!NOTE]
> `preserveAllLower` operates at the **sub-segment level** (split by underscores). Hyphens still cause primary-segment splitting, which are then joined without camelCase capitalization:
>
> ```javascript
> sanitizePropertyName("parse-xml-data", { preserveAllLower: true }); // "parsexmldata" (joined, no caps)
> sanitizePropertyName("parse-xml-data", {}); // "parseXmlData" (normal camelCase)
> sanitizePropertyName("common_apps", { preserveAllLower: true }); // "common_apps" (underscores preserved)
> ```

## Rules

Rules provide fine-grained control over segment transformation. All rules support glob patterns and are case-insensitive by default.

### `leave` (case-sensitive)

Preserve segments exactly as-is. Case-sensitive matching.

```javascript
sanitizePropertyName("autoIP", {
	rules: { leave: ["autoIP"] }
}); // "autoIP"

sanitizePropertyName("auto-ip", {
	rules: { leave: ["ip"] }
}); // "autoip" (preserves "ip" segment)

// Case mismatch - no preservation
sanitizePropertyName("auto-ip", {
	rules: { leave: ["IP"] }
}); // "autoIp"
```

> [!IMPORTANT]
> `leave` is **case-sensitive**. This was a bug in older versions where it behaved case-insensitively.

### `leaveInsensitive` (case-insensitive)

Preserve segments exactly as-is. Case-insensitive matching.

```javascript
sanitizePropertyName("autoIP", {
	rules: { leaveInsensitive: ["autoip"] }
}); // "autoIP"

sanitizePropertyName("AutoIP", {
	rules: { leaveInsensitive: ["autoip"] }
}); // "AutoIP"
```

### `upper`

Force segments to UPPERCASE. Supports exact matches, glob patterns, and boundary patterns.

```javascript
// Exact match
sanitizePropertyName("get-http-status", {
	rules: { upper: ["http"] }
}); // "getHTTPStatus"

// Multiple segments
sanitizePropertyName("parse-json-xml-data", {
	rules: { upper: ["json", "xml"] }
}); // "parseJSONXMLData"
```

### `lower`

Force segments to lowercase. Pattern-matched segments are **preserved in lowercase** through the camelCase phase - `lower` rules take full effect symmetrically with `upper`.

```javascript
sanitizePropertyName("validate-USER-id", {
	rules: { lower: ["user"] }
}); // "validateUserId"  (exact match, no pattern - camelCase applies first char)

// Pattern-based lower - segment stays fully lowercase (Bug #6 fix)
sanitizePropertyName("get-API-status", {
	rules: { lower: ["*-api-*"] }
}); // "getapiStatus"  (api stays lowercase, not capitalized)

sanitizePropertyName("foo-API-json", {
	rules: { lower: ["json"] }
}); // "fooAPIjson"  (json stays lowercase)
```

## Pattern Matching

The sanitization system supports three types of pattern matching:

### 1. Exact Match

Simple string matching (case-insensitive for `upper`/`lower` rules).

```javascript
sanitizePropertyName("get-api-status", {
	rules: { upper: ["api"] }
}); // "getAPIStatus"
```

### 2. Glob Patterns (`*` and `?`)

Match patterns before string splitting using wildcards.

#### Pre-split patterns (with hyphens)

```javascript
// *-ip matches strings ending with "-ip"
sanitizePropertyName("auto-ip", {
	rules: { upper: ["*-ip"] }
}); // "autoIP"

// *-api-* matches strings with "-api-" in the middle
sanitizePropertyName("get-api-status", {
	rules: { upper: ["*-api-*"] }
}); // "getAPIStatus"

// Multiple patterns
sanitizePropertyName("get-http-api-status", {
	rules: { upper: ["http", "*-api-*"] }
}); // "getHTTPAPIStatus"
```

#### Underscore patterns

```javascript
// api_* matches strings starting with "api_"
sanitizePropertyName("api_helper", {
	rules: { upper: ["api_*"] }
}); // "API_helper"

// *_api_* matches strings with "_api_" in the middle
sanitizePropertyName("get_api_data", {
	rules: { upper: ["*_api_*"] }
}); // "get_API_data"
```

#### Within-segment patterns

Transform parts within already camelCased identifiers.

```javascript
// *URL* matches "url" anywhere in the segment
sanitizePropertyName("buildUrlWithParams", {
	rules: { upper: ["*URL*"] }
}); // "buildURLWithParams"

sanitizePropertyName("parseUrl", {
	rules: { upper: ["*URL*"] }
}); // "parseURL"

sanitizePropertyName("parseUrlFromUrlString", {
	rules: { upper: ["*URL*"] }
}); // "parseURLFromURLString"
```

### 3. Boundary Patterns (`**STRING**`)

Match only when surrounded by other characters (requires positive lookbehind/ahead).

```javascript
// **url** only matches "url" when it has characters before AND after
sanitizePropertyName("buildUrlWithParams", {
	rules: { upper: ["**url**"] }
}); // "buildURLWithParams"

// Standalone "url" is NOT matched
sanitizePropertyName("url", {
	rules: { upper: ["**url**"] }
}); // "url"

// Multiple boundary patterns
sanitizePropertyName("buildApiUrlParser", {
	rules: { upper: ["**api**", "**url**"] }
}); // "buildAPIURLParser"
```

## Rule Precedence

When multiple rules could apply to the same segment, they are evaluated in this order:

1. **`leave` (case-sensitive)** - Highest priority
2. **`leaveInsensitive` (case-insensitive)**
3. **`preserveAllUpper` option** - Overrides transformation rules
4. **`preserveAllLower` option** - Overrides transformation rules
5. **`upper` rules** - Takes precedence over `lower`
6. **`lower` rules**
7. **Default camelCase transformation** - Lowest priority

### Examples

```javascript
// leave overrides upper
sanitizePropertyName("autoIP", {
	rules: {
		leave: ["autoIP"],
		upper: ["ip"]
	}
}); // "autoIP"

// preserveAllUpper overrides lower
sanitizePropertyName("COMMON_APPS", {
	preserveAllUpper: true,
	rules: { lower: ["apps"] }
}); // "COMMON_APPS"

// upper overrides lower
sanitizePropertyName("foo-api", {
	rules: {
		upper: ["api"],
		lower: ["api"]
	}
}); // "fooAPI"
```

## Examples

### Basic Transformations

```javascript
// Simple camelCase
sanitizePropertyName("auto-ip"); // "autoIp"
sanitizePropertyName("root-math"); // "rootMath"
sanitizePropertyName("get-api-status"); // "getApiStatus"

// Underscore preservation
sanitizePropertyName("my_module"); // "my_module"
sanitizePropertyName("common_apps"); // "common_apps"

// Mixed hyphens and underscores
sanitizePropertyName("Mixed_APPS_some-thing"); // "mixed_APPS_someThing"
```

### Special Characters and Edge Cases

```javascript
// Special characters removed
sanitizePropertyName("my file!.mjs"); // "myFileMjs"

// Leading numbers stripped
sanitizePropertyName("2autoIP"); // "autoIP"

// Leading underscores preserved
sanitizePropertyName("_test"); // "_test"
sanitizePropertyName("__private"); // "__private"

// Empty/whitespace becomes underscore
sanitizePropertyName(""); // "_"
sanitizePropertyName("   "); // "_"

// Dollar signs preserved
sanitizePropertyName("$scope"); // "$scope"
```

### Complex Combinations

```javascript
// Multiple options and rules
sanitizePropertyName("Mixed_API_some-json-DATA", {
	lowerFirst: true,
	preserveAllUpper: true,
	rules: {
		upper: ["json"],
		lower: ["data"],
		leave: ["API"]
	}
}); // "mixed_API_someJSONDATA"
// Note: preserveAllUpper keeps "DATA" uppercase, overriding lower:["data"]

// Complex pattern matching
sanitizePropertyName("get-http-api-status", {
	rules: { upper: ["http", "*-api-*"] }
}); // "getHTTPAPIStatus"

// Multiple boundary patterns
sanitizePropertyName("test-api-url-parser", {
	rules: { upper: ["**api**", "**url**"] }
}); // "testAPIURLParser"
```

### Real-World Use Cases

```javascript
// API endpoint naming
sanitizePropertyName("get-user-api"); // "getUserApi"
sanitizePropertyName("post-json-data", {
	rules: { upper: ["json"] }
}); // "postJSONData"

// File-based API generation
sanitizePropertyName("http-client.mjs", {
	rules: { upper: ["http"] }
}); // "HTTPClient"

// Database models
sanitizePropertyName("user_profile"); // "user_profile"
sanitizePropertyName("order_item_details"); // "order_item_details"

// Technical acronyms
sanitizePropertyName("parse-xml-to-json", {
	rules: { upper: ["xml", "json"] }
}); // "parseXMLToJSON"
```

---

## Runtime Convenience Method

When working with a live Slothlet API instance, a convenience method is available on `api.slothlet` that sanitizes a string using the **same sanitize configuration the instance was initialized with** - identical to what Slothlet uses when building API paths from filenames:

```javascript
const api = await slothlet({
	dir: "./api",
	sanitize: {
		rules: { upper: ["http", "api"] }
	}
});

// Uses the same sanitize config as the instance
api.slothlet.sanitize("get-http-status"); // "getHTTPStatus"
api.slothlet.sanitize("post-api-data"); // "postAPIData"
api.slothlet.sanitize("my-module"); // "myModule"
```

This is useful for predicting exactly what API path a given filename will produce at runtime.

> [!NOTE]
> `api.slothlet.sanitize()` only accepts a string. It does not accept an options object - the options come from the instance config. Use the standalone `sanitizePropertyName` export if you need to pass custom options directly.

---

## Testing

The sanitization system is covered by a single comprehensive test suite:

| Suite                      | Tests | Focus                                                                    |
| -------------------------- | ----- | ------------------------------------------------------------------------ |
| `sanitize.test.vitest.mjs` | 104   | camelCase, all options, all rule types, patterns, precedence, edge cases |

Tests are located at:

- `tests/vitests/suites/sanitization/sanitize.test.vitest.mjs`

---

## Implementation Notes

### Internal Architecture

1. **Pattern Compilation**: Converts glob patterns to regex with proper escaping
2. **Pre-split Matching**: Evaluates patterns against original string before segmentation
3. **Segment Rules Application**: Applies transformation rules to individual segments; tracks `lower` rule applications
4. **Within-Segment Patterns**: Transforms parts within already-processed segments
5. **CamelCase Transformation**: Applies camelCase based on segment position, skipping segments marked by `lower` rules
6. **Cleanup**: Ensures valid JavaScript identifier output

### Performance Characteristics

- **Pattern caching**: Regex patterns are compiled once per call
- **Early returns**: Preservation rules short-circuit processing
- **Efficient splitting**: Two-level segmentation minimizes regex operations
- **Minimal allocations**: In-place transformations where possible

---

## API Reference

### Function Signature

```typescript
function sanitizePropertyName(
	input: string,
	options?: {
		lowerFirst?: boolean;
		preserveAllUpper?: boolean;
		preserveAllLower?: boolean;
		rules?: {
			leave?: string[];
			leaveInsensitive?: string[];
			upper?: string[];
			lower?: string[];
		};
	}
): string;
```

### Parameters

- **`input`** (string): The string to sanitize (filename, path segment, etc.)
- **`options`** (object, optional): Configuration options
  - **`lowerFirst`** (boolean, default: `true`): Lowercase first character
  - **`preserveAllUpper`** (boolean, default: `false`): Preserve all-uppercase segments
  - **`preserveAllLower`** (boolean, default: `false`): Preserve all-lowercase segments
  - **`rules`** (object, optional): Transformation rules
    - **`leave`** (string[], case-sensitive): Preserve segments exactly
    - **`leaveInsensitive`** (string[], case-insensitive): Preserve segments exactly
    - **`upper`** (string[]): Force segments to UPPERCASE
    - **`lower`** (string[]): Force segments to lowercase (pattern matches preserved through camelCase)

### Returns

- **string**: Valid JavaScript identifier safe for dot-notation access

### Throws

- **TypeError**: If input is not a string (rare, as input is coerced to string)

---

## See Also

- [API-FLATTENING.md](./API-FLATTENING.md) - How sanitization integrates with API generation
- [MODULE-STRUCTURE.md](./MODULE-STRUCTURE.md) - Module loading and naming conventions
- [API-RULES.md](./API-RULES.md) - Complete API generation rule system
- [v3/changes/sanitization.md](./v3/changes/sanitization.md) - V2 → V3 migration and behavior changes