# CamoText CLI Documentation
*version 1.1.0*

## Features

- **Headless Operation**: Run without GUI dependencies (CLI only)
- **Native Batch Processing**: Built-in directory processing with parallel execution
- **Flexible Input**: Accept single files, raw text strings, or entire directories
- **Multiple Output Formats**: Support for various file formats
- **Entity Detection**: List detected entity types before processing
- **Key Management**: Export anonymization keys for audit trails
- **Configurable Hashing**: Customize hash length for anonymization tags
- **Parallel Processing**: Multi-threaded batch processing for faster execution
- **Recursive Processing**: Process nested directory structures
- **Progress Reporting**: Real-time progress updates during batch operations
- **Comprehensive Help**: Built-in help system with organized argument documentation
- **Error Handling**: Clear error messages with suggestions for valid arguments


### Setting PATH


- One PATH is set, use **camo** to invoke CLI mode (headless, scriptable) rather than having to specify the CamoTextCLI executable (camo.exe) location in a string

*Setting PATH on Windows:*

1) Win+R → type sysdm.cpl → Enter.

2) “Advanced” tab → Environment Variables…

3) Under User variables (or System variables for all users) select Path → Edit.

4) New → paste C:\full\path\to\CamoTextCLI (your folder containing camo.exe) → OK all the way out.



*Setting PATH on MacOS*

***Option A***: Edit your shell’s startup file. Recent macOS versions default to zsh.

1) Open your shell config in a text editor. run:
```bash
nano ~/.zshrc
```

2) Add this line at the bottom, replacing /full/path/to with the folder containing camo.app:
```bash
export PATH="/full/path/to:$PATH"
```

3) Save (Ctrl + O, Enter) and exit (Ctrl + X).

4) Reload the file so you don’t have to restart Terminal:
```bash
source ~/.zshrc
```

***Option B***: Symlink into a directory that’s already on your PATH

Most macOS systems already include /usr/local/bin (or /opt/homebrew/bin) in your shell’s PATH. Just create a symbolic link:

1) create /usr/local/bin if needed
```bash
sudo mkdir -p /usr/local/bin
```

2) link the camo executable
```bash
sudo ln -s ~/Desktop/camo.app/Contents/MacOS/camo /usr/local/bin/camo
```

3) Open a new terminal and simply run:
camo --help



---

## Quick Start

For immediate help and argument reference (CLI):

```bash
# Get comprehensive help (CLI)
camo --help    # Windows
./camo --help  # macOS/Linux

# Get help in short form (CLI)
camo -h        # Windows
./camo -h      # macOS/Linux
```

## System Requirements

CamoText is distributed as self-contained executables with all dependencies bundled:

- **No Python installation required** - All dependencies included in executables
- **No additional downloads** - NLP model bundled internally
- **Cross-platform support** - Native executables for Windows, macOS, and Linux
- **No environment setup** - Ready to run immediately after download


### CLI Mode (camo)

When launched as 'camo' or 'camo.exe', 'camotextcli', or 'camotextcli.exe', the command-line interface runs in headless
mode:

```bash
# CLI executable, always headless
camo --input document.txt --output anonymized.txt       # Windows
camo --input-dir ./docs --output-dir ./processed

./camo --input document.txt --output anonymized.txt     # macOS/Linux
./camo --input-dir ./docs --output-dir ./processed
```

**CLI Features:**

- Batch directory processing with parallel execution
- Scriptable automation and CI/CD integration
- Progress reporting and error handling
- JSON key export for audit trails
- Entity analysis across multiple files
- No GUI dependencies required

---

| Aspect           | GUI Mode (camotext)            | CLI Mode (camo/camotextcli/camotextcli.py) |
| ---------------- | ------------------------------ | ------------------------------------------ |
| **Execution**    | Interactive window opens       | Runs in terminal/command prompt            |
| **Dependencies** | Requires display/window system | Headless compatible                        |
| **Automation**   | Manual operation               | Fully scriptable                           |
| **Output**       | Visual interface               | STDOUT/files                               |
| **File Size**    | Separate executable            | Separate executable                        |
| **Server Use**   | Not suitable                   | Ideal for server environments              |


### Usage Examples

> **Note:**
>
> - Use `camo` or `camo.exe`for the CLI (headless, scriptable, batch, or automation
>   workflows).

```bash

# CLI mode
camo --input file.txt --output out.txt           # Windows
./camo --input file.txt --output out.txt         # macOS/Linux
```

## Command Syntax


### Error Handling

If invalid arguments are provided, CamoText displays helpful error messages:

```bash

# Output:
# Error: Invalid argument(s): --invalid-arg
#
# Supported CLI arguments: --dump-key, --extensions, --hash-length, --help, --ignore-category, --input, --input-dir, --key-dir, --list-entities, --output, --output-dir, --priority, --redact, --revert, --recursive, --workers, -h
#
# Use --help or -h for detailed usage information.
```

### Required Arguments

CamoTextCLI requires either plaintext, single file input or batch processing input:

| Mode             | Required Arguments               |
| ---------------- | -------------------------------- |
| Single File      | `--input`                        |
| Batch Processing | `--input-dir` AND `--output-dir` |
| De-Anonymization | `--input-dir` AND `--deanon`    |

### Argument Reference

All CLI arguments, organized into groups:

#### Input/Output Options

| Argument           | Type   | Metavar | Description                                                       |
| ------------------ | ------ | ------- | ----------------------------------------------------------------- |
| `-i` OR `--input`  | string | FILE    | Input file path or raw text string                                |
| `-o` OR `--output` | string | FILE    | Output file path. If omitted, prints to STDOUT                    |
| `--input-dir`      | string | DIR     | Input directory for batch processing                              |
| `--output-dir`     | string | DIR     | Output directory for batch processing (required with --input-dir) |

#### Anonymization Options

| Argument             | Type    | Metavar  | Default | Description                                                                                                                                  |
| -------------------- | ------- | -------- | ------- | -------------------------------------------------------------------------------------------------------------------------------------------- |
| `-p` OR `--priority` | string  | TEXT     | none    | Text that should be anonymized with priority. Case-insensitive. Can be used multiple times                                                   |
| `-r` OR `--revert`   | string  | TEXT     | none    | Text to revert from anonymized files. Case-insensitive. Can be used multiple times. Requires --input-dir                                     |
| `--hash-length`      | integer | N        | 8       | Length of the anonymization hashes                                                                                                           |
| `--ignore-category`  | string  | CATEGORY | none    | Category to ignore (revert after anonymization). Case-insensitive. Can be used multiple times. Examples: PERSON, EMAIL_ADDRESS, PHONE_NUMBER |
| `--config`           | string  | FILE     | none    | Path to JSON configuration file containing anonymization settings                                                                            |
| `--redact`           | flag    | -        | false   | Replace all anonymized items with "[REDACTED]" instead of hash tags                                                                        |
| `--deanon`           | flag    | -        | false   | De-anonymize files by restoring hash tags to original text. Requires --input-dir                                                           |

#### Key Management

| Argument     | Type   | Metavar | Description                                                                                                       |
| ------------ | ------ | ------- | ----------------------------------------------------------------------------------------------------------------- |
| `--dump-key` | string | FILE    | Path to write the anonymization key as a JSON file. If no value provided, uses default naming in output directory |
| `--key-dir`  | string | DIR     | Directory to save anonymization keys during batch processing. If no value provided, uses output directory         |

#### Batch Processing Options

| Argument       | Type    | Metavar | Default                         | Description                                                |
| -------------- | ------- | ------- | ------------------------------- | ---------------------------------------------------------- |
| `--recursive`  | flag    | -       | false                           | Process subdirectories recursively during batch processing |
| `--extensions` | list    | EXT     | .txt .pdf .docx .xlsx .csv .rtf | File extensions to process                                 |
| `--workers`    | integer | N       | 1                               | Number of parallel workers for batch processing            |

#### Analysis Options

| Argument          | Type | Metavar | Description                                                      |
| ----------------- | ---- | ------- | ---------------------------------------------------------------- |
| `--list-entities` | flag | -       | List detected entity types and exit (no anonymization performed) |

## Usage Examples

### Getting Help and Information

```bash
# Display comprehensive help with all arguments organized by category
camo --help                # Windows
./camo --help              # macOS/Linux

# Short form help
camo -h                    # Windows
./camo -h                  # macOS/Linux

# Example help output structure:
# CamoText: Anonymize text from files or strings.
#
# Input/Output Options:
#   --input FILE            Input file path or raw text string
#   --output FILE           Output file path. If omitted, prints to STDOUT
#   --input-dir DIR         Input directory for batch processing
#   --output-dir DIR        Output directory for batch processing
#
# Anonymization Options:
#   --priority TEXT         Text that should be anonymized with priority
#   --revert TEXT           Text to revert from anonymized files
#   --hash-length N         Length of the anonymization hashes (default: 8)
#   --ignore-category CAT   Category to ignore (revert after anonymization)
#
# Key Management:
#   --dump-key [FILE]       Path to write the anonymization key as a JSON file
#   --key-dir [DIR]         Directory to save anonymization keys
#
# Batch Processing Options:
#   --recursive             Process subdirectories recursively
#   --extensions EXT        File extensions to process
#   --workers N             Number of parallel workers (default: 1)
#
# Analysis Options:
#   --list-entities         List detected entity types and exit
```

### Error Handling Examples

```bash
# Example 1: Invalid argument
camo --unknown-option file.txt
# Output:
# Error: Invalid argument(s): --unknown-option
#
# Supported CLI arguments: --dump-key, --extensions, --hash-length, --help, --ignore-category, --input, --input-dir, --key-dir, --list-entities, --output, --output-dir, --priority, --redact, --revert, --recursive, --workers, -h
#
# Use --help or -h for detailed usage information.

# Example 2: Missing required arguments (argparse error)
camo --output file.txt
# Output:
# Error: Invalid argument(s) provided.
#
# Supported CLI arguments: --dump-key, --extensions, --hash-length, --help, --ignore-category, --input, --input-dir, --key-dir, --list-entities, --output, --output-dir, --priority, --redact, --revert, --recursive, --workers, -h
#
# Use --help or -h for detailed usage information.

# Example 3: Get help to see proper usage
camo --help
# Shows complete help documentation with examples
```

### Basic File Anonymization

```bash
# Anonymize a text file to STDOUT (using long forms)
camo --input document.txt                               # Windows
./camo --input document.txt                             # macOS/Linux

# Anonymize a text file to STDOUT (using short forms)
camo -i document.txt                                    # Windows
./camo -i document.txt                                  # macOS/Linux

# Anonymize a PDF and save to file (using long forms)
camo --input report.pdf --output anonymized_report.pdf   # Windows
./camo --input report.pdf --output anonymized_report.pdf # macOS/Linux

# Anonymize a PDF and save to file (using short forms)
camo -i report.pdf -o anonymized_report.pdf              # Windows
./camo -i report.pdf -o anonymized_report.pdf            # macOS/Linux
```

### Raw Text Processing

```bash
# Process raw text string (Windows)
camo --input "John Doe works at Acme Corp and can be reached at john@acme.com"

# Process raw text string (macOS/Linux)
./camo --input "John Doe works at Acme Corp and can be reached at john@acme.com"

# Process with custom hash length
camo --input "Sensitive data here" --hash-length 12     # Windows
./camo --input "Sensitive data here" --hash-length 12   # macOS/Linux

# Process with redaction mode
camo --input "John Doe works at Acme Corp" --redact     # Windows
./camo --input "John Doe works at Acme Corp" --redact   # macOS/Linux

# Process with configuration file (includes exclusions)
camo --config config.json --input document.txt          # Windows
./camo --config config.json --input document.txt        # macOS/Linux
```

### Priority Text Processing

```bash
# Process with priority text that must be anonymized first (Windows)
camo --priority "confidential" --input "This is confidential information"

# Process with multiple priority texts (macOS/Linux)
./camo --priority "classified" --priority "top secret" --input document.txt

# Priority text with file output
camo --priority "internal use only" --input report.pdf --output anonymized_report.pdf

# Batch processing with priority text
camo --priority "proprietary" --input-dir ./documents --output-dir ./anonymized
```

### Term Reversion

```bash
# Revert specific terms from anonymized files (Windows)
camo --input-dir ./anonymized --revert "John Doe" --revert "Acme Corp"

# Revert specific terms (macOS/Linux)
./camo --input-dir ./anonymized --revert "John Doe" --revert "Acme Corp"

# Revert with short form
camo --input-dir ./anonymized -r "John Doe" -r "Acme Corp"

# Revert with custom key directory
camo --input-dir ./anonymized --key-dir ./keys --revert "confidential"

# Revert with recursive processing
camo --input-dir ./anonymized --revert "internal" --recursive
```

### Exclusions

The `exclusions` configuration field allows you to specify text that should be excluded from anonymization and preserved in the output. This is useful when you want to keep certain terms visible while anonymizing other sensitive information.

**How Exclusions Work:**
- **Case-insensitive matching**: Exclusions are matched regardless of case
- **Partial matching**: If any part of a detected entity contains an exclusion term, the entire entity is preserved
- **Priority over anonymization**: Excluded text is never anonymized, even if it would normally be detected as PII
- **Configuration-based**: Set via JSON configuration files for consistent behavior

```json
{
  "exclusions": ["confidential", "internal use only", "proprietary", "public domain"]
}
```

**Example:**
- Input: "This confidential document contains John Doe's personal information"
- Without exclusions: "This <PERSON_a1b2c3d4> document contains <PERSON_e5f6g7h8>'s personal information"
- With exclusions: "This confidential document contains <PERSON_e5f6g7h8>'s personal information"

### Redaction Mode

The `--redact` option replaces all anonymized items with "[REDACTED]" instead of hash tags. This provides a cleaner, more uniform appearance while still maintaining the anonymization key for reference.

```bash
# Basic redaction (Windows)
camo --input document.txt --redact

# Redaction with file output (macOS/Linux)
./camo --input report.pdf --output redacted_report.pdf --redact

# Batch processing with redaction
camo --input-dir ./docs --output-dir ./redacted --redact

# Redaction with priority text
camo --priority "confidential" --input document.txt --redact
```

**Key Features:**
- **Uniform appearance**: All anonymized content appears as "[REDACTED]"
- **Key preservation**: Anonymization keys still contain original mappings
- **Clean output**: No hash tags visible in the final document
- **Compatible with all modes**: Works with single file, batch, and priority processing

### Category Ignoring (Reversion)

The `--ignore-category` option allows you to specify entity categories that should be ignored (reverted to original text
after anonymization). This is useful when you want to anonymize most entities but keep certain types visible.

```bash
# Ignore organization names (Windows)
camo --ignore-category "organization" --input document.txt --output anonymized.txt

# Ignore multiple categories (macOS/Linux)
./camo --ignore-category "person" --ignore-category "location" --input report.pdf --output clean_report.pdf

# Case-insensitive category matching
camo --ignore-category "ORGANIZATION" --ignore-category "person" --input data.txt

# Batch processing with ignored categories
./camo --input-dir ./docs --output-dir ./processed --ignore-category "organization" --ignore-category "location"

# Combine with priority text
camo --priority "confidential" --ignore-category "organization" --input document.txt
```

**Supported Categories for --ignore-category:**

- `PERSON` - Personal names
- `EMAIL_ADDRESS` - Email addresses
- `PHONE_NUMBER` / `CONTACT_NUMBER` - Phone numbers
- `ORGANIZATION` / `ENTITY` - Company and organization names
- `LOCATION` / `ADDRESS` / `STREET_ADDRESS` - Geographic locations
- `DATE_TIME` - Dates and times
- `MONEY` - Monetary amounts
- `CREDIT_CARD` - Credit card numbers
- `US_SSN` - Social Security Numbers
- `IP_ADDRESS` - IP addresses
- `URL` - Website URLs
- `FILE` - File paths and names
- `GPS` - GPS coordinates
- `ACCOUNT` - Account handles and numbers
- `UUID` - Universal unique identifiers
- `CRYPTO_ADDRESS` - Cryptocurrency addresses
- And more... (see main documentation for complete list)

### Key File Placement

When using `--dump-key` with output directories, the key file is automatically placed in the appropriate location:

```bash
# Single file: key placed in same directory as output
camo --input document.txt --output ./processed/anonymized.txt --dump-key key.json
# Result: key saved to ./processed/key.json (not ./key.json)

# Batch processing: keys placed in output directory when --key-dir not specified
./camo --input-dir ./docs --output-dir ./processed --dump-key keys.json
# Result: individual key files placed in ./processed/ directory

# Explicit key directory overrides automatic placement
camo --input-dir ./docs --output-dir ./processed --key-dir ./audit --dump-key keys.json
# Result: keys saved to ./audit/ directory (--key-dir takes precedence)
```

### Entity Detection

```bash
# List detected entity types without anonymization
camo --input document.txt --list-entities               # Windows
./camo --input document.txt --list-entities                 # macOS/Linux

# Example output:
# ["PERSON", "EMAIL_ADDRESS", "ORGANIZATION"]
```

### Key Management

```bash
# Generate anonymization key file for single file
camo --input document.txt --output anonymized.txt --dump-key key.json  # Windows
./camo --input document.txt --output anonymized.txt --dump-key key.json    # macOS/Linux

# Use default key naming (no value provided)
camo --input document.txt --output anonymized.txt --dump-key
# Result: Creates anonymized_key.json in the same directory as output

# Batch processing with centralized key directory
camo --input-dir ./docs --output-dir ./anonymized --key-dir ./keys --dump-key batch_key.json

# Batch processing with automatic individual key files (no --key-dir specified)
camo --input-dir ./docs --output-dir ./anonymized --dump-key keys.json
# Result: Creates individual key files like document1_key.json, document2_key.json, etc. in ./anonymized/

# Batch processing with default key directory (no value for --key-dir)
camo --input-dir ./docs --output-dir ./anonymized --key-dir --dump-key
# Result: Keys saved to ./anonymized/ directory with default naming

# Batch processing with explicit key directory (overrides automatic behavior)
camo --input-dir ./docs --output-dir ./anonymized --key-dir ./audit --dump-key keys.json
# Result: Keys saved to ./audit/ directory as specified by --key-dir
```

**Key File Behavior:**

- **Single file mode**: `--dump-key` saves to the specified path
- **Single file mode with no value**: `--dump-key` saves to output directory with name `{output_filename}_key.json`
- **Batch mode with `--key-dir`**: Keys saved to the specified key directory
- **Batch mode with `--key-dir` (no value)**: Keys saved to output directory
- **Batch mode without `--key-dir`**: Creates individual key files for each processed file in the output directory
- **Key file naming**: Individual key files use the pattern `{original_filename}_key.json`
- **Default naming**: When no filename is specified, uses `{document_name}_key.json` format

**Key file format (key.json):**

```json
{
  "John Doe": "<PERSON_a1b2c3d4>",
  "john@acme.com": "<EMAIL_ADDRESS_e5f6g7h8>",
  "Acme Corp": "<ORGANIZATION_i9j0k1l2>"
}
```

### Native Batch Processing

```bash
# Process entire directory (Windows)
camo --input-dir ./documents --output-dir ./anonymized --key-dir ./keys

# Process entire directory (macOS/Linux)
./camo --input-dir ./documents --output-dir ./anonymized --key-dir ./keys

# Process with specific file types
camo --input-dir ./docs --output-dir ./anon --extensions .txt .pdf

# Recursive processing with parallel workers
camo --input-dir ./projects --output-dir ./anonymized ^
             --key-dir ./keys --recursive --workers 4

# List all entities in a directory
camo --input-dir ./documents --list-entities
```

### Advanced Batch Examples

```bash
# Large dataset processing with 8 workers (Windows)
camo --input-dir C:\data\sensitive --output-dir C:\data\anonymized ^
             --key-dir C:\data\keys --workers 8 --hash-length 12

# Large dataset processing (macOS/Linux)
./camo --input-dir /data/sensitive --output-dir /data/anonymized \
           --key-dir /data/keys --workers 8 --hash-length 12

# Process only specific file types recursively
camo --input-dir ./mixed_files --output-dir ./cleaned ^
             --extensions .docx .pdf --recursive

# Batch entity analysis
camo --input-dir ./compliance_docs --list-entities > entity_report.json

# Advanced priority text processing
camo --priority "Operation Blackbird" --priority "classified" ^
             --input-dir ./sensitive --output-dir ./sanitized ^
             --key-dir ./keys --workers 4

# Complex processing with category ignoring
camo --priority "confidential" --ignore-category "organization" ^
             --ignore-category "location" --input-dir ./documents ^
             --output-dir ./processed --dump-key audit.json --workers 6

# Batch processing with redaction mode
camo --input-dir ./sensitive --output-dir ./redacted --redact --workers 4

# Batch processing with configuration file (includes exclusions)
camo --config config.json --input-dir ./docs --output-dir ./processed --workers 4

# Selective anonymization keeping organizations visible
./camo --input-dir ./compliance_docs --output-dir ./redacted \
           --ignore-category "organization" --ignore-category "entity" \
           --recursive --workers 8
```

### De-Anonymization

The `--deanon` flag allows you to fully de-anonymize files by restoring all hash tags to their original text using anonymization key files. This is useful when you need to restore anonymized documents to their original state.

**How De-Anonymization Works:**

1. **Directory Scan**: Scans the input directory for JSON anonymization key files
2. **Key Combination**: Combines all JSON key files found (skips duplicates, keeps first occurrence)
3. **File Processing**: Processes all compatible input files (non-JSON files with supported extensions)
4. **Hash Restoration**: Replaces all hash tags with their original text from the combined key
5. **Output**: Saves de-anonymized files with "clean_" prefix

**Key Features:**

- **Automatic key discovery**: Finds and combines all JSON key files in the directory
- **Recursive processing**: Use `--recursive` to process subdirectories
- **Extension filtering**: Use `--extensions` to filter file types
- **Parallel processing**: Supports `--workers` for faster execution
- **Output flexibility**: Save to same directory (with "clean_" prefix) or specify `--output-dir`
- **Warning system**: Warns if no key files found and asks for confirmation

**Basic Usage:**

```bash
# De-anonymize files in directory (Windows)
camo --input-dir ./anonymized --deanon

# De-anonymize files (macOS/Linux)
./camo --input-dir ./anonymized --deanon

# De-anonymize with recursive processing
camo --input-dir ./anonymized --deanon --recursive

# De-anonymize with specific output directory
camo --input-dir ./anonymized --deanon --output-dir ./cleaned

# De-anonymize with parallel processing
camo --input-dir ./anonymized --deanon --workers 4

# De-anonymize specific file types only
camo --input-dir ./anonymized --deanon --extensions .txt .pdf

# De-anonymize with recursive and parallel processing
camo --input-dir ./anonymized --deanon --recursive --workers 4
```

**Example Workflow:**

```bash
# 1. Initial anonymization
camo --input-dir ./documents --output-dir ./anonymized --key-dir ./keys

# 2. Later, fully de-anonymize all files
camo --input-dir ./anonymized --deanon --recursive

# 3. Result: All files restored with "clean_" prefix in same directory
#    Example: document1.txt -> clean_document1.txt
```

**Output Behavior:**

- **Without `--output-dir`**: Files saved in same directory with "clean_" prefix
- **With `--output-dir`**: Files saved to output directory maintaining structure, with "clean_" prefix
- **Recursive mode**: Maintains directory structure in output

**Warning System:**

If no JSON key files are detected, the tool will warn and ask for confirmation:

```bash
Warning: No anonymization key .json files detected.
Continue anyway? (y/n): 
```

**Progress Reporting:**

```bash
# Example output during de-anonymization:
Found 3 JSON key file(s) with 45 combined entries
Found 12 input file(s) to process...

De-anonymization complete:
  ✓ Successful: 12
  📁 Processed directory: ./anonymized
  📝 Output files saved with 'clean_' prefix
```

### Term Reversion Feature

CamoText supports selective term reversion, which allows you to revert specific terms from previously anonymized files
back to their original text. This is useful when you need to selectively restore certain information that was previously
anonymized.

### How Term Reversion Works

1. **Directory Scan**: Scans the input directory for files with supported extensions
2. **Hash Detection**: Identifies anonymization hash patterns (`<ENTITY_TYPE_hash>`) in each file
3. **Key File Loading**: Loads key files (either from `--key-dir` or the same directory as input files)
4. **Term Matching**: Matches revert terms against original text in key files (case-insensitive)
5. **Hash Replacement**: Replaces all matching hash patterns with their original text
6. **File Overwrite**: Saves the reverted content back to the original file location

### Key Features

- **Case-insensitive matching**: Terms are matched regardless of case
- **Partial matching**: Terms can match any part of the original text
- **Multiple terms**: Can specify multiple terms to revert
- **Directory processing**: Processes all files in the specified directory
- **Key file integration**: Automatically finds and uses key files to map hashes to original text
- **Overwrite mode**: Reverted files overwrite the original files
- **Parallel processing**: Supports multi-threaded processing for faster execution
- **Recursive processing**: Can process nested directory structures

### Key File Discovery

- **With `--key-dir`**: Looks for `*_key.json` files in the specified key directory
- **Without `--key-dir`**: Looks for `*_key.json` files in the same directory as each input file
- **Multiple keys**: Combines all found key files to build a comprehensive mapping

### Term Reversion Examples

```bash
# Basic reversion (Windows)
camo --input-dir ./anonymized --revert "John Doe" --revert "Acme Corp"

# Basic reversion (macOS/Linux)
./camo --input-dir ./anonymized --revert "John Doe" --revert "Acme Corp"

# Revert with short form
camo --input-dir ./anonymized -r "John Doe" -r "Acme Corp"

# Revert with custom key directory
camo --input-dir ./anonymized --key-dir ./keys --revert "confidential"

# Revert with recursive processing
camo --input-dir ./anonymized --revert "internal" --recursive

# Revert with parallel processing
camo --input-dir ./anonymized --revert "sensitive" --workers 4

# Revert multiple terms in one command
camo --input-dir ./anonymized --revert "John Doe" --revert "Acme Corp" --revert "confidential"

# Revert with specific file types
camo --input-dir ./anonymized --revert "classified" --extensions .txt .pdf
```

### Example Workflow

```bash
# 1. Initial anonymization
camo --input-dir ./documents --output-dir ./anonymized --key-dir ./keys

# 2. Later, revert specific terms
camo --input-dir ./anonymized --key-dir ./keys --revert "John Doe"

# 3. Result: All instances of <PERSON_abc123de> become "John Doe" again
```

### Progress Reporting

```bash
# Example output during reversion:
Found 15 files to process for reversion...
[1/15] ✓ document1.txt (reverted 3 instances of 2 terms)
[2/15] ⚠ document2.txt - No anonymization hashes found
[3/15] ✓ document3.txt (reverted 1 instances of 1 terms)
[4/15] ⚠ document4.txt - No terms matching ['John Doe'] found in key files

Reversion complete:
  ✓ Successful: 12
  ✗ Failed: 0
  ⚠ No hashes found: 2
  ⚠ No matching terms: 1
  📁 Processed directory: ./anonymized
```

### Use Cases

- **Redact all PII, then selectively restore specific names or organizations for reporting**
- **Batch anonymize large datasets, then revert only certain terms for compliance**
- **De-anonymize documents using key files**
- **Automate anonymization and reversion in CI/CD pipelines**

### Processing Modes

The CLI supports three processing modes:

#### 1. Single File Processing

Process individual files or raw text strings:

```bash
# File processing
camo --input C:\path\to\document.pdf                    # Windows
./camo --input /path/to/document.pdf                      # macOS/Linux

# Raw text processing
camo --input "This is raw text to anonymize"
./camo --input "This is raw text to anonymize"
```

#### 2. Batch Directory Processing

Process entire directories with built-in parallelization:

```bash
# Basic directory processing
camo --input-dir ./source --output-dir ./processed      # Windows
./camo --input-dir ./source --output-dir ./processed    # macOS/Linux

# Advanced batch processing (Windows)
camo --input-dir ./data --output-dir ./anonymized ^
             --key-dir ./keys --recursive --workers 4 ^
             --extensions .txt .pdf .docx

# Advanced batch processing (macOS/Linux)
./camo --input-dir ./data --output-dir ./anonymized \
           --key-dir ./keys --recursive --workers 4 \
           --extensions .txt .pdf .docx
```

#### 3. Entity Analysis Mode

Analyze entity types without anonymization:

```bash
# Single file analysis
camo --input document.txt --list-entities               # Windows
./camo --input document.txt --list-entities             # macOS/Linux

# Batch entity analysis
camo --input-dir ./documents --list-entities            # Windows
./camo --input-dir ./documents --list-entities          # macOS/Linux
```

### Output Handling

#### Single File Output

##### Standard Output (Default)

When `--output` is not specified, anonymized text is printed to STDOUT:

```bash
camo --input document.txt
./camo --input document.txt
```

##### File Output

```bash
camo --input document.txt --output anonymized.txt
./camo --input document.txt --output anonymized.txt
```

##### Key File Output

```bash
camo --input document.txt --output anonymized.txt --dump-key key.json  # Windows
./camo --input document.txt --output anonymized.txt --dump-key key.json    # macOS/Linux

# Use default key naming (no value provided)
camo --input document.txt --output anonymized.txt --dump-key
# Result: Creates anonymized_key.json in the same directory as output

# Batch processing with centralized key directory
camo --input-dir ./docs --output-dir ./anonymized --key-dir ./keys --dump-key batch_key.json

# Batch processing with automatic individual key files (no --key-dir specified)
camo --input-dir ./docs --output-dir ./anonymized --dump-key keys.json
# Result: Creates individual key files like document1_key.json, document2_key.json, etc. in ./anonymized/

# Batch processing with default key directory (no value for --key-dir)
camo --input-dir ./docs --output-dir ./anonymized --key-dir --dump-key
# Result: Keys saved to ./anonymized/ directory with default naming

# Batch processing with explicit key directory (overrides automatic behavior)
camo --input-dir ./docs --output-dir ./anonymized --key-dir ./audit --dump-key keys.json
# Result: Keys saved to ./audit/ directory as specified by --key-dir
```

**Key File Behavior:**

- **Single file mode**: `--dump-key` saves to the specified path
- **Single file mode with no value**: `--dump-key` saves to output directory with name `{output_filename}_key.json`
- **Batch mode with `--key-dir`**: Keys saved to the specified key directory
- **Batch mode with `--key-dir` (no value)**: Keys saved to output directory
- **Batch mode without `--key-dir`**: Creates individual key files for each processed file in the output directory
- **Key file naming**: Individual key files use the pattern `{original_filename}_key.json`
- **Default naming**: When no filename is specified, uses `{document_name}_key.json` format

**Key file format (key.json):**

```json
{
  "John Doe": "<PERSON_a1b2c3d4>",
  "john@acme.com": "<EMAIL_ADDRESS_e5f6g7h8>",
  "Acme Corp": "<ORGANIZATION_i9j0k1l2>"
}
```

### Integration Examples

#### CI/CD Pipeline

```yaml
# GitHub Actions example
- name: Anonymize documents
  run: |
    camo --input sensitive_report.pdf \
         --output public_report.pdf \
         --dump-key audit_key.json \
         --ignore-category "organization"
```

#### Shell Script Integration

```bash
#!/bin/bash
# batch_anonymize.sh - Using native batch processing

INPUT_DIR="./sensitive_docs"
OUTPUT_DIR="./anonymized_docs"
KEY_DIR="./keys"

# Single command for entire directory (assuming CamoText executable in PATH)
camo \
    --input-dir "$INPUT_DIR" \
    --output-dir "$OUTPUT_DIR" \
    --key-dir "$KEY_DIR" \
    --recursive \
    --workers 4 \
    --hash-length 10 \
    --ignore-category "organization" \
    --ignore-category "location"
```

### Performance Considerations

#### File Size Limits

CamoText enforces a maximum file size and page count for processing. See the main documentation for details.

### Supported Entity Types

The entity types detected and anonymized by the system can be found in the CamoText User Guide

### AI Agent Integration

CamoText is well-suited for AI agents and local bots due to its executable-based CLI design.

#### Excellent AI Agent Compatibility

##### ✅ **Perfect CLI Interface**

- **No GUI dependencies** - Runs headless in any environment
- **Structured input/output** - Predictable command syntax and responses
- **Standard exit codes** - Proper success/failure signaling
- **JSON output** - Machine-readable entity lists and anonymization keys

##### ✅ **Zero-Setup Deployment**

```bash
# AI agent can simply invoke the executable
./camo --input "sensitive data" --output result.txt --dump-key key.json
```

- **Self-contained executable** - No Python environment or dependency management
- **Cross-platform** - Same interface on Windows, macOS, Linux
- **Immediate availability** - Download and run, no installation steps

##### ✅ **Batch Processing Power**

```bash
# Process entire directories with parallel workers
./camo --input-dir ./data --output-dir ./anonymized --workers 8 --key-dir ./keys
```

- **Native parallelization** - Built-in multi-threading for large datasets
- **Progress reporting** - Real-time status updates for monitoring
- **Error resilience** - Continues processing even if individual files fail

##### ✅ **Flexible Integration Patterns**

**1. Direct subprocess calls:**

```python
import subprocess
import json

result = subprocess.run(['./camo', '--input', text, '--list-entities'],
                       capture_output=True, text=True)
entities = json.loads(result.stdout)
```

**2. File-based workflows:**

```python
# Agent writes files, processes them, reads results
with open('temp_input.txt', 'w') as f:
    f.write(sensitive_data)

subprocess.run(['./camo', '--input', 'temp_input.txt',
                '--output', 'temp_output.txt', '--dump-key', 'temp_key.json'])

with open('temp_output.txt') as f:
    anonymized = f.read()
with open('temp_key.json') as f:
    key = json.load(f)
```

#### Sample AI Agent Integration

```python
class CamoTextAgent:
    def __init__(self, executable_path='camo'):
        self.executable = executable_path

    def anonymize(self, text, hash_length=8, ignore_categories=None):
        """Anonymize text and return both result and key mapping."""
        import tempfile
        import json

        with tempfile.NamedTemporaryFile(mode='w', suffix='.txt', delete=False) as f:
            f.write(text)
            input_file = f.name

        try:
            output_file = input_file + '.anon'
            key_file = input_file + '.key'

            cmd = [self.executable, '--input', input_file,
                   '--output', output_file, '--dump-key', key_file,
                   '--hash-length', str(hash_length)]

            if ignore_categories:
                for category in ignore_categories:
                    cmd.extend(['--ignore-category', category])

            result = subprocess.run(cmd, capture_output=True, text=True)

            if result.returncode == 0:
                with open(output_file, 'r') as f:
                    anonymized_text = f.read()
                with open(key_file, 'r') as f:
                    key_mapping = json.load(f)
                return anonymized_text, key_mapping
            else:
                raise Exception(f"CamoText error: {result.stderr}")
        finally:
            # Cleanup temporary files
            for file_path in [input_file, output_file, key_file]:
                try:
                    os.unlink(file_path)
                except:
                    pass

    def batch_process(self, input_dir, output_dir, workers=4, ignore_categories=None):
        """Process directory of files with parallel workers."""
        cmd = [self.executable, '--input-dir', input_dir,
               '--output-dir', output_dir, '--workers', str(workers)]

        if ignore_categories:
            for category in ignore_categories:
                cmd.extend(['--ignore-category', category])

        result = subprocess.run(cmd, capture_output=True, text=True)
        return result.returncode == 0, result.stdout, result.stderr

    def detect_entities(self, text):
        """Get list of entity types without anonymization."""
        cmd = [self.executable, '--input', text, '--list-entities']
        result = subprocess.run(cmd, capture_output=True, text=True)

        if result.returncode == 0:
            return json.loads(result.stdout)
        else:
            raise Exception(f"Entity detection error: {result.stderr}")

# Usage example
agent = CamoTextAgent(executable_path='camo.exe')  # Windows
anonymized, key_map = agent.anonymize("John Doe works at Acme Corp", ignore_categories=['organization'])
entities = agent.detect_entities("Analyze this sensitive document")
batch_success, output, errors = agent.batch_process('./docs', './processed',
                                                   ignore_categories=['organization', 'location'])
```

## License and Support

CamoText CLI is subject to the CamoText EULA. For technical support or feature requests, please refer to the main
documentation or contact support channels.

---

### Configuration File

The `--config` argument allows you to specify a JSON configuration file that contains anonymization settings. This is
useful for maintaining consistent settings across multiple runs, including custom priority tags. The configuration file should have the following
format:

```json
{
  "priority": ["John Doe", "Acme Corp", "123-456-7890"],
  "priority_tags": {
    "John Doe": "PERSON",
    "Acme Corp": "FIRM"
  },
  "hash_length": 12,
  "ignore_category": ["PERSON", "EMAIL_ADDRESS"],
  "redact": true,
  "exclusions": ["confidential", "internal use only", "proprietary"]
}
```

All fields in the configuration file are optional. If a field is not specified, the default value or command-line
argument will be used instead.

#### Configuration Fields

- **`priority`** or **`priorities`**: Array of strings that should be anonymized with priority (both formats supported for backwards compatibility)
- **`priority_tags`**: Dictionary mapping priority text to custom category tags (e.g., `{"John Doe": "PERSON", "Acme Corp": "FIRM"}`). Custom tags must be letters only, 1-16 characters, and will be auto-converted to uppercase. If not specified, priorities default to "PRIORITY" tag.
- **`hash_length`**: Integer length of anonymization hashes (default: 8)
- **`ignore_category`**: Array of entity categories to ignore (revert after anonymization)
- **`redact`**: Boolean to replace hash tags with "[REDACTED]" (default: false)
- **`exclusions`**: Array of text strings to exclude from anonymization (preserve in output)

#### Configuration File Examples

1. **Basic Configuration**

```json
{
  "priority": ["confidential", "top secret"],
  "priority_tags": {
    "confidential": "CLASSIFIED",
    "top secret": "SECRET"
  },
  "hash_length": 10
}
```

Usage:

```bash
camo --config config.json --input document.txt
```

2. **Complex Configuration**

```json
{
  "priority": [
    "Project Aurora",
    "Operation Blackbird",
    "classified information"
  ],
  "priority_tags": {
    "Project Aurora": "PROJECT",
    "Operation Blackbird": "OPERATION"
  },
  "hash_length": 12,
  "ignore_category": ["ORGANIZATION", "LOCATION", "DATE_TIME"],
  "redact": false,
  "exclusions": ["public domain", "open source", "general knowledge"]
}
```

Usage:

```bash
camo --config config.json --input-dir ./documents --output-dir ./anonymized
```

3. **Minimal Configuration**

```json
{
  "ignore_category": ["PERSON"]
}
```

Usage:

```bash
camo --config config.json --input document.txt --output anonymized.txt
```

#### Configuration File Benefits

1. **Consistency**: Maintain the same anonymization rules across multiple runs
2. **Reusability**: Share configuration files across team members
3. **Version Control**: Track anonymization settings in source control
4. **Reduced Command Length**: Avoid typing long command lines repeatedly

#### Configuration File Location

The configuration file can be placed anywhere on your system. Common locations include:

- Project root directory
- Configuration directory
- User's home directory

Example paths:

```bash
# Windows
camo --config C:\config\anonymization.json --input document.txt
camo --config .\config.json --input document.txt

# macOS/Linux
./camo --config /etc/camotext/config.json --input document.txt
./camo --config ./config.json --input document.txt
```

#### Configuration File Override

Command-line arguments take precedence over configuration file settings. This allows you to:

1. Use a base configuration file for common settings
2. Override specific settings via command line when needed

Example:

```bash
# config.json has hash_length: 8, but command line overrides it to 12
camo --config config.json --hash-length 12 --input document.txt
```

### Examples

Basic file anonymization:

```bash
# Using long form arguments
camo --input document.pdf --output anonymized.txt

# Using short form arguments
camo -i document.pdf -o anonymized.txt

# Using configuration file
camo --config config.json --input document.txt
```

Anonymization with priority text:

```bash
# Using long form arguments
camo --input file.txt --priority "John Doe" --priority "Acme Corp"

# Using short form arguments
camo -i file.txt -p "John Doe" -p "Acme Corp"

# Using configuration file
camo --config config.json --input file.txt
```

Batch processing:

```bash
# Basic batch processing
camo --input-dir ./docs --output-dir ./output --recursive

# Batch processing with configuration
camo --config config.json --input-dir ./docs --output-dir ./output
```

List entities without anonymization:

```bash
camo --input file.txt --list-entities
```

Ignore specific categories:

```bash
# Using command line arguments
camo --input file.txt --ignore-category PERSON --ignore-category EMAIL_ADDRESS

# Using configuration file
camo --config config.json --input file.txt
```

Supported file types: `.txt`, `.pdf`, `.docx`, `.xlsx`, `.csv`, `.rtf`