# Voxtype

[![Voxtype - Voice to Text for Linux](website/images/og-preview.png)](https://voxtype.io)

**[voxtype.io](https://voxtype.io)**

Voice-to-text for Linux. 9-11× realtime on your CPU. Local by default.

Hold a hotkey (default: ScrollLock) while speaking, release to transcribe and output the text at your cursor position. Voxtype runs Cohere Transcribe (#1 on the Open ASR Leaderboard) faster than realtime on a plain Zen 4 CPU. Parakeet, Whisper, and five more engines if you want them. No cloud, no subscription, no telemetry.

## Features

### Speed and engines

- **Cohere Transcribe at 9-11× realtime — on your CPU.** Quantized to 1.5 GB (q4f16). Punctuation, capitalization, and inverse text normalization out of the box. Sits at #1 on the Open ASR Leaderboard. *(New in 0.7.0)*
- **Parakeet on AMD and NVIDIA GPUs.** MIGraphX 7.2 for Radeon, separate CUDA 12 and CUDA 13 binaries for every NVIDIA driver generation, Vulkan for Whisper across vendors. *(MIGraphX new in 0.7.0)*
- **Text processing built in.** Spoken punctuation (`"comma"` → `,`), per-user replacement tables for common mistranscriptions, and an optional post-processing pipe through any LLM or shell script. Fix domain terms, drop filler words, polish grammar — all without leaving voxtype.
- **Dynamic per-engine model loading.** Configure all 7 engines, pay memory only for the active one. Models load on first use and unload when idle.
- **Seven transcription engines.** Whisper, Parakeet, Moonshine, SenseVoice, Paraformer, Dolphin, Omnilingual. Switch with `voxtype configure` or one config line. CJK and 1600+ languages covered by the multilingual engines.
- **Meeting mode.** Continuous transcription with chunked processing, speaker attribution, and export to Markdown, JSON, SRT, or VTT.

### Native Linux integration

- **Hyprland, Niri, Sway, River, GNOME, KDE.** Compositor keybindings everywhere, evdev fallback for X11, Wayland-first typing via wtype with full CJK support. Falls back through dotool → ydotool → clipboard if any layer is unavailable.
- **Pauses your music.** Auto-pauses Spotify, Plasma media players, anything that speaks MPRIS the moment you start dictating. Resumes on release.
- **Floating waveform OSD.** Matches your swayosd band by default — same vertical position as volume and brightness — so the level meter sits where you already look for system feedback.
- **Interactive TUI configure.** `voxtype configure` (also surfaces in Walker / fuzzel / rofi) edits every option in `~/.config/voxtype/config.toml` for you — no hand-editing TOML. Auto-downloads missing models, swaps GPU binaries via pkexec, restarts the daemon when needed.
- **Push-to-talk or toggle.** Hold to record, or press once to start/stop. Optional audio cues when recording starts/stops.

### Trust

- **Local by default. No cloud. No subscription. No telemetry.** Optional remote Whisper servers when you want them. Your audio stays on your machine until you choose otherwise.
- **MIT licensed.** AUR (`voxtype`, `voxtype-bin`, plus `voxtype-bin-rc` for testers who want pre-release builds — see [docs/INSTALL.md](docs/INSTALL.md#arch-linux)), `.deb`, `.rpm`, Homebrew on macOS. Signed release binaries from a reproducible Docker pipeline.

## Quick Start

Most users should install a [pre-built package](docs/INSTALL.md). The steps below are for building from source.

```bash
# 1. Install build dependencies
# Fedora:
sudo dnf install rust cargo alsa-lib-devel clang-devel cmake pkgconf
# Arch:
sudo pacman -S rustup alsa-lib clang cmake pkgconf
# Debian/Ubuntu:
sudo apt install cargo libasound2-dev libclang-dev cmake pkg-config

# 2. Build
cargo build --release

# 3. Install typing backend (Wayland)
# Fedora:
sudo dnf install wtype
# Arch:
sudo pacman -S wtype
# Ubuntu:
sudo apt install wtype

# 4. Download whisper model
./target/release/voxtype setup --download

# 5. Add keybinding to your compositor
# See "Compositor Keybindings" section below

# 6. Run
./target/release/voxtype
```

For the full per-distro dependency matrix (including GPU backends), see [docs/INSTALL.md](docs/INSTALL.md#build-dependencies-source-builds-only).

### Compositor Keybindings

Voxtype works best with your compositor's native keybindings. Add these to your compositor config.

> **Not sure which compositor you have?** Run `echo $XDG_CURRENT_DESKTOP` in a terminal. Common values: `Hyprland`, `sway`, `river`, `KDE`, `GNOME`.

**Hyprland** (`~/.config/hypr/hyprland.conf`):
```
bind = SUPER, V, exec, voxtype record start
bindr = SUPER, V, exec, voxtype record stop
```

**Sway** (`~/.config/sway/config`):
```
bindsym --no-repeat $mod+v exec voxtype record start
bindsym --release $mod+v exec voxtype record stop
```

**River** (`~/.config/river/init`):
```bash
riverctl map normal Super V spawn 'voxtype record start'
riverctl map -release normal Super V spawn 'voxtype record stop'
```

**KDE Plasma (KWin):**

KDE does not support key-release events, so use toggle mode. Open **System Settings > Shortcuts > Custom Shortcuts**, create a new shortcut, and set the command to:
```
voxtype record toggle
```

Assign your preferred key combination (e.g., Meta+V). Since KDE handles the keybinding, the built-in hotkey should be disabled (see below).

Then disable the built-in hotkey in your config:
```toml
# ~/.config/voxtype/config.toml
[hotkey]
enabled = false
```

> **X11 / Built-in hotkey fallback:** If you're on X11 or prefer voxtype's built-in hotkey (ScrollLock by default), add yourself to the `input` group: `sudo usermod -aG input $USER` and log out/in. See the [User Manual](docs/USER_MANUAL.md) for details.

> **Omarchy / Multi-modifier keybindings:** If using keybindings with multiple modifiers (e.g., `SUPER+CTRL+X`), releasing keys slowly can cause typed text to trigger window manager shortcuts instead of inserting text. See [Modifier Key Interference](docs/TROUBLESHOOTING.md#modifier-key-interference-hyprlandsway) in the troubleshooting guide for the solution using output hooks and Hyprland submaps.

## Usage

1. Run `voxtype` (it runs as a foreground daemon)
2. Hold **ScrollLock** (or your configured hotkey)
3. Speak
4. Release the key
5. Text appears at your cursor (or in clipboard if typing isn't available)

Press Ctrl+C to stop the daemon.

### Toggle Mode

If you prefer to press once to start recording and again to stop (instead of holding):

```bash
# Via command line
voxtype --toggle

# Or in config.toml
[hotkey]
key = "SCROLLLOCK"
mode = "toggle"
```

### Meeting Mode

For longer recordings like meetings and interviews, meeting mode provides continuous transcription with automatic chunking, speaker attribution, and export.

```bash
# Start a meeting
voxtype meeting start --title "Weekly standup"

# Check status
voxtype meeting status

# Stop and export
voxtype meeting stop
voxtype meeting export latest --format markdown --speakers --timestamps
```

Meetings are stored locally and can be exported to Markdown, plain text, JSON, SRT, or VTT. Use `voxtype meeting list` to see past meetings, and `voxtype meeting summarize latest` to generate an AI summary via Ollama.

## Configuration

Config file location: `~/.config/voxtype/config.toml`

See [`config/default.toml`](config/default.toml) for the full annotated default configuration.

```toml
# State file for Waybar/polybar integration (enabled by default)
state_file = "auto"  # Or custom path, or "disabled" to turn off

[hotkey]
key = "SCROLLLOCK"  # Or: PAUSE, F13-F24, RIGHTALT, etc.
modifiers = []      # Optional: ["LEFTCTRL", "LEFTALT"]
# mode = "toggle"   # Uncomment for toggle mode (press to start/stop)

[audio]
device = "default"  # Or specific device from `pactl list sources short`
sample_rate = 16000
max_duration_secs = 60

# Audio feedback (sound cues when recording starts/stops)
# [audio.feedback]
# enabled = true
# theme = "default"   # "default", "subtle", "mechanical", or path to custom dir
# volume = 0.7        # 0.0 to 1.0

[whisper]
model = "base.en"   # tiny, base, small, medium, large-v3, large-v3-turbo
language = "en"     # Or "auto" for detection, or language code (es, fr, de, etc.)
translate = false   # Translate non-English speech to English
# threads = 4       # CPU threads for inference (omit for auto-detect)
# on_demand_loading = true  # Load model only when recording (saves memory)

[output]
mode = "type"       # "type", "clipboard", or "paste"
fallback_to_clipboard = true
type_delay_ms = 0   # Increase if characters are dropped
# auto_submit = true  # Send Enter after transcription (for chat apps, terminals)
# Note: "paste" mode copies to clipboard then simulates Ctrl+V
#       Useful for non-US keyboard layouts where ydotool typing fails

[output.notification]
on_recording_start = false  # Notify when PTT activates
on_recording_stop = false   # Notify when transcribing
on_transcription = true     # Show transcribed text

# Text processing (word replacements, spoken punctuation)
# [text]
# spoken_punctuation = true  # Say "period" → ".", "open paren" → "("
# replacements = { "vox type" = "voxtype", "oh marky" = "Omarchy" }
```

### Audio Feedback

Enable audio feedback to hear a sound when recording starts and stops:

```toml
[audio.feedback]
enabled = true
theme = "default"  # Built-in themes: default, subtle, mechanical
volume = 0.7       # 0.0 to 1.0
```

**Built-in themes:**
- `default` - Clear, pleasant two-tone beeps
- `subtle` - Quiet, unobtrusive clicks
- `mechanical` - Typewriter/keyboard-like sounds

**Custom themes:** Point `theme` to a directory containing `start.wav`, `stop.wav`, and `error.wav` files.

### Text Processing

Voxtype can post-process transcribed text with word replacements and spoken punctuation.

**Word replacements** fix commonly misheard words:

```toml
[text]
replacements = { "vox type" = "voxtype", "oh marky" = "Omarchy" }
```

**Spoken punctuation** (opt-in) converts spoken words to symbols - useful for developers:

```toml
[text]
spoken_punctuation = true
```

With this enabled, saying "function open paren close paren" outputs `function()`. Supports period, comma, brackets, braces, newlines, and many more. See [CONFIGURATION.md](docs/CONFIGURATION.md#text) for the full list.

### Post-Processing Command (Advanced)

For advanced cleanup, you can pipe transcriptions through an external command
like a local LLM for grammar correction, filler word removal, or text formatting:

```toml
[output.post_process]
command = "ollama run llama3.2:1b 'Clean up this dictation. Fix grammar, remove filler words:'"
timeout_ms = 30000  # 30 second timeout for LLM
```

The command receives text on stdin and outputs cleaned text on stdout. On any
failure (timeout, error), Voxtype gracefully falls back to the original transcription.

See [CONFIGURATION.md](docs/CONFIGURATION.md#outputpost_process) for more examples including scripts for LM Studio, Ollama, and llama.cpp.

## CLI Options

```
voxtype [OPTIONS] [COMMAND]

Commands:
  daemon      Run as background daemon (default)
  transcribe  Transcribe an audio file
  setup       Setup and installation utilities
  config      Show current configuration
  status      Show daemon status (for Waybar/polybar integration)
  record      Control recording from external sources (compositor keybindings, scripts)
  meeting     Meeting transcription (start, stop, export, summarize)

Setup subcommands:
  voxtype setup              Run basic dependency checks (default)
  voxtype setup --download   Download the configured Whisper model
  voxtype setup systemd      Install/manage systemd user service
  voxtype setup waybar       Generate Waybar module configuration
  voxtype setup model        Interactive model selection and download
  voxtype setup gpu          Manage GPU acceleration (switch CPU/Vulkan)
  voxtype setup onnx         Switch between Whisper and ONNX engines

Status options:
  voxtype status --format json       Output as JSON (for Waybar)
  voxtype status --follow            Continuously output on state changes
  voxtype status --extended          Include model, device, backend in JSON
  voxtype status --icon-theme THEME  Icon theme (emoji, nerd-font, material, etc.)

Record subcommands (for compositor keybindings):
  voxtype record start                     Start recording (send SIGUSR1 to daemon)
  voxtype record start --output-file PATH  Write transcription to a file
  voxtype record stop                      Stop recording and transcribe (send SIGUSR2 to daemon)
  voxtype record toggle                    Toggle recording state

Options:
  -c, --config <FILE>    Path to config file
  -v, --verbose          Increase verbosity (-v, -vv)
  -q, --quiet            Quiet mode (errors only)
  --clipboard            Force clipboard mode
  --paste                Force paste mode (clipboard + Ctrl+V)
  --model <MODEL>        Override transcription model
  --engine <ENGINE>      Override transcription engine (whisper, parakeet, moonshine, sensevoice, paraformer, dolphin, omnilingual)
  --hotkey <KEY>         Override hotkey
  --toggle               Use toggle mode (press to start/stop)
  -h, --help             Print help
  -V, --version          Print version
```

## Whisper Models

| Model | Size | English WER | Speed |
|-------|------|-------------|-------|
| tiny.en | 39 MB | ~10% | Fastest |
| base.en | 142 MB | ~8% | Fast |
| small.en | 466 MB | ~6% | Medium |
| medium.en | 1.5 GB | ~5% | Slow |
| large-v3 | 3 GB | ~4% | Slowest |
| large-v3-turbo | 1.6 GB | ~4% | Fast |

For most uses, `base.en` provides a good balance of speed and accuracy. If you have a GPU, `large-v3-turbo` offers excellent accuracy with fast inference.

### Multilingual Support

The `.en` models are English-only but faster and more accurate for English. For other languages, use `large-v3` which supports 99 languages.

**Use Case 1: Transcribe in the spoken language** (speak French, output French)
```toml
[whisper]
model = "large-v3"
language = "auto"     # Auto-detect and transcribe in that language
translate = false
```

**Use Case 2: Translate to English** (speak French, output English)
```toml
[whisper]
model = "large-v3"
language = "auto"     # Auto-detect the spoken language
translate = true      # Translate output to English
```

**Use Case 3: Force a specific language** (always transcribe as Spanish)
```toml
[whisper]
model = "large-v3"
language = "es"       # Force Spanish transcription
translate = false
```

With GPU acceleration, `large-v3` achieves sub-second inference while supporting all languages.

## Supported Engines

Voxtype ships separate binaries for Whisper and ONNX engines. Use `voxtype setup onnx --enable` to switch to the ONNX binary, or `--disable` to switch back.

| Engine | Languages | Architecture | Best For |
|--------|-----------|-------------|----------|
| **Whisper** (default) | 99 languages | Encoder-decoder (whisper.cpp) | General use, multilingual |
| **Parakeet** | English | FastConformer TDT (ONNX) | Fast English transcription |
| **Moonshine** | English | Encoder-decoder (ONNX) | Edge devices, low memory |
| **SenseVoice** | zh, en, ja, ko, yue | CTC encoder (ONNX) | Chinese, Japanese, Korean |
| **Paraformer** | zh+en, zh+yue+en | Non-autoregressive (ONNX) | Chinese-English bilingual |
| **Dolphin** | 40 languages + 22 Chinese dialects | CTC E-Branchformer (ONNX) | Eastern languages (no English) |
| **Omnilingual** | 1600+ languages | wav2vec2 CTC (ONNX) | Low-resource and rare languages |

To set the engine in your config:

```toml
engine = "sensevoice"  # or: whisper, parakeet, moonshine, paraformer, dolphin, omnilingual
```

Or override on the command line:

```bash
voxtype --engine sensevoice
```

## GPU Acceleration

Voxtype supports optional GPU acceleration for significantly faster inference. With GPU acceleration, even the `large-v3` model can achieve sub-second inference times.

### Vulkan (AMD, NVIDIA, Intel)

Packages include a Vulkan binary. To enable GPU acceleration:

```bash
# Install Vulkan runtime (if not already installed)
# Arch:
sudo pacman -S vulkan-icd-loader

# Ubuntu/Debian:
sudo apt install libvulkan1

# Fedora:
sudo dnf install vulkan-loader

# Enable GPU acceleration
sudo voxtype setup gpu --enable

# Check status
voxtype setup gpu
```

To switch back to CPU: `sudo voxtype setup gpu --disable`

### Building from Source (CUDA, Metal, ROCm)

For other GPU backends, build from source with the appropriate feature flag:

**CUDA (NVIDIA)**
```bash
# Install CUDA toolkit first, then:
cargo build --release --features gpu-cuda
```

**Metal (macOS/Apple Silicon)**
```bash
cargo build --release --features gpu-metal
```

**HIP/ROCm (AMD alternative)**
```bash
cargo build --release --features gpu-hipblas
```

### Performance Comparison

Results vary by hardware. Example on AMD RX 6800:

| Model | CPU | Vulkan GPU |
|-------|-----|------------|
| base.en | ~7x realtime | ~35x realtime |
| large-v3 | ~1x realtime | ~5x realtime |

## Requirements

### System Requirements

- **Linux** with glibc 2.38+ (Ubuntu 24.04+, Fedora 39+, Arch, Debian Trixie+)
- **Wayland or X11** desktop (GNOME, KDE, Sway, Hyprland, River, i3, etc.)

### Runtime Dependencies

- **PipeWire** or **PulseAudio** (for audio capture)
- **wtype** (for typing output on Wayland) - *recommended, best CJK/Unicode support*
- **dotool** - *for non-US keyboard layouts (German, French, etc.) - supports XKB layouts*
- **ydotool** + daemon - *for X11 or as Wayland fallback*
- **wl-clipboard** (for clipboard fallback on Wayland)

### Permissions

- **Wayland compositors:** No special permissions needed when using compositor keybindings
- **Built-in hotkey / X11:** User must be in the `input` group (for evdev access)

### Installing Dependencies

**Fedora:**
```bash
sudo dnf install wtype wl-clipboard
```

**Ubuntu/Debian:**
```bash
sudo apt install wtype wl-clipboard
```

**Arch:**
```bash
sudo pacman -S wtype wl-clipboard
```

## Building from Source

```bash
# Install Rust if needed
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh

# Install build dependencies
# Fedora:
sudo dnf install alsa-lib-devel

# Ubuntu:
sudo apt install libasound2-dev

# Build (Whisper engine only)
cargo build --release

# Build with ONNX engines (Parakeet, Moonshine, SenseVoice, etc.)
cargo build --release --features parakeet,moonshine,sensevoice,paraformer,dolphin

# Or just the engine you need
cargo build --release --features parakeet

# Binary is at: target/release/voxtype
```

ONNX engines require the corresponding Cargo feature at build time. Without it, setting
`engine = "parakeet"` in your config will fail with an error. The prebuilt release binaries
(`-onnx-avx2`, `-onnx-cuda`, etc.) include all ONNX engines.

## AppImage (Universal)

AppImage works on any Linux distribution without installation:

```bash
# Download the appropriate AppImage from the GitHub release
chmod +x voxtype-*-x86_64.AppImage

# Move to a permanent location
mv voxtype-*-x86_64.AppImage ~/.local/bin/voxtype

# Run setup (downloads model, configures service)
~/.local/bin/voxtype setup
```

Available AppImage variants:
- `voxtype-{ver}-x86_64.AppImage` - Whisper engine with CPU and Vulkan GPU support (recommended)
- `voxtype-{ver}-onnx-x86_64.AppImage` - ONNX engines (Parakeet, Moonshine, etc.) + Vulkan Whisper
- `voxtype-{ver}-onnx-cuda-x86_64.AppImage` - ONNX engines with NVIDIA CUDA + Vulkan Whisper

Each ONNX AppImage also includes the Vulkan Whisper binary, so you can switch between
engines via `engine = "whisper"` or `engine = "parakeet"` in your config without changing
AppImages. For GPU-accelerated Whisper in the Whisper-only AppImage, set `VOXTYPE_GPU=1`.

## Waybar Integration

Add to your Waybar config:

```json
"custom/voxtype": {
    "exec": "voxtype status --follow --format json",
    "return-type": "json",
    "format": "{}",
    "tooltip": true
}
```

The state file is enabled by default (`state_file = "auto"`). If you've disabled it, re-enable it:

```toml
state_file = "auto"
```

### Extended Status Info

Use `--extended` to include model, device, and backend in the JSON output:

```bash
voxtype status --format json --extended
```

Output:
```json
{
  "text": "🎙️",
  "class": "idle",
  "tooltip": "Voxtype ready\nModel: base.en\nDevice: default\nBackend: CPU (AVX-512)",
  "model": "base.en",
  "device": "default",
  "backend": "CPU (AVX-512)"
}
```

Waybar config with model display:
```json
"custom/voxtype": {
    "exec": "voxtype status --follow --format json --extended",
    "return-type": "json",
    "format": "{} [{}]",
    "format-alt": "{model}",
    "tooltip": true
}
```

## Troubleshooting

### "Cannot open input device" error

This only affects the built-in evdev hotkey. You have two options:

**Option 1: Use compositor keybindings (recommended)**
Configure your compositor to call `voxtype record start/stop` and disable the built-in hotkey. See "Compositor Keybindings" above.

**Option 2: Add yourself to the input group**
```bash
sudo usermod -aG input $USER
# Log out and back in
```

### Text not appearing / typing not working

Voxtype uses wtype (preferred), dotool, or ydotool for typing output:

```bash
# Check available typing backends
which wtype dotool ydotool

# For non-US keyboard layouts, install dotool and configure:
# In ~/.config/voxtype/config.toml:
# [output]
# dotool_xkb_layout = "de"  # Your layout (de, fr, es, etc.)

# If using ydotool fallback (X11/TTY), start the daemon:
systemctl --user start ydotool
systemctl --user enable ydotool  # Start on login
```

**KDE Plasma / GNOME users:** wtype does not work on these desktops. Voxtype automatically falls back to dotool (recommended for non-US layouts) or ydotool. See [Troubleshooting](docs/TROUBLESHOOTING.md#wtype-not-working-on-kde-plasma-or-gnome-wayland) for setup instructions.

### No audio captured

Check your default audio input:

```bash
# List audio sources
pactl list sources short

# Test recording
arecord -d 3 -f S16_LE -r 16000 test.wav
aplay test.wav
```

### Text appears slowly

If characters are being dropped, increase the delay:

```toml
[output]
type_delay_ms = 10
```

## Architecture

```mermaid
flowchart LR
    subgraph Input
        Hotkey["Hotkey<br/>(compositor/evdev)"] --> Audio["Audio<br/>(cpal)"]
    end
    subgraph Transcription
        Audio --> Engine{Engine?}
        Engine -->|whisper| WhisperBackend{Backend?}
        Engine -->|onnx| ONNX["ONNX Engine<br/>(Parakeet, Moonshine,<br/>SenseVoice, Paraformer,<br/>Dolphin, Omnilingual)"]
        WhisperBackend -->|local| Whisper["Whisper<br/>(whisper-rs)"]
        WhisperBackend -->|cli| CLI["whisper-cli<br/>(subprocess)"]
        WhisperBackend -->|remote| Remote["Remote Server<br/>(HTTP API)"]
    end
    subgraph Output
        Whisper --> PostProcess["Post-Process<br/>(optional)"]
        CLI --> PostProcess
        Remote --> PostProcess
        ONNX --> PostProcess
        PostProcess --> PreHook["Pre-Output Hook"]
        PreHook --> TextOutput["Output<br/>(wtype/dotool/ydotool)"]
        TextOutput --> PostHook["Post-Output Hook"]
        PreHook -.-> Compositor["Compositor<br/>(submap/mode)"]
        PostHook -.-> Compositor
    end
```

**Multiple transcription engines.** Voxtype supports 7 transcription engines across two runtime backends:
- **Whisper** (default): OpenAI's Whisper model via whisper.cpp. Supports local in-process, CLI subprocess, and remote HTTP backends. 99 languages.
- **ONNX engines** (via ONNX Runtime): Parakeet (English), Moonshine (English), SenseVoice (zh/en/ja/ko/yue), Paraformer (zh+en bilingual), Dolphin (40 languages + Chinese dialects, no English), Omnilingual (1600+ languages). Switch engines with `voxtype setup onnx`.

**Why compositor keybindings?** Wayland compositors like Hyprland, Sway, and River support key-release events, enabling push-to-talk without special permissions. Voxtype's `record start/stop` commands integrate directly with your compositor's keybinding system.

**Fallback: evdev hotkey.** For X11 or compositors without key-release support, voxtype includes a built-in hotkey using evdev (the Linux input subsystem). This requires the user to be in the `input` group.

**Why wtype + dotool + ydotool?** On Wayland, wtype uses the virtual-keyboard protocol for text input, with excellent Unicode/CJK support and no daemon required. When wtype fails (KDE/GNOME), dotool provides keyboard layout support via XKB for non-US layouts. As a final fallback, ydotool uses uinput for text injection on X11/TTY. This combination ensures Voxtype works on any Linux desktop with proper keyboard layout support.

**Post-processing.** Transcriptions can optionally be piped through an external command before output. Use this to integrate local LLMs (Ollama, llama.cpp) for grammar correction, text expansion, or domain-specific vocabulary. Any command that reads stdin and writes stdout works.

## Feedback

We want to hear from you! Voxtype is a young project and your feedback helps make it better.

- **Something not working?** If Voxtype doesn't install cleanly, doesn't work on your system, or is buggy in any way, please [open an issue](https://github.com/peteonrails/voxtype/issues). I actively monitor and respond to issues.
- **Like Voxtype?** I don't accept donations, but if you find it useful:
  - A [GitHub star](https://github.com/peteonrails/voxtype) helps others discover the project
  - Arch users: a vote on the [AUR package](https://aur.archlinux.org/packages/voxtype) helps keep it maintained

## Contributors

- [Peter Jackson](https://github.com/peteonrails) - Creator and maintainer
- [jvantillo](https://github.com/jvantillo) - GPU acceleration patch, whisper-rs 0.15.1 compatibility
- [materemias](https://github.com/materemias) - Paste output mode, on-demand model loading, single-instance safeguard, meeting mode post-processing, PKGBUILD fix
- [Dan Heuckeroth](https://github.com/danheuck) - NixOS Home Manager module design
- [Kevin Miller](https://github.com/digunix) - NixOS module enhancements, ROCm support
- [reisset](https://github.com/reisset) - Testing and feedback on post-processing feature
- [Goodroot](https://github.com/goodroot) - Testing, feedback, and documentation updates
- [robzolkos](https://github.com/robzolkos) - Auto-submit feature for AI agent workflows
- [konnsim](https://github.com/konnsim) - Modifier key interference bug report
- [IgorWarzocha](https://github.com/IgorWarzocha) - Hyprland submap solution for modifier key fix
- [Zubair](https://github.com/mzubair481) - dotool output driver with keyboard layout support
- [ayoahha](https://github.com/ayoahha) - CLI backend for whisper-cli subprocess transcription
- [Loki Coyote](https://github.com/lokkju) - eitype output driver for KDE/GNOME support, media keys and numeric keycode hotkey support
- [Christopher Albert](https://github.com/krystophny) - macOS port foundation, CoreAudio capture, CGEvent output, Homebrew packaging
- [Umesh](https://github.com/radiorambo) - Documentation website
- [Sami Jawhar](https://github.com/sjawhar) - Eager input processing wiring
- [KaiStarkk](https://github.com/KaiStarkk) - Post-process trim and fallback_on_empty options
- [graysky](https://github.com/graysky2) - Flash attention config fix

## License

MIT