Handy: An Open-Source Offline Speech-to-Text Application
Handy is a cross-platform desktop application designed to provide simple, privacy-centric speech transcription. Built with Tauri (combining Rust for the backend and React/TypeScript for the frontend), it enables users to transcribe spoken words directly into text fields across any application using keyboard shortcuts, all processed locally on the user's device. This eliminates the need for cloud services, ensuring data privacy and offline functionality.
Core Functionality
The workflow is straightforward:
- Activation: Press a configurable global keyboard shortcut to start recording (or use push-to-talk mode).
- Recording: Speak while the shortcut is active; silence is filtered using Voice Activity Detection (VAD) powered by Silero.
- Processing: Upon release, Handy transcribes the audio using selected local models.
- Output: The transcribed text is pasted directly into the active text field via clipboard or simulated keystrokes.
Handy supports multiple transcription engines:
- Whisper Models: Options include Small, Medium, Turbo, and Large variants from OpenAI's Whisper, with GPU acceleration on compatible hardware (NVIDIA, AMD, Intel on Windows/Linux; Apple Silicon on macOS).
- Parakeet V3: A CPU-optimized model with excellent performance, automatic language detection, and no GPU requirement, achieving up to 5x real-time speed on mid-range CPUs.
This local processing ensures low latency and compliance with privacy standards, making it ideal for users in sensitive environments or with unreliable internet.
Why Choose Handy?
Developed to address the lack of truly open-source, forkable speech-to-text tools, Handy emphasizes:
- Free and Accessible: No paywalls; designed for broad adoption, including accessibility needs.
- Open Source: Licensed under MIT, encouraging community contributions and custom extensions.
- Privacy-First: All audio stays on-device; no data transmission.
- Simplicity: Focused solely on transcription-to-text pasting, avoiding feature bloat.
- Extensibility: Modular architecture allows developers to integrate or modify components, such as adding new models or integrations.
Unlike proprietary tools like Dragon NaturallySpeaking or cloud-based services (e.g., Google Voice Typing), Handy prioritizes user control and offline reliability. It's not aiming to be the most accurate out-of-the-box but the most adaptable for personal or community customization.
Technical Architecture
Handy's stack includes:
- Frontend: React with TypeScript and Tailwind CSS for the settings interface.
- Backend: Rust for audio I/O (via CPAL), ML inference (whisper-rs and transcription-rs), VAD (vad-rs), and system events (rdev for shortcuts).
- Additional Libraries: Rubato for audio resampling, and support for tools like wtype or dotool on Linux for Wayland compatibility.
It runs on x64 architectures with recommendations for modern hardware: Intel Skylake+ CPUs for Parakeet, GPUs for Whisper. Debug mode (Ctrl/Cmd + Shift + D) aids troubleshooting, including log generation.
Installation and Setup
Download from GitHub Releases or handy.computer. Post-install:
- Grant microphone and accessibility permissions.
- Configure shortcuts in Settings.
- Select models; auto-download or manual install for restricted networks.
Manual model installation involves downloading .bin files for Whisper or .tar.gz for Parakeet into the app data directory (e.g., ~/Library/Application Support/com.pais.handy/models on macOS).
Platform-Specific Notes
- macOS: Full support for Intel and Apple Silicon; upcoming Globe key integration.
- Windows: x64 only; GPU acceleration works well with NVIDIA/AMD.
- Linux: x64; Ubuntu 22.04/24.04 recommended. Wayland has limitations—use clipboard pasting or install wtype/dotool for better compatibility. Overlay disabled by default to avoid focus issues; toggle recording via signals (e.g., SIGUSR2) in window managers like Sway.
Known issues include occasional Whisper crashes on certain configs and incomplete Wayland support, with active development for fixes.
Roadmap and Community
Ongoing work includes debug logging, macOS shortcut enhancements, opt-in analytics (privacy-respecting), and settings refactoring. Contributions are welcomed via pull requests after checking issues. Join the Discord for discussions.
Sponsors like Wordcab and Epicenter support development. Related projects: Handy CLI. Acknowledgments go to OpenAI (Whisper), whisper.cpp/ggml, Silero VAD, and Tauri.
In summary, Handy bridges the gap for offline, open-source dictation, empowering users to tailor it to their needs while fostering a collaborative ecosystem.
