Everything you need
to get started.
From installation to advanced configuration — this guide covers every feature in Kalam. New here? Start with the Quick Start below.
Quick Start
Get from zero to dictating in under a minute. Kalam runs on Windows, macOS, and Linux.
Download & install
Grab the installer for your OS from the download page. No account or sign-up required.
Grant permissions
Kalam needs microphone access to hear you and accessibility / input permissions to type into other apps. The onboarding wizard walks you through each one.
Hold, speak, release
Press and hold your dictation hotkey (default: Ctrl + Win on Windows, Ctrl + Super on Linux/macOS), speak naturally, then release. Your words appear wherever your cursor is — any app, any text field.
Installation
Platform-specific guides for installing Kalam. All installers are available on the download page.
Windows 10+
Download the .exe (NSIS) installer from the download page. Run the installer and follow the prompts.
Permissions needed
- Microphone — Windows Settings → Privacy → Microphone. Ensure "Allow apps to access your microphone" is on.
- Accessibility — No extra setup on Windows. Kalam uses standard input APIs.
macOS 11+
Download the .pkg or .dmg file. For .pkg, double-click to install. For .dmg, drag Kalam to Applications.
Permissions needed
- Microphone — System Settings → Privacy & Security → Microphone. Toggle Kalam on.
- Accessibility — System Settings → Privacy & Security → Accessibility. Add Kalam to the list.
- Input Monitoring — System Settings → Privacy & Security → Input Monitoring. This allows Kalam to detect your global hotkey.
Linux (Ubuntu, Debian, and others)
Download the .AppImage or .deb package.
# AppImage
chmod +x Kalam*.AppImage
./Kalam*.AppImage
# Debian / Ubuntu
sudo dpkg -i kalam*.deb
Terminal
Permissions needed
- Microphone — Usually granted automatically. Check PulseAudio/PipeWire settings if audio isn't detected.
- Accessibility — On X11, Kalam uses
xdotool. On Wayland, some compositors may require XDG portal configuration.
Your First Dictation
Once installed, here's what a typical dictation flow looks like.
- Click into any text field in any application — a browser, Word, Slack, VS Code, anything.
- Press and hold your dictation hotkey (default: Ctrl + Win on Windows).
- A small floating pill appears on screen with a waveform animation — this means Kalam is listening.
- Speak naturally. Kalam captures your audio in real time.
- Release the key. Your transcribed text is typed into the active field.
Speech-to-Text Modes
Kalam supports four transcription modes. Choose the one that fits your workflow and privacy needs.
Local (Offline)
Uses a Whisper model running on your machine. Audio never leaves your device. No internet required.
Cloud
Sends audio to your configured provider (Groq or OpenAI) for faster, more accurate transcription. Requires an API key.
Hybrid
Uses cloud by default but switches to local processing when a sensitive app is detected. Similar to Auto with broader sensitivity coverage.
Auto
Uses cloud normally but automatically switches to local when a sensitive app is detected (banking, password managers, etc.).
Choosing the right mode
| Priority | Recommended Mode | Why |
|---|---|---|
| Maximum privacy | Local | Audio never leaves your device |
| Speed & accuracy | Cloud | Near-instant results with Groq |
| Privacy + speed | Auto | Cloud speed with automatic local fallback for sensitive apps |
| Balanced | Hybrid | Cloud by default, local for sensitive apps |
Hotkeys & Controls
Kalam supports two dictation modes and a customizable hotkey. Configure everything in Settings → General.
Dictation modes
| Mode | How it works | Best for |
|---|---|---|
| Hold-to-dictate | Press and hold your hotkey. Speak. Release to stop and insert text. | Quick phrases, short messages |
| Toggle | Press once to start dictating, press again to stop. | Longer dictation sessions, hands-free use |
| Both | Registers both a hold hotkey and a separate toggle hotkey. Use either depending on the situation. | Flexibility — the default setting |
Default hotkeys
| Action | Default Key | Notes |
|---|---|---|
| Dictate (hold) | Ctrl + Win | Hold to record, release to transcribe. On macOS/Linux: Ctrl + Super |
| Dictate (toggle) | Not set by default | Assign a separate toggle key in Settings → General |
| Command mode | Not set by default | Enable and assign a key in Settings → Command Mode |
| Language toggle | Not set by default | Switch between configured languages mid-session |
Overview Dashboard
The Overview is your home screen — a snapshot of your dictation activity and productivity stats.
The dashboard shows:
- 7-day word chart — daily dictation volume with estimated time saved vs. typing at 40 WPM.
- Total words — lifetime word count across all dictations.
- Time saved — estimated hours saved by dictating instead of typing.
- Top destinations — which apps you dictate into most often.
- Recent dictations — quick access to your latest transcriptions.
Dictation History
Every dictation is logged with its transcription, timestamp, target app, and metadata. Access it from the sidebar.
Each history entry includes:
- Full transcription — the complete text that was dictated.
- Target app — which application received the text, with app icon and display name.
- Metadata — word count, audio length, response latency, recognition mode (cloud/local), and language.
- Sensitive app toggle — mark the target app as sensitive directly from the history detail view.
Use the search bar to find past dictations by content. Sort by newest or oldest. To clear all history, use the Clear all button (with confirmation).
Workspace
Beyond the Overview and History, Kalam includes four voice-powered productivity tools. Access them from the sidebar.
Notes
Color-coded cards with labels, pins, and rich text editing. Great for capturing ideas by voice. Supports archive and trash.
Tasks
Task management with open/closed states, priority levels (Low, Medium, High), subtasks, and drag-to-reorder.
Snippets
Reusable text shortcuts. Define a trigger phrase and Kalam expands it into your full text automatically during dictation.
Reminders
Set due dates and get notified. Works with notes and tasks. Supports recurring reminders.
Voice commands
Command mode lets you create workspace items by voice. First, enable it in Settings → Command Mode and assign a hotkey. Then hold your command hotkey and say a command:
| Say this | What happens |
|---|---|
| "New note buy groceries" | Creates a note titled "buy groceries" |
| "New task review PR" | Creates a task titled "review PR" |
| "New reminder call dentist" | Creates a note (set the reminder time in the note editor) |
| "Online search Rust async patterns" | Opens a DuckDuckGo search in your browser |
Command Mode
Use your voice to create notes, tasks, and run web searches — without touching the keyboard. Configure in Settings → Command Mode.
Getting started
- Open Settings → Command Mode.
- Toggle Enable command mode on.
- Assign a command hotkey (e.g. Right Alt).
- Hold the command hotkey, speak your command, release.
Basic commands
Without LLM enabled, commands must start with a specific prefix:
| Say this | What happens |
|---|---|
| "New note [content]" | Creates a note with the spoken content as the title |
| "New task [content]" | Creates a task with the spoken content as the title |
| "New reminder [content]" | Creates a note (set the reminder time in the editor) |
| "Online search [query]" | Opens a DuckDuckGo search in your default browser |
LLM-powered commands (optional)
For natural language parsing, enable the LLM option in Settings → Command Mode and add an API key for one of the supported providers:
- Groq, OpenAI, Anthropic, OpenRouter, or Google Gemini
With LLM enabled, you can speak naturally without fixed prefixes — Kalam infers the entry type and extracts fields automatically. For example, "remind me to call the dentist tomorrow at 3pm" creates a note with a reminder set.
Dictionary
Add custom words, names, and technical terms to improve transcription accuracy. Manage your dictionary in Settings → Dictionary.
The dictionary feeds custom vocabulary to the cloud STT provider, helping it recognize words it might otherwise miss or misspell — like proper nouns, brand names, acronyms, or domain-specific jargon.
Adding terms
- Open Settings → Dictionary.
- Click Add term.
- Type the word or phrase exactly as you want it transcribed.
- The term is saved and included in future cloud transcription requests.
You can also edit existing terms inline or delete terms you no longer need.
Privacy Settings
Kalam gives you granular control over your data. Everything is configurable in Settings → Privacy.
Sensitive app detection
Define which applications trigger automatic offline mode. When using Hybrid or Auto STT mode and you focus a sensitive app (like a banking site or password manager), Kalam switches to local processing — your audio never leaves your device.
You can add apps from:
- Currently running processes — pick from what's open right now
- Installed applications — browse your installed apps
- Browse for executable — select any .exe / .app file
History retention
Choose how long dictation history is kept: 7 days, 30 days, 90 days, 1 year, or Forever. Entries older than your selected period are automatically removed on startup and after each dictation.
Telemetry
Anonymous usage analytics are opt-in only and disabled by default. No audio, no transcription text, and no personal data is ever included in telemetry. See the full Privacy Policy for details.
API Keys
Cloud and hybrid modes require an API key from a supported provider. Kalam stores keys locally — they never touch our servers.
Supported providers
Groq
Blazing-fast inference. Free tier available. Get your key at console.groq.com.
OpenAI
Industry-standard Whisper API. Get your key at platform.openai.com.
Adding your key
- Open Settings → Audio & Dictation.
- Select your Cloud Provider (Groq or OpenAI).
- Paste your API key in the key field.
- Click Validate to confirm the key works.
- A ✓ Configured badge appears when the key is saved.
Troubleshooting
Common issues and how to fix them. Can't find your answer? Open an issue on GitHub.
Microphone not detected
Ensure Kalam has microphone permission in your OS settings:
- Windows: Settings → Privacy → Microphone → ensure "Allow apps to access your microphone" is on.
- macOS: System Settings → Privacy & Security → Microphone → toggle Kalam on.
- Linux: Check PulseAudio / PipeWire settings. Run
pavucontroland verify the input device.
You can also change your input device in Settings → Audio & Dictation or from the status bar microphone selector.
Text not appearing in target app
Kalam needs accessibility permissions to type into other apps:
- macOS: System Settings → Privacy & Security → Accessibility → add Kalam. Also check Input Monitoring.
- Linux (Wayland): Some compositors may require XDG portal configuration. X11 uses
xdotoolwhich works out of the box. - Windows: If running as a standard user and the target app is elevated (admin), Kalam may not be able to inject text. Run Kalam as administrator in that case.
Offline mode is slow
Local Whisper model performance depends on your hardware (CPU and available RAM). To improve speed:
- Try a smaller model size in Settings → Audio & Dictation → Local Model.
- Close resource-heavy applications to free up RAM.
- Switch to Cloud mode with a Groq API key for near-instant transcription.
Dictation cuts off or misses words
This can happen with quiet microphones or high background noise:
- Enable the audio filter in Settings → Audio & Dictation. The "Light" preset applies peak normalization and a noise gate.
- Use the Test Microphone feature to record a sample and play it back — check if your voice is coming through clearly.
- Try a different microphone or move closer to your current one.
App won't launch / crashes on startup
- Make sure you're running a supported OS version (Windows 10+, macOS 11+, or a recent Linux distro).
- Try deleting the app data folder and relaunching:
- Windows:
%USERPROFILE%\.kalam - macOS:
~/.kalam - Linux:
~/.kalamand~/.local/share/kalam
- Windows:
- Check Settings → About → Logs for error details, or export logs and attach them to a GitHub issue.