Documentation — Kalam Voice Dictation

Quick Start

Get from zero to dictating in under a minute. Kalam runs on Windows, macOS, and Linux.

Download & install

Grab the installer for your OS from the download page. No account or sign-up required.

Grant permissions

Kalam needs microphone access to hear you and accessibility / input permissions to type into other apps. The onboarding wizard walks you through each one.

Hold, speak, release

Press and hold your dictation hotkey (default: Ctrl + Win on Windows, Ctrl + Super on Linux/macOS), speak naturally, then release. Your words appear wherever your cursor is — any app, any text field.

Screenshot: Onboarding wizard The first-run setup showing permission grants and STT mode selection.

Kalam's onboarding walks you through permissions and engine setup

Tip: Kalam defaults to Cloud mode with Groq for fast transcription — just add your API key in Settings → Audio & Dictation. For fully offline use, switch to Local mode and download a Whisper model. No internet required after that.

Installation

Platform-specific guides for installing Kalam. All installers are available on the download page.

Windows 10+

Download the .exe (NSIS) installer from the download page. Run the installer and follow the prompts.

Windows SmartScreen: Because Kalam is a new open-source app, Windows may show a "Windows protected your PC" warning. Click More info → Run anyway to proceed.

Screenshot: Windows SmartScreen prompt The "More info → Run anyway" dialog that appears on first install.

Permissions needed

Microphone — Windows Settings → Privacy → Microphone. Ensure "Allow apps to access your microphone" is on.
Accessibility — No extra setup on Windows. Kalam uses standard input APIs.

macOS 11+

Download the .pkg or .dmg file. For .pkg, double-click to install. For .dmg, drag Kalam to Applications.

Gatekeeper: If macOS blocks the app, go to System Settings → Privacy & Security and click Open Anyway next to the Kalam message.

Permissions needed

Microphone — System Settings → Privacy & Security → Microphone. Toggle Kalam on.
Accessibility — System Settings → Privacy & Security → Accessibility. Add Kalam to the list.
Input Monitoring — System Settings → Privacy & Security → Input Monitoring. This allows Kalam to detect your global hotkey.

Screenshot: macOS Privacy & Security The Accessibility and Input Monitoring toggles for Kalam in System Settings.

Linux (Ubuntu, Debian, and others)

Download the .AppImage or .deb package.

# AppImage
chmod +x Kalam*.AppImage
./Kalam*.AppImage

# Debian / Ubuntu
sudo dpkg -i kalam*.deb

Terminal

Permissions needed

Microphone — Usually granted automatically. Check PulseAudio/PipeWire settings if audio isn't detected.
Accessibility — On X11, Kalam uses xdotool. On Wayland, some compositors may require XDG portal configuration.

Your First Dictation

Once installed, here's what a typical dictation flow looks like.

Screenshot: Dictation in action The overlay pill showing the recording waveform while dictating into a text editor. Show the pill in its expanded/recording state with the waveform visible.

The floating overlay pill appears when you hold your hotkey

Click into any text field in any application — a browser, Word, Slack, VS Code, anything.
Press and hold your dictation hotkey (default: Ctrl + Win on Windows).
A small floating pill appears on screen with a waveform animation — this means Kalam is listening.
Speak naturally. Kalam captures your audio in real time.
Release the key. Your transcribed text is typed into the active field.

Toggle mode: Prefer hands-free? Switch to Toggle mode in Settings → General. Press once to start, press again to stop — no holding required.

Speech-to-Text Modes

Kalam supports four transcription modes. Choose the one that fits your workflow and privacy needs.

Local (Offline)

Uses a Whisper model running on your machine. Audio never leaves your device. No internet required.

Cloud

Sends audio to your configured provider (Groq or OpenAI) for faster, more accurate transcription. Requires an API key.

Hybrid

Uses cloud by default but switches to local processing when a sensitive app is detected. Similar to Auto with broader sensitivity coverage.

Auto

Uses cloud normally but automatically switches to local when a sensitive app is detected (banking, password managers, etc.).

Screenshot: Settings → Audio & Dictation The STT mode selector showing the four mode cards (Auto, Hybrid, Cloud, Local) with the description panel below.

Select your preferred STT mode in Settings → Audio & Dictation

Choosing the right mode

Priority	Recommended Mode	Why
Maximum privacy	Local	Audio never leaves your device
Speed & accuracy	Cloud	Near-instant results with Groq
Privacy + speed	Auto	Cloud speed with automatic local fallback for sensitive apps
Balanced	Hybrid	Cloud by default, local for sensitive apps

BYOK (Bring Your Own Key): Kalam never provides API keys. You bring your own from Groq or OpenAI. Keys are stored locally on your device and never sent to Kalam servers.

Hotkeys & Controls

Kalam supports two dictation modes and a customizable hotkey. Configure everything in Settings → General.

Dictation modes

Mode	How it works	Best for
Hold-to-dictate	Press and hold your hotkey. Speak. Release to stop and insert text.	Quick phrases, short messages
Toggle	Press once to start dictating, press again to stop.	Longer dictation sessions, hands-free use
Both	Registers both a hold hotkey and a separate toggle hotkey. Use either depending on the situation.	Flexibility — the default setting

Screenshot: Settings → General → Dictation Hotkeys The hotkey configuration section showing the recording mode selector (Hold / Toggle / Both) and the hotkey capture fields.

Customize your dictation hotkey and recording mode in Settings

Default hotkeys

Action	Default Key	Notes
Dictate (hold)	Ctrl + Win	Hold to record, release to transcribe. On macOS/Linux: Ctrl + Super
Dictate (toggle)	Not set by default	Assign a separate toggle key in Settings → General
Command mode	Not set by default	Enable and assign a key in Settings → Command Mode
Language toggle	Not set by default	Switch between configured languages mid-session

Custom hotkeys: You can remap any hotkey in Settings → General → Dictation Hotkeys. Click the capture field and press your preferred key combination.

Overview Dashboard

The Overview is your home screen — a snapshot of your dictation activity and productivity stats.

Screenshot: Overview dashboard The home screen showing the 7-day words chart, estimated time saved, word count stats, and recent dictation entries.

The Overview dashboard with dictation stats and activity charts

The dashboard shows:

7-day word chart — daily dictation volume with estimated time saved vs. typing at 40 WPM.
Total words — lifetime word count across all dictations.
Time saved — estimated hours saved by dictating instead of typing.
Top destinations — which apps you dictate into most often.
Recent dictations — quick access to your latest transcriptions.

Dictation History

Every dictation is logged with its transcription, timestamp, target app, and metadata. Access it from the sidebar.

Screenshot: History page The dictation history list grouped by date, showing transcription previews, target app icons, and timestamps.

Dictation history grouped by date with searchable entries

Each history entry includes:

Full transcription — the complete text that was dictated.
Target app — which application received the text, with app icon and display name.
Metadata — word count, audio length, response latency, recognition mode (cloud/local), and language.
Sensitive app toggle — mark the target app as sensitive directly from the history detail view.

Use the search bar to find past dictations by content. Sort by newest or oldest. To clear all history, use the Clear all button (with confirmation).

Retention: History is automatically pruned based on your retention setting in Settings → Privacy. Audio is never written to disk — only the transcribed text is stored.

Workspace

Beyond the Overview and History, Kalam includes four voice-powered productivity tools. Access them from the sidebar.

Notes

Color-coded cards with labels, pins, and rich text editing. Great for capturing ideas by voice. Supports archive and trash.

Tasks

Task management with open/closed states, priority levels (Low, Medium, High), subtasks, and drag-to-reorder.

Snippets

Reusable text shortcuts. Define a trigger phrase and Kalam expands it into your full text automatically during dictation.

Reminders

Set due dates and get notified. Works with notes and tasks. Supports recurring reminders.

Screenshot: Notes workspace The Notes view showing color-coded cards with labels, pins, and the search/filter toolbar. Show a few notes with different colors and at least one pinned.

Notes with color-coded cards, labels, and pin support

Screenshot: Tasks workspace The Tasks view showing open and completed tasks with priority badges, subtasks, and the scope/filter toolbar.

Tasks with priorities, subtasks, and drag-to-reorder

Voice commands

Command mode lets you create workspace items by voice. First, enable it in Settings → Command Mode and assign a hotkey. Then hold your command hotkey and say a command:

Say this	What happens
"New note buy groceries"	Creates a note titled "buy groceries"
"New task review PR"	Creates a task titled "review PR"
"New reminder call dentist"	Creates a note (set the reminder time in the note editor)
"Online search Rust async patterns"	Opens a DuckDuckGo search in your browser

Command Mode

Use your voice to create notes, tasks, and run web searches — without touching the keyboard. Configure in Settings → Command Mode.

Getting started

Open Settings → Command Mode.
Toggle Enable command mode on.
Assign a command hotkey (e.g. Right Alt).
Hold the command hotkey, speak your command, release.

Basic commands

Without LLM enabled, commands must start with a specific prefix:

Say this	What happens
"New note [content]"	Creates a note with the spoken content as the title
"New task [content]"	Creates a task with the spoken content as the title
"New reminder [content]"	Creates a note (set the reminder time in the editor)
"Online search [query]"	Opens a DuckDuckGo search in your default browser

LLM-powered commands (optional)

For natural language parsing, enable the LLM option in Settings → Command Mode and add an API key for one of the supported providers:

Groq, OpenAI, Anthropic, OpenRouter, or Google Gemini

With LLM enabled, you can speak naturally without fixed prefixes — Kalam infers the entry type and extracts fields automatically. For example, "remind me to call the dentist tomorrow at 3pm" creates a note with a reminder set.

Note: Command mode is disabled by default and has no hotkey assigned. You must enable it and set a hotkey before it will work.

Dictionary

Add custom words, names, and technical terms to improve transcription accuracy. Manage your dictionary in Settings → Dictionary.

The dictionary feeds custom vocabulary to the cloud STT provider, helping it recognize words it might otherwise miss or misspell — like proper nouns, brand names, acronyms, or domain-specific jargon.

Adding terms

Open Settings → Dictionary.
Click Add term.
Type the word or phrase exactly as you want it transcribed.
The term is saved and included in future cloud transcription requests.

You can also edit existing terms inline or delete terms you no longer need.

Tip: Dictionary terms are most effective with cloud STT. They're sent as vocabulary hints to the provider. Local (offline) Whisper models have limited support for custom vocabulary.

Privacy Settings

Kalam gives you granular control over your data. Everything is configurable in Settings → Privacy.

Screenshot: Settings → Privacy The Privacy settings panel showing sensitive app detection, history retention, and telemetry toggles.

Privacy controls in Settings

Sensitive app detection

Define which applications trigger automatic offline mode. When using Hybrid or Auto STT mode and you focus a sensitive app (like a banking site or password manager), Kalam switches to local processing — your audio never leaves your device.

You can add apps from:

Currently running processes — pick from what's open right now
Installed applications — browse your installed apps
Browse for executable — select any .exe / .app file

Screenshot: Sensitive app picker The full-screen sensitive app picker showing the three tabs (Running / Installed / Browse) with app icons and names.

Add sensitive apps from running processes, installed apps, or by browsing

History retention

Choose how long dictation history is kept: 7 days, 30 days, 90 days, 1 year, or Forever. Entries older than your selected period are automatically removed on startup and after each dictation.

Telemetry

Anonymous usage analytics are opt-in only and disabled by default. No audio, no transcription text, and no personal data is ever included in telemetry. See the full Privacy Policy for details.

API Keys

Cloud and hybrid modes require an API key from a supported provider. Kalam stores keys locally — they never touch our servers.

Supported providers

Groq

Blazing-fast inference. Free tier available. Get your key at console.groq.com.

OpenAI

Industry-standard Whisper API. Get your key at platform.openai.com.

Adding your key

Open Settings → Audio & Dictation.
Select your Cloud Provider (Groq or OpenAI).
Paste your API key in the key field.
Click Validate to confirm the key works.
A ✓ Configured badge appears when the key is saved.

Screenshot: Settings → Audio & Dictation → API Key The cloud provider selector and API key field with the "✓ Configured" badge visible.

API key configuration with provider selection and validation

Keep your keys safe. API keys are stored in your local app config. Kalam never transmits them to any server other than the STT provider you selected. If you suspect a key is compromised, regenerate it on the provider's dashboard.

Troubleshooting

Common issues and how to fix them. Can't find your answer? Open an issue on GitHub.

Microphone not detected

Ensure Kalam has microphone permission in your OS settings:

Windows: Settings → Privacy → Microphone → ensure "Allow apps to access your microphone" is on.
macOS: System Settings → Privacy & Security → Microphone → toggle Kalam on.
Linux: Check PulseAudio / PipeWire settings. Run pavucontrol and verify the input device.

You can also change your input device in Settings → Audio & Dictation or from the status bar microphone selector.

or

Text not appearing in target app

Kalam needs accessibility permissions to type into other apps:

macOS: System Settings → Privacy & Security → Accessibility → add Kalam. Also check Input Monitoring.
Linux (Wayland): Some compositors may require XDG portal configuration. X11 uses xdotool which works out of the box.
Windows: If running as a standard user and the target app is elevated (admin), Kalam may not be able to inject text. Run Kalam as administrator in that case.

or

Offline mode is slow

Local Whisper model performance depends on your hardware (CPU and available RAM). To improve speed:

Try a smaller model size in Settings → Audio & Dictation → Local Model.
Close resource-heavy applications to free up RAM.
Switch to Cloud mode with a Groq API key for near-instant transcription.

or

Dictation cuts off or misses words

This can happen with quiet microphones or high background noise:

Enable the audio filter in Settings → Audio & Dictation. The "Light" preset applies peak normalization and a noise gate.
Use the Test Microphone feature to record a sample and play it back — check if your voice is coming through clearly.
Try a different microphone or move closer to your current one.

or

App won't launch / crashes on startup

Make sure you're running a supported OS version (Windows 10+, macOS 11+, or a recent Linux distro).
Try deleting the app data folder and relaunching:
- Windows: %USERPROFILE%\.kalam
- macOS: ~/.kalam
- Linux: ~/.kalam and ~/.local/share/kalam
Check Settings → About → Logs for error details, or export logs and attach them to a GitHub issue.

Still stuck? Open an issue on GitHub or email [email protected]. Include your OS, Kalam version (Settings → About), and any error messages or logs.

Everything you needto get started.

Quick Start

Download & install

Grant permissions

Hold, speak, release

Installation

Windows 10+

Permissions needed

macOS 11+

Permissions needed

Linux (Ubuntu, Debian, and others)

Permissions needed

Your First Dictation

Speech-to-Text Modes

Local (Offline)

Cloud

Hybrid

Auto

Choosing the right mode

Hotkeys & Controls

Dictation modes

Default hotkeys

Overview Dashboard

Dictation History

Workspace

Notes

Tasks

Snippets

Reminders

Voice commands

Command Mode

Getting started

Basic commands

LLM-powered commands (optional)

Dictionary

Adding terms

Privacy Settings

Sensitive app detection

History retention

Telemetry

API Keys

Supported providers

Groq

OpenAI

Adding your key

Troubleshooting

Microphone not detected

Text not appearing in target app

Offline mode is slow

Dictation cuts off or misses words

App won't launch / crashes on startup

Everything you need
to get started.