Cloud-Powered Dictation: Fast STT for Old Linux Hardware with Copilot & Claude
Context
I work daily with GitHub Copilot and Claude Code, and like many developers, I often think faster than I type. Speech‑to‑Text (STT) is an obvious productivity multiplier — but there’s a catch:
Unlike traditional local STT solutions that struggle on weak hardware, this approach leverages cloud processing for near-instant results.
-
My main Linux laptop is old (Dell Vostro 14‑3468)
-
CPU is weak, no GPU acceleration
-
I want speed and accuracy, not fancy UI
-
I want it to work everywhere, not inside a specific app
This post documents a minimal, fast, and reliable STT setup that works perfectly on old hardware and integrates seamlessly into a modern coding workflow.
Design Goals
Before touching any tools, I defined strict constraints:
-
⚡ Low latency (sub‑second for short dictations)
-
🧠 High accuracy (English & Spanish)
-
🖥️ Works on old CPUs
-
⌨️ Keyboard‑driven (no mouse, no UI)
-
🌍 Global (works in Copilot, Claude Code, editors, terminals)
-
🧩 Wayland‑compatible (no X11 hacks)
Offline STT was nice to have, but not required.
Why Local Whisper GUIs Didn’t Work
I tested several local Whisper-based tools (GUI and CLI wrappers). They worked, but:
-
❌ Slow on CPU‑only machines
-
❌ Heavy UIs
-
❌ Poor integration with keyboard workflows
On old hardware, local inference is the bottleneck, not disk or RAM.
So instead of fighting physics, I changed the architecture.
Architecture: Thin Client + API STT
The winning approach:
Mic → WAV → OpenAI STT API → Clipboard → Paste AnywhereKey idea:
Let the laptop do only audio capture. Let the cloud do inference.
This gives:
-
Near‑instant transcription for short prompts
-
Consistent accuracy
-
Minimal local resource usage
Core Tools
-
Ubuntu Linux (Wayland)
-
sox (rec, play) for audio capture & beeps
-
OpenAI STT API (gpt-4o-mini-transcribe)
-
wl-copy for clipboard integration
-
Keyboard shortcuts (system‑level)
No window focus tricks. No UI automation. No hacks.
The Dictation Script
This Python script does exactly one thing:
-
Record audio until I stop it
-
Transcribe it
-
Put the text in the clipboard
-
Signal readiness with a sound
import subprocessimport tempfilefrom openai import OpenAIfrom dotenv import load_dotenv
load_dotenv()client = OpenAI()
def beep(freq, dur=0.12): subprocess.run( ["play", "-nq", "synth", str(dur), "sine", str(freq)], check=False, )
with tempfile.NamedTemporaryFile(suffix=".wav") as f: beep(1200) # start recording
try: subprocess.run( ["rec", "-q", f.name, "rate", "16000", "channels", "1"], check=True, ) except KeyboardInterrupt: pass
beep(600) # stop recording
with open(f.name, "rb") as audio: r = client.audio.transcriptions.create( model="gpt-4o-mini-transcribe", file=audio, ) text = r.text.strip()
subprocess.run( ["wl-copy"], input=text, text=True, check=True,)
beep(1500) # clipboard readyPush‑to‑Talk With Two Shortcuts
To make this frictionless, I use two global keyboard shortcuts:
▶ Start Dictation
#!/usr/bin/env bashsource ~/tools/stt-dictation/.venv/bin/activatepython ~/tools/stt-dictation/dictate.pyShortcut example:
- Super + D
⏹ Stop Dictation
#!/usr/bin/env bashpkill -INT recShortcut example:
- Super + S
This cleanly stops recording without killing Python, allowing transcription to finish.
Audio Feedback (Why It Matters)
I intentionally avoided notifications or UI.
Instead, sound cues communicate state instantly:
-
🔊 High beep → recording started
-
🔊 Low beep → recording stopped
-
🔔 Single high beep → text ready to paste
This avoids a classic failure mode:
Pasting before transcription finishes.
With the final beep, I know exactly when Ctrl+V is safe.
Sound cues work particularly well for coding workflows because they don’t require visual attention or interrupt your focus on the screen.
Performance & Cost
For short prompts (
-
⏱️ End‑to‑end latency: ~1 second
-
💲 Cost per dictation: ~$0.0005 (at $0.006 per minute for gpt-4o-mini-transcribe, a 5-second dictation costs ~$0.0005)
Even with heavy daily usage, monthly cost is negligible.
At this scale, latency matters far more than cost.
Why This Works Well for Copilot & Claude Code
-
Global clipboard → works everywhere
-
No dependency on editor plugins
-
Perfect for:
Long prompts
-
Refactoring instructions
-
Natural language problem descriptions
It feels like talking to the IDE, not typing into it.
Lessons Learned
-
Old hardware is fine if you keep it thin
-
Cloud inference beats local CPU every time
-
Keyboard‑first workflows matter
-
Audio feedback > visual feedback
-
Simple pipelines beat complex tools
Final Thoughts
This setup turned an aging laptop into a high‑quality dictation workstation for modern AI‑assisted development.
No heavyweight apps. No vendor lock‑in. No UI friction.
Just:
Think → Speak → Paste → Code
Key Takeaways:
-
Cloud processing beats local CPU for speed on old hardware
-
Keyboard-driven workflows integrate seamlessly with coding
-
Audio feedback provides instant state awareness
-
Minimalist design ensures reliability and low maintenance
If you’re working on Linux with limited hardware and heavy AI usage, this approach is worth copying.