Kokoro TTS — Local AI

Voice Studio

Type any text or speak into the mic — hear it voiced by one of 60+ AI voices across 9 languages, running entirely on local infrastructure.

Kokoro TTS Kokoro-FastAPI Whisper STT FastAPI 60+ Voices 9 Languages 100% Local
Choose a Voice
Loading voices…
Voice:
Text to Speech
Speak & Convert
0 / 2500
Speed1.00×

Press Record, speak, then Stop. Audio is sent to the backend where Whisper transcribes it locally — then Kokoro re-synthesises the text in your chosen voice.

Click to start
Your speech will appear here after recording…

How it works

Step 1
Pick a Voice
60+ voices across 9 languages served by Kokoro TTS running locally on the home server.
Step 2
Text to Speech
Type any text — the FastAPI backend proxies to Kokoro-FastAPI's OpenAI-compatible /v1/audio/speech endpoint.
Step 3
Speak and Convert
Browser MediaRecorder captures audio; Whisper (faster-whisper) transcribes it in the backend, then Kokoro re-synthesises in your chosen voice.
Step 4
100% Local
No external APIs. Kokoro-FastAPI and Whisper run on the home server. Zero data leaves the network.