Text to Speech API for Discord Bots
Low-latency WAV over a single REST call. No SDK. No per-voice licensing. Bot is speaking in under 200 ms.
ElevenLabs rate-limits burst traffic
Discord voice channels spike: a single /tts command fires a cold HTTP round-trip, and popular bots hit ElevenLabs' concurrency ceiling during peak hours. Audexum queues requests on your key with no artificial concurrency cap on Starter and above.
Opus encoding is your bottleneck, not the TTS
discord.js consumes Opus natively. Audexum returns raw WAV (PCM 24 kHz) which ffmpeg or @discordjs/opus can transcode in-process. You control the buffer — no hidden re-encoding on a third-party server adding 300 ms.
43 voices, one endpoint, no per-voice fee
ElevenLabs charges per voice clone; Google TTS bills per language as a separate SKU. Audexum's 43 voices across 33 languages are all on the same price schedule — a multilingual /tts command costs the same as an English-only one.
Free tier covers a small bot's entire month
10,000 free characters per month equals roughly 1,500 short TTS commands. Most hobby bots never leave the free tier. When you do scale, PAYG charges €8 per million characters with no monthly commitment.
First audio in 60 seconds.
No SDK — one POST request, binary audio in the response body.
// tts.js — Audexum TTS helper for discord.js v14
const { Readable } = require("stream");
const API = "https://audexum.com/api/synthesize";
/**
* Returns a Node.js Readable stream of PCM WAV audio.
* Pass it directly to createAudioResource() with StreamType.Arbitrary.
*/
async function synthesize(text, voice = "af_heart") {
const res = await fetch(API, {
method: "POST",
headers: {
Authorization: `Bearer ${process.env.AUDEXUM_API_KEY}`,
"Content-Type": "application/json",
},
body: JSON.stringify({ text, voice, format: "wav" }),
});
if (!res.ok) throw new Error(`Audexum ${res.status}: ${await res.text()}`);
return Readable.fromWeb(res.body);
}
module.exports = { synthesize };Full parameter reference: audexum.com/docs. Supported formats: wav, mp3, ogg. Supported voices: F1–F5, M1–M5 (43 voices total). Supported languages: 33.
Transparent, no per-use-case surcharge.
Every plan covers every use case at the same character rate. PAYG credits never expire.
All plans include STT (speech-to-text dictation) at no extra cost. Full pricing details →
Tools and guides for this use case.
What audio format should I use for Discord?+
Request format: "wav" (PCM 24 kHz, mono or stereo). Discord's voice gateway expects Opus; transcode the WAV stream with @discordjs/opus or ffmpeg before creating the audio resource. WAV avoids double MP3 compression artifacts.
How fast is the first-byte response?+
Audexum begins streaming audio within ~80–150 ms of receiving a request for texts under 200 characters (EU region). The full WAV for a 100-character phrase arrives in under 400 ms on a typical VPS in Frankfurt.
Can I run multiple concurrent /tts commands?+
Yes. Audexum imposes no per-key concurrency limit on Starter and above. Free-tier requests are queued but not rejected. For very high-throughput bots (>50 simultaneous voice channels), PAYG or Scale is recommended.
Is there a Discord-specific SDK?+
No SDK is needed. A single fetch() or requests.post() call returns audio. The blog post at /blog/tts-api-for-discord-bot-2026 contains a complete working bot with slash commands, join/leave logic, and a queue.
Same API, every use case.
One endpoint handles Discord bots, podcast narration, e-learning courses, accessibility audio, and newsletter editions.
Start free. Ship fast.
10,000 characters per month, no credit card required. First audio in your terminal in 60 seconds.
Questions? [email protected]