Skip to content
Use case

Text to Speech API for Discord Bots

Low-latency WAV over a single REST call. No SDK. No per-voice licensing. Bot is speaking in under 200 ms.

Why it matters

ElevenLabs rate-limits burst traffic

Discord voice channels spike: a single /tts command fires a cold HTTP round-trip, and popular bots hit ElevenLabs' concurrency ceiling during peak hours. Audexum queues requests on your key with no artificial concurrency cap on Starter and above.

Opus encoding is your bottleneck, not the TTS

discord.js consumes Opus natively. Audexum returns raw WAV (PCM 24 kHz) which ffmpeg or @discordjs/opus can transcode in-process. You control the buffer — no hidden re-encoding on a third-party server adding 300 ms.

43 voices, one endpoint, no per-voice fee

ElevenLabs charges per voice clone; Google TTS bills per language as a separate SKU. Audexum's 43 voices across 33 languages are all on the same price schedule — a multilingual /tts command costs the same as an English-only one.

Free tier covers a small bot's entire month

10,000 free characters per month equals roughly 1,500 short TTS commands. Most hobby bots never leave the free tier. When you do scale, PAYG charges €8 per million characters with no monthly commitment.

Integration

First audio in 60 seconds.

No SDK — one POST request, binary audio in the response body.

discord.js — /tts slash command (Node.js)
// tts.js — Audexum TTS helper for discord.js v14
const { Readable } = require("stream");

const API = "https://audexum.com/api/synthesize";

/**
 * Returns a Node.js Readable stream of PCM WAV audio.
 * Pass it directly to createAudioResource() with StreamType.Arbitrary.
 */
async function synthesize(text, voice = "af_heart") {
  const res = await fetch(API, {
    method: "POST",
    headers: {
      Authorization: `Bearer ${process.env.AUDEXUM_API_KEY}`,
      "Content-Type": "application/json",
    },
    body: JSON.stringify({ text, voice, format: "wav" }),
  });
  if (!res.ok) throw new Error(`Audexum ${res.status}: ${await res.text()}`);
  return Readable.fromWeb(res.body);
}

module.exports = { synthesize };

Full parameter reference: audexum.com/docs. Supported formats: wav, mp3, ogg. Supported voices: F1–F5, M1–M5 (43 voices total). Supported languages: 33.

Pricing

Transparent, no per-use-case surcharge.

Every plan covers every use case at the same character rate. PAYG credits never expire.

PlanChars/moPrice
Free10,000€0 / mo
Starter100,000€4 / mo
Pro500,000€12 / mo
Scale2,000,000€30 / mo
Pay-as-you-goUnlimited€8 / 1M chars

All plans include STT (speech-to-text dictation) at no extra cost. Full pricing details →

FAQ
What audio format should I use for Discord?+

Request format: "wav" (PCM 24 kHz, mono or stereo). Discord's voice gateway expects Opus; transcode the WAV stream with @discordjs/opus or ffmpeg before creating the audio resource. WAV avoids double MP3 compression artifacts.

How fast is the first-byte response?+

Audexum begins streaming audio within ~80–150 ms of receiving a request for texts under 200 characters (EU region). The full WAV for a 100-character phrase arrives in under 400 ms on a typical VPS in Frankfurt.

Can I run multiple concurrent /tts commands?+

Yes. Audexum imposes no per-key concurrency limit on Starter and above. Free-tier requests are queued but not rejected. For very high-throughput bots (>50 simultaneous voice channels), PAYG or Scale is recommended.

Is there a Discord-specific SDK?+

No SDK is needed. A single fetch() or requests.post() call returns audio. The blog post at /blog/tts-api-for-discord-bot-2026 contains a complete working bot with slash commands, join/leave logic, and a queue.

Other use cases

Same API, every use case.

One endpoint handles Discord bots, podcast narration, e-learning courses, accessibility audio, and newsletter editions.

Start free. Ship fast.

10,000 characters per month, no credit card required. First audio in your terminal in 60 seconds.

Questions? [email protected]