TTS API for Discord Bot: Complete Setup Guide with discord.js (2026)
Build a Discord bot that synthesizes speech and plays it in a voice channel using Audexum's TTS API and discord.js. Full working code included.
Discord's built-in TTS (/tts) is useful for five minutes before it becomes annoying. If you want a bot that speaks in a voice channel with an actual voice — not the robotic browser TTS — you need an external TTS API. This guide covers the full setup: picking a TTS API, integrating it with discord.js v14, and playing audio in a voice channel. All code is copy-paste ready.
What You Need
- Node.js 18+
- A Discord bot application and token
- A TTS API key (Audexum free tier works — 10,000 chars/month, no credit card)
- @discordjs/voice, discord.js, @discordjs/opus, and ffmpeg-static packages
The bot flow is: user types a command → bot calls TTS API → receives MP3 audio → plays it in the user's current voice channel.
Choosing a TTS API for Your Discord Bot
Your main constraints for Discord bot use are: latency (voice channels are interactive), cost at scale (bots can generate a lot of audio fast), and simplicity (you do not need voice cloning — you need a reliable REST endpoint).
| Provider | Latency (typical) | Free Tier | Cost at 1M chars | API Simplicity |
|---|---|---|---|---|
| Audexum | ~300–700ms | 10K/mo | €8 | Simple REST, Bearer auth |
| ElevenLabs | ~400–900ms | 10K/mo (no commercial) | $11–$330 | Good docs |
| OpenAI TTS | ~600–1200ms | None | $15 | Simple REST |
| Google Cloud TTS | ~200–500ms | 1M/mo (billing req.) | $4–$16 | SDK or REST |
| AWS Polly | ~200–600ms | 5M (12-mo trial) | $4 | SDK required |
For Discord bots, latency matters more than for batch pipelines. The gap between providers is noticeable — anything over 1 second feels sluggish in an interactive voice session. Audexum's REST API is the simplest to integrate — no SDK required, Bearer token auth, binary audio response.
See Audexum vs OpenAI TTS for a detailed latency and cost breakdown.
Project Setup
mkdir discord-tts-bot && cd discord-tts-bot
npm init -y
npm install discord.js @discordjs/voice @discordjs/opus ffmpeg-static dotenvCreate a .env file:
DISCORD_TOKEN=your_discord_bot_token
DISCORD_CLIENT_ID=your_application_client_id
AUDEXUM_API_KEY=your_audexum_api_keyGet your Audexum API key from audexum.com/signup — the free tier (10K chars/month) is enough to prototype a bot.
The TTS Helper
Create tts.js — this handles the API call and returns a readable stream for audio playback:
// tts.js
const { Readable } = require("stream");
const AUDEXUM_API = "https://audexum.com/api/synthesize";
async function synthesize(text, voiceId = "af_heart") {
const response = await fetch(AUDEXUM_API, {
method: "POST",
headers: {
Authorization: `Bearer ${process.env.AUDEXUM_API_KEY}`,
"Content-Type": "application/json",
},
body: JSON.stringify({
text,
voice: voiceId,
format: "mp3",
}),
});
if (!response.ok) {
const body = await response.text();
throw new Error(`Audexum API error ${response.status}: ${body}`);
}
// Convert the fetch body (Web ReadableStream) to a Node.js Readable
return Readable.fromWeb(response.body);
}
module.exports = { synthesize };The Bot: Voice Channel Playback
Create bot.js — handles Discord commands and pipes TTS audio into a voice channel:
// bot.js
require("dotenv").config();
const {
Client, GatewayIntentBits, REST, Routes, SlashCommandBuilder,
} = require("discord.js");
const {
joinVoiceChannel, createAudioPlayer, createAudioResource,
AudioPlayerStatus, StreamType,
} = require("@discordjs/voice");
const { synthesize } = require("./tts");
const client = new Client({
intents: [
GatewayIntentBits.Guilds,
GatewayIntentBits.GuildVoiceStates,
GatewayIntentBits.GuildMessages,
],
});
async function registerCommands() {
const commands = [
new SlashCommandBuilder()
.setName("speak")
.setDescription("Speak text in your current voice channel")
.addStringOption((opt) =>
opt.setName("text").setDescription("What to say").setRequired(true).setMaxLength(500)
)
.addStringOption((opt) =>
opt.setName("voice").setDescription("Voice ID (optional)").setRequired(false)
)
.toJSON(),
];
const rest = new REST().setToken(process.env.DISCORD_TOKEN);
await rest.put(Routes.applicationCommands(process.env.DISCORD_CLIENT_ID), { body: commands });
console.log("Slash commands registered.");
}
client.once("ready", async () => {
console.log(`Logged in as ${client.user.tag}`);
await registerCommands();
});
client.on("interactionCreate", async (interaction) => {
if (!interaction.isChatInputCommand() || interaction.commandName !== "speak") return;
const voiceChannel = interaction.member?.voice?.channel;
if (!voiceChannel) {
return interaction.reply({ content: "You need to be in a voice channel first.", ephemeral: true });
}
const text = interaction.options.getString("text");
const voiceId = interaction.options.getString("voice") ?? "af_heart";
await interaction.deferReply({ ephemeral: true });
try {
const audioStream = await synthesize(text, voiceId);
const connection = joinVoiceChannel({
channelId: voiceChannel.id,
guildId: interaction.guildId,
adapterCreator: interaction.guild.voiceAdapterCreator,
selfDeaf: false,
});
const player = createAudioPlayer();
const resource = createAudioResource(audioStream, { inputType: StreamType.Arbitrary });
player.play(resource);
connection.subscribe(player);
player.on(AudioPlayerStatus.Idle, () => connection.destroy());
player.on("error", (err) => { console.error("Player error:", err); connection.destroy(); });
await interaction.editReply({ content: `Speaking: "${text}"` });
} catch (err) {
console.error("TTS error:", err);
await interaction.editReply({ content: "Failed to synthesize audio. Check logs." });
}
});
client.login(process.env.DISCORD_TOKEN);Running the Bot
node bot.jsIn Discord, type /speak text:Hello, this is your TTS bot while in a voice channel. The bot joins, speaks the text, then leaves.
Character Budgeting for Discord Bots
Discord bots can use characters faster than you expect. A few scenarios:
| Use Case | Avg chars/command | Commands/day | Monthly total |
|---|---|---|---|
| Server announcements | 200 | 5 | ~30,000 |
| Game event narration | 150 | 20 | ~90,000 |
| Music bot track announces | 80 | 100 | ~240,000 |
| Full assistant bot | 300 | 50 | ~450,000 |
For a small community server with occasional use, Audexum's free 10,000 chars/month covers it. For an active gaming server, the €4/month plan (100K chars) or €12/month (500K chars) is the realistic range. See audexum.com/pricing for full plan details.
The referral program also applies here: share your Audexum referral code with your community. Each signup using your code gives both accounts +10,000 free characters. For a Discord server, this can mean sustained free usage.
Adding Voice Selection for Users
A common pattern is letting users choose from a voice list. Extend the slash command options:
.addStringOption((opt) =>
opt
.setName("voice")
.setDescription("Choose a voice")
.setRequired(false)
.addChoices(
{ name: "Heart (American · F)", value: "af_heart" },
{ name: "Michael (American · M)", value: "am_michael" },
{ name: "Emma (British · F)", value: "bf_emma" },
{ name: "George (British · M)", value: "bm_george" }
)
)Check the full voice list at audexum.com/docs. With 43 voices across 33 languages, you can support multilingual communities in a single bot.
Handling Rate Limits and Errors Gracefully
Add a simple retry wrapper around the API call for production bots:
async function synthesizeWithRetry(text, voiceId, maxRetries = 2) {
for (let attempt = 0; attempt <= maxRetries; attempt++) {
try {
return await synthesize(text, voiceId);
} catch (err) {
if (attempt === maxRetries) throw err;
await new Promise((r) => setTimeout(r, 1000));
}
}
}For high-traffic bots, track character usage in your own database and alert before hitting plan limits.
Alternatives for Discord Bot TTS
- Google Cloud TTS — Best free volume (1M chars/month), requires a billing account. Slightly more complex setup with the SDK, but reliable at scale.
- OpenAI TTS — No free tier, $15/1M chars. Justified if you are already paying for OpenAI APIs and want to consolidate billing. Good voice quality for longer phrases.
- ElevenLabs — High voice quality, 10K free chars with no commercial rights. At bot scale, it gets expensive fast. Full comparison here.
- Self-hosted (Coqui/Piper) — Zero API cost once running, but requires a server with GPU or CPU resources, ongoing maintenance, and model management.
Deploying the Bot
For production, run the bot as a process managed by pm2:
pm2 start bot.js --name discord-tts-bot
pm2 saveMake sure your .env file is protected (chmod 600 .env) and not committed to version control.
By Petar, founder of Audexum. The Discord bot use case was one of the first things people asked about after launch — this guide covers exactly what I wish had existed when I was building the first integration.