2026-05-18·9 min read·By Petar

TTS API for Discord Bot: Complete Setup Guide with discord.js (2026)

Build a Discord bot that synthesizes speech and plays it in a voice channel using Audexum's TTS API and discord.js. Full working code included.

Discord's built-in TTS (/tts) is useful for five minutes before it becomes annoying. If you want a bot that speaks in a voice channel with an actual voice — not the robotic browser TTS — you need an external TTS API. This guide covers the full setup: picking a TTS API, integrating it with discord.js v14, and playing audio in a voice channel. All code is copy-paste ready.

What You Need

Node.js 18+
A Discord bot application and token
A TTS API key (Audexum free tier works — 30,000 credits/month, no credit card)
@discordjs/voice, discord.js, @discordjs/opus, and ffmpeg-static packages

The bot flow is: user types a command → bot calls TTS API → receives MP3 audio → plays it in the user's current voice channel.

Choosing a TTS API for Your Discord Bot

Your main constraints for Discord bot use are: latency (voice channels are interactive), cost at scale (bots can generate a lot of audio fast), and simplicity (you do not need voice cloning — you need a reliable REST endpoint).

Provider	Latency (typical)	Free Tier	Cost at 1M chars	API Simplicity
Audexum	~300–700ms	30K/mo	€20	Simple REST, Bearer auth
ElevenLabs	~400–900ms	10K/mo (no commercial)	$11–$330	Good docs
OpenAI TTS	~600–1200ms	None	$15	Simple REST
Google Cloud TTS	~200–500ms	1M/mo (billing req.)	$4–$16	SDK or REST
AWS Polly	~200–600ms	5M (12-mo trial)	$4	SDK required

For Discord bots, latency matters more than for batch pipelines. The gap between providers is noticeable — anything over 1 second feels sluggish in an interactive voice session. Audexum's REST API is the simplest to integrate — no SDK required, Bearer token auth, binary audio response.

See Audexum vs OpenAI TTS for a detailed latency and cost breakdown.

Project Setup

bash

mkdir discord-tts-bot && cd discord-tts-bot
npm init -y
npm install discord.js @discordjs/voice @discordjs/opus ffmpeg-static dotenv

Create a .env file:

env

DISCORD_TOKEN=your_discord_bot_token
DISCORD_CLIENT_ID=your_application_client_id
AUDEXUM_API_KEY=your_audexum_api_key

Get your Audexum API key from audexum.com/signup — the free tier (30K credits/month) is enough to prototype a bot.

The TTS Helper

Create tts.js — this handles the API call and returns a readable stream for audio playback:

javascript

// tts.js
const { Readable } = require("stream");

const AUDEXUM_API = "https://audexum.com/api/synthesize";

async function synthesize(text, voiceId = "af_heart") {
  const response = await fetch(AUDEXUM_API, {
    method: "POST",
    headers: {
      Authorization: `Bearer ${process.env.AUDEXUM_API_KEY}`,
      "Content-Type": "application/json",
    },
    body: JSON.stringify({
      text,
      voice: voiceId,
      format: "mp3",
    }),
  });

  if (!response.ok) {
    const body = await response.text();
    throw new Error(`Audexum API error ${response.status}: ${body}`);
  }

  // Convert the fetch body (Web ReadableStream) to a Node.js Readable
  return Readable.fromWeb(response.body);
}

module.exports = { synthesize };

The Bot: Voice Channel Playback

Create bot.js — handles Discord commands and pipes TTS audio into a voice channel:

javascript

// bot.js
require("dotenv").config();
const {
  Client, GatewayIntentBits, REST, Routes, SlashCommandBuilder,
} = require("discord.js");
const {
  joinVoiceChannel, createAudioPlayer, createAudioResource,
  AudioPlayerStatus, StreamType,
} = require("@discordjs/voice");
const { synthesize } = require("./tts");

const client = new Client({
  intents: [
    GatewayIntentBits.Guilds,
    GatewayIntentBits.GuildVoiceStates,
    GatewayIntentBits.GuildMessages,
  ],
});

async function registerCommands() {
  const commands = [
    new SlashCommandBuilder()
      .setName("speak")
      .setDescription("Speak text in your current voice channel")
      .addStringOption((opt) =>
        opt.setName("text").setDescription("What to say").setRequired(true).setMaxLength(500)
      )
      .addStringOption((opt) =>
        opt.setName("voice").setDescription("Voice ID (optional)").setRequired(false)
      )
      .toJSON(),
  ];
  const rest = new REST().setToken(process.env.DISCORD_TOKEN);
  await rest.put(Routes.applicationCommands(process.env.DISCORD_CLIENT_ID), { body: commands });
  console.log("Slash commands registered.");
}

client.once("ready", async () => {
  console.log(`Logged in as ${client.user.tag}`);
  await registerCommands();
});

client.on("interactionCreate", async (interaction) => {
  if (!interaction.isChatInputCommand() || interaction.commandName !== "speak") return;

  const voiceChannel = interaction.member?.voice?.channel;
  if (!voiceChannel) {
    return interaction.reply({ content: "You need to be in a voice channel first.", ephemeral: true });
  }

  const text = interaction.options.getString("text");
  const voiceId = interaction.options.getString("voice") ?? "af_heart";

  await interaction.deferReply({ ephemeral: true });

  try {
    const audioStream = await synthesize(text, voiceId);

    const connection = joinVoiceChannel({
      channelId: voiceChannel.id,
      guildId: interaction.guildId,
      adapterCreator: interaction.guild.voiceAdapterCreator,
      selfDeaf: false,
    });

    const player = createAudioPlayer();
    const resource = createAudioResource(audioStream, { inputType: StreamType.Arbitrary });
    player.play(resource);
    connection.subscribe(player);

    player.on(AudioPlayerStatus.Idle, () => connection.destroy());
    player.on("error", (err) => { console.error("Player error:", err); connection.destroy(); });

    await interaction.editReply({ content: `Speaking: "${text}"` });
  } catch (err) {
    console.error("TTS error:", err);
    await interaction.editReply({ content: "Failed to synthesize audio. Check logs." });
  }
});

client.login(process.env.DISCORD_TOKEN);

Running the Bot

bash

node bot.js

In Discord, type /speak text:Hello, this is your TTS bot while in a voice channel. The bot joins, speaks the text, then leaves.

Character Budgeting for Discord Bots

Discord bots can use characters faster than you expect. A few scenarios:

Use Case	Avg chars/command	Commands/day	Monthly total
Server announcements	200	5	~30,000
Game event narration	150	20	~90,000
Music bot track announces	80	100	~240,000
Full assistant bot	300	50	~450,000

For a small community server with occasional use, Audexum's free 30,000 credits/month covers it. For an active gaming server, the €4/month plan (250K credits) or €12/month (1.2M credits) is the realistic range. See audexum.com/pricing for full plan details.

The referral program also applies here: share your Audexum referral code with your community. Each signup using your code gives both accounts bonus free credits on top of the monthly grant. For a Discord server, this can mean sustained free usage.

Adding Voice Selection for Users

A common pattern is letting users choose from a voice list. Extend the slash command options:

javascript

.addStringOption((opt) =>
  opt
    .setName("voice")
    .setDescription("Choose a voice")
    .setRequired(false)
    .addChoices(
      { name: "Heart (American · F)", value: "af_heart" },
      { name: "Michael (American · M)", value: "am_michael" },
      { name: "Emma (British · F)", value: "bf_emma" },
      { name: "George (British · M)", value: "bm_george" }
    )
)

Check the full voice list at audexum.com/docs. With 43 voices across 33 languages, you can support multilingual communities in a single bot.

Handling Rate Limits and Errors Gracefully

Add a simple retry wrapper around the API call for production bots:

javascript

async function synthesizeWithRetry(text, voiceId, maxRetries = 2) {
  for (let attempt = 0; attempt <= maxRetries; attempt++) {
    try {
      return await synthesize(text, voiceId);
    } catch (err) {
      if (attempt === maxRetries) throw err;
      await new Promise((r) => setTimeout(r, 1000));
    }
  }
}

For high-traffic bots, track character usage in your own database and alert before hitting plan limits.

Alternatives for Discord Bot TTS

Google Cloud TTS — Best free volume (1M chars/month), requires a billing account. Slightly more complex setup with the SDK, but reliable at scale.
OpenAI TTS — No free tier, $15/1M chars. Justified if you are already paying for OpenAI APIs and want to consolidate billing. Good voice quality for longer phrases.
ElevenLabs — High voice quality, 10K free chars with no commercial rights. At bot scale, it gets expensive fast. Full comparison here.
Self-hosted (Coqui/Piper) — Zero API cost once running, but requires a server with GPU or CPU resources, ongoing maintenance, and model management.

Deploying the Bot

For production, run the bot as a process managed by pm2:

bash

pm2 start bot.js --name discord-tts-bot
pm2 save

Make sure your .env file is protected (chmod 600 .env) and not committed to version control.

By Petar, founder of Audexum. The Discord bot use case was one of the first things people asked about after launch — this guide covers exactly what I wish had existed when I was building the first integration.