Skip to content
All posts
·9 min read·By Petar

TTS API for Discord Bot: Complete Setup Guide with discord.js (2026)

Build a Discord bot that synthesizes speech and plays it in a voice channel using Audexum's TTS API and discord.js. Full working code included.


Discord's built-in TTS (/tts) is useful for five minutes before it becomes annoying. If you want a bot that speaks in a voice channel with an actual voice — not the robotic browser TTS — you need an external TTS API. This guide covers the full setup: picking a TTS API, integrating it with discord.js v14, and playing audio in a voice channel. All code is copy-paste ready.

What You Need

  • Node.js 18+
  • A Discord bot application and token
  • A TTS API key (Audexum free tier works — 10,000 chars/month, no credit card)
  • @discordjs/voice, discord.js, @discordjs/opus, and ffmpeg-static packages

The bot flow is: user types a command → bot calls TTS API → receives MP3 audio → plays it in the user's current voice channel.

Choosing a TTS API for Your Discord Bot

Your main constraints for Discord bot use are: latency (voice channels are interactive), cost at scale (bots can generate a lot of audio fast), and simplicity (you do not need voice cloning — you need a reliable REST endpoint).

ProviderLatency (typical)Free TierCost at 1M charsAPI Simplicity
Audexum~300–700ms10K/mo€8Simple REST, Bearer auth
ElevenLabs~400–900ms10K/mo (no commercial)$11–$330Good docs
OpenAI TTS~600–1200msNone$15Simple REST
Google Cloud TTS~200–500ms1M/mo (billing req.)$4–$16SDK or REST
AWS Polly~200–600ms5M (12-mo trial)$4SDK required

For Discord bots, latency matters more than for batch pipelines. The gap between providers is noticeable — anything over 1 second feels sluggish in an interactive voice session. Audexum's REST API is the simplest to integrate — no SDK required, Bearer token auth, binary audio response.

See Audexum vs OpenAI TTS for a detailed latency and cost breakdown.

Project Setup

bash
mkdir discord-tts-bot && cd discord-tts-bot
npm init -y
npm install discord.js @discordjs/voice @discordjs/opus ffmpeg-static dotenv

Create a .env file:

env
DISCORD_TOKEN=your_discord_bot_token
DISCORD_CLIENT_ID=your_application_client_id
AUDEXUM_API_KEY=your_audexum_api_key

Get your Audexum API key from audexum.com/signup — the free tier (10K chars/month) is enough to prototype a bot.

The TTS Helper

Create tts.js — this handles the API call and returns a readable stream for audio playback:

javascript
// tts.js
const { Readable } = require("stream");

const AUDEXUM_API = "https://audexum.com/api/synthesize";

async function synthesize(text, voiceId = "af_heart") {
  const response = await fetch(AUDEXUM_API, {
    method: "POST",
    headers: {
      Authorization: `Bearer ${process.env.AUDEXUM_API_KEY}`,
      "Content-Type": "application/json",
    },
    body: JSON.stringify({
      text,
      voice: voiceId,
      format: "mp3",
    }),
  });

  if (!response.ok) {
    const body = await response.text();
    throw new Error(`Audexum API error ${response.status}: ${body}`);
  }

  // Convert the fetch body (Web ReadableStream) to a Node.js Readable
  return Readable.fromWeb(response.body);
}

module.exports = { synthesize };

The Bot: Voice Channel Playback

Create bot.js — handles Discord commands and pipes TTS audio into a voice channel:

javascript
// bot.js
require("dotenv").config();
const {
  Client, GatewayIntentBits, REST, Routes, SlashCommandBuilder,
} = require("discord.js");
const {
  joinVoiceChannel, createAudioPlayer, createAudioResource,
  AudioPlayerStatus, StreamType,
} = require("@discordjs/voice");
const { synthesize } = require("./tts");

const client = new Client({
  intents: [
    GatewayIntentBits.Guilds,
    GatewayIntentBits.GuildVoiceStates,
    GatewayIntentBits.GuildMessages,
  ],
});

async function registerCommands() {
  const commands = [
    new SlashCommandBuilder()
      .setName("speak")
      .setDescription("Speak text in your current voice channel")
      .addStringOption((opt) =>
        opt.setName("text").setDescription("What to say").setRequired(true).setMaxLength(500)
      )
      .addStringOption((opt) =>
        opt.setName("voice").setDescription("Voice ID (optional)").setRequired(false)
      )
      .toJSON(),
  ];
  const rest = new REST().setToken(process.env.DISCORD_TOKEN);
  await rest.put(Routes.applicationCommands(process.env.DISCORD_CLIENT_ID), { body: commands });
  console.log("Slash commands registered.");
}

client.once("ready", async () => {
  console.log(`Logged in as ${client.user.tag}`);
  await registerCommands();
});

client.on("interactionCreate", async (interaction) => {
  if (!interaction.isChatInputCommand() || interaction.commandName !== "speak") return;

  const voiceChannel = interaction.member?.voice?.channel;
  if (!voiceChannel) {
    return interaction.reply({ content: "You need to be in a voice channel first.", ephemeral: true });
  }

  const text = interaction.options.getString("text");
  const voiceId = interaction.options.getString("voice") ?? "af_heart";

  await interaction.deferReply({ ephemeral: true });

  try {
    const audioStream = await synthesize(text, voiceId);

    const connection = joinVoiceChannel({
      channelId: voiceChannel.id,
      guildId: interaction.guildId,
      adapterCreator: interaction.guild.voiceAdapterCreator,
      selfDeaf: false,
    });

    const player = createAudioPlayer();
    const resource = createAudioResource(audioStream, { inputType: StreamType.Arbitrary });
    player.play(resource);
    connection.subscribe(player);

    player.on(AudioPlayerStatus.Idle, () => connection.destroy());
    player.on("error", (err) => { console.error("Player error:", err); connection.destroy(); });

    await interaction.editReply({ content: `Speaking: "${text}"` });
  } catch (err) {
    console.error("TTS error:", err);
    await interaction.editReply({ content: "Failed to synthesize audio. Check logs." });
  }
});

client.login(process.env.DISCORD_TOKEN);

Running the Bot

bash
node bot.js

In Discord, type /speak text:Hello, this is your TTS bot while in a voice channel. The bot joins, speaks the text, then leaves.

Character Budgeting for Discord Bots

Discord bots can use characters faster than you expect. A few scenarios:

Use CaseAvg chars/commandCommands/dayMonthly total
Server announcements2005~30,000
Game event narration15020~90,000
Music bot track announces80100~240,000
Full assistant bot30050~450,000

For a small community server with occasional use, Audexum's free 10,000 chars/month covers it. For an active gaming server, the €4/month plan (100K chars) or €12/month (500K chars) is the realistic range. See audexum.com/pricing for full plan details.

The referral program also applies here: share your Audexum referral code with your community. Each signup using your code gives both accounts +10,000 free characters. For a Discord server, this can mean sustained free usage.

Adding Voice Selection for Users

A common pattern is letting users choose from a voice list. Extend the slash command options:

javascript
.addStringOption((opt) =>
  opt
    .setName("voice")
    .setDescription("Choose a voice")
    .setRequired(false)
    .addChoices(
      { name: "Heart (American · F)", value: "af_heart" },
      { name: "Michael (American · M)", value: "am_michael" },
      { name: "Emma (British · F)", value: "bf_emma" },
      { name: "George (British · M)", value: "bm_george" }
    )
)

Check the full voice list at audexum.com/docs. With 43 voices across 33 languages, you can support multilingual communities in a single bot.

Handling Rate Limits and Errors Gracefully

Add a simple retry wrapper around the API call for production bots:

javascript
async function synthesizeWithRetry(text, voiceId, maxRetries = 2) {
  for (let attempt = 0; attempt <= maxRetries; attempt++) {
    try {
      return await synthesize(text, voiceId);
    } catch (err) {
      if (attempt === maxRetries) throw err;
      await new Promise((r) => setTimeout(r, 1000));
    }
  }
}

For high-traffic bots, track character usage in your own database and alert before hitting plan limits.

Alternatives for Discord Bot TTS

  • Google Cloud TTS — Best free volume (1M chars/month), requires a billing account. Slightly more complex setup with the SDK, but reliable at scale.
  • OpenAI TTS — No free tier, $15/1M chars. Justified if you are already paying for OpenAI APIs and want to consolidate billing. Good voice quality for longer phrases.
  • ElevenLabs — High voice quality, 10K free chars with no commercial rights. At bot scale, it gets expensive fast. Full comparison here.
  • Self-hosted (Coqui/Piper) — Zero API cost once running, but requires a server with GPU or CPU resources, ongoing maintenance, and model management.

Deploying the Bot

For production, run the bot as a process managed by pm2:

bash
pm2 start bot.js --name discord-tts-bot
pm2 save

Make sure your .env file is protected (chmod 600 .env) and not committed to version control.


By Petar, founder of Audexum. The Discord bot use case was one of the first things people asked about after launch — this guide covers exactly what I wish had existed when I was building the first integration.

Start for free — 10,000 characters/month, no credit card.