All posts
Tutorial Python API

How to Add Text-to-Speech to Your Python App in 5 Minutes

March 28, 2026 · 6 min read

Adding text-to-speech to a Python app used to mean installing heavy ML libraries, downloading multi-gigabyte model weights, and fighting GPU drivers — just to hear a single sentence. The OuteAI API gives you studio-quality speech from a simple HTTP call. The official Python SDK wraps those calls so you can ship in minutes, not days.

This guide walks through everything from installation to streaming real-time audio, using real code you can drop straight into a project.

Why text-to-speech belongs in your Python app

Voice interfaces have gone from a nice-to-have to a genuine user expectation. A few concrete places where TTS integration pays off:

  • Accessibility — users with visual impairments or reading difficulties can consume your content by ear.
  • Content automation — turn articles, newsletters, or documentation into audio with zero manual recording.
  • AI assistants — give your chatbot or LLM a voice so responses feel immediate and natural.
  • E-learning — narrate course modules automatically, keeping your voice consistent across every lesson.
  • Notifications — read out alerts, summaries, or status updates in ambient environments where screens aren't practical.

All of the above are one API call away.

Prerequisites

Before writing any code, make sure you have:

  • Python 3.9 or newer
  • An OuteAI account — sign up here if you don't have one
  • An API token from your Account page
  • Credits in your account (generation costs 0.001 credits / second of audio)

No GPU required on your end. Everything runs on OuteAI's infrastructure.

Install the SDK

The outeai package is on PyPI. One line to install:

pip install outeai

If you need the async client for use with asyncio or FastAPI, install the optional extra:

pip install "outeai[async]"

That's it. No CUDA setup, no model downloads, no native dependencies.

List available voices

Every generation call requires a voice_id. OuteAI ships with a library of built-in voices, and any voice clones you create live alongside them. Before generating audio, list what's available on your account:

from outeai import OuteAI

client = OuteAI("oute_xxxxxxxxxxxxxxxxxxxx")
voices = client.list_voices()

for v in voices:
    print(v["voice_id"], "—", v["speaker_name"])

Each entry has a voice_id (a UUID string you pass to generation calls) and a human-readable speaker_name. Copy the voice_id of whichever voice fits your use case.

Built-in voices cover English, Spanish, French, German, Japanese, Chinese, Arabic, and many more. See the full language list on the home page.

Generate speech and save to a file

Pass your text and voice_id to generate_speech(). The call blocks until the full audio is ready, then returns a SpeechResult containing the raw WAV bytes.

from outeai import OuteAI

client = OuteAI("oute_xxxxxxxxxxxxxxxxxxxx")

result = client.generate_speech(
    text="Hello! This audio was generated by OuteAI.",
    voice_id="your-voice-uuid-here",
)

result.save("output.wav")
print("Saved to output.wav")

The .save() method writes the WAV file to disk. If you'd rather work with the raw bytes — for example, to serve the audio directly over HTTP — use result.wav_bytes.

One-liner shortcut

Skip the intermediate object entirely with generate_speech_to_file():

client.generate_speech_to_file(
    "output.wav",
    text="Saving directly to disk in a single call.",
    voice_id="your-voice-uuid-here",
)

Serving audio over HTTP (Flask example)

To return audio directly from a web endpoint without touching the filesystem:

from flask import Flask, Response
from outeai import OuteAI
import os

app = Flask(__name__)
client = OuteAI(os.environ["OUTEAI_API_KEY"])

@app.route("/speak")
def speak():
    result = client.generate_speech(
        text="Welcome to my app.",
        voice_id="your-voice-uuid-here",
    )
    return Response(result.wav_bytes, mimetype="audio/wav")

Stream audio in real time

Non-streaming generation waits for the full audio clip before returning anything. For longer texts or latency-sensitive applications — live assistants, real-time voice interfaces, audio players — streaming starts delivering chunks as soon as generation begins.

from outeai import OuteAI

client = OuteAI("oute_xxxxxxxxxxxxxxxxxxxx")

with client.stream_speech(
    text="Streaming lets you start playing audio before generation is finished.",
    voice_id="your-voice-uuid-here",
) as stream:
    with open("streamed_output.wav", "wb") as f:
        for chunk in stream:
            f.write(chunk)

Each chunk is a bytes object containing a fragment of WAV audio. You can pipe these chunks directly to a player, a WebSocket connection, or any downstream consumer — no need to wait for the loop to finish before starting playback.

The convenience method does the same thing in one call:

client.stream_speech_to_file(
    "streamed_output.wav",
    text="Your narration text goes here.",
    voice_id="your-voice-uuid-here",
)
GPU cold starts: If the OuteAI GPU hasn't been used recently, the first request may take 10–40 seconds before audio begins. Subsequent requests within the same active window are fast. For latency-sensitive apps, send a short warm-up request at startup.

Secure your API key

Never hardcode an API key in source code. Load it from an environment variable instead:

import os
from outeai import OuteAI

client = OuteAI(os.environ["OUTEAI_API_KEY"])

Set the variable in your shell before running:

export OUTEAI_API_KEY="oute_xxxxxxxxxxxxxxxxxxxx"
python your_script.py

For projects that use .env files, python-dotenv handles loading automatically:

pip install python-dotenv
from dotenv import load_dotenv
load_dotenv()  # reads .env before the rest of your script

You can create, rename, and revoke API tokens at any time from your Account page. If a token is ever exposed, revoke it immediately and issue a new one.

What's next

You now have everything needed to generate high-quality speech from Python. A few natural directions from here:

  • Clone a voice — upload a 10-second audio sample and get a voice_id you can reuse across any call. Read the voice cloning guide.
  • Explore the full API reference — the REST API lets you do everything the SDK does, from any language or environment. See the API docs.
  • Check your credit balance — each second of audio costs 0.001 credits. Track usage from your account page.
  • Try Studio — if you want to generate audio without any code, the Studio interface has the same voices and model in a browser UI.

Get started

Ready to build with speech AI?

No subscription, no seats. Top up credits and spend them only when you generate audio. Credits never expire.