Introduction
OuteAI provides high-quality text-to-speech generation through two interfaces:
- 1 Studio — a browser-based interface for generating and managing audio. No code required. Great for one-off generations, content production, and voice experimentation.
- 2 API / SDK — a REST API with an official Python SDK for building applications, automating workflows, or integrating TTS into your product.
Both interfaces use the same underlying model, voice library, and credits balance. Audio is billed per second of generated audio at 0.001 credits / second.
GPU Cold Starts
This applies to both Studio and the API. Subsequent requests within the same active window are fast.
In practice: if you are building a latency-sensitive application, consider sending a short warm-up request (e.g., a single word) when your application starts, then making real requests immediately after. You will still be billed normally for the warm-up audio.
Authentication
All API requests are authenticated with a Bearer token. You can create and manage API tokens from your Account page.
Authorization: Bearer oute_xxxxxxxxxxxxxxxxxxxxxxxx
Studio
Using Studio
Studio is a browser-based interface at /studio. Sign in to get started — no setup required.
1. Select a voice
Choose from built-in voices or any voice clone you have created. Voices are shown in the composer at the bottom of the page.
2. Type your text
Enter your text in the composer. Studio supports up to 5,000 characters per generation. Press Enter to generate, or Shift+Enter for a new line.
3. Play, download, or continue
Generated audio appears in your session feed with an inline waveform player. Sessions persist — come back any time to continue generating or download previous audio.
Voice Cloning
Create a custom voice from a reference audio sample in the Voice Clone section of Studio.
Audio requirements
| Requirement | Value |
|---|---|
| Format | WAV or MP3 |
| Duration | 5 – 10 seconds |
| Max file size | 10 MB |
| Speaker | Single speaker, clean audio, minimal background noise |
Once cloned, your voice appears in Studio's voice selector and is available via the API using its UUID. Clones can be deleted at any time from the Voice Clone page.
Voice cloning is billed at 0.025 credits / voice. Clones are stored until you delete them.
Python SDK
Installation
Install the outeai package from PyPI. Python 3.9+ is required.
# Synchronous client only
pip install outeai
# Synchronous + async client (installs httpx)
pip install "outeai[async]"
Quick Start
List available voices, generate speech, and save it to a file.
from outeai import OuteAI
# Initialise the client — use a context manager to auto-close
with OuteAI("oute_xxxxxxxxxxxxxxxxxxxxxxxx") as client:
# List all voices available on your account
voices = client.list_voices()
for v in voices:
print(v["voice_id"], v["speaker_name"])
# Generate speech (non-streaming, returns full WAV)
audio = client.generate_speech(
text="Hello, world! This is OuteAI.",
voice_id=voices[0]["voice_id"],
)
# Save to a WAV file
audio.save("hello.wav")
Generate and save in one call
client.generate_speech_to_file(
"output.wav",
text="Saving directly to a file.",
voice_id="your-voice-uuid",
)
SpeechResult object
generate_speech() returns a SpeechResult with the following attributes:
| Attribute | Type | Description |
|---|---|---|
| wav_bytes | bytes | Raw WAV file bytes |
| content_type | str | Always audio/wav |
| sample_rate | int | 24000 Hz |
| channels | int | 1 (mono) |
| sample_width | int | 2 bytes (16-bit PCM) |
Streaming
Use stream_speech() to receive audio chunks as they are generated. This is useful for real-time playback or piping audio to another process.
with client.stream_speech(
text="Streaming audio chunk by chunk.",
voice_id="your-voice-uuid",
) as stream:
# Iterate over raw WAV chunks
for chunk in stream:
my_audio_pipe.write(chunk)
Stream directly to a file
with client.stream_speech(
text="Saving a stream.",
voice_id="your-voice-uuid",
) as stream:
stream.save("streamed.wav")
# Or use the one-liner shorthand
client.stream_speech_to_file(
"streamed.wav",
text="Saving a stream.",
voice_id="your-voice-uuid",
)
Async Client
Install with async support and use AsyncOuteAI for non-blocking generation in async applications.
pip install "outeai[async]"
import asyncio
from outeai import AsyncOuteAI
async def main():
async with AsyncOuteAI("oute_xxxxxxxxxxxxxxxxxxxxxxxx") as client:
voices = await client.list_voices()
audio = await client.generate_speech(
text="Hello from async!",
voice_id=voices[0]["voice_id"],
)
audio.save("async-output.wav")
asyncio.run(main())
Async streaming
async with client.stream_speech(
text="Async streaming example.",
voice_id="your-voice-uuid",
) as stream:
async for chunk in stream:
my_pipe.write(chunk)
# Or save directly
await client.stream_speech_to_file(
"streamed.wav",
text="Async streaming to file.",
voice_id="your-voice-uuid",
)
Error Handling
The SDK raises two exception types:
from outeai import OuteAI, OuteAIError, OuteAIAPIError
try:
audio = client.generate_speech(text="Hello", voice_id="...")
except OuteAIAPIError as e:
# HTTP error returned by the API
print(e.status_code) # e.g. 402, 429, 400
print(e.code) # e.g. "INSUFFICIENT_CREDITS"
print(e.message) # human-readable description
except OuteAIError as e:
# Network error or SDK-level validation failure
print(e)
| Exception | When raised |
|---|---|
| OuteAIAPIError | The API returned a non-2xx response. Has .status_code, .code, and .message. |
| OuteAIError | Base class. Raised for network failures, invalid arguments, or missing optional dependencies. |
HTTP API Reference
Base URL: https://outeai.com. All endpoints require Authorization: Bearer <token>.
List Voices
Returns built-in voices plus any voice clones you have created. Each voice includes a voice_id (UUID) used when generating speech.
Example response
{
"success": True,
"data": {
"voices": [
{
"voice_id": "3fa85f64-5717-4562-b3fc-2c963f66afa6",
"speaker_name": "My Cloned Voice",
"is_clone": True,
}
]
},
}
Generate Speech
Generates speech and returns the complete WAV file once synthesis is done. Suitable for short texts or when you need the full file before playback.
Request body (JSON)
| Field | Type | Notes |
|---|---|---|
| text required | string | Text to synthesise. Also accepts input (OpenAI-compatible). |
| voice_id required | string | UUID of the voice from GET /api/v1/voices. Also accepts voice. |
| model_name required | string | Model identifier. Currently: outetts_1_pro. Also accepts model. |
| response_format optional | string | Must be wav. Default: wav. |
Example request
curl -X POST https://outeai.com/api/v1/tts_non_stream \
-H "Authorization: Bearer oute_xxxx" \
-H "Content-Type: application/json" \
-H "Accept: audio/wav" \
-d '{
"text": "Hello, world!",
"voice_id": "3fa85f64-5717-4562-b3fc-2c963f66afa6",
"model_name": "outetts_1_pro"
}' \
--output hello.wav
Returns Content-Type: audio/wav binary data on success.
Streaming Speech
Same request body as tts_non_stream. The response is a chunked HTTP transfer where audio data arrives incrementally. The stream begins with a WAV header followed by PCM chunks as they are generated.
Example request
curl -X POST https://outeai.com/api/v1/tts_stream \
-H "Authorization: Bearer oute_xxxx" \
-H "Content-Type: application/json" \
-H "Accept: audio/wav" \
-d '{
"text": "Streaming audio in real time.",
"voice_id": "3fa85f64-5717-4562-b3fc-2c963f66afa6",
"model_name": "outetts_1_pro"
}' \
--output streamed.wav
Voice Clones
Returns all voice clones on your account, plus quota information.
Accepts either multipart/form-data (file upload) or JSON (base64-encoded audio).
Multipart form (recommended)
| Field | Type | Notes |
|---|---|---|
| speaker_name required | string | 2–80 characters. Display name for this voice. |
| audio required | file | WAV or MP3, 5–10 seconds, max 10 MB. |
curl -X POST https://outeai.com/api/v1/voice-clones \
-H "Authorization: Bearer oute_xxxx" \
-F "speaker_name=My Voice" \
-F "audio=@/path/to/sample.wav"
JSON (base64)
curl -X POST https://outeai.com/api/v1/voice-clones \
-H "Authorization: Bearer oute_xxxx" \
-H "Content-Type: application/json" \
-d '{
"speaker_name": "My Voice",
"audio_base64": "<base64-encoded-audio>"
}'
Permanently removes the voice clone. Audio previously generated with this voice is not affected.
API Tokens
Returns all active tokens on your account (secrets are masked).
Creates a new token. The full secret is returned only once — store it securely.
Immediately revokes the token. Requests using this token will return 401.
Reference
Models
| ID | Display name | Sample rate | Channels |
|---|---|---|---|
| outetts_1_pro | OuteTTS-1-Pro | 24 000 Hz | 1 (mono) |
Additional models may be added in future. Always pass the model ID string in your requests.
Error Codes
All error responses share the same JSON envelope:
{
"success": False,
"code": "INSUFFICIENT_CREDITS",
"message": "You do not have enough credits to start this request.",
}
| HTTP | Code | Meaning |
|---|---|---|
| 400 | VALIDATION_ERROR | Missing or invalid request field |
| 401 | UNAUTHORIZED | Missing or invalid API token |
| 402 | INSUFFICIENT_CREDITS | Not enough credits to generate audio. Top up in your account. |
| 404 | NOT_FOUND | Resource (voice, session) not found |
| 429 | TTS_RATE_LIMITED | Too many requests per minute. Wait and retry. |
| 502 | TTS_PROVIDER_UNAVAILABLE | Upstream GPU provider is unavailable. Retry after a short wait. |