When using the WellSaid API for text-to-speech generation, you have the option to select which voice model you would like to use. The model determines the voice performance engine powering the audio, and some models support enhanced features like AI Director.


Available Models

Tag (Optional)Description
legacyThe original WellSaid voice model. If no model is specified in your API request, this model will be used automatically. It delivers natural-sounding speech and supports most core voice features.
carusoThe next-generation WellSaid voice model, built to support advanced performance cues via AI Director. It enables dynamic control over pitch, tempo, loudness, and respelling, giving you more expressive and natural results.

How to Specify the Model in the API

To choose a model, include the model field in the JSON payload sent to any/ttsendpoint. Here's an example usingcaruso\:

{
  "speaker_id": 26,
  "model": "caruso",
  "text": "<pitch value=\"-250\">This sentence feels deeper and more serious.</pitch>"
}

If the model field is omitted, the API will default to the legacy model:

{
  "speaker_id": 26,
  "text": "This uses the legacy model because no model is specified."
}


Why Choose the Caruso Model?
The caruso model unlocks AI Director capabilities, allowing you to fine-tune the emotional and stylistic delivery of voiceovers using inline markup tags. With caruso, you can modify:

Pitch (e.g. deeper or higher tone)

Tempo (slower or faster pace)

Loudness (quiet to commanding projection)

These advanced controls give you studio-level flexibility directly through the API.

Learn more: Using AI Director with the API