Skip to main content
POST
/
v1
/
audio
/
stt
Speech to Text
curl --request POST \
  --url https://apis.finevoice.ai/v1/audio/stt \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: application/json' \
  --data '
{
  "url": "https://example.com/audio.mp3",
  "language": "en",
  "title": "<string>",
  "format": "json",
  "engine": "whisper",
  "useAsync": true,
  "word_level_timestamp_alignment": false,
  "speaker_diarization": false,
  "min_speakers": 123,
  "max_speakers": 123,
  "batch_size": 123,
  "script_target": "<string>"
}
'
{
  "status": 123,
  "url": "<string>",
  "taskId": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
  "error": {
    "code": "<string>",
    "message": "<string>"
  },
  "urls": [
    "<string>"
  ],
  "service": "<string>",
  "port": "<string>",
  "timestamp": "<string>"
}

Authorizations

Authorization
string
header
required

Bearer token (API key). Format: Bearer {your_api_key}

Body

application/json

The speech-to-text request payload.

url
string

The source audio or video URL.

Example:

"https://example.com/audio.mp3"

language
string

The source language code (e.g. en, zh, ja).

Example:

"en"

title
string

An optional task title for reference.

format
string

Expected transcript output format: srt, vtt, json, txt.

Example:

"json"

engine
string

The transcription engine to use. Supported: whisper, funasr.

Example:

"whisper"

useAsync
boolean

Set to true to process asynchronously.

Example:

true

word_level_timestamp_alignment
boolean

Whether to include word-level timestamp alignment.

Example:

false

speaker_diarization
boolean

Whether to enable speaker diarization (identify multiple speakers).

Example:

false

min_speakers
integer<int32>

Minimum number of speakers for diarization.

max_speakers
integer<int32>

Maximum number of speakers for diarization.

batch_size
integer<int32>

The transcription batch size.

script_target
string

The target output script style or format.

Response

Task accepted. Returns a taskId for async polling or the result URL directly.

Standard response for audio processing tasks.

status
integer<int32>

HTTP-style status code (200 for success, 202 for in-progress).

url
string

Download URL of the generated audio file (available when completed).

taskId
string

Task identifier for async polling. Use with GET /v1/task/{task_id}.

Example:

"a1b2c3d4-e5f6-7890-abcd-ef1234567890"

error
object
urls
string[]

Multiple output URLs (e.g. for separation stems).

service
string
port
string
timestamp
string