Skip to main content
POST
/
v1
/
enhancer
/
process
/
pipeline
Process Pipeline
curl --request POST \
  --url https://apis.finevoice.ai/v1/enhancer/process/pipeline \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: application/json' \
  --data '
{
  "url": "https://example.com/audio.mp3",
  "step_speech_enhancement": true,
  "step_remove_mouth_sounds": false,
  "step_remove_long_silences": false,
  "step_super_resolution": false,
  "step_filler_words_remove": false,
  "step_stuttering_remove": false,
  "step_audio_normalization": false,
  "enhancement_model": "FRCRN_SE_16K",
  "enhancement_use_vad": false,
  "silence_threshold_db": -42,
  "long_silence_ms": 1200,
  "keep_silence_ms": 180,
  "filler_use_whisper": false,
  "filler_language": "en",
  "filler_words_list": "um,uh,er,hmm,hm,ah,eh,mhm",
  "stutter_min_similarity": 0.75,
  "norm_method": "peak",
  "norm_peak_db": -1,
  "norm_rms_db": -20,
  "output_format": "wav"
}
'
{}

Authorizations

Authorization
string
header
required

Bearer token (API key). Format: Bearer {your_api_key}

Body

application/json

The all-in-one audio processing pipeline request payload.

url
string

Audio file URL (http/https).

Example:

"https://example.com/audio.mp3"

step_speech_enhancement
boolean
default:true

Enable Speech Enhancement step (noise reduction).

step_remove_mouth_sounds
boolean
default:false

Enable Remove Mouth Sounds step.

step_remove_long_silences
boolean
default:false

Enable Remove Long Silences step.

step_super_resolution
boolean
default:false

Enable Super Resolution step (8 kHz to 48 kHz).

step_filler_words_remove
boolean
default:false

Enable Filler Words Removal step.

step_stuttering_remove
boolean
default:false

Enable Stuttering Removal step.

step_audio_normalization
boolean
default:false

Enable Audio Normalization step.

enhancement_model
string
default:FRCRN_SE_16K

Speech enhancement model: MossFormer2_SE_48K, FRCRN_SE_16K, or MossFormerGAN_SE_16K.

enhancement_use_vad
boolean
default:false

Enable VAD preprocessing for speech enhancement.

silence_threshold_db
number
default:-42

Silence detection threshold in dBFS.

long_silence_ms
integer
default:1200

Silences longer than this value (ms) are trimmed.

keep_silence_ms
integer
default:180

Silence padding to preserve at boundaries (ms).

filler_use_whisper
boolean
default:false

Use Whisper for filler word detection.

filler_language
string
default:en

Language code for filler word detection (e.g. en, zh).

filler_words_list
string
default:um,uh,er,hmm,hm,ah,eh,mhm

Comma-separated filler words to remove. Empty = built-in defaults.

stutter_min_similarity
number
default:0.75

Minimum similarity for stutter detection.

Required range: 0.5 <= x <= 1
norm_method
string
default:peak

Normalization method: peak, rms, or both.

norm_peak_db
number
default:-1

Target peak level in dBFS.

norm_rms_db
number
default:-20

Target RMS level in dBFS.

output_format
string
default:wav

Output format: wav, mp3, flac, or m4a.

Response

Processed audio download URL returned.

The response is of type object.