Преминете към основното съдържание

ASR WebSocket API Reference

Endpoint: ws://<host>:8765/ws/transcribe

Protocol Overview

The ASR API uses a simple 3-step request-response over a short-lived WebSocket connection:


Step 1: Metadata (Client → Server)

Send a JSON text frame with transcription parameters.

{
"type": "transcribe",
"language": "bg",
"task": "transcribe",
"api_key": "your-api-key",
"machine_id": "a1b2c3d4e5f6...",
"postprocess": true
}

Fields

FieldTypeRequiredDefaultDescription
typestringyes"transcribe" for transcription, "ping" for health check
languagestring | nullnonull"en"ISO 639-1 language code
taskstringno"transcribe""transcribe" (keep language) or "translate" (to English)
api_keystringconditionalRequired if server has SERVER_API_KEY configured
machine_idstringno"unknown"SHA256 hardware fingerprint for telemetry
postprocessboolnoserver defaultOverride LLM text correction per-request

Supported Languages

bg, en, de, fr, es, it, pt, nl, pl, ru, cs, da, sv, hr, el, hu, ro, sk, sl, et, fi, lv, lt, mt, uk

Machine ID Computation

SHA256 hash of hardware components:

machine_id = SHA256(
bios_serial +
board_serial +
board_manufacturer +
board_product +
cpu_id +
disk_serial
)

Sources (Windows WMI): Win32_BIOS.SerialNumber, Win32_BaseBoard.*, Win32_Processor.ProcessorId, Win32_DiskDrive.SerialNumber


Step 2: Audio Data (Client → Server)

Send the audio as a binary WebSocket frame containing a WAV file.

Audio Requirements

PropertyValue
FormatWAV (RIFF)
Sample rateAny (server resamples to 16kHz; 16kHz preferred)
ChannelsMono
Bit depthFloat32 or Int16

Step 3: Result (Server → Client)

Success Response

{
"text": "Здравейте, как сте?",
"segments": [
{
"text": "Здравейте, как сте?",
"start": 0.0,
"end": 2.5
}
],
"language": "bg",
"postprocessed_text": "Здравейте, как сте?",
"postprocess_ms": 150,
"postprocess_error": null
}

Response Fields

FieldTypeDescription
textstringRaw transcription/translation from ASR
segmentsarrayTimed segments: {text, start, end} (seconds)
languagestringDetected/used source language code
postprocessed_textstring | nullLLM-corrected text (null if disabled or failed)
postprocess_msnumberPost-processing latency in milliseconds
postprocess_errorstring | nullError message if post-processing failed

Error Response

{
"error": "unauthorized"
}

Common errors: "unauthorized" (invalid API key)


Ping / Health Check

WebSocket Ping

{"type": "ping"}

Response:

{"type": "pong", "status": "ok"}

HTTP Health

GET /health
{
"status": "ok",
"postprocess_enabled": false
}

Prometheus Metrics

Available at /metrics (via nginx proxy).

MetricTypeLabelsDescription
asr_requests_totalcountermachine_id, task, statusTotal request count
asr_audio_seconds_totalcountermachine_id, taskAudio duration processed
asr_processing_secondshistogrammachine_id, taskEnd-to-end processing latency
asr_postprocess_secondshistogrammachine_id, taskLLM post-processing latency

Example PromQL Queries

# Total requests per machine
sum by (machine_id) (asr_requests_total)

# Total audio minutes per machine
sum by (machine_id) (asr_audio_seconds_total) / 60

# Request rate over last hour
rate(asr_requests_total[1h])

# Average processing time
rate(asr_processing_seconds_sum[5m]) / rate(asr_processing_seconds_count[5m])

Code Examples

Python

import json, io
import soundfile as sf
from websockets.sync.client import connect

audio, sr = sf.read("recording.wav", dtype="float32")

with connect("ws://localhost:8765/ws/transcribe") as ws:
ws.send(json.dumps({
"type": "transcribe",
"language": "bg",
"task": "transcribe",
"api_key": "your-key",
"machine_id": "a1b2c3...",
}))

buf = io.BytesIO()
sf.write(buf, audio, sr, format="WAV")
ws.send(buf.getvalue())

result = json.loads(ws.recv())
text = result.get("postprocessed_text") or result["text"]
print(text)

JavaScript

const ws = new WebSocket("ws://localhost:8765/ws/transcribe");

ws.onopen = () => {
ws.send(JSON.stringify({
type: "transcribe",
language: "bg",
api_key: "your-key",
}));
ws.send(wavArrayBuffer);
};

ws.onmessage = (event) => {
const result = JSON.parse(event.data);
const text = result.postprocessed_text || result.text;
console.log(text);
};

Rust (tungstenite)

use tungstenite::connect;
use serde_json::json;

let (mut ws, _) = connect("ws://localhost:8765/ws/transcribe")?;

// Send metadata
ws.send(Message::Text(json!({
"type": "transcribe",
"language": "bg",
"api_key": "your-key",
}).to_string()))?;

// Send WAV audio
ws.send(Message::Binary(wav_bytes))?;

// Receive result
let result = ws.read()?;