ASR WebSocket API Reference
Endpoint: ws://<host>:8765/ws/transcribe
Protocol Overview
The ASR API uses a simple 3-step request-response over a short-lived WebSocket connection:
Step 1: Metadata (Client → Server)
Send a JSON text frame with transcription parameters.
{
"type": "transcribe",
"language": "bg",
"task": "transcribe",
"api_key": "your-api-key",
"machine_id": "a1b2c3d4e5f6...",
"postprocess": true
}
Fields
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
type | string | yes | — | "transcribe" for transcription, "ping" for health check |
language | string | null | no | null → "en" | ISO 639-1 language code |
task | string | no | "transcribe" | "transcribe" (keep language) or "translate" (to English) |
api_key | string | conditional | — | Required if server has SERVER_API_KEY configured |
machine_id | string | no | "unknown" | SHA256 hardware fingerprint for telemetry |
postprocess | bool | no | server default | Override LLM text correction per-request |
Supported Languages
bg, en, de, fr, es, it, pt, nl, pl, ru, cs, da, sv, hr, el, hu, ro, sk, sl, et, fi, lv, lt, mt, uk
Machine ID Computation
SHA256 hash of hardware components:
machine_id = SHA256(
bios_serial +
board_serial +
board_manufacturer +
board_product +
cpu_id +
disk_serial
)
Sources (Windows WMI): Win32_BIOS.SerialNumber, Win32_BaseBoard.*, Win32_Processor.ProcessorId, Win32_DiskDrive.SerialNumber
Step 2: Audio Data (Client → Server)
Send the audio as a binary WebSocket frame containing a WAV file.
Audio Requirements
| Property | Value |
|---|---|
| Format | WAV (RIFF) |
| Sample rate | Any (server resamples to 16kHz; 16kHz preferred) |
| Channels | Mono |
| Bit depth | Float32 or Int16 |
Step 3: Result (Server → Client)
Success Response
{
"text": "Здравейте, как сте?",
"segments": [
{
"text": "Здравейте, как сте?",
"start": 0.0,
"end": 2.5
}
],
"language": "bg",
"postprocessed_text": "Здравейте, как сте?",
"postprocess_ms": 150,
"postprocess_error": null
}
Response Fields
| Field | Type | Description |
|---|---|---|
text | string | Raw transcription/translation from ASR |
segments | array | Timed segments: {text, start, end} (seconds) |
language | string | Detected/used source language code |
postprocessed_text | string | null | LLM-corrected text (null if disabled or failed) |
postprocess_ms | number | Post-processing latency in milliseconds |
postprocess_error | string | null | Error message if post-processing failed |
Error Response
{
"error": "unauthorized"
}
Common errors: "unauthorized" (invalid API key)
Ping / Health Check
WebSocket Ping
{"type": "ping"}
Response:
{"type": "pong", "status": "ok"}
HTTP Health
GET /health
{
"status": "ok",
"postprocess_enabled": false
}
Prometheus Metrics
Available at /metrics (via nginx proxy).
| Metric | Type | Labels | Description |
|---|---|---|---|
asr_requests_total | counter | machine_id, task, status | Total request count |
asr_audio_seconds_total | counter | machine_id, task | Audio duration processed |
asr_processing_seconds | histogram | machine_id, task | End-to-end processing latency |
asr_postprocess_seconds | histogram | machine_id, task | LLM post-processing latency |
Example PromQL Queries
# Total requests per machine
sum by (machine_id) (asr_requests_total)
# Total audio minutes per machine
sum by (machine_id) (asr_audio_seconds_total) / 60
# Request rate over last hour
rate(asr_requests_total[1h])
# Average processing time
rate(asr_processing_seconds_sum[5m]) / rate(asr_processing_seconds_count[5m])
Code Examples
Python
import json, io
import soundfile as sf
from websockets.sync.client import connect
audio, sr = sf.read("recording.wav", dtype="float32")
with connect("ws://localhost:8765/ws/transcribe") as ws:
ws.send(json.dumps({
"type": "transcribe",
"language": "bg",
"task": "transcribe",
"api_key": "your-key",
"machine_id": "a1b2c3...",
}))
buf = io.BytesIO()
sf.write(buf, audio, sr, format="WAV")
ws.send(buf.getvalue())
result = json.loads(ws.recv())
text = result.get("postprocessed_text") or result["text"]
print(text)
JavaScript
const ws = new WebSocket("ws://localhost:8765/ws/transcribe");
ws.onopen = () => {
ws.send(JSON.stringify({
type: "transcribe",
language: "bg",
api_key: "your-key",
}));
ws.send(wavArrayBuffer);
};
ws.onmessage = (event) => {
const result = JSON.parse(event.data);
const text = result.postprocessed_text || result.text;
console.log(text);
};
Rust (tungstenite)
use tungstenite::connect;
use serde_json::json;
let (mut ws, _) = connect("ws://localhost:8765/ws/transcribe")?;
// Send metadata
ws.send(Message::Text(json!({
"type": "transcribe",
"language": "bg",
"api_key": "your-key",
}).to_string()))?;
// Send WAV audio
ws.send(Message::Binary(wav_bytes))?;
// Receive result
let result = ws.read()?;