ASR WebSocket API Reference

Endpoint: ws://<host>:8765/ws/transcribe

Protocol Overview

The ASR API uses a simple 3-step request-response over a short-lived WebSocket connection:

Step 1: Metadata (Client → Server)

Send a JSON text frame with transcription parameters.

{
  "type": "transcribe",
  "language": "bg",
  "task": "transcribe",
  "api_key": "your-api-key",
  "machine_id": "a1b2c3d4e5f6...",
  "postprocess": true
}

Fields

Field	Type	Required	Default	Description
`type`	string	yes	—	`"transcribe"` for transcription, `"ping"` for health check
`language`	string \| null	no	`null` → `"en"`	ISO 639-1 language code
`task`	string	no	`"transcribe"`	`"transcribe"` (keep language) or `"translate"` (to English)
`api_key`	string	conditional	—	Required if server has `SERVER_API_KEY` configured
`machine_id`	string	no	`"unknown"`	SHA256 hardware fingerprint for telemetry
`postprocess`	bool	no	server default	Override LLM text correction per-request

Supported Languages

bg, en, de, fr, es, it, pt, nl, pl, ru, cs, da, sv, hr, el, hu, ro, sk, sl, et, fi, lv, lt, mt, uk

Machine ID Computation

SHA256 hash of hardware components:

machine_id = SHA256(
    bios_serial +
    board_serial +
    board_manufacturer +
    board_product +
    cpu_id +
    disk_serial
)

Sources (Windows WMI): Win32_BIOS.SerialNumber, Win32_BaseBoard.*, Win32_Processor.ProcessorId, Win32_DiskDrive.SerialNumber

Step 2: Audio Data (Client → Server)

Send the audio as a binary WebSocket frame containing a WAV file.

Audio Requirements

Property	Value
Format	WAV (RIFF)
Sample rate	Any (server resamples to 16kHz; 16kHz preferred)
Channels	Mono
Bit depth	Float32 or Int16

Step 3: Result (Server → Client)

Success Response

{
  "text": "Здравейте, как сте?",
  "segments": [
    {
      "text": "Здравейте, как сте?",
      "start": 0.0,
      "end": 2.5
    }
  ],
  "language": "bg",
  "postprocessed_text": "Здравейте, как сте?",
  "postprocess_ms": 150,
  "postprocess_error": null
}

Response Fields

Field	Type	Description
`text`	string	Raw transcription/translation from ASR
`segments`	array	Timed segments: `{text, start, end}` (seconds)
`language`	string	Detected/used source language code
`postprocessed_text`	string \| null	LLM-corrected text (null if disabled or failed)
`postprocess_ms`	number	Post-processing latency in milliseconds
`postprocess_error`	string \| null	Error message if post-processing failed

Error Response

{
  "error": "unauthorized"
}

Common errors: "unauthorized" (invalid API key)

Ping / Health Check

WebSocket Ping

{"type": "ping"}

Response:

{"type": "pong", "status": "ok"}

HTTP Health

GET /health

{
  "status": "ok",
  "postprocess_enabled": false
}

Prometheus Metrics

Available at /metrics (via nginx proxy).

Metric	Type	Labels	Description
`asr_requests_total`	counter	`machine_id`, `task`, `status`	Total request count
`asr_audio_seconds_total`	counter	`machine_id`, `task`	Audio duration processed
`asr_processing_seconds`	histogram	`machine_id`, `task`	End-to-end processing latency
`asr_postprocess_seconds`	histogram	`machine_id`, `task`	LLM post-processing latency

Example PromQL Queries

# Total requests per machine
sum by (machine_id) (asr_requests_total)

# Total audio minutes per machine
sum by (machine_id) (asr_audio_seconds_total) / 60

# Request rate over last hour
rate(asr_requests_total[1h])

# Average processing time
rate(asr_processing_seconds_sum[5m]) / rate(asr_processing_seconds_count[5m])

Code Examples

Python

import json, io
import soundfile as sf
from websockets.sync.client import connect

audio, sr = sf.read("recording.wav", dtype="float32")

with connect("ws://localhost:8765/ws/transcribe") as ws:
    ws.send(json.dumps({
        "type": "transcribe",
        "language": "bg",
        "task": "transcribe",
        "api_key": "your-key",
        "machine_id": "a1b2c3...",
    }))

    buf = io.BytesIO()
    sf.write(buf, audio, sr, format="WAV")
    ws.send(buf.getvalue())

    result = json.loads(ws.recv())
    text = result.get("postprocessed_text") or result["text"]
    print(text)

JavaScript

const ws = new WebSocket("ws://localhost:8765/ws/transcribe");

ws.onopen = () => {
  ws.send(JSON.stringify({
    type: "transcribe",
    language: "bg",
    api_key: "your-key",
  }));
  ws.send(wavArrayBuffer);
};

ws.onmessage = (event) => {
  const result = JSON.parse(event.data);
  const text = result.postprocessed_text || result.text;
  console.log(text);
};

Rust (tungstenite)

use tungstenite::connect;
use serde_json::json;

let (mut ws, _) = connect("ws://localhost:8765/ws/transcribe")?;

// Send metadata
ws.send(Message::Text(json!({
    "type": "transcribe",
    "language": "bg",
    "api_key": "your-key",
}).to_string()))?;

// Send WAV audio
ws.send(Message::Binary(wav_bytes))?;

// Receive result
let result = ws.read()?;

Protocol Overview​

Step 1: Metadata (Client → Server)​

Fields​

Supported Languages​

Machine ID Computation​

Step 2: Audio Data (Client → Server)​

Audio Requirements​

Step 3: Result (Server → Client)​

Success Response​

Response Fields​

Error Response​

Ping / Health Check​

WebSocket Ping​

HTTP Health​

Prometheus Metrics​

Example PromQL Queries​

Code Examples​

Python​

JavaScript​

Rust (tungstenite)​