Scripture Listener (Spec)

Status: Finalized (v1)
Last updated: 2026-03-03

Goal

Build a lightweight service that listens to live speech, detects explicit Bible references, and instantly outputs:

canonical Route Bible link
QR code
embeddable snippet

Product outcome

Church teams, speakers, and media operators can generate accurate, shareable scripture links from spoken content in real time without manually typing references.

Non-goals (v1)

Paraphrase / semantic verse matching
Full sermon transcript storage/search product
Multi-speaker diarization
Automatic slide deck generation
AI-generated theological commentary

Future scope (post-v1)

Exedra-powered paraphrase detection and ranked semantic candidates
Candidate confirmation flow for ambiguous paraphrase matches

Locked Decisions (v1)

Product domain: cue.selah.tools
Scope: direct-reference parsing only (no paraphrase triggers in v1)
Default confidence threshold: 0.70
UI model: start-first screen -> live slide screen with explicit stop
Slide requirement: verse text + verse reference + small Route Bible QR on every slide
Transport: WebSocket streaming for live session

Backend

Architecture overview

Use an event-driven backend with three main components:

ASR ingress service (Rust)
- Receives live audio stream (WebSocket)
- Runs VAD/chunking
- Sends utterance chunks to ASR worker
- Maintains per-session state
ASR worker (Parakeet runtime)
- Returns incremental transcript chunks and final utterance text
Scripture match + action pipeline (Rust)
- Resolves direct references first
- Applies direct-reference confidence policy
- Calls Route Bible parse/QR APIs
- Emits match events

Reuse of existing Selah infrastructure

Route Bible outputs
- Reuse canonical parsing endpoint (/parse)
- Reuse QR generation endpoint (/qr)
- Reuse Route Bible canonical URL conventions
Reference parsing
- Reuse grab-bcv parsing rules and canonicalization behavior
- Use local grab-bcv parse as the primary v1 match detector for explicit references

Deferred Exedra integration

Paraphrase trigger support is intentionally deferred to post-v1:

reuse Exedra hybrid retrieval (resolve_query/search)
reuse semantic query candidate expansion strategy
reuse existing paraphrase fixtures for quality gates

Trigger policy (when search runs)

Run direct-reference resolution on utterance boundaries and controlled rolling windows:

Trigger when silence is detected for 500-800ms
Also trigger every 1.5-2s during long uninterrupted speech
Skip if transcript delta is low (near-duplicate suppression)
Require minimum query signal:
- at least 6 words, or
- at least 40 characters

Resolution order:

Direct reference parse attempt
If parse fails, no match is emitted in v1

Confidence and match policy

Target behavior: only emit final matches when direct parse confidence is high enough to avoid false triggers.

Scoring model

Compute match_confidence in [0, 1] from direct-reference signals:

parse validity (canonical parse succeeds)
parser certainty (single unambiguous parse result)
ASR confidence for relevant token span (if available)
short-window stability (same canonical ref in 2 of last 3 windows)
transcript quality guard (minimum token quality, no severe truncation)

Decision thresholds

>= 0.70: emit final match with QR/snippet
< 0.70: suppress match

Anti-noise guardrails

session cooldown: suppress duplicate canonical result for 20-30s
stability guard: require consistent canonical result across short window
max output rate: at most one auto-match per N seconds (configurable)

APIs

Ingress API (new service)

WS /v1/listen
- client sends audio frames + control events
- server emits transcript and match events

Example server events:

transcript.partial
transcript.final
match.final
match.suppressed

Match output payload

json

{
  "session_id": "abc123",
  "utterance_id": "utt_0042",
  "mode": "direct_reference",
  "confidence": 0.81,
  "canonical": "JHN.3.16",
  "display": "John 3:16",
  "route_url": "https://route.bible/jhn.3.16?src=listener",
  "qr_url": "https://route.bible/qr?passage=JHN.3.16&format=svg&download=false",
  "snippet_html": "<a href=\"https://route.bible/jhn.3.16\">John 3:16</a>",
  "needs_confirmation": false,
  "reason": "direct_parse"
}

Route Bible integration contract

For final canonical match:

Parse transcript text with grab-bcv (local) to produce canonical passage (for example JHN.3.16)
POST /parse (or GET /parse?q=...) for route-normalized target and compatibility validation
GET/POST /qr for QR asset generation
Build snippet variants:
- plain anchor snippet
- dynamic badge snippet

Parsing source of truth (v1)

Listener-side explicit reference detection: grab-bcv
Route construction + downstream share format: Route Bible canonical conventions and APIs
In v1, if grab-bcv parse fails, no semantic/paraphrase fallback is attempted

Data model

Core entities:

Session: stream metadata, language, source
Utterance: transcript text, timestamps, asr confidence
DirectMatch: parsed canonical passage + confidence diagnostics
MatchEvent: emitted artifact payload and suppression reason if blocked

Storage strategy (v1):

keep in-memory session state
optional short-lived event log (24h) for debugging/QA
no long-term raw audio retention by default

Observability

Required metrics:

ASR latency p50/p95
match latency p50/p95 (utterance end -> emitted match)
direct-parse hit rate
parse failure rate
false positive rate from operator feedback
dedupe suppression count
threshold bucket distribution (>=0.70, <0.70)

Structured logs must include:

session id, utterance id
parsed canonical ref (if any)
confidence breakdown
final decision reason

Security and privacy

TLS only
authenticated ingest keys for non-local deployments
PII minimization: do not persist full transcripts by default
configurable retention for diagnostics
explicit user disclosure that microphone input is processed

Deployment plan

Public app host: https://cue.selah.tools
Deploy Rust ingress/matcher service as long-running instances (recommended: Fly.io in US-East; equivalent container platform acceptable)
Run Parakeet ASR as a separate GPU-backed service (for example Runpod/Modal/Lambda Labs class infrastructure)
Use managed Redis for short-lived session/cooldown state
Depend on existing Route Bible public endpoints (or internal mirror) for parse/QR generation
Keep ingress and ASR services independently scalable

UI/UX

v1 surface

UI uses two explicit states:

Pre-start state (default)
- Nearly the entire screen is a single primary CTA: Start listening
- No dense controls shown before session start
Live state (after start)
- Real-time sermon slide view auto-follows confirmed scripture matches
- Persistent Stop button is visible and immediately ends the live connection

Core live behavior:

full-screen slide presentation mode
each confirmed match becomes one rendered slide
slide updates in real time as new references are detected

Session controls and connection lifecycle

Start listening:
- opens the live session UI
- establishes the server stream connection (WS /v1/listen)
Stop:
- explicitly terminates the active stream connection
- halts further transcript/match events
- returns UI to pre-start state

Connection states to render:

idle (pre-start)
connecting
live
stopping
disconnected (unexpected loss, with retry/start action)

Slide composition (required)

Every slide must include:

scripture reference label (for example John 3:16) in a clear, readable position
verse text as the visual focus (large type, high contrast, sermon-readable)
a small Route Bible QR code in a corner on every slide

Layout constraints:

verse text occupies primary visual area
reference stays visible even for long verse text
QR is present but non-dominant (~64-96px target size on 1080p output)
safe-area padding so projector/stream crop does not hide text or QR

Slide transition behavior

new confirmed match triggers slide update with subtle transition (no flashy animation)
duplicate canonical passage inside cooldown window does not create a new slide
if no new confirmed match, current slide remains pinned

Secondary controls (operator)

Use a minimal control strip or panel for:

Stop (in live state only)
live transcript pane
recent match log (Confirmed / Suppressed)
copy link, copy snippet, open QR actions for latest match

Interaction design rules

Do not interrupt operator flow with modal dialogs
Prioritize slide readability over control density
Keep controls visually secondary to the slide canvas
Pre-start screen should feel intentionally sparse, with Start listening as the dominant action
Keep match log compact and timestamped
Highlight confidence and mode (Direct) in control view
Show clear reason on suppression (duplicate, low confidence, unstable)

Error states

ASR unavailable
Route Bible QR generation failure
degraded mode should still show canonical text match if QR fails

Acceptance

Functional acceptance criteria

Direct spoken references are detected and resolved to canonical passage links.
Every confirmed match renders a sermon-style slide containing verse text, verse reference, and a small Route Bible QR in the corner.
Initial screen presents Start listening as the dominant, near-only UI action.
Pressing Start listening transitions to live slide UI and opens server stream connection.
Pressing Stop closes the active stream connection and returns to pre-start UI.
Matches at or above configured confidence floor generate Route Bible link + QR + snippet.
Duplicate suppression prevents repeated fire for same canonical passage in cooldown window.
Suppressed matches include clear reason codes for operator debugging.
Service remains responsive under continuous speech sessions.

Quality gates

Use direct-reference transcript fixtures as baseline (clear references, abbreviated references, noisy-ASR references).

Initial target (v1):

Direct parse precision: >= 0.95
Direct parse recall on clean references: >= 0.90
Auto-fire precision (confidence >= 0.70): >= 0.95
End-to-end match latency p95: <= 1200ms after utterance boundary
Slide update latency p95 (confirmed match -> rendered slide): <= 300ms

Rollout plan

Phase 1 (v1): direct references only (auto-fire, no paraphrase)
Phase 2: improve direct reference robustness (abbreviations, partial chapter/verse wording)
Phase 3: add paraphrase suggestions using Exedra retrieval (needs_confirmation=true)
Phase 4: optional paraphrase auto-fire for high-confidence band

Open Questions

Should v1 support one language (en) only, or include multilingual ASR/parsing?
Should confidence thresholds be global or configurable per organization/session?
Should QR payload be returned as URL only, or inline SVG/PNG bytes for low-latency clients?
What retention policy is required for transcripts/audio in production deployments?
Do we need a fallback when Parakeet/GPU is unavailable (alternate ASR provider)?
For long passages, should v1 render one slide per verse or a single condensed slide block?
When paraphrase mode is added later, should it always require explicit operator confirmation?

Scripture Listener (Spec) ​

Goal ​

Product outcome ​

Non-goals (v1) ​

Future scope (post-v1) ​

Locked Decisions (v1) ​

Backend ​

Architecture overview ​

Reuse of existing Selah infrastructure ​

Deferred Exedra integration ​

Trigger policy (when search runs) ​

Confidence and match policy ​

Scoring model ​

Decision thresholds ​

Anti-noise guardrails ​

APIs ​

Ingress API (new service) ​

Match output payload ​

Route Bible integration contract ​

Parsing source of truth (v1) ​

Data model ​

Observability ​

Security and privacy ​

Deployment plan ​

UI/UX ​

v1 surface ​

Session controls and connection lifecycle ​

Slide composition (required) ​

Slide transition behavior ​

Secondary controls (operator) ​

Interaction design rules ​

Error states ​

Acceptance ​

Functional acceptance criteria ​

Quality gates ​

Rollout plan ​

Open Questions ​