Scripture Listener (Spec)
Status: Finalized (v1)
Last updated: 2026-03-03
Goal
Build a lightweight service that listens to live speech, detects explicit Bible references, and instantly outputs:
- canonical Route Bible link
- QR code
- embeddable snippet
Product outcome
Church teams, speakers, and media operators can generate accurate, shareable scripture links from spoken content in real time without manually typing references.
Non-goals (v1)
- Paraphrase / semantic verse matching
- Full sermon transcript storage/search product
- Multi-speaker diarization
- Automatic slide deck generation
- AI-generated theological commentary
Future scope (post-v1)
- Exedra-powered paraphrase detection and ranked semantic candidates
- Candidate confirmation flow for ambiguous paraphrase matches
Locked Decisions (v1)
- Product domain:
cue.selah.tools - Scope: direct-reference parsing only (no paraphrase triggers in v1)
- Default confidence threshold:
0.70 - UI model: start-first screen -> live slide screen with explicit stop
- Slide requirement: verse text + verse reference + small Route Bible QR on every slide
- Transport: WebSocket streaming for live session
Backend
Architecture overview
Use an event-driven backend with three main components:
- ASR ingress service (Rust)
- Receives live audio stream (WebSocket)
- Runs VAD/chunking
- Sends utterance chunks to ASR worker
- Maintains per-session state
- ASR worker (Parakeet runtime)
- Returns incremental transcript chunks and final utterance text
- Scripture match + action pipeline (Rust)
- Resolves direct references first
- Applies direct-reference confidence policy
- Calls Route Bible parse/QR APIs
- Emits match events
Reuse of existing Selah infrastructure
- Route Bible outputs
- Reuse canonical parsing endpoint (
/parse) - Reuse QR generation endpoint (
/qr) - Reuse Route Bible canonical URL conventions
- Reuse canonical parsing endpoint (
- Reference parsing
- Reuse
grab-bcvparsing rules and canonicalization behavior - Use local
grab-bcvparse as the primary v1 match detector for explicit references
- Reuse
Deferred Exedra integration
Paraphrase trigger support is intentionally deferred to post-v1:
- reuse Exedra hybrid retrieval (
resolve_query/search) - reuse semantic query candidate expansion strategy
- reuse existing paraphrase fixtures for quality gates
Trigger policy (when search runs)
Run direct-reference resolution on utterance boundaries and controlled rolling windows:
- Trigger when silence is detected for
500-800ms - Also trigger every
1.5-2sduring long uninterrupted speech - Skip if transcript delta is low (near-duplicate suppression)
- Require minimum query signal:
- at least
6words, or - at least
40characters
- at least
Resolution order:
- Direct reference parse attempt
- If parse fails, no match is emitted in v1
Confidence and match policy
Target behavior: only emit final matches when direct parse confidence is high enough to avoid false triggers.
Scoring model
Compute match_confidence in [0, 1] from direct-reference signals:
- parse validity (canonical parse succeeds)
- parser certainty (single unambiguous parse result)
- ASR confidence for relevant token span (if available)
- short-window stability (same canonical ref in 2 of last 3 windows)
- transcript quality guard (minimum token quality, no severe truncation)
Decision thresholds
>= 0.70: emit final match with QR/snippet< 0.70: suppress match
Anti-noise guardrails
- session cooldown: suppress duplicate canonical result for
20-30s - stability guard: require consistent canonical result across short window
- max output rate: at most one auto-match per
Nseconds (configurable)
APIs
Ingress API (new service)
WS /v1/listen- client sends audio frames + control events
- server emits transcript and match events
Example server events:
transcript.partialtranscript.finalmatch.finalmatch.suppressed
Match output payload
{
"session_id": "abc123",
"utterance_id": "utt_0042",
"mode": "direct_reference",
"confidence": 0.81,
"canonical": "JHN.3.16",
"display": "John 3:16",
"route_url": "https://route.bible/jhn.3.16?src=listener",
"qr_url": "https://route.bible/qr?passage=JHN.3.16&format=svg&download=false",
"snippet_html": "<a href=\"https://route.bible/jhn.3.16\">John 3:16</a>",
"needs_confirmation": false,
"reason": "direct_parse"
}Route Bible integration contract
For final canonical match:
- Parse transcript text with
grab-bcv(local) to produce canonical passage (for exampleJHN.3.16) POST /parse(orGET /parse?q=...) for route-normalized target and compatibility validationGET/POST /qrfor QR asset generation- Build snippet variants:
- plain anchor snippet
- dynamic badge snippet
Parsing source of truth (v1)
- Listener-side explicit reference detection:
grab-bcv - Route construction + downstream share format: Route Bible canonical conventions and APIs
- In v1, if
grab-bcvparse fails, no semantic/paraphrase fallback is attempted
Data model
Core entities:
Session: stream metadata, language, sourceUtterance: transcript text, timestamps, asr confidenceDirectMatch: parsed canonical passage + confidence diagnosticsMatchEvent: emitted artifact payload and suppression reason if blocked
Storage strategy (v1):
- keep in-memory session state
- optional short-lived event log (24h) for debugging/QA
- no long-term raw audio retention by default
Observability
Required metrics:
- ASR latency p50/p95
- match latency p50/p95 (utterance end -> emitted match)
- direct-parse hit rate
- parse failure rate
- false positive rate from operator feedback
- dedupe suppression count
- threshold bucket distribution (
>=0.70,<0.70)
Structured logs must include:
- session id, utterance id
- parsed canonical ref (if any)
- confidence breakdown
- final decision reason
Security and privacy
- TLS only
- authenticated ingest keys for non-local deployments
- PII minimization: do not persist full transcripts by default
- configurable retention for diagnostics
- explicit user disclosure that microphone input is processed
Deployment plan
- Public app host:
https://cue.selah.tools - Deploy Rust ingress/matcher service as long-running instances (recommended: Fly.io in US-East; equivalent container platform acceptable)
- Run Parakeet ASR as a separate GPU-backed service (for example Runpod/Modal/Lambda Labs class infrastructure)
- Use managed Redis for short-lived session/cooldown state
- Depend on existing Route Bible public endpoints (or internal mirror) for parse/QR generation
- Keep ingress and ASR services independently scalable
UI/UX
v1 surface
UI uses two explicit states:
- Pre-start state (default)
- Nearly the entire screen is a single primary CTA:
Start listening - No dense controls shown before session start
- Nearly the entire screen is a single primary CTA:
- Live state (after start)
- Real-time sermon slide view auto-follows confirmed scripture matches
- Persistent
Stopbutton is visible and immediately ends the live connection
Core live behavior:
- full-screen slide presentation mode
- each confirmed match becomes one rendered slide
- slide updates in real time as new references are detected
Session controls and connection lifecycle
Start listening:- opens the live session UI
- establishes the server stream connection (
WS /v1/listen)
Stop:- explicitly terminates the active stream connection
- halts further transcript/match events
- returns UI to pre-start state
Connection states to render:
idle(pre-start)connectinglivestoppingdisconnected(unexpected loss, with retry/start action)
Slide composition (required)
Every slide must include:
- scripture reference label (for example
John 3:16) in a clear, readable position - verse text as the visual focus (large type, high contrast, sermon-readable)
- a small Route Bible QR code in a corner on every slide
Layout constraints:
- verse text occupies primary visual area
- reference stays visible even for long verse text
- QR is present but non-dominant (
~64-96pxtarget size on 1080p output) - safe-area padding so projector/stream crop does not hide text or QR
Slide transition behavior
- new confirmed match triggers slide update with subtle transition (no flashy animation)
- duplicate canonical passage inside cooldown window does not create a new slide
- if no new confirmed match, current slide remains pinned
Secondary controls (operator)
Use a minimal control strip or panel for:
Stop(in live state only)- live transcript pane
- recent match log (
Confirmed/Suppressed) - copy link, copy snippet, open QR actions for latest match
Interaction design rules
- Do not interrupt operator flow with modal dialogs
- Prioritize slide readability over control density
- Keep controls visually secondary to the slide canvas
- Pre-start screen should feel intentionally sparse, with
Start listeningas the dominant action - Keep match log compact and timestamped
- Highlight confidence and mode (
Direct) in control view - Show clear reason on suppression (duplicate, low confidence, unstable)
Error states
- ASR unavailable
- Route Bible QR generation failure
- degraded mode should still show canonical text match if QR fails
Acceptance
Functional acceptance criteria
- Direct spoken references are detected and resolved to canonical passage links.
- Every confirmed match renders a sermon-style slide containing verse text, verse reference, and a small Route Bible QR in the corner.
- Initial screen presents
Start listeningas the dominant, near-only UI action. - Pressing
Start listeningtransitions to live slide UI and opens server stream connection. - Pressing
Stopcloses the active stream connection and returns to pre-start UI. - Matches at or above configured confidence floor generate Route Bible link + QR + snippet.
- Duplicate suppression prevents repeated fire for same canonical passage in cooldown window.
- Suppressed matches include clear reason codes for operator debugging.
- Service remains responsive under continuous speech sessions.
Quality gates
Use direct-reference transcript fixtures as baseline (clear references, abbreviated references, noisy-ASR references).
Initial target (v1):
- Direct parse precision:
>= 0.95 - Direct parse recall on clean references:
>= 0.90 - Auto-fire precision (
confidence >= 0.70):>= 0.95 - End-to-end match latency p95:
<= 1200msafter utterance boundary - Slide update latency p95 (confirmed match -> rendered slide):
<= 300ms
Rollout plan
- Phase 1 (v1): direct references only (auto-fire, no paraphrase)
- Phase 2: improve direct reference robustness (abbreviations, partial chapter/verse wording)
- Phase 3: add paraphrase suggestions using Exedra retrieval (
needs_confirmation=true) - Phase 4: optional paraphrase auto-fire for high-confidence band
Open Questions
- Should v1 support one language (
en) only, or include multilingual ASR/parsing? - Should confidence thresholds be global or configurable per organization/session?
- Should QR payload be returned as URL only, or inline SVG/PNG bytes for low-latency clients?
- What retention policy is required for transcripts/audio in production deployments?
- Do we need a fallback when Parakeet/GPU is unavailable (alternate ASR provider)?
- For long passages, should v1 render one slide per verse or a single condensed slide block?
- When paraphrase mode is added later, should it always require explicit operator confirmation?