Stoned.AI — live-streamed human + AI conversation show, both sides voiced via local Kokoro TTS. Governance docs 00-09, README, .gitignore. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
5.2 KiB
5.2 KiB
Implementation Plan
Scope For This Implementation
-
Included:
- Project scaffold (
pyproject.toml,src/stoned_ai/,tests/) - TTS layer wrapping Arena's Kokoro backend
- Cleaning engine for AI CLI output noise
- AI backend abstraction supporting Codex and Gemini CLI backends
- Web server with SSE delivery
- Host view (
/host) with text input, send, voice selection, session control - Broadcast view (
/broadcast) styled for OBS browser source capture - WAV audio serving for both views
- Per-speaker voice assignment (host voice + AI voice)
install.shscript and~/.local/bin/stoned-weblink
- Project scaffold (
-
Excluded from initial build:
- Claude API backend (Phase 2)
- Visual avatar or waveform animation overlay
- YouTube chat integration
- Persistent conversation logging (nice to have, not required for launch)
- Mobile-responsive host view (desktop only for now)
Phases
Phase 1: Project Scaffold and Core Backend
- Objective: Establish the Python package, TTS layer, cleaning engine, and AI backend abstraction.
- Files likely affected:
pyproject.tomlsrc/stoned_ai/__init__.pysrc/stoned_ai/tts.pysrc/stoned_ai/clean.pysrc/stoned_ai/ai.pyscripts/install.sh
- Risks:
pykokoroimport paths may differ slightly from Arena's. Verify import compatibility before writing TTS layer. - Exit criteria:
stoned_ai.ttscan synthesize a WAV from text using a Kokoro voice.stoned_ai.aican call Codex or Gemini and return a clean string.
Phase 2: Web Server and SSE Delivery
- Objective: Build the HTTP server, session state management, SSE event stream, and WAV file serving.
- Files likely affected:
src/stoned_ai/web.py
- Risks: Session state must be thread-safe. SSE connections from both
/hostand/broadcastmust receive the same events. - Exit criteria: A session can be started. A host message can be submitted. The AI responds. Both turns are pushed over SSE. Both turns are voiced.
Phase 3: Host View (/host)
- Objective: Build the host's control panel HTML/CSS/JS page.
- Files likely affected:
src/stoned_ai/web.py(inline HTML or template)
- Risks: Voice selection dropdown must populate from the live Kokoro voice list. If the voice list is slow to load, display a loading state.
- Exit criteria: Jason can open
/host, start a session, pick voices, type and send a message, hear his voice, hear the AI's voice, and stop the session.
Phase 4: Broadcast View (/broadcast)
- Objective: Build the clean, OBS-capturable broadcast page.
- Files likely affected:
src/stoned_ai/web.py(inline HTML or template)
- Risks: OBS browser source must auto-play audio. Verify OBS audio capture works with the WAV playback approach before marking complete.
- Exit criteria:
/broadcastshows only conversation cards. No controls are visible. OBS captures the page. Audio plays in OBS without manual permission prompts.
Phase 5: Claude API Backend (Post-Launch)
- Objective: Add a Claude backend using the
anthropicSDK as an alternative to Codex/Gemini. - Files likely affected:
src/stoned_ai/ai.pypyproject.toml(addanthropicdependency)
- Risks: Requires a valid
ANTHROPIC_API_KEYenvironment variable onsvc-ai. Must not break existing Codex/Gemini backends. - Exit criteria: The host view offers a Claude model option. A full conversation runs using the Claude API backend.
Order Of Operations
- Create
pyproject.tomland package scaffold. - Implement
tts.py(Kokoro wrapper). - Implement
clean.py(noise stripping for Codex and Gemini). - Implement
ai.py(Codex and Gemini backends). - Implement
web.py— server core, session state, SSE stream, WAV serving. - Implement
/hostview inweb.py. - Implement
/broadcastview inweb.py. - Write
scripts/install.sh. - Smoke test: full end-to-end conversation from host view to broadcast view.
- Verify OBS browser source audio capture.
Testing Expectations
- Unit tests:
tts.pyvoice listing.clean.pynoise stripping against fixture strings.ai.pyCLI argument construction (mock subprocess). - Integration tests: Full SSE event sequence from host message submit to broadcast card render. Requires a live Codex or Gemini CLI.
- Manual verification: OBS audio capture. Visual broadcast layout on stream. Per-speaker voice differentiation.
Documentation Expectations
README.mdmust be updated with usage instructions after Phase 2 is complete.docs/09-PROJECT-STATUS.mdmust be updated after each phase completes.docs/06-WORKER-HANDOFF.mdmust be updated before handing off to the implementation model.
Escalation Conditions
- Stop and raise a change request if:
pykokorocannot be imported without installing the full Arena package.- The Kokoro voice pipeline requires GPU on the current hardware and fails on CPU.
- OBS cannot capture audio from a browser source pointing at
svc-aiwithout additional configuration. - The Codex or Gemini CLI output format has changed in a way that breaks the cleaning engine.
Signature
- Document role: governing
- Created by: Claude (supervisor)
- Created at: 2026-04-12
- Revision status: initial
- Future revision rule: this document may be revised only by the user or by an explicitly authorized supervisor revision