Files
stoned-ai/docs/03-IMPLEMENTATION-PLAN.md
Jason Hall fcd93ee0af Initialize project governance and baseline structure
Stoned.AI — live-streamed human + AI conversation show, both sides voiced
via local Kokoro TTS. Governance docs 00-09, README, .gitignore.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-12 21:55:46 +00:00

5.2 KiB

Implementation Plan

Scope For This Implementation

  • Included:

    • Project scaffold (pyproject.toml, src/stoned_ai/, tests/)
    • TTS layer wrapping Arena's Kokoro backend
    • Cleaning engine for AI CLI output noise
    • AI backend abstraction supporting Codex and Gemini CLI backends
    • Web server with SSE delivery
    • Host view (/host) with text input, send, voice selection, session control
    • Broadcast view (/broadcast) styled for OBS browser source capture
    • WAV audio serving for both views
    • Per-speaker voice assignment (host voice + AI voice)
    • install.sh script and ~/.local/bin/stoned-web link
  • Excluded from initial build:

    • Claude API backend (Phase 2)
    • Visual avatar or waveform animation overlay
    • YouTube chat integration
    • Persistent conversation logging (nice to have, not required for launch)
    • Mobile-responsive host view (desktop only for now)

Phases

Phase 1: Project Scaffold and Core Backend

  • Objective: Establish the Python package, TTS layer, cleaning engine, and AI backend abstraction.
  • Files likely affected:
    • pyproject.toml
    • src/stoned_ai/__init__.py
    • src/stoned_ai/tts.py
    • src/stoned_ai/clean.py
    • src/stoned_ai/ai.py
    • scripts/install.sh
  • Risks: pykokoro import paths may differ slightly from Arena's. Verify import compatibility before writing TTS layer.
  • Exit criteria: stoned_ai.tts can synthesize a WAV from text using a Kokoro voice. stoned_ai.ai can call Codex or Gemini and return a clean string.

Phase 2: Web Server and SSE Delivery

  • Objective: Build the HTTP server, session state management, SSE event stream, and WAV file serving.
  • Files likely affected:
    • src/stoned_ai/web.py
  • Risks: Session state must be thread-safe. SSE connections from both /host and /broadcast must receive the same events.
  • Exit criteria: A session can be started. A host message can be submitted. The AI responds. Both turns are pushed over SSE. Both turns are voiced.

Phase 3: Host View (/host)

  • Objective: Build the host's control panel HTML/CSS/JS page.
  • Files likely affected:
    • src/stoned_ai/web.py (inline HTML or template)
  • Risks: Voice selection dropdown must populate from the live Kokoro voice list. If the voice list is slow to load, display a loading state.
  • Exit criteria: Jason can open /host, start a session, pick voices, type and send a message, hear his voice, hear the AI's voice, and stop the session.

Phase 4: Broadcast View (/broadcast)

  • Objective: Build the clean, OBS-capturable broadcast page.
  • Files likely affected:
    • src/stoned_ai/web.py (inline HTML or template)
  • Risks: OBS browser source must auto-play audio. Verify OBS audio capture works with the WAV playback approach before marking complete.
  • Exit criteria: /broadcast shows only conversation cards. No controls are visible. OBS captures the page. Audio plays in OBS without manual permission prompts.

Phase 5: Claude API Backend (Post-Launch)

  • Objective: Add a Claude backend using the anthropic SDK as an alternative to Codex/Gemini.
  • Files likely affected:
    • src/stoned_ai/ai.py
    • pyproject.toml (add anthropic dependency)
  • Risks: Requires a valid ANTHROPIC_API_KEY environment variable on svc-ai. Must not break existing Codex/Gemini backends.
  • Exit criteria: The host view offers a Claude model option. A full conversation runs using the Claude API backend.

Order Of Operations

  1. Create pyproject.toml and package scaffold.
  2. Implement tts.py (Kokoro wrapper).
  3. Implement clean.py (noise stripping for Codex and Gemini).
  4. Implement ai.py (Codex and Gemini backends).
  5. Implement web.py — server core, session state, SSE stream, WAV serving.
  6. Implement /host view in web.py.
  7. Implement /broadcast view in web.py.
  8. Write scripts/install.sh.
  9. Smoke test: full end-to-end conversation from host view to broadcast view.
  10. Verify OBS browser source audio capture.

Testing Expectations

  • Unit tests: tts.py voice listing. clean.py noise stripping against fixture strings. ai.py CLI argument construction (mock subprocess).
  • Integration tests: Full SSE event sequence from host message submit to broadcast card render. Requires a live Codex or Gemini CLI.
  • Manual verification: OBS audio capture. Visual broadcast layout on stream. Per-speaker voice differentiation.

Documentation Expectations

  • README.md must be updated with usage instructions after Phase 2 is complete.
  • docs/09-PROJECT-STATUS.md must be updated after each phase completes.
  • docs/06-WORKER-HANDOFF.md must be updated before handing off to the implementation model.

Escalation Conditions

  • Stop and raise a change request if:
    • pykokoro cannot be imported without installing the full Arena package.
    • The Kokoro voice pipeline requires GPU on the current hardware and fails on CPU.
    • OBS cannot capture audio from a browser source pointing at svc-ai without additional configuration.
    • The Codex or Gemini CLI output format has changed in a way that breaks the cleaning engine.

Signature

  • Document role: governing
  • Created by: Claude (supervisor)
  • Created at: 2026-04-12
  • Revision status: initial
  • Future revision rule: this document may be revised only by the user or by an explicitly authorized supervisor revision