# Implementation Plan ## Scope For This Implementation - Included: - Project scaffold (`pyproject.toml`, `src/stoned_ai/`, `tests/`) - TTS layer wrapping Arena's Kokoro backend - Cleaning engine for AI CLI output noise - AI backend abstraction supporting Codex and Gemini CLI backends - Web server with SSE delivery - Host view (`/host`) with text input, send, voice selection, session control - Broadcast view (`/broadcast`) styled for OBS browser source capture - WAV audio serving for both views - Per-speaker voice assignment (host voice + AI voice) - `install.sh` script and `~/.local/bin/stoned-web` link - Excluded from initial build: - Claude API backend (Phase 2) - Visual avatar or waveform animation overlay - YouTube chat integration - Persistent conversation logging (nice to have, not required for launch) - Mobile-responsive host view (desktop only for now) ## Phases ### Phase 1: Project Scaffold and Core Backend - Objective: Establish the Python package, TTS layer, cleaning engine, and AI backend abstraction. - Files likely affected: - `pyproject.toml` - `src/stoned_ai/__init__.py` - `src/stoned_ai/tts.py` - `src/stoned_ai/clean.py` - `src/stoned_ai/ai.py` - `scripts/install.sh` - Risks: `pykokoro` import paths may differ slightly from Arena's. Verify import compatibility before writing TTS layer. - Exit criteria: `stoned_ai.tts` can synthesize a WAV from text using a Kokoro voice. `stoned_ai.ai` can call Codex or Gemini and return a clean string. ### Phase 2: Web Server and SSE Delivery - Objective: Build the HTTP server, session state management, SSE event stream, and WAV file serving. - Files likely affected: - `src/stoned_ai/web.py` - Risks: Session state must be thread-safe. SSE connections from both `/host` and `/broadcast` must receive the same events. - Exit criteria: A session can be started. A host message can be submitted. The AI responds. Both turns are pushed over SSE. Both turns are voiced. ### Phase 3: Host View (`/host`) - Objective: Build the host's control panel HTML/CSS/JS page. - Files likely affected: - `src/stoned_ai/web.py` (inline HTML or template) - Risks: Voice selection dropdown must populate from the live Kokoro voice list. If the voice list is slow to load, display a loading state. - Exit criteria: Jason can open `/host`, start a session, pick voices, type and send a message, hear his voice, hear the AI's voice, and stop the session. ### Phase 4: Broadcast View (`/broadcast`) - Objective: Build the clean, OBS-capturable broadcast page. - Files likely affected: - `src/stoned_ai/web.py` (inline HTML or template) - Risks: OBS browser source must auto-play audio. Verify OBS audio capture works with the WAV playback approach before marking complete. - Exit criteria: `/broadcast` shows only conversation cards. No controls are visible. OBS captures the page. Audio plays in OBS without manual permission prompts. ### Phase 5: Claude API Backend (Post-Launch) - Objective: Add a Claude backend using the `anthropic` SDK as an alternative to Codex/Gemini. - Files likely affected: - `src/stoned_ai/ai.py` - `pyproject.toml` (add `anthropic` dependency) - Risks: Requires a valid `ANTHROPIC_API_KEY` environment variable on `svc-ai`. Must not break existing Codex/Gemini backends. - Exit criteria: The host view offers a Claude model option. A full conversation runs using the Claude API backend. ## Order Of Operations 1. Create `pyproject.toml` and package scaffold. 2. Implement `tts.py` (Kokoro wrapper). 3. Implement `clean.py` (noise stripping for Codex and Gemini). 4. Implement `ai.py` (Codex and Gemini backends). 5. Implement `web.py` — server core, session state, SSE stream, WAV serving. 6. Implement `/host` view in `web.py`. 7. Implement `/broadcast` view in `web.py`. 8. Write `scripts/install.sh`. 9. Smoke test: full end-to-end conversation from host view to broadcast view. 10. Verify OBS browser source audio capture. ## Testing Expectations - Unit tests: `tts.py` voice listing. `clean.py` noise stripping against fixture strings. `ai.py` CLI argument construction (mock subprocess). - Integration tests: Full SSE event sequence from host message submit to broadcast card render. Requires a live Codex or Gemini CLI. - Manual verification: OBS audio capture. Visual broadcast layout on stream. Per-speaker voice differentiation. ## Documentation Expectations - `README.md` must be updated with usage instructions after Phase 2 is complete. - `docs/09-PROJECT-STATUS.md` must be updated after each phase completes. - `docs/06-WORKER-HANDOFF.md` must be updated before handing off to the implementation model. ## Escalation Conditions - Stop and raise a change request if: - `pykokoro` cannot be imported without installing the full Arena package. - The Kokoro voice pipeline requires GPU on the current hardware and fails on CPU. - OBS cannot capture audio from a browser source pointing at `svc-ai` without additional configuration. - The Codex or Gemini CLI output format has changed in a way that breaks the cleaning engine. ## Signature - Document role: governing - Created by: Claude (supervisor) - Created at: 2026-04-12 - Revision status: initial - Future revision rule: this document may be revised only by the user or by an explicitly authorized supervisor revision