Initialize project governance and baseline structure

Stoned.AI — live-streamed human + AI conversation show, both sides voiced via local Kokoro TTS. Governance docs 00-09, README, .gitignore. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-12 21:55:46 +00:00
commit fcd93ee0af
12 changed files with 740 additions and 0 deletions
--- a/docs/02-ARCHITECTURE-PLAN.md
+++ b/docs/02-ARCHITECTURE-PLAN.md
@@ -0,0 +1,112 @@
+# Architecture Plan
+
+## Current State
+
+- No implementation exists yet. This is a greenfield project.
+- The Arena project (`/home/svc-admin/ai-projects/projects/arena`) provides reusable infrastructure:
+  - `src/arena/tts.py` — Kokoro TTS backend (`ArenaTTSManager`, `KokoroBackend`)
+  - `/opt/models/kokoro` — downloaded Kokoro voice models
+  - `pykokoro` — installed Python package
+  - Pattern for SSE-based real-time conversation delivery
+  - Pattern for WAV serving and browser audio playback
+
+## Target State
+
+A lightweight Python web server (`stoned-web`) with two browser-facing views:
+
+1. **Host view** (`/host`) — Jason's control panel. Text input box, send button, voice selection per speaker, session start/stop, status display.
+2. **Broadcast view** (`/broadcast`) — Clean, OBS-capturable page. Scrolling conversation cards only. No controls. Styled for stream.
+
+Both views receive conversation turns over Server-Sent Events. The broadcast view is the OBS browser source. The host view is what Jason operates on his own screen.
+
+## Design Principles
+
+- Principle 1: **Text-in, voice-out for both sides.** The host types; the system voices. The AI generates text; the system voices. No microphone dependency.
+- Principle 2: **Reuse Arena TTS infrastructure.** Do not reimplement Kokoro synthesis. Import and use `ArenaTTSManager` directly from the arena package or copy the relevant module.
+- Principle 3: **Broadcast view is read-only.** The `/broadcast` URL has zero interactive elements. It exists only for OBS to consume.
+- Principle 4: **One AI at a time.** The session has exactly one human speaker and one AI speaker. Multi-AI is not in scope.
+
+## Major Components
+
+- Component: **Web Server (`src/stoned_ai/web.py`)**
+  - Purpose: HTTP server handling both views, SSE streams, session state, and audio file serving.
+  - Responsibilities: Accept host message submissions. Dispatch AI calls. Trigger TTS for both sides. Serve WAV files. Push turns to connected SSE clients.
+  - Dependencies: `stoned_ai/tts.py`, `stoned_ai/ai.py`, standard library (`http.server` or a lightweight framework).
+
+- Component: **TTS Layer (`src/stoned_ai/tts.py`)**
+  - Purpose: Synthesize WAV audio for any speaker given a voice ID and text.
+  - Responsibilities: Wrap `ArenaTTSManager` (or import the Arena `tts.py` module directly). Store generated WAVs in a session-scoped directory. Return a browser-fetchable path.
+  - Dependencies: `pykokoro`, `/opt/models/kokoro`.
+
+- Component: **AI Backend (`src/stoned_ai/ai.py`)**
+  - Purpose: Call the configured AI model and return a clean text response.
+  - Responsibilities: Accept conversation history and a prompt. Call the model CLI or API. Return cleaned text. Initially wraps `codex exec` or `gemini -p`. Claude API added later.
+  - Dependencies: `subprocess` (for CLI backends), `anthropic` SDK (for Claude backend, Phase 2).
+
+- Component: **Cleaning Engine (`src/stoned_ai/clean.py`)**
+  - Purpose: Strip CLI noise from AI responses.
+  - Responsibilities: Apply regex filters for Codex and Gemini banner lines, warnings, token counts.
+  - Dependencies: None beyond stdlib. Can be copied from Arena's `clean.py` and extended.
+
+- Component: **Broadcast View (`/broadcast`)**
+  - Purpose: Clean, OBS-capturable HTML page.
+  - Responsibilities: Connect to the SSE stream. Render conversation cards. Play audio. Never show controls.
+  - Dependencies: Browser-side JavaScript only.
+
+- Component: **Host View (`/host`)**
+  - Purpose: Jason's control panel for operating the show.
+  - Responsibilities: Text input and send. Voice selection per speaker. Session start/stop. Status display. Mirrors the conversation feed.
+  - Dependencies: Browser-side JavaScript only.
+
+## Data Flow
+
+1. Jason opens `/host` in his browser and `/broadcast` in OBS as a browser source.
+2. Jason starts a session, selects voices for himself and the AI, enters the opening topic or first message.
+3. Jason types his message and hits send.
+4. Server receives the message, queues it as a "host turn."
+5. Server calls Kokoro TTS for Jason's voice, stores the WAV, pushes the turn to all SSE clients.
+6. Both views render the host card. Both play the WAV audio.
+7. Server calls the AI backend with the conversation history.
+8. AI returns a text response. Server cleans it.
+9. Server calls Kokoro TTS for the AI voice, stores the WAV, pushes the AI turn to all SSE clients.
+10. Both views render the AI card. Both play the WAV audio.
+11. Repeat from step 3.
+
+## Key Decisions
+
+- Decision 1: **Copy or import Arena's TTS module rather than duplicating Kokoro logic.**
+  - Why: `ArenaTTSManager` is already tested and handles session audio, path safety, and pipeline caching.
+  - Tradeoff: Creates a dependency on Arena's internal code. Mitigated by treating it as a stable utility layer.
+
+- Decision 2: **Two separate URLs for host and broadcast.**
+  - Why: The host needs controls. OBS must not capture controls. Mixing them on one page creates layout complexity and accidental capture risk.
+  - Tradeoff: Two SSE connections instead of one. Acceptable at this scale.
+
+- Decision 3: **Start with CLI-based AI backends (Codex/Gemini), add Claude API in Phase 2.**
+  - Why: Both CLIs are already present and working on `svc-ai`. Fastest path to a functional prototype.
+  - Tradeoff: CLI output noise requires cleaning. Claude API (Phase 2) is cleaner but needs an API key and the `anthropic` SDK.
+
+- Decision 4: **No speech-to-text. Host types.**
+  - Why: Eliminates microphone capture, audio routing, and STT accuracy problems. Aligns with how Jason already works.
+  - Tradeoff: Host must type during the live stream. This is the intended format — the typing is part of the show.
+
+## Rejected Alternatives
+
+- Alternative: Using Arena's existing `arena-web` server with modifications.
+  - Why rejected: Arena is an AI-to-AI tool. Retrofitting a human-in-the-loop mode and a separate broadcast view would require significant changes to Arena's core, risking regressions. A clean separate project is lower risk and lower coupling.
+
+- Alternative: Streaming audio from `svc-ai` to a Windows machine via virtual audio cable.
+  - Why rejected: The browser-source approach in OBS is simpler, more reliable, and already proven in the Arena project. All audio plays in the browser, which OBS captures directly.
+
+## Open Questions
+
+- Question 1: Should the Claude API backend use claude-sonnet-4-6 as the default, or should the model be configurable per session?
+- Question 2: Should conversation history be capped at a rolling window to prevent prompt length creep, or left unbounded for the initial version?
+
+## Signature
+
+- Document role: governing
+- Created by: Claude (supervisor)
+- Created at: 2026-04-12
+- Revision status: initial
+- Future revision rule: this document may be revised only by the user or by an explicitly authorized supervisor revision