Initialize project governance and baseline structure
Stoned.AI — live-streamed human + AI conversation show, both sides voiced via local Kokoro TTS. Governance docs 00-09, README, .gitignore. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
111
docs/03-IMPLEMENTATION-PLAN.md
Normal file
111
docs/03-IMPLEMENTATION-PLAN.md
Normal file
@@ -0,0 +1,111 @@
|
||||
# Implementation Plan
|
||||
|
||||
## Scope For This Implementation
|
||||
|
||||
- Included:
|
||||
- Project scaffold (`pyproject.toml`, `src/stoned_ai/`, `tests/`)
|
||||
- TTS layer wrapping Arena's Kokoro backend
|
||||
- Cleaning engine for AI CLI output noise
|
||||
- AI backend abstraction supporting Codex and Gemini CLI backends
|
||||
- Web server with SSE delivery
|
||||
- Host view (`/host`) with text input, send, voice selection, session control
|
||||
- Broadcast view (`/broadcast`) styled for OBS browser source capture
|
||||
- WAV audio serving for both views
|
||||
- Per-speaker voice assignment (host voice + AI voice)
|
||||
- `install.sh` script and `~/.local/bin/stoned-web` link
|
||||
|
||||
- Excluded from initial build:
|
||||
- Claude API backend (Phase 2)
|
||||
- Visual avatar or waveform animation overlay
|
||||
- YouTube chat integration
|
||||
- Persistent conversation logging (nice to have, not required for launch)
|
||||
- Mobile-responsive host view (desktop only for now)
|
||||
|
||||
## Phases
|
||||
|
||||
### Phase 1: Project Scaffold and Core Backend
|
||||
|
||||
- Objective: Establish the Python package, TTS layer, cleaning engine, and AI backend abstraction.
|
||||
- Files likely affected:
|
||||
- `pyproject.toml`
|
||||
- `src/stoned_ai/__init__.py`
|
||||
- `src/stoned_ai/tts.py`
|
||||
- `src/stoned_ai/clean.py`
|
||||
- `src/stoned_ai/ai.py`
|
||||
- `scripts/install.sh`
|
||||
- Risks: `pykokoro` import paths may differ slightly from Arena's. Verify import compatibility before writing TTS layer.
|
||||
- Exit criteria: `stoned_ai.tts` can synthesize a WAV from text using a Kokoro voice. `stoned_ai.ai` can call Codex or Gemini and return a clean string.
|
||||
|
||||
### Phase 2: Web Server and SSE Delivery
|
||||
|
||||
- Objective: Build the HTTP server, session state management, SSE event stream, and WAV file serving.
|
||||
- Files likely affected:
|
||||
- `src/stoned_ai/web.py`
|
||||
- Risks: Session state must be thread-safe. SSE connections from both `/host` and `/broadcast` must receive the same events.
|
||||
- Exit criteria: A session can be started. A host message can be submitted. The AI responds. Both turns are pushed over SSE. Both turns are voiced.
|
||||
|
||||
### Phase 3: Host View (`/host`)
|
||||
|
||||
- Objective: Build the host's control panel HTML/CSS/JS page.
|
||||
- Files likely affected:
|
||||
- `src/stoned_ai/web.py` (inline HTML or template)
|
||||
- Risks: Voice selection dropdown must populate from the live Kokoro voice list. If the voice list is slow to load, display a loading state.
|
||||
- Exit criteria: Jason can open `/host`, start a session, pick voices, type and send a message, hear his voice, hear the AI's voice, and stop the session.
|
||||
|
||||
### Phase 4: Broadcast View (`/broadcast`)
|
||||
|
||||
- Objective: Build the clean, OBS-capturable broadcast page.
|
||||
- Files likely affected:
|
||||
- `src/stoned_ai/web.py` (inline HTML or template)
|
||||
- Risks: OBS browser source must auto-play audio. Verify OBS audio capture works with the WAV playback approach before marking complete.
|
||||
- Exit criteria: `/broadcast` shows only conversation cards. No controls are visible. OBS captures the page. Audio plays in OBS without manual permission prompts.
|
||||
|
||||
### Phase 5: Claude API Backend (Post-Launch)
|
||||
|
||||
- Objective: Add a Claude backend using the `anthropic` SDK as an alternative to Codex/Gemini.
|
||||
- Files likely affected:
|
||||
- `src/stoned_ai/ai.py`
|
||||
- `pyproject.toml` (add `anthropic` dependency)
|
||||
- Risks: Requires a valid `ANTHROPIC_API_KEY` environment variable on `svc-ai`. Must not break existing Codex/Gemini backends.
|
||||
- Exit criteria: The host view offers a Claude model option. A full conversation runs using the Claude API backend.
|
||||
|
||||
## Order Of Operations
|
||||
|
||||
1. Create `pyproject.toml` and package scaffold.
|
||||
2. Implement `tts.py` (Kokoro wrapper).
|
||||
3. Implement `clean.py` (noise stripping for Codex and Gemini).
|
||||
4. Implement `ai.py` (Codex and Gemini backends).
|
||||
5. Implement `web.py` — server core, session state, SSE stream, WAV serving.
|
||||
6. Implement `/host` view in `web.py`.
|
||||
7. Implement `/broadcast` view in `web.py`.
|
||||
8. Write `scripts/install.sh`.
|
||||
9. Smoke test: full end-to-end conversation from host view to broadcast view.
|
||||
10. Verify OBS browser source audio capture.
|
||||
|
||||
## Testing Expectations
|
||||
|
||||
- Unit tests: `tts.py` voice listing. `clean.py` noise stripping against fixture strings. `ai.py` CLI argument construction (mock subprocess).
|
||||
- Integration tests: Full SSE event sequence from host message submit to broadcast card render. Requires a live Codex or Gemini CLI.
|
||||
- Manual verification: OBS audio capture. Visual broadcast layout on stream. Per-speaker voice differentiation.
|
||||
|
||||
## Documentation Expectations
|
||||
|
||||
- `README.md` must be updated with usage instructions after Phase 2 is complete.
|
||||
- `docs/09-PROJECT-STATUS.md` must be updated after each phase completes.
|
||||
- `docs/06-WORKER-HANDOFF.md` must be updated before handing off to the implementation model.
|
||||
|
||||
## Escalation Conditions
|
||||
|
||||
- Stop and raise a change request if:
|
||||
- `pykokoro` cannot be imported without installing the full Arena package.
|
||||
- The Kokoro voice pipeline requires GPU on the current hardware and fails on CPU.
|
||||
- OBS cannot capture audio from a browser source pointing at `svc-ai` without additional configuration.
|
||||
- The Codex or Gemini CLI output format has changed in a way that breaks the cleaning engine.
|
||||
|
||||
## Signature
|
||||
|
||||
- Document role: governing
|
||||
- Created by: Claude (supervisor)
|
||||
- Created at: 2026-04-12
|
||||
- Revision status: initial
|
||||
- Future revision rule: this document may be revised only by the user or by an explicitly authorized supervisor revision
|
||||
Reference in New Issue
Block a user