Initialize project governance and baseline structure

Stoned.AI — live-streamed human + AI conversation show, both sides voiced
via local Kokoro TTS. Governance docs 00-09, README, .gitignore.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
2026-04-12 21:55:46 +00:00
commit fcd93ee0af
12 changed files with 740 additions and 0 deletions

11
.gitignore vendored Normal file
View File

@@ -0,0 +1,11 @@
__pycache__/
.pytest_cache/
.mypy_cache/
.venv/
node_modules/
dist/
build/
.DS_Store
Thumbs.db
*.wav
*.pyc

32
README.md Normal file
View File

@@ -0,0 +1,32 @@
# stoned-ai
**Stoned.AI** is a live-streamed, unscripted conversation show between a human host and an AI.
Both the host and the AI are voiced through local Kokoro TTS. The host types their side of the conversation. The AI generates its response. Both are read aloud and displayed as a scrolling conversation feed that OBS captures as a browser source for streaming to YouTube.
No microphone required. No script. No agenda.
## Project Docs
- [Governance Rules](docs/00-GOVERNANCE-RULES.md)
- [Project Charter](docs/01-PROJECT-CHARTER.md)
- [Architecture Plan](docs/02-ARCHITECTURE-PLAN.md)
- [Implementation Plan](docs/03-IMPLEMENTATION-PLAN.md)
- [Acceptance Criteria](docs/04-ACCEPTANCE-CRITERIA.md)
- [Risk Register](docs/05-RISK-REGISTER.md)
- [Worker Handoff](docs/06-WORKER-HANDOFF.md)
- [Review Checklist](docs/07-REVIEW-CHECKLIST.md)
- [Change Request](docs/08-CHANGE-REQUEST.md)
- [Project Status](docs/09-PROJECT-STATUS.md)
## Layout
```text
stoned-ai/
├── docs/
│ └── 00 through 09 governing docs
├── src/
│ └── stoned_ai/
├── tests/
└── README.md
```

View File

@@ -0,0 +1,58 @@
# Governance Rules
## Core Rule
The documents in this project set the governing plan for the project.
They are **never to be rewritten by implementation models** unless:
- the user explicitly requests a revision
- a supervisor model proposes a revision
- the user accepts the revision
## Allowed Editors
The following may change governing documents:
- the user
- a supervisor model acting under explicit user direction
The following may not change governing documents without approval:
- local coding models
- implementation agents
- documentation drafting agents
- background automation
## Allowed Actions For Implementation Models
- read governing docs
- restate requirements
- create implementation artifacts
- create subordinate notes
- create code and tests
- create proposed change requests
## Forbidden Actions For Implementation Models
- change project scope
- weaken constraints
- redefine acceptance criteria
- override architecture decisions
- silently reinterpret requirements
## Revision Process
If implementation reveals the plan is wrong or incomplete:
1. create a change request using `08-CHANGE-REQUEST.md`
2. stop changing the governing plan directly
3. escalate the change request to the user or a supervisor model
4. revise governing docs only after approval
## Decision Authority
- **Primary Stakeholder**: Jason
- **Supervisor**: Claude (planning, architecture, review)
- **Workhorse**: Implementation model (code, tests, scaffolding)
- **Reviewer**: Claude or a second model comparing implementation to docs

View File

@@ -0,0 +1,65 @@
# Project Charter
## Project Name
- Name: Stoned.AI
## Purpose
- Stoned.AI is a live-streamed, unscripted conversation show between a human host (Jason) and an AI.
- It fills a gap in the AI content space: instead of productivity tutorials, it is genuine, funny, and curious conversation — going wherever the discussion leads.
- It exists now because the concept was proven in a single conversation on April 1, 2026, and the local TTS and AI infrastructure to support it already exists on `svc-ai`.
## Goals
- Goal 1: Build a browser-based host interface where Jason types his side of the conversation and the AI responds, with both sides voiced via local Kokoro TTS.
- Goal 2: Provide a clean OBS-capturable broadcast view showing only the scrolling conversation feed — no controls, styled for streaming.
- Goal 3: Support at least one AI backend for responses (Codex or Gemini initially, Claude added after initial launch).
## Non-Goals
- Not an AI-to-AI debate tool. One human, one AI. That is the format.
- Not a productivity or workflow tool. The output is entertainment and conversation, not work product.
- Not a replacement for Arena. This is a separate project with a separate purpose.
- Not building a mobile app, desktop app, or browser extension. Web only for the initial version.
## Users / Stakeholders
- Primary user: Jason (host)
- Audience: YouTube live stream viewers
- Secondary stakeholders: none currently
## Constraints
- Must run on `svc-ai` (AMD Ryzen 5 3600, ~14 GiB RAM, no GPU).
- Must reuse the existing Kokoro TTS stack from the Arena project (`/opt/models/kokoro`, `pykokoro`).
- Must be capturable by OBS Studio as a browser source.
- No microphone dependency — both sides are text-in, voice-out.
- Host types their side; the system voices it. No speech-to-text.
## Deliverables
- Deliverable 1: A working `stoned-web` server with host input, AI response, and Kokoro TTS for both sides.
- Deliverable 2: A `/broadcast` view (no controls, OBS-ready) and a `/host` control view (input box, voice selection, session management).
- Deliverable 3: At least one wired AI backend capable of generating conversational responses.
- Deliverable 4: Per-speaker Kokoro voice assignment (host voice and AI voice are independently selectable).
## Success Definition
- Jason can type a message, hear it spoken in his chosen voice, the AI responds, and the AI response is spoken in its chosen voice.
- The broadcast view displays cleanly in OBS as a browser source.
- A full test conversation runs end to end without manual intervention.
- Jason can go live on YouTube using this as the audio and visual source.
## Authority
- User approval required: yes
- Supervisor revision required for charter changes: yes
## Signature
- Document role: governing
- Created by: Claude (supervisor)
- Created at: 2026-04-12
- Revision status: initial
- Future revision rule: this document may be revised only by the user or by an explicitly authorized supervisor revision

View File

@@ -0,0 +1,112 @@
# Architecture Plan
## Current State
- No implementation exists yet. This is a greenfield project.
- The Arena project (`/home/svc-admin/ai-projects/projects/arena`) provides reusable infrastructure:
- `src/arena/tts.py` — Kokoro TTS backend (`ArenaTTSManager`, `KokoroBackend`)
- `/opt/models/kokoro` — downloaded Kokoro voice models
- `pykokoro` — installed Python package
- Pattern for SSE-based real-time conversation delivery
- Pattern for WAV serving and browser audio playback
## Target State
A lightweight Python web server (`stoned-web`) with two browser-facing views:
1. **Host view** (`/host`) — Jason's control panel. Text input box, send button, voice selection per speaker, session start/stop, status display.
2. **Broadcast view** (`/broadcast`) — Clean, OBS-capturable page. Scrolling conversation cards only. No controls. Styled for stream.
Both views receive conversation turns over Server-Sent Events. The broadcast view is the OBS browser source. The host view is what Jason operates on his own screen.
## Design Principles
- Principle 1: **Text-in, voice-out for both sides.** The host types; the system voices. The AI generates text; the system voices. No microphone dependency.
- Principle 2: **Reuse Arena TTS infrastructure.** Do not reimplement Kokoro synthesis. Import and use `ArenaTTSManager` directly from the arena package or copy the relevant module.
- Principle 3: **Broadcast view is read-only.** The `/broadcast` URL has zero interactive elements. It exists only for OBS to consume.
- Principle 4: **One AI at a time.** The session has exactly one human speaker and one AI speaker. Multi-AI is not in scope.
## Major Components
- Component: **Web Server (`src/stoned_ai/web.py`)**
- Purpose: HTTP server handling both views, SSE streams, session state, and audio file serving.
- Responsibilities: Accept host message submissions. Dispatch AI calls. Trigger TTS for both sides. Serve WAV files. Push turns to connected SSE clients.
- Dependencies: `stoned_ai/tts.py`, `stoned_ai/ai.py`, standard library (`http.server` or a lightweight framework).
- Component: **TTS Layer (`src/stoned_ai/tts.py`)**
- Purpose: Synthesize WAV audio for any speaker given a voice ID and text.
- Responsibilities: Wrap `ArenaTTSManager` (or import the Arena `tts.py` module directly). Store generated WAVs in a session-scoped directory. Return a browser-fetchable path.
- Dependencies: `pykokoro`, `/opt/models/kokoro`.
- Component: **AI Backend (`src/stoned_ai/ai.py`)**
- Purpose: Call the configured AI model and return a clean text response.
- Responsibilities: Accept conversation history and a prompt. Call the model CLI or API. Return cleaned text. Initially wraps `codex exec` or `gemini -p`. Claude API added later.
- Dependencies: `subprocess` (for CLI backends), `anthropic` SDK (for Claude backend, Phase 2).
- Component: **Cleaning Engine (`src/stoned_ai/clean.py`)**
- Purpose: Strip CLI noise from AI responses.
- Responsibilities: Apply regex filters for Codex and Gemini banner lines, warnings, token counts.
- Dependencies: None beyond stdlib. Can be copied from Arena's `clean.py` and extended.
- Component: **Broadcast View (`/broadcast`)**
- Purpose: Clean, OBS-capturable HTML page.
- Responsibilities: Connect to the SSE stream. Render conversation cards. Play audio. Never show controls.
- Dependencies: Browser-side JavaScript only.
- Component: **Host View (`/host`)**
- Purpose: Jason's control panel for operating the show.
- Responsibilities: Text input and send. Voice selection per speaker. Session start/stop. Status display. Mirrors the conversation feed.
- Dependencies: Browser-side JavaScript only.
## Data Flow
1. Jason opens `/host` in his browser and `/broadcast` in OBS as a browser source.
2. Jason starts a session, selects voices for himself and the AI, enters the opening topic or first message.
3. Jason types his message and hits send.
4. Server receives the message, queues it as a "host turn."
5. Server calls Kokoro TTS for Jason's voice, stores the WAV, pushes the turn to all SSE clients.
6. Both views render the host card. Both play the WAV audio.
7. Server calls the AI backend with the conversation history.
8. AI returns a text response. Server cleans it.
9. Server calls Kokoro TTS for the AI voice, stores the WAV, pushes the AI turn to all SSE clients.
10. Both views render the AI card. Both play the WAV audio.
11. Repeat from step 3.
## Key Decisions
- Decision 1: **Copy or import Arena's TTS module rather than duplicating Kokoro logic.**
- Why: `ArenaTTSManager` is already tested and handles session audio, path safety, and pipeline caching.
- Tradeoff: Creates a dependency on Arena's internal code. Mitigated by treating it as a stable utility layer.
- Decision 2: **Two separate URLs for host and broadcast.**
- Why: The host needs controls. OBS must not capture controls. Mixing them on one page creates layout complexity and accidental capture risk.
- Tradeoff: Two SSE connections instead of one. Acceptable at this scale.
- Decision 3: **Start with CLI-based AI backends (Codex/Gemini), add Claude API in Phase 2.**
- Why: Both CLIs are already present and working on `svc-ai`. Fastest path to a functional prototype.
- Tradeoff: CLI output noise requires cleaning. Claude API (Phase 2) is cleaner but needs an API key and the `anthropic` SDK.
- Decision 4: **No speech-to-text. Host types.**
- Why: Eliminates microphone capture, audio routing, and STT accuracy problems. Aligns with how Jason already works.
- Tradeoff: Host must type during the live stream. This is the intended format — the typing is part of the show.
## Rejected Alternatives
- Alternative: Using Arena's existing `arena-web` server with modifications.
- Why rejected: Arena is an AI-to-AI tool. Retrofitting a human-in-the-loop mode and a separate broadcast view would require significant changes to Arena's core, risking regressions. A clean separate project is lower risk and lower coupling.
- Alternative: Streaming audio from `svc-ai` to a Windows machine via virtual audio cable.
- Why rejected: The browser-source approach in OBS is simpler, more reliable, and already proven in the Arena project. All audio plays in the browser, which OBS captures directly.
## Open Questions
- Question 1: Should the Claude API backend use claude-sonnet-4-6 as the default, or should the model be configurable per session?
- Question 2: Should conversation history be capped at a rolling window to prevent prompt length creep, or left unbounded for the initial version?
## Signature
- Document role: governing
- Created by: Claude (supervisor)
- Created at: 2026-04-12
- Revision status: initial
- Future revision rule: this document may be revised only by the user or by an explicitly authorized supervisor revision

View File

@@ -0,0 +1,111 @@
# Implementation Plan
## Scope For This Implementation
- Included:
- Project scaffold (`pyproject.toml`, `src/stoned_ai/`, `tests/`)
- TTS layer wrapping Arena's Kokoro backend
- Cleaning engine for AI CLI output noise
- AI backend abstraction supporting Codex and Gemini CLI backends
- Web server with SSE delivery
- Host view (`/host`) with text input, send, voice selection, session control
- Broadcast view (`/broadcast`) styled for OBS browser source capture
- WAV audio serving for both views
- Per-speaker voice assignment (host voice + AI voice)
- `install.sh` script and `~/.local/bin/stoned-web` link
- Excluded from initial build:
- Claude API backend (Phase 2)
- Visual avatar or waveform animation overlay
- YouTube chat integration
- Persistent conversation logging (nice to have, not required for launch)
- Mobile-responsive host view (desktop only for now)
## Phases
### Phase 1: Project Scaffold and Core Backend
- Objective: Establish the Python package, TTS layer, cleaning engine, and AI backend abstraction.
- Files likely affected:
- `pyproject.toml`
- `src/stoned_ai/__init__.py`
- `src/stoned_ai/tts.py`
- `src/stoned_ai/clean.py`
- `src/stoned_ai/ai.py`
- `scripts/install.sh`
- Risks: `pykokoro` import paths may differ slightly from Arena's. Verify import compatibility before writing TTS layer.
- Exit criteria: `stoned_ai.tts` can synthesize a WAV from text using a Kokoro voice. `stoned_ai.ai` can call Codex or Gemini and return a clean string.
### Phase 2: Web Server and SSE Delivery
- Objective: Build the HTTP server, session state management, SSE event stream, and WAV file serving.
- Files likely affected:
- `src/stoned_ai/web.py`
- Risks: Session state must be thread-safe. SSE connections from both `/host` and `/broadcast` must receive the same events.
- Exit criteria: A session can be started. A host message can be submitted. The AI responds. Both turns are pushed over SSE. Both turns are voiced.
### Phase 3: Host View (`/host`)
- Objective: Build the host's control panel HTML/CSS/JS page.
- Files likely affected:
- `src/stoned_ai/web.py` (inline HTML or template)
- Risks: Voice selection dropdown must populate from the live Kokoro voice list. If the voice list is slow to load, display a loading state.
- Exit criteria: Jason can open `/host`, start a session, pick voices, type and send a message, hear his voice, hear the AI's voice, and stop the session.
### Phase 4: Broadcast View (`/broadcast`)
- Objective: Build the clean, OBS-capturable broadcast page.
- Files likely affected:
- `src/stoned_ai/web.py` (inline HTML or template)
- Risks: OBS browser source must auto-play audio. Verify OBS audio capture works with the WAV playback approach before marking complete.
- Exit criteria: `/broadcast` shows only conversation cards. No controls are visible. OBS captures the page. Audio plays in OBS without manual permission prompts.
### Phase 5: Claude API Backend (Post-Launch)
- Objective: Add a Claude backend using the `anthropic` SDK as an alternative to Codex/Gemini.
- Files likely affected:
- `src/stoned_ai/ai.py`
- `pyproject.toml` (add `anthropic` dependency)
- Risks: Requires a valid `ANTHROPIC_API_KEY` environment variable on `svc-ai`. Must not break existing Codex/Gemini backends.
- Exit criteria: The host view offers a Claude model option. A full conversation runs using the Claude API backend.
## Order Of Operations
1. Create `pyproject.toml` and package scaffold.
2. Implement `tts.py` (Kokoro wrapper).
3. Implement `clean.py` (noise stripping for Codex and Gemini).
4. Implement `ai.py` (Codex and Gemini backends).
5. Implement `web.py` — server core, session state, SSE stream, WAV serving.
6. Implement `/host` view in `web.py`.
7. Implement `/broadcast` view in `web.py`.
8. Write `scripts/install.sh`.
9. Smoke test: full end-to-end conversation from host view to broadcast view.
10. Verify OBS browser source audio capture.
## Testing Expectations
- Unit tests: `tts.py` voice listing. `clean.py` noise stripping against fixture strings. `ai.py` CLI argument construction (mock subprocess).
- Integration tests: Full SSE event sequence from host message submit to broadcast card render. Requires a live Codex or Gemini CLI.
- Manual verification: OBS audio capture. Visual broadcast layout on stream. Per-speaker voice differentiation.
## Documentation Expectations
- `README.md` must be updated with usage instructions after Phase 2 is complete.
- `docs/09-PROJECT-STATUS.md` must be updated after each phase completes.
- `docs/06-WORKER-HANDOFF.md` must be updated before handing off to the implementation model.
## Escalation Conditions
- Stop and raise a change request if:
- `pykokoro` cannot be imported without installing the full Arena package.
- The Kokoro voice pipeline requires GPU on the current hardware and fails on CPU.
- OBS cannot capture audio from a browser source pointing at `svc-ai` without additional configuration.
- The Codex or Gemini CLI output format has changed in a way that breaks the cleaning engine.
## Signature
- Document role: governing
- Created by: Claude (supervisor)
- Created at: 2026-04-12
- Revision status: initial
- Future revision rule: this document may be revised only by the user or by an explicitly authorized supervisor revision

View File

@@ -0,0 +1,51 @@
# Acceptance Criteria
## Functional Criteria
- Criterion 1: The host can open `/host`, start a session, type a message, and submit it.
- Criterion 2: The host's typed message is synthesized to WAV audio using the host's selected Kokoro voice and played back.
- Criterion 3: After the host submits, the AI backend generates a response and it is synthesized to WAV audio using the AI's selected Kokoro voice and played back.
- Criterion 4: Both the host message and the AI response appear as conversation cards in the feed on both `/host` and `/broadcast`.
- Criterion 5: The `/broadcast` view contains no interactive controls — only the scrolling conversation feed and audio playback.
- Criterion 6: OBS Studio can use `/broadcast` as a browser source and capture the conversation cards and audio.
- Criterion 7: The host and AI each have an independently selectable Kokoro voice.
- Criterion 8: The host can stop an in-progress session from the host view.
- Criterion 9: A new session can be started after a previous one ends without restarting the server.
## Non-Functional Criteria
- Performance: TTS synthesis must begin immediately after a turn is received. The next card must not appear until the current turn's audio has finished playing.
- Reliability: The server must not crash if the AI backend times out. A timeout must surface as an error card in the feed, not a silent hang.
- Security: The server is local-network only (`svc-ai`). No authentication is required. The WAV serving path must be sandboxed to the session audio directory to prevent path traversal.
- Maintainability: AI backends must be swappable without changes to the web server or TTS layer. Adding Claude in Phase 2 must require changes only to `ai.py` and `pyproject.toml`.
## Documentation Criteria
- `README.md` updated with installation and usage instructions.
- `docs/09-PROJECT-STATUS.md` updated to reflect completed phases.
- `docs/06-WORKER-HANDOFF.md` updated with current build state before each implementation handoff.
## Validation Criteria
- Test evidence required: unit tests for `tts.py` voice listing, `clean.py` noise filtering, `ai.py` CLI argument construction.
- Review evidence required: supervisor review of Phase 1 and Phase 2 output before Phase 3 begins.
- Manual validation required: OBS browser source audio capture verified by Jason.
## Definition Of Done
The project is done (Phase 14 complete) only when:
- all functional criteria above are satisfied
- the server runs stably on `svc-ai`
- OBS can capture the broadcast view end-to-end
- Jason has completed at least one test conversation from start to finish using the host view and the broadcast view simultaneously
- required documentation is updated
- supervisor review is complete
## Signature
- Document role: governing
- Created by: Claude (supervisor)
- Created at: 2026-04-12
- Revision status: initial
- Future revision rule: this document may be revised only by the user or by an explicitly authorized supervisor revision

56
docs/05-RISK-REGISTER.md Normal file
View File

@@ -0,0 +1,56 @@
# Risk Register
## Risks
### Risk 1
- Risk: `pykokoro` is installed in Arena's virtualenv only. Stoned.AI may not be able to import it without installing Arena as a dependency or duplicating the virtualenv.
- Impact: High — TTS is a core feature. Without it the project cannot function.
- Likelihood: Medium — the Arena install script installs `pykokoro` into a project-local venv, not system-wide.
- Mitigation: Add `pykokoro` as a direct dependency in `pyproject.toml`. Install into a fresh project-local venv. Verify the import works independently before proceeding with the TTS layer.
- Owner: Implementation model (Phase 1).
- Trigger: `import pykokoro` fails in the Stoned.AI venv.
### Risk 2
- Risk: OBS Studio browser source does not auto-play audio without user interaction, blocked by browser autoplay policy.
- Impact: High — audio playback in the broadcast view is a core feature.
- Likelihood: Medium — OBS uses Chromium internally and has its own audio handling. This is a known issue for some browser source setups.
- Mitigation: Test OBS audio capture during Phase 4 before marking the phase complete. If autoplay is blocked, investigate OBS browser source audio settings (`Enable JavaScript`, `Allow Plugins`, global audio device assignment in OBS scene settings).
- Owner: Jason (manual verification) with implementation support.
- Trigger: Audio does not play in OBS browser source during Phase 4 testing.
### Risk 3
- Risk: Kokoro synthesis is slow on CPU (no GPU on `svc-ai`), causing noticeable latency between turn submission and audio playback.
- Impact: Medium — does not break functionality but degrades the live stream experience.
- Likelihood: Medium — Kokoro is rated CPU-viable but synthesis time varies by voice and text length.
- Mitigation: Gate the next card rendering until audio is ready (already the Arena pattern). Keep host messages and AI responses concise. If latency is unacceptable, investigate Piper TTS as a faster CPU fallback.
- Owner: Implementation model (Phase 2), Jason (evaluation).
- Trigger: Synthesis takes more than 5 seconds for a typical 23 sentence response.
### Risk 4
- Risk: AI backend CLI output format changes (Codex or Gemini banner updates) causing the cleaning engine to miss noise or strip dialogue.
- Impact: Medium — visible noise in the conversation feed degrades the stream.
- Likelihood: Low — CLI tools update occasionally, but not frequently.
- Mitigation: Maintain a robust `clean.py` based on Arena's proven patterns. Add a debug mode flag that shows raw output for troubleshooting.
- Owner: Implementation model.
- Trigger: Conversation feed shows CLI banner lines or token counts.
### Risk 5
- Risk: The host and broadcast views share an SSE stream. If the broadcast view reconnects (e.g. OBS browser source refresh), it may miss turns that occurred during the gap.
- Impact: Low — the broadcast feed would fall behind but would recover on the next turn.
- Likelihood: Low — OBS browser sources are generally stable once connected.
- Mitigation: On SSE reconnect, replay the current session's conversation history as catch-up events before resuming the live stream. This is optional for the initial build.
- Owner: Implementation model (Phase 2, optional enhancement).
- Trigger: OBS browser source shows an incomplete conversation after a refresh.
## Signature
- Document role: governing
- Created by: Claude (supervisor)
- Created at: 2026-04-12
- Revision status: initial
- Future revision rule: this document may be revised only by the user or by an explicitly authorized supervisor revision

113
docs/06-WORKER-HANDOFF.md Normal file
View File

@@ -0,0 +1,113 @@
# Worker Handoff
## Instructions For Implementation Models
You are an implementation model operating under a supervisor-approved plan for the Stoned.AI project.
## You Must
- follow the governing docs exactly
- implement only the approved scope (Phases 14 as defined in `03-IMPLEMENTATION-PLAN.md`)
- report conflicts instead of improvising policy changes
- keep changes aligned to the acceptance criteria in `04-ACCEPTANCE-CRITERIA.md`
- preserve architecture decisions in `02-ARCHITECTURE-PLAN.md` unless a change request is approved
## You Must Not
- rewrite governing docs
- change scope on your own
- add features not listed in the implementation plan
- weaken constraints (e.g. do not add microphone input, do not skip the broadcast view)
- invent acceptance criteria
## Inputs You Should Read First
1. `00-GOVERNANCE-RULES.md`
2. `01-PROJECT-CHARTER.md`
3. `02-ARCHITECTURE-PLAN.md`
4. `03-IMPLEMENTATION-PLAN.md`
5. `04-ACCEPTANCE-CRITERIA.md`
6. `05-RISK-REGISTER.md`
## Critical Context
### What This Project Is
Stoned.AI is a live-streamed, unscripted conversation show. One human host (Jason) types his side. One AI generates responses. **Both sides are voiced via local Kokoro TTS.** The conversation displays as scrolling cards in a browser.
There are two browser views:
- `/host` — Jason's control panel (input, voice selection, session control)
- `/broadcast` — Clean OBS-capturable feed (no controls, cards and audio only)
### What Already Exists (Do Not Rebuild)
The Arena project at `/home/svc-admin/ai-projects/projects/arena` contains:
- A working Kokoro TTS backend: `src/arena/tts.py` — class `ArenaTTSManager`
- WAV file generation, session audio directories, path safety logic
- Cleaning engine patterns: `src/arena/clean.py`
- Proven SSE delivery pattern: `src/arena/web.py`
**Reuse these patterns.** Do not reinvent Kokoro integration from scratch. Import or copy the relevant code.
### Package Layout
```text
stoned-ai/
├── pyproject.toml
├── README.md
├── scripts/
│ └── install.sh
├── src/
│ └── stoned_ai/
│ ├── __init__.py
│ ├── ai.py — AI backend (Codex, Gemini)
│ ├── clean.py — CLI noise stripping
│ ├── tts.py — Kokoro TTS wrapper
│ └── web.py — HTTP server, SSE, host and broadcast views
└── tests/
```
### Entry Point
`pyproject.toml` should define:
```toml
[project.scripts]
stoned-web = "stoned_ai.web:main"
```
### AI Backends
Phase 1 requires Codex and Gemini CLI backends only.
Codex call pattern (from Arena):
```
codex exec --skip-git-repo-check --color never -o <output_file> <prompt>
```
Gemini call pattern (from Arena):
```
gemini -p <prompt>
```
### TTS Path
Generated WAV files live under:
```
/opt/models/arena-voices/generated/session-<id>/
```
This path is already used by Arena. Use the same root to avoid duplication.
### Known Environment
- Host: `svc-ai`
- Python: 3.12
- Kokoro models: `/opt/models/kokoro/cache`
- Arena venv (reference only): `/home/svc-admin/ai-projects/projects/arena/.venv`
## Output Expectations
After each phase, report:
- what was changed
- what was not changed
- what remains blocked or needs escalation
- any change requests needed

View File

@@ -0,0 +1,60 @@
# Review Checklist
## Review Goal
Compare the implementation against the governing docs. Confirm the build matches the charter, architecture, and acceptance criteria before Jason signs off.
## Check These First
- Does the implementation match the charter? (One human, one AI, both voiced, two browser views)
- Does it respect architecture decisions? (Reused Kokoro backend, separate host/broadcast URLs, CLI backends first)
- Does it remain inside scope? (No microphone, no STT, no Claude API yet)
- Are acceptance criteria satisfied? (See `04-ACCEPTANCE-CRITERIA.md` for full list)
- Are risks handled or explicitly accepted? (See `05-RISK-REGISTER.md`)
## Technical Review
- Correctness:
- Does `/host` accept host text input and submit it to the server?
- Does the server call the AI backend and return a clean response?
- Does TTS synthesis run for both host and AI turns?
- Are WAV files served correctly and played in sequence?
- Do both `/host` and `/broadcast` receive SSE events?
- Behavioral regressions:
- Does a timeout on the AI backend produce a visible error, not a hang?
- Can the host start a new session after ending a previous one without restarting the server?
- Does the broadcast view contain zero interactive elements?
- Missing tests:
- Is there a unit test for `clean.py` noise stripping?
- Is there a unit test for `ai.py` CLI argument construction?
- Is there a unit test for `tts.py` voice listing?
- Hidden complexity:
- Is session state thread-safe?
- Is the WAV serving path sandboxed to the session directory?
- Security concerns:
- Path traversal: can a crafted URL escape the generated audio directory?
- Operational concerns:
- Is `scripts/install.sh` present and functional?
- Does `stoned-web` start cleanly and bind to `0.0.0.0:8766` (or similar, not conflicting with Arena's 8765)?
## Documentation Review
- `README.md` updated with install and usage instructions: yes/no
- `docs/09-PROJECT-STATUS.md` updated to reflect completed phases: yes/no
- `docs/06-WORKER-HANDOFF.md` updated with current build state: yes/no
## Final Decision
- Accept
- Reject
- Request revision
## Notes
- Findings:
- Required follow-up:

33
docs/08-CHANGE-REQUEST.md Normal file
View File

@@ -0,0 +1,33 @@
# Change Request
## Summary
- Proposed change: (none pending)
## Reason
- Why is the current governing plan insufficient or wrong?
## Requested Document Changes
- Document:
- Proposed revision:
## Impact
- Scope impact:
- Architecture impact:
- Risk impact:
- Testing impact:
- Timeline impact:
## Recommendation
- Approve
- Reject
- Defer
## Approval
- User decision:
- Supervisor recommendation:

38
docs/09-PROJECT-STATUS.md Normal file
View File

@@ -0,0 +1,38 @@
# Project Status
## Current Status
- Phase: Pre-implementation
- State: Governance scaffold complete. Ready for implementation handoff.
- Last updated: 2026-04-12
## Completed
- Project directory created
- Governance documentation written (docs 0009)
- Git initialized
- README and .gitignore created
## In Progress
- Nothing. Awaiting implementation model handoff.
## Blocked
- Nothing currently blocked.
## Next Actions
1. Hand off to implementation model with `06-WORKER-HANDOFF.md` as the entry point.
2. Implementation model completes Phase 1 (scaffold + TTS + AI + clean).
3. Supervisor reviews Phase 1 output.
4. Implementation model proceeds to Phase 2 (web server + SSE).
5. Continue through Phase 3 (host view) and Phase 4 (broadcast view).
6. Jason tests end-to-end with OBS.
7. Create Gitea repo (`AccursedBinkie/stoned-ai`) and push baseline.
## Notes
- Port assignment: use `8766` to avoid conflict with Arena's default `8765`.
- The Arena project at `/home/svc-admin/ai-projects/projects/arena` is the reference implementation for TTS, SSE, and cleaning patterns.
- Gitea remote has not been created yet. Do not push until the Gitea repo exists.