Initialize project governance and baseline structure
Stoned.AI — live-streamed human + AI conversation show, both sides voiced via local Kokoro TTS. Governance docs 00-09, README, .gitignore. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
58
docs/00-GOVERNANCE-RULES.md
Normal file
58
docs/00-GOVERNANCE-RULES.md
Normal file
@@ -0,0 +1,58 @@
|
||||
# Governance Rules
|
||||
|
||||
## Core Rule
|
||||
|
||||
The documents in this project set the governing plan for the project.
|
||||
|
||||
They are **never to be rewritten by implementation models** unless:
|
||||
|
||||
- the user explicitly requests a revision
|
||||
- a supervisor model proposes a revision
|
||||
- the user accepts the revision
|
||||
|
||||
## Allowed Editors
|
||||
|
||||
The following may change governing documents:
|
||||
|
||||
- the user
|
||||
- a supervisor model acting under explicit user direction
|
||||
|
||||
The following may not change governing documents without approval:
|
||||
|
||||
- local coding models
|
||||
- implementation agents
|
||||
- documentation drafting agents
|
||||
- background automation
|
||||
|
||||
## Allowed Actions For Implementation Models
|
||||
|
||||
- read governing docs
|
||||
- restate requirements
|
||||
- create implementation artifacts
|
||||
- create subordinate notes
|
||||
- create code and tests
|
||||
- create proposed change requests
|
||||
|
||||
## Forbidden Actions For Implementation Models
|
||||
|
||||
- change project scope
|
||||
- weaken constraints
|
||||
- redefine acceptance criteria
|
||||
- override architecture decisions
|
||||
- silently reinterpret requirements
|
||||
|
||||
## Revision Process
|
||||
|
||||
If implementation reveals the plan is wrong or incomplete:
|
||||
|
||||
1. create a change request using `08-CHANGE-REQUEST.md`
|
||||
2. stop changing the governing plan directly
|
||||
3. escalate the change request to the user or a supervisor model
|
||||
4. revise governing docs only after approval
|
||||
|
||||
## Decision Authority
|
||||
|
||||
- **Primary Stakeholder**: Jason
|
||||
- **Supervisor**: Claude (planning, architecture, review)
|
||||
- **Workhorse**: Implementation model (code, tests, scaffolding)
|
||||
- **Reviewer**: Claude or a second model comparing implementation to docs
|
||||
65
docs/01-PROJECT-CHARTER.md
Normal file
65
docs/01-PROJECT-CHARTER.md
Normal file
@@ -0,0 +1,65 @@
|
||||
# Project Charter
|
||||
|
||||
## Project Name
|
||||
|
||||
- Name: Stoned.AI
|
||||
|
||||
## Purpose
|
||||
|
||||
- Stoned.AI is a live-streamed, unscripted conversation show between a human host (Jason) and an AI.
|
||||
- It fills a gap in the AI content space: instead of productivity tutorials, it is genuine, funny, and curious conversation — going wherever the discussion leads.
|
||||
- It exists now because the concept was proven in a single conversation on April 1, 2026, and the local TTS and AI infrastructure to support it already exists on `svc-ai`.
|
||||
|
||||
## Goals
|
||||
|
||||
- Goal 1: Build a browser-based host interface where Jason types his side of the conversation and the AI responds, with both sides voiced via local Kokoro TTS.
|
||||
- Goal 2: Provide a clean OBS-capturable broadcast view showing only the scrolling conversation feed — no controls, styled for streaming.
|
||||
- Goal 3: Support at least one AI backend for responses (Codex or Gemini initially, Claude added after initial launch).
|
||||
|
||||
## Non-Goals
|
||||
|
||||
- Not an AI-to-AI debate tool. One human, one AI. That is the format.
|
||||
- Not a productivity or workflow tool. The output is entertainment and conversation, not work product.
|
||||
- Not a replacement for Arena. This is a separate project with a separate purpose.
|
||||
- Not building a mobile app, desktop app, or browser extension. Web only for the initial version.
|
||||
|
||||
## Users / Stakeholders
|
||||
|
||||
- Primary user: Jason (host)
|
||||
- Audience: YouTube live stream viewers
|
||||
- Secondary stakeholders: none currently
|
||||
|
||||
## Constraints
|
||||
|
||||
- Must run on `svc-ai` (AMD Ryzen 5 3600, ~14 GiB RAM, no GPU).
|
||||
- Must reuse the existing Kokoro TTS stack from the Arena project (`/opt/models/kokoro`, `pykokoro`).
|
||||
- Must be capturable by OBS Studio as a browser source.
|
||||
- No microphone dependency — both sides are text-in, voice-out.
|
||||
- Host types their side; the system voices it. No speech-to-text.
|
||||
|
||||
## Deliverables
|
||||
|
||||
- Deliverable 1: A working `stoned-web` server with host input, AI response, and Kokoro TTS for both sides.
|
||||
- Deliverable 2: A `/broadcast` view (no controls, OBS-ready) and a `/host` control view (input box, voice selection, session management).
|
||||
- Deliverable 3: At least one wired AI backend capable of generating conversational responses.
|
||||
- Deliverable 4: Per-speaker Kokoro voice assignment (host voice and AI voice are independently selectable).
|
||||
|
||||
## Success Definition
|
||||
|
||||
- Jason can type a message, hear it spoken in his chosen voice, the AI responds, and the AI response is spoken in its chosen voice.
|
||||
- The broadcast view displays cleanly in OBS as a browser source.
|
||||
- A full test conversation runs end to end without manual intervention.
|
||||
- Jason can go live on YouTube using this as the audio and visual source.
|
||||
|
||||
## Authority
|
||||
|
||||
- User approval required: yes
|
||||
- Supervisor revision required for charter changes: yes
|
||||
|
||||
## Signature
|
||||
|
||||
- Document role: governing
|
||||
- Created by: Claude (supervisor)
|
||||
- Created at: 2026-04-12
|
||||
- Revision status: initial
|
||||
- Future revision rule: this document may be revised only by the user or by an explicitly authorized supervisor revision
|
||||
112
docs/02-ARCHITECTURE-PLAN.md
Normal file
112
docs/02-ARCHITECTURE-PLAN.md
Normal file
@@ -0,0 +1,112 @@
|
||||
# Architecture Plan
|
||||
|
||||
## Current State
|
||||
|
||||
- No implementation exists yet. This is a greenfield project.
|
||||
- The Arena project (`/home/svc-admin/ai-projects/projects/arena`) provides reusable infrastructure:
|
||||
- `src/arena/tts.py` — Kokoro TTS backend (`ArenaTTSManager`, `KokoroBackend`)
|
||||
- `/opt/models/kokoro` — downloaded Kokoro voice models
|
||||
- `pykokoro` — installed Python package
|
||||
- Pattern for SSE-based real-time conversation delivery
|
||||
- Pattern for WAV serving and browser audio playback
|
||||
|
||||
## Target State
|
||||
|
||||
A lightweight Python web server (`stoned-web`) with two browser-facing views:
|
||||
|
||||
1. **Host view** (`/host`) — Jason's control panel. Text input box, send button, voice selection per speaker, session start/stop, status display.
|
||||
2. **Broadcast view** (`/broadcast`) — Clean, OBS-capturable page. Scrolling conversation cards only. No controls. Styled for stream.
|
||||
|
||||
Both views receive conversation turns over Server-Sent Events. The broadcast view is the OBS browser source. The host view is what Jason operates on his own screen.
|
||||
|
||||
## Design Principles
|
||||
|
||||
- Principle 1: **Text-in, voice-out for both sides.** The host types; the system voices. The AI generates text; the system voices. No microphone dependency.
|
||||
- Principle 2: **Reuse Arena TTS infrastructure.** Do not reimplement Kokoro synthesis. Import and use `ArenaTTSManager` directly from the arena package or copy the relevant module.
|
||||
- Principle 3: **Broadcast view is read-only.** The `/broadcast` URL has zero interactive elements. It exists only for OBS to consume.
|
||||
- Principle 4: **One AI at a time.** The session has exactly one human speaker and one AI speaker. Multi-AI is not in scope.
|
||||
|
||||
## Major Components
|
||||
|
||||
- Component: **Web Server (`src/stoned_ai/web.py`)**
|
||||
- Purpose: HTTP server handling both views, SSE streams, session state, and audio file serving.
|
||||
- Responsibilities: Accept host message submissions. Dispatch AI calls. Trigger TTS for both sides. Serve WAV files. Push turns to connected SSE clients.
|
||||
- Dependencies: `stoned_ai/tts.py`, `stoned_ai/ai.py`, standard library (`http.server` or a lightweight framework).
|
||||
|
||||
- Component: **TTS Layer (`src/stoned_ai/tts.py`)**
|
||||
- Purpose: Synthesize WAV audio for any speaker given a voice ID and text.
|
||||
- Responsibilities: Wrap `ArenaTTSManager` (or import the Arena `tts.py` module directly). Store generated WAVs in a session-scoped directory. Return a browser-fetchable path.
|
||||
- Dependencies: `pykokoro`, `/opt/models/kokoro`.
|
||||
|
||||
- Component: **AI Backend (`src/stoned_ai/ai.py`)**
|
||||
- Purpose: Call the configured AI model and return a clean text response.
|
||||
- Responsibilities: Accept conversation history and a prompt. Call the model CLI or API. Return cleaned text. Initially wraps `codex exec` or `gemini -p`. Claude API added later.
|
||||
- Dependencies: `subprocess` (for CLI backends), `anthropic` SDK (for Claude backend, Phase 2).
|
||||
|
||||
- Component: **Cleaning Engine (`src/stoned_ai/clean.py`)**
|
||||
- Purpose: Strip CLI noise from AI responses.
|
||||
- Responsibilities: Apply regex filters for Codex and Gemini banner lines, warnings, token counts.
|
||||
- Dependencies: None beyond stdlib. Can be copied from Arena's `clean.py` and extended.
|
||||
|
||||
- Component: **Broadcast View (`/broadcast`)**
|
||||
- Purpose: Clean, OBS-capturable HTML page.
|
||||
- Responsibilities: Connect to the SSE stream. Render conversation cards. Play audio. Never show controls.
|
||||
- Dependencies: Browser-side JavaScript only.
|
||||
|
||||
- Component: **Host View (`/host`)**
|
||||
- Purpose: Jason's control panel for operating the show.
|
||||
- Responsibilities: Text input and send. Voice selection per speaker. Session start/stop. Status display. Mirrors the conversation feed.
|
||||
- Dependencies: Browser-side JavaScript only.
|
||||
|
||||
## Data Flow
|
||||
|
||||
1. Jason opens `/host` in his browser and `/broadcast` in OBS as a browser source.
|
||||
2. Jason starts a session, selects voices for himself and the AI, enters the opening topic or first message.
|
||||
3. Jason types his message and hits send.
|
||||
4. Server receives the message, queues it as a "host turn."
|
||||
5. Server calls Kokoro TTS for Jason's voice, stores the WAV, pushes the turn to all SSE clients.
|
||||
6. Both views render the host card. Both play the WAV audio.
|
||||
7. Server calls the AI backend with the conversation history.
|
||||
8. AI returns a text response. Server cleans it.
|
||||
9. Server calls Kokoro TTS for the AI voice, stores the WAV, pushes the AI turn to all SSE clients.
|
||||
10. Both views render the AI card. Both play the WAV audio.
|
||||
11. Repeat from step 3.
|
||||
|
||||
## Key Decisions
|
||||
|
||||
- Decision 1: **Copy or import Arena's TTS module rather than duplicating Kokoro logic.**
|
||||
- Why: `ArenaTTSManager` is already tested and handles session audio, path safety, and pipeline caching.
|
||||
- Tradeoff: Creates a dependency on Arena's internal code. Mitigated by treating it as a stable utility layer.
|
||||
|
||||
- Decision 2: **Two separate URLs for host and broadcast.**
|
||||
- Why: The host needs controls. OBS must not capture controls. Mixing them on one page creates layout complexity and accidental capture risk.
|
||||
- Tradeoff: Two SSE connections instead of one. Acceptable at this scale.
|
||||
|
||||
- Decision 3: **Start with CLI-based AI backends (Codex/Gemini), add Claude API in Phase 2.**
|
||||
- Why: Both CLIs are already present and working on `svc-ai`. Fastest path to a functional prototype.
|
||||
- Tradeoff: CLI output noise requires cleaning. Claude API (Phase 2) is cleaner but needs an API key and the `anthropic` SDK.
|
||||
|
||||
- Decision 4: **No speech-to-text. Host types.**
|
||||
- Why: Eliminates microphone capture, audio routing, and STT accuracy problems. Aligns with how Jason already works.
|
||||
- Tradeoff: Host must type during the live stream. This is the intended format — the typing is part of the show.
|
||||
|
||||
## Rejected Alternatives
|
||||
|
||||
- Alternative: Using Arena's existing `arena-web` server with modifications.
|
||||
- Why rejected: Arena is an AI-to-AI tool. Retrofitting a human-in-the-loop mode and a separate broadcast view would require significant changes to Arena's core, risking regressions. A clean separate project is lower risk and lower coupling.
|
||||
|
||||
- Alternative: Streaming audio from `svc-ai` to a Windows machine via virtual audio cable.
|
||||
- Why rejected: The browser-source approach in OBS is simpler, more reliable, and already proven in the Arena project. All audio plays in the browser, which OBS captures directly.
|
||||
|
||||
## Open Questions
|
||||
|
||||
- Question 1: Should the Claude API backend use claude-sonnet-4-6 as the default, or should the model be configurable per session?
|
||||
- Question 2: Should conversation history be capped at a rolling window to prevent prompt length creep, or left unbounded for the initial version?
|
||||
|
||||
## Signature
|
||||
|
||||
- Document role: governing
|
||||
- Created by: Claude (supervisor)
|
||||
- Created at: 2026-04-12
|
||||
- Revision status: initial
|
||||
- Future revision rule: this document may be revised only by the user or by an explicitly authorized supervisor revision
|
||||
111
docs/03-IMPLEMENTATION-PLAN.md
Normal file
111
docs/03-IMPLEMENTATION-PLAN.md
Normal file
@@ -0,0 +1,111 @@
|
||||
# Implementation Plan
|
||||
|
||||
## Scope For This Implementation
|
||||
|
||||
- Included:
|
||||
- Project scaffold (`pyproject.toml`, `src/stoned_ai/`, `tests/`)
|
||||
- TTS layer wrapping Arena's Kokoro backend
|
||||
- Cleaning engine for AI CLI output noise
|
||||
- AI backend abstraction supporting Codex and Gemini CLI backends
|
||||
- Web server with SSE delivery
|
||||
- Host view (`/host`) with text input, send, voice selection, session control
|
||||
- Broadcast view (`/broadcast`) styled for OBS browser source capture
|
||||
- WAV audio serving for both views
|
||||
- Per-speaker voice assignment (host voice + AI voice)
|
||||
- `install.sh` script and `~/.local/bin/stoned-web` link
|
||||
|
||||
- Excluded from initial build:
|
||||
- Claude API backend (Phase 2)
|
||||
- Visual avatar or waveform animation overlay
|
||||
- YouTube chat integration
|
||||
- Persistent conversation logging (nice to have, not required for launch)
|
||||
- Mobile-responsive host view (desktop only for now)
|
||||
|
||||
## Phases
|
||||
|
||||
### Phase 1: Project Scaffold and Core Backend
|
||||
|
||||
- Objective: Establish the Python package, TTS layer, cleaning engine, and AI backend abstraction.
|
||||
- Files likely affected:
|
||||
- `pyproject.toml`
|
||||
- `src/stoned_ai/__init__.py`
|
||||
- `src/stoned_ai/tts.py`
|
||||
- `src/stoned_ai/clean.py`
|
||||
- `src/stoned_ai/ai.py`
|
||||
- `scripts/install.sh`
|
||||
- Risks: `pykokoro` import paths may differ slightly from Arena's. Verify import compatibility before writing TTS layer.
|
||||
- Exit criteria: `stoned_ai.tts` can synthesize a WAV from text using a Kokoro voice. `stoned_ai.ai` can call Codex or Gemini and return a clean string.
|
||||
|
||||
### Phase 2: Web Server and SSE Delivery
|
||||
|
||||
- Objective: Build the HTTP server, session state management, SSE event stream, and WAV file serving.
|
||||
- Files likely affected:
|
||||
- `src/stoned_ai/web.py`
|
||||
- Risks: Session state must be thread-safe. SSE connections from both `/host` and `/broadcast` must receive the same events.
|
||||
- Exit criteria: A session can be started. A host message can be submitted. The AI responds. Both turns are pushed over SSE. Both turns are voiced.
|
||||
|
||||
### Phase 3: Host View (`/host`)
|
||||
|
||||
- Objective: Build the host's control panel HTML/CSS/JS page.
|
||||
- Files likely affected:
|
||||
- `src/stoned_ai/web.py` (inline HTML or template)
|
||||
- Risks: Voice selection dropdown must populate from the live Kokoro voice list. If the voice list is slow to load, display a loading state.
|
||||
- Exit criteria: Jason can open `/host`, start a session, pick voices, type and send a message, hear his voice, hear the AI's voice, and stop the session.
|
||||
|
||||
### Phase 4: Broadcast View (`/broadcast`)
|
||||
|
||||
- Objective: Build the clean, OBS-capturable broadcast page.
|
||||
- Files likely affected:
|
||||
- `src/stoned_ai/web.py` (inline HTML or template)
|
||||
- Risks: OBS browser source must auto-play audio. Verify OBS audio capture works with the WAV playback approach before marking complete.
|
||||
- Exit criteria: `/broadcast` shows only conversation cards. No controls are visible. OBS captures the page. Audio plays in OBS without manual permission prompts.
|
||||
|
||||
### Phase 5: Claude API Backend (Post-Launch)
|
||||
|
||||
- Objective: Add a Claude backend using the `anthropic` SDK as an alternative to Codex/Gemini.
|
||||
- Files likely affected:
|
||||
- `src/stoned_ai/ai.py`
|
||||
- `pyproject.toml` (add `anthropic` dependency)
|
||||
- Risks: Requires a valid `ANTHROPIC_API_KEY` environment variable on `svc-ai`. Must not break existing Codex/Gemini backends.
|
||||
- Exit criteria: The host view offers a Claude model option. A full conversation runs using the Claude API backend.
|
||||
|
||||
## Order Of Operations
|
||||
|
||||
1. Create `pyproject.toml` and package scaffold.
|
||||
2. Implement `tts.py` (Kokoro wrapper).
|
||||
3. Implement `clean.py` (noise stripping for Codex and Gemini).
|
||||
4. Implement `ai.py` (Codex and Gemini backends).
|
||||
5. Implement `web.py` — server core, session state, SSE stream, WAV serving.
|
||||
6. Implement `/host` view in `web.py`.
|
||||
7. Implement `/broadcast` view in `web.py`.
|
||||
8. Write `scripts/install.sh`.
|
||||
9. Smoke test: full end-to-end conversation from host view to broadcast view.
|
||||
10. Verify OBS browser source audio capture.
|
||||
|
||||
## Testing Expectations
|
||||
|
||||
- Unit tests: `tts.py` voice listing. `clean.py` noise stripping against fixture strings. `ai.py` CLI argument construction (mock subprocess).
|
||||
- Integration tests: Full SSE event sequence from host message submit to broadcast card render. Requires a live Codex or Gemini CLI.
|
||||
- Manual verification: OBS audio capture. Visual broadcast layout on stream. Per-speaker voice differentiation.
|
||||
|
||||
## Documentation Expectations
|
||||
|
||||
- `README.md` must be updated with usage instructions after Phase 2 is complete.
|
||||
- `docs/09-PROJECT-STATUS.md` must be updated after each phase completes.
|
||||
- `docs/06-WORKER-HANDOFF.md` must be updated before handing off to the implementation model.
|
||||
|
||||
## Escalation Conditions
|
||||
|
||||
- Stop and raise a change request if:
|
||||
- `pykokoro` cannot be imported without installing the full Arena package.
|
||||
- The Kokoro voice pipeline requires GPU on the current hardware and fails on CPU.
|
||||
- OBS cannot capture audio from a browser source pointing at `svc-ai` without additional configuration.
|
||||
- The Codex or Gemini CLI output format has changed in a way that breaks the cleaning engine.
|
||||
|
||||
## Signature
|
||||
|
||||
- Document role: governing
|
||||
- Created by: Claude (supervisor)
|
||||
- Created at: 2026-04-12
|
||||
- Revision status: initial
|
||||
- Future revision rule: this document may be revised only by the user or by an explicitly authorized supervisor revision
|
||||
51
docs/04-ACCEPTANCE-CRITERIA.md
Normal file
51
docs/04-ACCEPTANCE-CRITERIA.md
Normal file
@@ -0,0 +1,51 @@
|
||||
# Acceptance Criteria
|
||||
|
||||
## Functional Criteria
|
||||
|
||||
- Criterion 1: The host can open `/host`, start a session, type a message, and submit it.
|
||||
- Criterion 2: The host's typed message is synthesized to WAV audio using the host's selected Kokoro voice and played back.
|
||||
- Criterion 3: After the host submits, the AI backend generates a response and it is synthesized to WAV audio using the AI's selected Kokoro voice and played back.
|
||||
- Criterion 4: Both the host message and the AI response appear as conversation cards in the feed on both `/host` and `/broadcast`.
|
||||
- Criterion 5: The `/broadcast` view contains no interactive controls — only the scrolling conversation feed and audio playback.
|
||||
- Criterion 6: OBS Studio can use `/broadcast` as a browser source and capture the conversation cards and audio.
|
||||
- Criterion 7: The host and AI each have an independently selectable Kokoro voice.
|
||||
- Criterion 8: The host can stop an in-progress session from the host view.
|
||||
- Criterion 9: A new session can be started after a previous one ends without restarting the server.
|
||||
|
||||
## Non-Functional Criteria
|
||||
|
||||
- Performance: TTS synthesis must begin immediately after a turn is received. The next card must not appear until the current turn's audio has finished playing.
|
||||
- Reliability: The server must not crash if the AI backend times out. A timeout must surface as an error card in the feed, not a silent hang.
|
||||
- Security: The server is local-network only (`svc-ai`). No authentication is required. The WAV serving path must be sandboxed to the session audio directory to prevent path traversal.
|
||||
- Maintainability: AI backends must be swappable without changes to the web server or TTS layer. Adding Claude in Phase 2 must require changes only to `ai.py` and `pyproject.toml`.
|
||||
|
||||
## Documentation Criteria
|
||||
|
||||
- `README.md` updated with installation and usage instructions.
|
||||
- `docs/09-PROJECT-STATUS.md` updated to reflect completed phases.
|
||||
- `docs/06-WORKER-HANDOFF.md` updated with current build state before each implementation handoff.
|
||||
|
||||
## Validation Criteria
|
||||
|
||||
- Test evidence required: unit tests for `tts.py` voice listing, `clean.py` noise filtering, `ai.py` CLI argument construction.
|
||||
- Review evidence required: supervisor review of Phase 1 and Phase 2 output before Phase 3 begins.
|
||||
- Manual validation required: OBS browser source audio capture verified by Jason.
|
||||
|
||||
## Definition Of Done
|
||||
|
||||
The project is done (Phase 1–4 complete) only when:
|
||||
|
||||
- all functional criteria above are satisfied
|
||||
- the server runs stably on `svc-ai`
|
||||
- OBS can capture the broadcast view end-to-end
|
||||
- Jason has completed at least one test conversation from start to finish using the host view and the broadcast view simultaneously
|
||||
- required documentation is updated
|
||||
- supervisor review is complete
|
||||
|
||||
## Signature
|
||||
|
||||
- Document role: governing
|
||||
- Created by: Claude (supervisor)
|
||||
- Created at: 2026-04-12
|
||||
- Revision status: initial
|
||||
- Future revision rule: this document may be revised only by the user or by an explicitly authorized supervisor revision
|
||||
56
docs/05-RISK-REGISTER.md
Normal file
56
docs/05-RISK-REGISTER.md
Normal file
@@ -0,0 +1,56 @@
|
||||
# Risk Register
|
||||
|
||||
## Risks
|
||||
|
||||
### Risk 1
|
||||
|
||||
- Risk: `pykokoro` is installed in Arena's virtualenv only. Stoned.AI may not be able to import it without installing Arena as a dependency or duplicating the virtualenv.
|
||||
- Impact: High — TTS is a core feature. Without it the project cannot function.
|
||||
- Likelihood: Medium — the Arena install script installs `pykokoro` into a project-local venv, not system-wide.
|
||||
- Mitigation: Add `pykokoro` as a direct dependency in `pyproject.toml`. Install into a fresh project-local venv. Verify the import works independently before proceeding with the TTS layer.
|
||||
- Owner: Implementation model (Phase 1).
|
||||
- Trigger: `import pykokoro` fails in the Stoned.AI venv.
|
||||
|
||||
### Risk 2
|
||||
|
||||
- Risk: OBS Studio browser source does not auto-play audio without user interaction, blocked by browser autoplay policy.
|
||||
- Impact: High — audio playback in the broadcast view is a core feature.
|
||||
- Likelihood: Medium — OBS uses Chromium internally and has its own audio handling. This is a known issue for some browser source setups.
|
||||
- Mitigation: Test OBS audio capture during Phase 4 before marking the phase complete. If autoplay is blocked, investigate OBS browser source audio settings (`Enable JavaScript`, `Allow Plugins`, global audio device assignment in OBS scene settings).
|
||||
- Owner: Jason (manual verification) with implementation support.
|
||||
- Trigger: Audio does not play in OBS browser source during Phase 4 testing.
|
||||
|
||||
### Risk 3
|
||||
|
||||
- Risk: Kokoro synthesis is slow on CPU (no GPU on `svc-ai`), causing noticeable latency between turn submission and audio playback.
|
||||
- Impact: Medium — does not break functionality but degrades the live stream experience.
|
||||
- Likelihood: Medium — Kokoro is rated CPU-viable but synthesis time varies by voice and text length.
|
||||
- Mitigation: Gate the next card rendering until audio is ready (already the Arena pattern). Keep host messages and AI responses concise. If latency is unacceptable, investigate Piper TTS as a faster CPU fallback.
|
||||
- Owner: Implementation model (Phase 2), Jason (evaluation).
|
||||
- Trigger: Synthesis takes more than 5 seconds for a typical 2–3 sentence response.
|
||||
|
||||
### Risk 4
|
||||
|
||||
- Risk: AI backend CLI output format changes (Codex or Gemini banner updates) causing the cleaning engine to miss noise or strip dialogue.
|
||||
- Impact: Medium — visible noise in the conversation feed degrades the stream.
|
||||
- Likelihood: Low — CLI tools update occasionally, but not frequently.
|
||||
- Mitigation: Maintain a robust `clean.py` based on Arena's proven patterns. Add a debug mode flag that shows raw output for troubleshooting.
|
||||
- Owner: Implementation model.
|
||||
- Trigger: Conversation feed shows CLI banner lines or token counts.
|
||||
|
||||
### Risk 5
|
||||
|
||||
- Risk: The host and broadcast views share an SSE stream. If the broadcast view reconnects (e.g. OBS browser source refresh), it may miss turns that occurred during the gap.
|
||||
- Impact: Low — the broadcast feed would fall behind but would recover on the next turn.
|
||||
- Likelihood: Low — OBS browser sources are generally stable once connected.
|
||||
- Mitigation: On SSE reconnect, replay the current session's conversation history as catch-up events before resuming the live stream. This is optional for the initial build.
|
||||
- Owner: Implementation model (Phase 2, optional enhancement).
|
||||
- Trigger: OBS browser source shows an incomplete conversation after a refresh.
|
||||
|
||||
## Signature
|
||||
|
||||
- Document role: governing
|
||||
- Created by: Claude (supervisor)
|
||||
- Created at: 2026-04-12
|
||||
- Revision status: initial
|
||||
- Future revision rule: this document may be revised only by the user or by an explicitly authorized supervisor revision
|
||||
113
docs/06-WORKER-HANDOFF.md
Normal file
113
docs/06-WORKER-HANDOFF.md
Normal file
@@ -0,0 +1,113 @@
|
||||
# Worker Handoff
|
||||
|
||||
## Instructions For Implementation Models
|
||||
|
||||
You are an implementation model operating under a supervisor-approved plan for the Stoned.AI project.
|
||||
|
||||
## You Must
|
||||
|
||||
- follow the governing docs exactly
|
||||
- implement only the approved scope (Phases 1–4 as defined in `03-IMPLEMENTATION-PLAN.md`)
|
||||
- report conflicts instead of improvising policy changes
|
||||
- keep changes aligned to the acceptance criteria in `04-ACCEPTANCE-CRITERIA.md`
|
||||
- preserve architecture decisions in `02-ARCHITECTURE-PLAN.md` unless a change request is approved
|
||||
|
||||
## You Must Not
|
||||
|
||||
- rewrite governing docs
|
||||
- change scope on your own
|
||||
- add features not listed in the implementation plan
|
||||
- weaken constraints (e.g. do not add microphone input, do not skip the broadcast view)
|
||||
- invent acceptance criteria
|
||||
|
||||
## Inputs You Should Read First
|
||||
|
||||
1. `00-GOVERNANCE-RULES.md`
|
||||
2. `01-PROJECT-CHARTER.md`
|
||||
3. `02-ARCHITECTURE-PLAN.md`
|
||||
4. `03-IMPLEMENTATION-PLAN.md`
|
||||
5. `04-ACCEPTANCE-CRITERIA.md`
|
||||
6. `05-RISK-REGISTER.md`
|
||||
|
||||
## Critical Context
|
||||
|
||||
### What This Project Is
|
||||
|
||||
Stoned.AI is a live-streamed, unscripted conversation show. One human host (Jason) types his side. One AI generates responses. **Both sides are voiced via local Kokoro TTS.** The conversation displays as scrolling cards in a browser.
|
||||
|
||||
There are two browser views:
|
||||
- `/host` — Jason's control panel (input, voice selection, session control)
|
||||
- `/broadcast` — Clean OBS-capturable feed (no controls, cards and audio only)
|
||||
|
||||
### What Already Exists (Do Not Rebuild)
|
||||
|
||||
The Arena project at `/home/svc-admin/ai-projects/projects/arena` contains:
|
||||
- A working Kokoro TTS backend: `src/arena/tts.py` — class `ArenaTTSManager`
|
||||
- WAV file generation, session audio directories, path safety logic
|
||||
- Cleaning engine patterns: `src/arena/clean.py`
|
||||
- Proven SSE delivery pattern: `src/arena/web.py`
|
||||
|
||||
**Reuse these patterns.** Do not reinvent Kokoro integration from scratch. Import or copy the relevant code.
|
||||
|
||||
### Package Layout
|
||||
|
||||
```text
|
||||
stoned-ai/
|
||||
├── pyproject.toml
|
||||
├── README.md
|
||||
├── scripts/
|
||||
│ └── install.sh
|
||||
├── src/
|
||||
│ └── stoned_ai/
|
||||
│ ├── __init__.py
|
||||
│ ├── ai.py — AI backend (Codex, Gemini)
|
||||
│ ├── clean.py — CLI noise stripping
|
||||
│ ├── tts.py — Kokoro TTS wrapper
|
||||
│ └── web.py — HTTP server, SSE, host and broadcast views
|
||||
└── tests/
|
||||
```
|
||||
|
||||
### Entry Point
|
||||
|
||||
`pyproject.toml` should define:
|
||||
```toml
|
||||
[project.scripts]
|
||||
stoned-web = "stoned_ai.web:main"
|
||||
```
|
||||
|
||||
### AI Backends
|
||||
|
||||
Phase 1 requires Codex and Gemini CLI backends only.
|
||||
|
||||
Codex call pattern (from Arena):
|
||||
```
|
||||
codex exec --skip-git-repo-check --color never -o <output_file> <prompt>
|
||||
```
|
||||
|
||||
Gemini call pattern (from Arena):
|
||||
```
|
||||
gemini -p <prompt>
|
||||
```
|
||||
|
||||
### TTS Path
|
||||
|
||||
Generated WAV files live under:
|
||||
```
|
||||
/opt/models/arena-voices/generated/session-<id>/
|
||||
```
|
||||
This path is already used by Arena. Use the same root to avoid duplication.
|
||||
|
||||
### Known Environment
|
||||
|
||||
- Host: `svc-ai`
|
||||
- Python: 3.12
|
||||
- Kokoro models: `/opt/models/kokoro/cache`
|
||||
- Arena venv (reference only): `/home/svc-admin/ai-projects/projects/arena/.venv`
|
||||
|
||||
## Output Expectations
|
||||
|
||||
After each phase, report:
|
||||
- what was changed
|
||||
- what was not changed
|
||||
- what remains blocked or needs escalation
|
||||
- any change requests needed
|
||||
60
docs/07-REVIEW-CHECKLIST.md
Normal file
60
docs/07-REVIEW-CHECKLIST.md
Normal file
@@ -0,0 +1,60 @@
|
||||
# Review Checklist
|
||||
|
||||
## Review Goal
|
||||
|
||||
Compare the implementation against the governing docs. Confirm the build matches the charter, architecture, and acceptance criteria before Jason signs off.
|
||||
|
||||
## Check These First
|
||||
|
||||
- Does the implementation match the charter? (One human, one AI, both voiced, two browser views)
|
||||
- Does it respect architecture decisions? (Reused Kokoro backend, separate host/broadcast URLs, CLI backends first)
|
||||
- Does it remain inside scope? (No microphone, no STT, no Claude API yet)
|
||||
- Are acceptance criteria satisfied? (See `04-ACCEPTANCE-CRITERIA.md` for full list)
|
||||
- Are risks handled or explicitly accepted? (See `05-RISK-REGISTER.md`)
|
||||
|
||||
## Technical Review
|
||||
|
||||
- Correctness:
|
||||
- Does `/host` accept host text input and submit it to the server?
|
||||
- Does the server call the AI backend and return a clean response?
|
||||
- Does TTS synthesis run for both host and AI turns?
|
||||
- Are WAV files served correctly and played in sequence?
|
||||
- Do both `/host` and `/broadcast` receive SSE events?
|
||||
|
||||
- Behavioral regressions:
|
||||
- Does a timeout on the AI backend produce a visible error, not a hang?
|
||||
- Can the host start a new session after ending a previous one without restarting the server?
|
||||
- Does the broadcast view contain zero interactive elements?
|
||||
|
||||
- Missing tests:
|
||||
- Is there a unit test for `clean.py` noise stripping?
|
||||
- Is there a unit test for `ai.py` CLI argument construction?
|
||||
- Is there a unit test for `tts.py` voice listing?
|
||||
|
||||
- Hidden complexity:
|
||||
- Is session state thread-safe?
|
||||
- Is the WAV serving path sandboxed to the session directory?
|
||||
|
||||
- Security concerns:
|
||||
- Path traversal: can a crafted URL escape the generated audio directory?
|
||||
|
||||
- Operational concerns:
|
||||
- Is `scripts/install.sh` present and functional?
|
||||
- Does `stoned-web` start cleanly and bind to `0.0.0.0:8766` (or similar, not conflicting with Arena's 8765)?
|
||||
|
||||
## Documentation Review
|
||||
|
||||
- `README.md` updated with install and usage instructions: yes/no
|
||||
- `docs/09-PROJECT-STATUS.md` updated to reflect completed phases: yes/no
|
||||
- `docs/06-WORKER-HANDOFF.md` updated with current build state: yes/no
|
||||
|
||||
## Final Decision
|
||||
|
||||
- Accept
|
||||
- Reject
|
||||
- Request revision
|
||||
|
||||
## Notes
|
||||
|
||||
- Findings:
|
||||
- Required follow-up:
|
||||
33
docs/08-CHANGE-REQUEST.md
Normal file
33
docs/08-CHANGE-REQUEST.md
Normal file
@@ -0,0 +1,33 @@
|
||||
# Change Request
|
||||
|
||||
## Summary
|
||||
|
||||
- Proposed change: (none pending)
|
||||
|
||||
## Reason
|
||||
|
||||
- Why is the current governing plan insufficient or wrong?
|
||||
|
||||
## Requested Document Changes
|
||||
|
||||
- Document:
|
||||
- Proposed revision:
|
||||
|
||||
## Impact
|
||||
|
||||
- Scope impact:
|
||||
- Architecture impact:
|
||||
- Risk impact:
|
||||
- Testing impact:
|
||||
- Timeline impact:
|
||||
|
||||
## Recommendation
|
||||
|
||||
- Approve
|
||||
- Reject
|
||||
- Defer
|
||||
|
||||
## Approval
|
||||
|
||||
- User decision:
|
||||
- Supervisor recommendation:
|
||||
38
docs/09-PROJECT-STATUS.md
Normal file
38
docs/09-PROJECT-STATUS.md
Normal file
@@ -0,0 +1,38 @@
|
||||
# Project Status
|
||||
|
||||
## Current Status
|
||||
|
||||
- Phase: Pre-implementation
|
||||
- State: Governance scaffold complete. Ready for implementation handoff.
|
||||
- Last updated: 2026-04-12
|
||||
|
||||
## Completed
|
||||
|
||||
- Project directory created
|
||||
- Governance documentation written (docs 00–09)
|
||||
- Git initialized
|
||||
- README and .gitignore created
|
||||
|
||||
## In Progress
|
||||
|
||||
- Nothing. Awaiting implementation model handoff.
|
||||
|
||||
## Blocked
|
||||
|
||||
- Nothing currently blocked.
|
||||
|
||||
## Next Actions
|
||||
|
||||
1. Hand off to implementation model with `06-WORKER-HANDOFF.md` as the entry point.
|
||||
2. Implementation model completes Phase 1 (scaffold + TTS + AI + clean).
|
||||
3. Supervisor reviews Phase 1 output.
|
||||
4. Implementation model proceeds to Phase 2 (web server + SSE).
|
||||
5. Continue through Phase 3 (host view) and Phase 4 (broadcast view).
|
||||
6. Jason tests end-to-end with OBS.
|
||||
7. Create Gitea repo (`AccursedBinkie/stoned-ai`) and push baseline.
|
||||
|
||||
## Notes
|
||||
|
||||
- Port assignment: use `8766` to avoid conflict with Arena's default `8765`.
|
||||
- The Arena project at `/home/svc-admin/ai-projects/projects/arena` is the reference implementation for TTS, SSE, and cleaning patterns.
|
||||
- Gitea remote has not been created yet. Do not push until the Gitea repo exists.
|
||||
Reference in New Issue
Block a user