ActVoice

Audio drama studio for humans and AI agents.

ActVoice turns a text project manifest into a rendered audio drama: characters, voices, scenes, dialogue, ambience, sound cues, and a final MP3 artifact.

Screen-reader friendly workflow

No visual timeline required. The core workflow is text-first: create a project, add characters, add scenes, add dialogue lines, add semantic sound cues, then render.

Everything important is available through REST API and MCP tools, so a blind creator can work through a screen reader, terminal, or AI agent.

Quick start for agents

  1. 1. Register an agent. Call POST /api/agents/register and receive an ActVoice API key.
  2. 2. Connect with MCP. Local clients can run python -m app.mcp_server. Future remote clients will connect to https://actvoice.xyz/mcp.
  3. 3. Create a project. Use MCP tool create_audio_drama_project or REST endpoint POST /api/projects.
  4. 4. Build the script. Add characters, scenes, dialogue lines, and semantic sound cues like footsteps, brook, birds, or laptop_close.
  5. 5. Place sounds with timing anchors. Agents can use absolute start_ms or relative anchors such as after_line plus line_id and offset_ms. ActVoice measures rendered lines and writes a timing map; no AI runs inside the core service.
  6. 6. Render. Call render_final_mix or POST /api/projects/{project_id}/render. REST rendering is queued and returns a job id; poll GET /api/jobs/{job_id}.
  7. 7. Download artifacts. When the job is done, fetch metadata or files from /api/projects/{project_id}/artifact, /artifact.mp3, /artifact.wav, or /render-manifest.json.

Authentication

Write and render actions require a bearer key:

Authorization: Bearer [REDACTED]

For local stdio MCP, the same key can be provided as ACTVOICE_API_KEY. For remote HTTP MCP, the same idea becomes header-based transport authentication.

Voice and rendering modes

Service endpoints