Crabyard.ai Spec
Status: draft. Deployed subset: Cloudflare Worker, D1/Kysely persistence, GitHub OAuth, admin allowlists, card/run state, repo workflow evaluation, issue/PR previews, diffs, and Ghostty WASM log grid.
Crabyard.ai is a Cloudflare-native control plane for running Codex sessions in cloud workspaces. It gives OpenClaw maintainers a Linear-like board where each card represents an intent, a live run, and its durable history.
Decisions
- Product name: Crabyard.ai.
- Primary object: card.
- UI direction: Linear-like, minimal, dense, subtle crustacean branding.
- Access: OpenClaw GitHub org plus admin-managed allowlists for users/teams and repos.
- Runtime default: Cloudflare Containers for jobs that fit; Crabbox for manual Codex, VNC, testing, heavier jobs, and performance.
- Interactive mode: full Codex CLI in browser through Ghostty WebAssembly.
- Autonomous mode: Codex app-server where structured turns/events are better than terminal scraping.
- Logs: 30-day retention by default.
- Secrets: per OpenClaw org, never stored in D1/R2/log bodies.
- Merge: Crabyard may merge directly; ClawSweeper remains available and preferred for review/fix/automerge loops.
- Runtime cap: configurable, default 20 concurrent Codex runs per org.
- GitHub comments: Crabyard does not post progress comments by default.
- VNC: available only for Crabbox-backed leases.
- Repo-owned workflow file: optional
CRABYARD.md.
Goals
- Start a Codex from a prompt, GitHub issue, GitHub PR, or issue/PR picker.
- Watch active Codex sessions live.
- Attach to a full Codex CLI through a browser terminal.
- Take over an active session when needed.
- Create PRs, fix PRs, merge PRs, or hand off to ClawSweeper depending on policy.
- Persist logs, terminal replay, session events, diffs, and run metadata.
- Keep access restricted to allowlisted OpenClaw org members and allowlisted repos.
- Run fully on Cloudflare for the product/control plane.
Current Implementation Status
Implemented now:
- Empty-by-default D1-backed board with Todo, Running, Human Review, Done.
- GitHub OAuth, bootstrap login, org membership check, team/user allowlists, repo allowlists.
- Prompt cards and GitHub
#numberpreviews across enabled repos. - Optional titles derived from prompt.
- D1
run_attemptswith heartbeat, stall reconciliation, runtime selection reason, lease fields, operator intent, and runtime capabilities. - Runtime adapter descriptor for Container and Crabbox policy.
- Ghostty WASM fullscreen session grid with D1 event replay and text fallback.
CRABYARD.mdfetch/parse/evaluate admin surface.- Card diff metadata and compact patch preview.
- Worker docs route at
/docs/plus GitHub Pages docs.
Not wired yet:
- Real Cloudflare Container or Crabbox lease creation.
- Live PTY/app-server transport and interactive stdin.
- R2 terminal/artifact archival.
- Durable Object WebSocket fanout.
- Direct merge execution or ClawSweeper handoff.
Non-Goals
- Rebuild Crabbox machine leasing.
- Rebuild ClawSweeper's review/automerge safety loop.
- Post progress comments to GitHub by default.
- Replace GitHub Issues or PRs as source systems.
- Build a generic CI provider.
Product Model
Card is the primary object.
Card source types:
- Freeform prompt
- GitHub issue
- GitHub PR
- Picker result from issue/PR hot search
Each card owns:
- Prompt or source reference
- Repo target
- Lane/state
- Merge policy
- Runtime preference
- Run attempts
- Logs and artifacts
- Current operator/takeover state
Cards may produce:
- Branch
- PR
- Review result
- Merge
- Human handoff
- Failure/stall record
Card policy fields:
- Runtime mode:
auto,container,crabbox - Interaction mode:
cli,app_server,auto - Merge policy:
open_pr,merge_when_green,fix_until_green_and_merge - Automerge enabled: boolean
- Max runtime override
- Retry limit override
- Required validation override
Card source rules:
- Freeform prompt cards require repo selection before running.
- Issue cards inherit repo and issue metadata from GitHub.
- PR cards inherit repo, PR number, head SHA, base branch, and mergeability metadata.
- Picker-created cards should store the original GitHub URL and normalized IDs.
- Batch cards are deferred; MVP cards target one repo and one source item.
Board
Design: Linear-like, minimal, dense, subtle crustacean branding.
Default visible lanes:
- Backlog
- Todo
- Running
- Human Review
Default hidden lanes:
- Rework
- Merging
- Done
- Canceled
- Duplicate
Lane means workflow state. Automerge is not a lane.
Automerge is a card policy:
open_prmerge_when_greenfix_until_green_and_merge
The UI shows merge policy as a chip on the card. Subtle claw/check icon when armed.
Card rendering:
- Title
- Source badge: prompt, issue, PR
- Repo badge
- Status chip
- Merge policy chip when non-default
- Runtime chip when not
auto - Active timer while running
- Last event summary
- Attach button when live
- PR link when available
Primary views:
- Board
- Card detail drawer
- Run detail drawer
- Terminal view
- Logs/replay view
- Admin view
Top bar:
- Hot search for issues/PRs/cards
- New prompt
- Active Codex count
- Queue count
- Filter
- Display options
Branding:
- Small shell/claw mark.
- Accent colors only for active status/merge intent.
- No cartoon shellfish UI.
- Product language remains professional.
Runtime Modes
Default runtime:
- Cloudflare Containers when the job fits.
- Crabbox for testing, manual Codex, VNC, heavier jobs, or higher performance.
Runtime mode choices:
container: Cloudflare Containercrabbox: Crabbox leaseauto: scheduler chooses
Manual Codex preference:
- Full Codex CLI in a browser terminal.
- Ghostty WebAssembly terminal receives PTY bytes.
- App-server PTY APIs or a Crabbox-side PTY bridge provide stdin/output/resize.
Autonomous Codex preference:
- Codex app-server
thread/startandturn/start. - Structured app-server events feed the run log.
turn/steerandturn/interruptsupport live steering/takeover.
Runtime selection:
- If card requires VNC, choose Crabbox.
- If card is manual CLI takeover first, prefer Crabbox unless container PTY is explicitly allowed.
- If repo/job fits configured container limits, choose Cloudflare Container.
- If validation requires unsupported system dependencies, choose Crabbox.
- If org cap is reached, leave card queued.
Cloudflare Container expectations:
- Image includes Codex CLI, Git, GitHub CLI, Node/pnpm, common build tools.
- Container starts app-server or PTY bridge on a local authenticated port.
- Worker/RunDO talks to the container only through controlled bindings/routes.
- Runtime filesystem may be disposable; durable history is in R2/D1.
Crabbox expectations:
- Crabyard requests a lease from Crabbox.
- Crabbox owns provider credentials, SSH, VNC/noVNC, lease lifecycle, and cleanup.
- Crabyard records the Crabbox lease ID and uses Crabbox attach/VNC affordances.
- Crabyard does not duplicate Crabbox provider logic.
Attach Semantics
Attach modes:
- Watch: user sees live terminal/events, no input.
- Take over: user can type into the Codex CLI PTY session or steer app-server turn.
When a user takes over:
- Record operator, time, and mode.
- Continue log persistence.
- Autonomous loop may pause or continue depending on card policy.
- Default: pause autonomous auto-advance while operator has control.
VNC
VNC is available only for Crabbox-backed leases.
VNC is post-MVP unless Crabbox already exposes it for a lease.
Expected flow:
- Card/run detail shows VNC button when lease supports desktop.
- Browser uses Crabbox noVNC/bridge path.
- Crabyard records VNC attach events but does not proxy pixels unless needed.
Auth
Login:
- GitHub OAuth.
- Bootstrap token login for first admin setup and break-glass recovery.
- GitHub OAuth sessions are short-lived verified sessions; OAuth tokens are not stored in D1.
- Bootstrap sessions are short-lived and bound to the current bootstrap token hash.
Access gate:
- User must be in OpenClaw GitHub org.
- User must be allowlisted directly or through an allowlisted GitHub team.
Admin UI:
- Manage allowed GitHub users/teams.
- Manage allowed repos.
- Manage org runtime caps.
- Manage merge permissions.
- Writes persist through the Worker API to D1; browser storage is not authoritative.
Repo gate:
- Only allowlisted repos appear in search/picker.
- Only allowlisted repos can start cards/runs.
- GitHub webhooks ignored unless repo is allowlisted.
Admin roles:
- Owner: manage org settings, users/teams, repos, caps, secrets, merge policy.
- Maintainer: create cards, start/stop/take over runs, approve direct merge if allowed.
- Viewer: watch board, open logs, attach read-only.
Admin UI sections:
- Users and teams
- Repos
- Runtime caps
- Merge policy
- Secrets status
- Audit log
Secrets
Secrets are per org.
Secret types:
- Codex auth / agent identity
- GitHub app/private key/token material
- Crabbox broker credentials
- Provider/runtime credentials when needed
Storage:
- Cloudflare Secrets for Worker-level secrets.
- Org secret references in D1.
- No secret values in D1, logs, R2, or card metadata.
Policy:
- Runs receive only scoped secrets required for selected repo/runtime.
- Secret usage is logged as redacted metadata.
Secret delivery:
- Worker reads secret by binding/reference.
- Dispatcher passes a scoped session token or ephemeral lease credential to runtime.
- Runtime never receives broad Cloudflare credentials.
- Logs redact common token shapes and known secret labels before R2 writes.
Merge Authority
Crabyard can merge directly when policy allows.
Merge paths:
- Direct Crabyard merge
- ClawSweeper handoff
- Manual human merge
Direct merge guardrails:
- Repo allowlisted.
- User/org policy allows direct merge.
- Exact head SHA known.
- Required checks green.
- Branch up to date or merge queue accepted.
- No active takeover.
- No unresolved required review state.
- Merge action logged.
Direct merge sequence:
- Refresh PR state from GitHub.
- Verify repo/user/policy allow direct merge.
- Verify exact head SHA matches reviewed/validated SHA.
- Verify required checks and required review state.
- Verify branch update/merge queue requirements.
- Perform merge through GitHub App credentials.
- Record audit event with actor, PR, SHA, checks snapshot, and merge commit.
- Move card to
Done.
ClawSweeper handoff:
- Preferred for
fix_until_green_and_merge. - Preferred when PR needs review/fix loop.
- Crabyard sends intent; ClawSweeper owns deterministic GitHub mutation sequence.
Crabyard direct merge should stay conservative. If any guardrail is unknown, block direct merge and offer ClawSweeper handoff or human review.
Logs And Retention
Retention: 30 days by default.
Persist:
- Run events
- App-server JSON-RPC events
- Terminal stdout/stderr/PTY stream
- Operator attach/takeover events
- GitHub actions
- Lease lifecycle events
- Final summaries
- Artifact references
Storage:
- D1: searchable event index and metadata.
- R2: full logs, terminal replay, transcripts, diffs, screenshots, artifacts.
After 30 days:
- Delete R2 log bodies/artifacts by lifecycle job.
- Keep minimal D1 run summary unless admin policy deletes all history.
Log streams:
events.ndjson: normalized lifecycle/app events.terminal.raw: PTY byte stream for replay.terminal.ndjson: timestamped decoded chunks.app-server.ndjson: JSON-RPC messages and notifications with secrets redacted.summary.json: final status, timings, artifacts, PR links.
Replay requirements:
- Completed sessions remain readable until retention expiry.
- Terminal replay preserves timing enough to scrub through a run.
- Search index stores compact event text, not full raw logs.
Resilience
Default org cap:
- 20 concurrent Codex runs.
Caps are configurable:
- Max concurrent runs
- Max runtime per run
- Max retries
- Max terminal idle time
- Max log bytes per run
- Runtime class allowlist
Stall handling:
- Detect no output/event heartbeat.
- Mark run
stalled. - Attempt graceful interrupt/terminate.
- Retry according to card policy.
- Preserve workspace when useful.
- Show stall reason and last activity in UI.
Run lifecycle:
- Queued
- Leasing
- Bootstrapping
- Running
- Waiting
- TakingOver
- Stalled
- Retrying
- Succeeded
- Failed
- Canceled
- Merging
- Done
Retry policy:
- Clean exit with active card state may enqueue continuation.
- Transient runtime failure retries with exponential backoff.
- Validation failure does not retry blindly unless policy says repair.
- Stalls retry only after workspace/log preservation.
- Manual cancel does not retry.
Heartbeat sources:
- App-server notification
- PTY output
- Runtime heartbeat
- Crabbox lease heartbeat
- Container health probe
Workflow File
Crabyard may support repo-owned workflow config.
Recommended name: CRABYARD.md.
Purpose:
- Version the repo's Codex operating policy with the code.
- Provide prompt template.
- Define validation commands.
- Define branch/PR conventions.
- Define merge/review policy hints.
- Define runtime preference and caps.
It is optional for MVP. Dashboard/org settings provide defaults.
Example:
---
runtime:
default: auto
allow:
- container
- crabbox
validation:
required:
- pnpm check
- pnpm test
merge:
default_policy: open_pr
allow_direct_merge: false
codex:
mode: cli
model: gpt-5.5
reasoning_effort: high
---
You are working in this repository for Crabyard.
Read AGENTS.md first.
Keep changes scoped.
Open a PR unless the card explicitly allows direct merge.
Resolution order:
- Card explicit settings
- Repo
CRABYARD.md - Org defaults
- Product defaults
Invalid CRABYARD.md:
- Do not block dashboard access.
- Show validation error in repo settings.
- Ignore invalid file for new runs and use org defaults.
Tech Stack
Language/tooling:
- TypeScript strict
- pnpm
- oxlint
- oxfmt
- Valibot
- Vitest
- Playwright
Frontend/app:
- TanStack Start
- React
- TanStack Router
- TanStack Query
- Tailwind
- Radix primitives where useful
Cloudflare:
- Workers
- Durable Objects
- D1
- R2
- Queues
- Workflows
- Containers
- Service bindings
Runtime integrations:
- Codex app-server JSON-RPC
- Codex CLI in PTY
- Ghostty WebAssembly terminal
- Crabbox
- ClawSweeper
- GitHub App/OAuth
Package Layout
apps/
web/ TanStack Start app
control/ Worker API, Durable Objects, Queues
packages/
schema/ Valibot schemas and inferred types
ui/ Crabyard UI components
github/ GitHub adapters, issue/PR search
crabbox/ Crabbox client
codex-app-server/ JSON-RPC client and generated protocol types
workflow/ CRABYARD.md parser/evaluator
Rule:
- TanStack Start handles app shell and UI-adjacent server functions.
- Control Worker owns orchestration.
- Durable Objects own live session state.
- D1 owns canonical metadata.
- R2 owns bulky history.
Data Model
Core D1 tables:
orgs: GitHub org, display name, caps, defaults.users: GitHub user ID/login, display name.org_users: org membership, role, allowlist source.teams: GitHub team ID/slug allowlist entries.repos: GitHub repo ID/name, allowlist status, defaults.cards: source, title, prompt, repo, lane, policy, position.runs: card ID, attempt, runtime, status, timestamps, lease/container refs.run_events: compact searchable event index.artifacts: R2 keys and retention metadata.github_refs: issue/PR normalized metadata and cache timestamps.audit_events: admin, merge, secret-use, takeover, and policy changes.
Durable Object state:
- BoardDO keeps hot board ordering and connected viewers.
- RunDO keeps live run status, attached clients, stream cursors, heartbeat, and current process/thread refs.
R2 layout:
orgs/{org}/runs/{run_id}/events.ndjson
orgs/{org}/runs/{run_id}/terminal.raw
orgs/{org}/runs/{run_id}/terminal.ndjson
orgs/{org}/runs/{run_id}/app-server.ndjson
orgs/{org}/runs/{run_id}/summary.json
orgs/{org}/runs/{run_id}/artifacts/{name}
API Surface
Representative endpoints:
GET /api/boards/:repoPOST /api/cardsPATCH /api/cards/:idPOST /api/cards/:id/startPOST /api/cards/:id/cancelPOST /api/cards/:id/mergePOST /api/cards/:id/clawsweeperGET /api/runs/:idGET /api/runs/:id/eventsGET /api/runs/:id/logsGET /api/search/github?q=...GET /api/admin/usersPUT /api/admin/users/:loginGET /api/admin/reposPUT /api/admin/repos/:owner/:repo
WebSockets:
/ws/board/:board_id/ws/run/:run_id/ws/terminal/:run_id
Service bindings:
- Web app calls Control Worker.
- Control Worker calls BoardDO/RunDO.
- Dispatcher calls runtime adapters.
- MergeController calls GitHub and ClawSweeper bridge.
Run Flow
Freeform prompt:
- User creates card with prompt and repo.
- Card enters
Todo. - Scheduler claims card when capacity exists.
- Dispatcher chooses runtime.
- Runtime checks out repo.
- Codex starts in CLI PTY or app-server mode.
- RunDO streams events/logs to browser and R2.
- Codex creates branch/PR or returns result.
- Card moves to
Human Review,Merging,Done, orRework.
Issue:
- User picks issue.
- Prompt is built from issue title/body/comments plus repo policy.
- Codex works from new branch.
- Result is PR or blocker.
PR:
- User picks PR.
- Runtime checks out PR head.
- Codex reviews/fixes/rebases according to card policy.
- Crabyard direct merge or ClawSweeper handoff handles merge.
Manual takeover:
- User attaches to live run.
- Watch mode streams only.
- Takeover mode grants input.
- Run records takeover.
- Auto-advance pauses by default.
GitHub Integration
Use GitHub OAuth for login and GitHub App for repo operations.
GitHub App permissions, draft:
- Metadata: read
- Contents: read/write
- Issues: read
- Pull requests: read/write
- Checks/statuses: read
- Actions: read
- Commit statuses: read
Optional/conditional:
- Administration: avoid unless required.
- Members: use OAuth/org APIs for membership where possible.
Picker behavior:
- Search allowlisted repos only.
- Support
repo#123,owner/repo#123, issue URL, PR URL, title text. - Multi-select can create multiple cards; batch card deferred.
Deployment
Target domain:
https://crabyard.openclaw.ai
Current DNS expectation:
- Domain must be in Cloudflare or CNAME/A routed through Cloudflare before custom domain deploy is complete.
- If domain remains at another DNS provider, deploy can still publish a workers.dev URL but not final production domain.
Cloudflare account:
- Use
services@openclaw.org. - Deployment should use a scoped Cloudflare API token, not a personal browser session.
Worker names:
crabyard-webcrabyard-control
Initial deployable artifact:
- Static/spec site can ship first.
- Product app replaces it when TanStack Start scaffold lands.
Required deploy checks:
npx wrangler whoamishows expected Cloudflare account.crabyard.openclaw.aiis routed to deployed Worker.curl -I https://crabyard.openclaw.ai/healthzreturns 200./docs/specrenders this spec.
Security
- All state-changing API calls require authenticated user and org allowlist.
- Repo operations require repo allowlist.
- Direct merge requires maintainer role and repo policy.
- Runtime tokens are scoped and short-lived.
- WebSocket attach checks authorization at connect and on takeover.
- Takeover creates audit event.
- R2 object keys are unguessable and never public by default.
- Admin changes are audit logged.
- Secrets are never echoed to browser clients.
Observability
Dashboard metrics:
- Active Codex count
- Queue count
- Runs by state
- Average runtime
- Stall count
- Runtime selection split
- Merge success/failure
- R2 retention backlog
Per-run timeline:
- Card created
- Runtime selected
- Lease/container started
- Repo checkout complete
- Codex started
- First output/event
- PR opened/updated
- Validation started/finished
- Takeover/attach events
- Merge/handoff
- Completion/failure
Cloudflare Components
BoardDO:
- Per org/repo board.
- Owns card ordering and live board fanout.
- Tracks active counts and lane changes.
RunDO:
- Per active run.
- Owns live WebSocket fanout.
- Proxies app-server or PTY stream.
- Tracks heartbeat, elapsed timer, stall state.
- Writes compact event index to D1 and full stream to R2.
Dispatcher:
- Queue consumer.
- Enforces caps.
- Chooses runtime.
- Allocates Cloudflare Container or Crabbox lease.
- Starts Codex.
GitHubSync:
- Webhook receiver.
- Issue/PR search cache.
- Status/check refresh.
MergeController:
- Direct merge guardrails.
- ClawSweeper handoff.
- Merge audit log.
RetentionJob:
- Deletes expired R2 logs/artifacts.
- Compacts D1 summaries.
MVP
- Auth and admin allowlists.
- Board with cards and lanes.
- Freeform prompt card.
- GitHub issue/PR picker.
- Fake runner live logs.
- Cloudflare Container runner.
- Codex CLI PTY in Ghostty WASM.
- R2 log persistence and replay.
- Crabbox runner option.
- Direct PR creation.
- Direct merge with guardrails.
- ClawSweeper handoff for review/automerge.
Open Decisions
- Exact GitHub App permission set.
- Direct merge default per repo.
- Whether
CRABYARD.mdships in MVP or phase 2. - Whether app-server PTY is enough for Ghostty or a small sidecar PTY bridge is needed.
- Cloudflare Container image shape and Codex auth injection path.
- R2 lifecycle implementation details for 30-day retention.