Prepare

agentv prepare materializes one eval case without launching the target provider. Use it when a human, a separate agent process, or another harness should attempt the task in the same workspace state AgentV would have provided immediately before target execution.

This is the manual-attempt workflow:

agentv prepare evals/foo.eval.yaml --test-id case-1 --target codex --out /tmp/agentv-case-1

The prepared directory contains:

/tmp/agentv-case-1/
  workspace/              # materialized template/repos/hooks state
  prompt.md               # safe task prompt for the human or external agent
  agentv_prepare.json     # snake_case manifest for audit and later grading

prepare runs setup only: workspace before_all, target before_all, workspace before_each, and target before_each. It does not launch the agent, run graders, mark an eval complete, or expose hidden expected outputs and grader internals in prompt.md.

Grade the Attempt

After the human or external agent finishes editing files in workspace/, grade the final state without rerunning the target:

agentv grade evals/foo.eval.yaml \
  --test-id case-1 \
  --prepared /tmp/agentv-case-1 \
  --output .agentv/results/runs/manual-case-1

grade reads agentv_prepare.json, verifies it matches the eval/test, captures workspace changes from the prepared baseline when available, and runs the eval’s graders against the final workspace. The target provider is not invoked.

If the external agent produced a final answer outside the workspace, pass it as a text file:

agentv grade evals/foo.eval.yaml \
  --test-id case-1 \
  --prepared /tmp/agentv-case-1 \
  --response /tmp/agentv-case-1/final-response.md

Add Trace or Session Evidence

Trace-aware graders can use a local trace/session artifact from the manual attempt:

agentv grade evals/foo.eval.yaml \
  --test-id case-1 \
  --prepared /tmp/agentv-case-1 \
  --trace /tmp/agentv-case-1/session.jsonl

Supported --trace inputs:

Format	Typical source
`agentv.trace.v1` JSON or JSONL	`outputs/trace.json` from an AgentV run or replay/export workflow
AgentV transcript JSONL	`agentv import claude`, `agentv import codex`, or `agentv import copilot` output

Single-record trace files are accepted directly. Multi-record files are matched by test_id and target. The selected trace is projected into AgentV’s normal trace and messages grader context, so tool-trajectory, execution-metrics, and code graders receive the same shape they see during eval runs.

Use --response when the final answer text should be graded independently of the trace. If --response is omitted and the trace contains an assistant message with content, AgentV uses the last assistant message as the candidate answer.

Observability Boundary

prepare is not a replacement for live observability. Configure live tracing in the harness or target itself:

Use provider-native settings, target hooks, or environment variables to enable session logs.
Use AgentV’s OTLP options during normal eval runs, such as --otel-file or --export-otel, when AgentV is the runner.
For Opik, Phoenix, Langfuse, or another backend, treat their traces as external artifacts that can be imported or projected back into AgentV later.

AgentV remains responsible for eval definitions, workspace setup, grading, result bundles, and CI gates. Live trace storage, dashboards, and provider-specific run monitoring belong in the observability backend or the external harness.

There is no agentv watch command.

Manifest

agentv_prepare.json uses snake_case keys because it is a disk artifact:

{
  "schema_version": 1,
  "eval_path": "/repo/evals/foo.eval.yaml",
  "test_id": "case-1",
  "target": "codex",
  "workspace_path": "/tmp/agentv-case-1/workspace",
  "prompt_path": "/tmp/agentv-case-1/prompt.md",
  "setup_status": "ok",
  "setup_steps": [],
  "repo_pins": [],
  "baseline": { "status": "initialized", "commit": "..." },
  "created_at": "2026-06-18T00:00:00.000Z"
}

Keep the prepared directory with the generated run directory when sharing review evidence. The index.jsonl row written by grade includes metadata.prepared_attempt with the manifest path, workspace path, prompt path, baseline status, and optional trace path.