Prepare
agentv prepare materializes one eval case without launching the target provider. Use it when a human, a separate agent process, or another harness should attempt the task in the same workspace state AgentV would have provided immediately before target execution.
This is the manual-attempt workflow:
agentv prepare evals/foo.eval.yaml --test-id case-1 --target codex --out /tmp/agentv-case-1The prepared directory contains:
/tmp/agentv-case-1/ workspace/ # materialized template/repos/hooks state prompt.md # safe task prompt for the human or external agent agentv_prepare.json # snake_case manifest for audit and later gradingprepare runs setup only: workspace before_all, target before_all, workspace before_each, and target before_each. It does not launch the agent, run graders, mark an eval complete, or expose hidden expected outputs and grader internals in prompt.md.
Grade the Attempt
Section titled “Grade the Attempt”After the human or external agent finishes editing files in workspace/, grade the final state without rerunning the target:
agentv grade evals/foo.eval.yaml \ --test-id case-1 \ --prepared /tmp/agentv-case-1 \ --output .agentv/results/runs/manual-case-1grade reads agentv_prepare.json, verifies it matches the eval/test, captures workspace changes from the prepared baseline when available, and runs the eval’s graders against the final workspace. The target provider is not invoked.
If the external agent produced a final answer outside the workspace, pass it as a text file:
agentv grade evals/foo.eval.yaml \ --test-id case-1 \ --prepared /tmp/agentv-case-1 \ --response /tmp/agentv-case-1/final-response.mdAdd Trace or Session Evidence
Section titled “Add Trace or Session Evidence”Trace-aware graders can use a local trace/session artifact from the manual attempt:
agentv grade evals/foo.eval.yaml \ --test-id case-1 \ --prepared /tmp/agentv-case-1 \ --trace /tmp/agentv-case-1/session.jsonlSupported --trace inputs:
| Format | Typical source |
|---|---|
agentv.trace.v1 JSON or JSONL | outputs/trace.json from an AgentV run or replay/export workflow |
| AgentV transcript JSONL | agentv import claude, agentv import codex, or agentv import copilot output |
Single-record trace files are accepted directly. Multi-record files are matched by test_id and target. The selected trace is projected into AgentV’s normal trace and messages grader context, so tool-trajectory, execution-metrics, and code graders receive the same shape they see during eval runs.
Use --response when the final answer text should be graded independently of the trace. If --response is omitted and the trace contains an assistant message with content, AgentV uses the last assistant message as the candidate answer.
Observability Boundary
Section titled “Observability Boundary”prepare is not a replacement for live observability. Configure live tracing in the harness or target itself:
- Use provider-native settings, target hooks, or environment variables to enable session logs.
- Use AgentV’s OTLP options during normal eval runs, such as
--otel-fileor--export-otel, when AgentV is the runner. - For Opik, Phoenix, Langfuse, or another backend, treat their traces as external artifacts that can be imported or projected back into AgentV later.
AgentV remains responsible for eval definitions, workspace setup, grading, result bundles, and CI gates. Live trace storage, dashboards, and provider-specific run monitoring belong in the observability backend or the external harness.
There is no agentv watch command.
Manifest
Section titled “Manifest”agentv_prepare.json uses snake_case keys because it is a disk artifact:
{ "schema_version": 1, "eval_path": "/repo/evals/foo.eval.yaml", "test_id": "case-1", "target": "codex", "workspace_path": "/tmp/agentv-case-1/workspace", "prompt_path": "/tmp/agentv-case-1/prompt.md", "setup_status": "ok", "setup_steps": [], "repo_pins": [], "baseline": { "status": "initialized", "commit": "..." }, "created_at": "2026-06-18T00:00:00.000Z"}Keep the prepared directory with the generated run directory when sharing review evidence. The index.jsonl row written by grade includes metadata.prepared_attempt with the manifest path, workspace path, prompt path, baseline status, and optional trace path.