Manual Verification Workflow
Depth:
Some behavior cannot be covered by automated tests: TUI flows, tmux-driven agents, multi-screen navigation, on-disk artifact inspection, live LLM calls. /aitask-qa handles everything that is script-testable; manual verification covers the rest as first-class tasks with their own checklist, lock, and archival gate.
A task with issue_type: manual_verification is a human-checklist runner. When /aitask-pick picks one, Step 3 Check 3 routes it into a dedicated Pass/Fail/Skip/Defer loop instead of the normal plan-and-implement flow. Failing items auto-generate pre-populated bug tasks; deferred items block archival or spawn carry-over tasks for later.
The Checklist Format
A manual-verification task body contains one H2 section:
## Verification Checklist
- [ ] Open the brainstorm TUI and confirm the left pane renders
- [ ] Ctrl+N in the monitor creates a new task
- [ ] Agent spawn lands in a fresh tmux window
Each item is stateful. The parser rewrites item lines in place as the picker marks them:
| State | Line format |
|---|---|
| pending | - [ ] text |
| pass | - [x] text — PASS YYYY-MM-DD HH:MM |
| fail | - [fail] text — FAIL YYYY-MM-DD HH:MM follow-up t<new_id> |
| skip | - [skip] text — SKIP YYYY-MM-DD HH:MM <reason> |
| defer | - [defer] text — DEFER YYYY-MM-DD HH:MM |
Annotations live after — (em dash + spaces); the parser strips and rewrites the suffix on each state change, so repeated marks do not accumulate trailing text. Inspect .aitask-scripts/aitask_verification_parse.sh --help for the full subcommand surface (parse, set, summary, terminal_only, seed).
Where Checklists Come From — Two Generation Paths
Manual-verification tasks are not written by hand. Two places in the /aitask-pick workflow seed them automatically, both shelling out to the same ./.aitask-scripts/aitask_create_manual_verification.sh helper.
Aggregate-Sibling (Parent → Children Planning)
When /aitask-pick on a parent task splits it into two or more children during planning, after the child plans are committed the skill offers to add a sibling manual-verification task that verifies some or all of the children.
Three choices:
- No, not needed — skip; the normal flow continues to the child-task checkpoint.
- Yes, aggregate sibling covering all children — recommended for TUI/UX-heavy work.
- Yes, but let me choose which children it verifies — multi-select a subset of the children.
The seeder extracts each selected child’s plan ## Verification bullets, prefixes each with [t<parent>_<child>] for at-a-glance origin, and creates the sibling with verifies: [<selected_child_ids>]. Skipped when only one child was created — a post-implementation follow-up covers that case instead.
Post-Implementation Follow-up (Step 8c)
After the “Commit changes” branch of /aitask-pick’s review step, Step 8c offers a standalone manual-verification follow-up. The new task is picked after the current one archives and verifies behavior introduced by this task.
The prompt is skipped when:
- The current task is a child (aggregate-sibling covers children).
- The current task’s
issue_typeis itselfmanual_verification. - An aggregate sibling was already created during the same session.
Candidate checklist bullets are assembled from four de-duplicated sources:
- The task’s
## Verification StepsH2 section. - The plan’s
## VerificationH2 section. - The plan’s Final Implementation Notes — bullets under
**Deviations from plan:**and**Issues encountered:**. - A diff scan of the task’s commits (files matching interactive surfaces — TUI code,
textualimports,fzf/gum-driven scripts — emitTODO: verify <file> end-to-end in tmuxbullets).
If all four sources are empty, a single TODO: define verification for t<id> stub is written — the picker fills it in interactively when they later pick the follow-up.
Set manual_verification_followup_mode: never in an active profile to skip Step 8c entirely. See Execution Profiles.
Running a Manual-Verification Task
When /aitask-pick picks a task whose issue_type is manual_verification, Step 3 Check 3 dispatches to the Manual Verification Procedure — replacing Steps 6 (plan), 7 (implement), and 8 (review). Steps 4 (ownership lock) and 5 (worktree) still run first: manual verification is owned work that should be locked against concurrent pickers.
For each pending or deferred item, the procedure renders the item text (Item 3: ...) and prompts for an outcome:
| Choice | Effect |
|---|---|
| Pass | Marks the item pass with timestamp. |
| Fail | Runs aitask_verification_followup.sh — creates a pre-populated bug task with commit hashes, touched files, verbatim failing text, and depends: [<origin>]. The item is annotated follow-up t<new_id>. |
| Skip (with reason) | Prompts for a free-text reason; marks the item skip with the reason appended. |
| Defer | Marks the item defer — the task will not archive cleanly while any item is in this state. |
| Other (free text) | Treated as a normal chat message — answer a question about the item, perform an investigation, apply a correction, etc. The item index does not advance until a terminal outcome is recorded. Express “pause” / “stop” / “abort” intent in the free-text field to end the loop without mutating state; the task stays Implementing with the lock held so only the original picker can resume. |
The “Other” path intentionally has no fixed keyword list — intent is judged from the user’s phrasing, not string-matched. This is how conversational corrections (“check this one more time with the minimonitor open”) stay inside the loop without interrupting the verification flow.
Fail → Follow-up Bug Task
On Fail, .aitask-scripts/aitask_verification_followup.sh creates a pre-populated bug task and links it back to the origin. The helper does five things:
- Resolves the origin task.
- User-supplied
--originwins. - Otherwise, the single entry in the current task’s
verifies:frontmatter list is used. - Empty
verifies:falls back to the current task itself as origin. - Multiple entries in
verifies:(aggregate-sibling case) emitORIGIN_AMBIGUOUS:<csv>(exit 2). The picker prompts the user with oneAskUserQuestionoption per candidate and re-invokes with--origin <chosen>.
- User-supplied
- Collects commits for the origin:
git log --oneline --all --grep="(t<origin>)". - Collects touched files from those commits:
git show --name-only, deduped. - Creates a
bugtask viaaitask_create.sh --batchwithdepends: [<origin>],labels: verification,bug, and a description containing the verbatim failing item text, commit list, touched-file list, and a Source block (MV task path, origin task ID, origin archived plan path). - Best-effort back-reference. Appends a bullet under the origin’s archived plan
## Final Implementation Notessection:
Skipped silently if the archived plan cannot be found or- **Manual-verification failure:** item "<text>" failed; follow-up task t<new_id>../ait gitis unavailable.
On success, stdout ends with FOLLOWUP_CREATED:<new_id>:<path> and the picker announces the new task id.
The verifies: Field
verifies: is an optional list of task IDs that a manual-verification task validates — populated automatically by the aggregate-sibling seeder (one entry per selected child). It drives origin disambiguation in the Fail flow: a single entry auto-resolves, multiple entries trigger ORIGIN_AMBIGUOUS.
Edit via ait update:
./.aitask-scripts/aitask_update.sh --batch <id> --verifies 571_4,571_5,571_6 # replace list
./.aitask-scripts/aitask_update.sh --batch <id> --add-verifies 571_7 # append one
./.aitask-scripts/aitask_update.sh --batch <id> --remove-verifies 571_4 # drop one
See the field reference in Task Format and the full CLI surface in Task Management.
Defer and Carry-over
Deferred items block clean archival. aitask_archive.sh <id> errors out on a manual-verification task while any item is still pending or defer. Two resolutions are offered at the post-loop checkpoint when DEFER > 0:
- Archive with carry-over. Sets an internal flag so Step 9 calls:The script archives the current task and creates a fresh
./.aitask-scripts/aitask_archive.sh --with-deferred-carryover <id>manual_verificationtask seeded with only the deferred items. The originalverifies:list is copied forward so Fail attribution still works on the carry-over. - Stop without archiving. Leaves the task
Implementingwith the lock held so only the original picker can resume it. Re-pick later with/aitask-pick <id>to continue the remaining items — no new plan, no re-verification; the checklist state is durable on disk.
If all items are terminal (pass, fail, or skip), the procedure proceeds to standard Step 9 archival without the carry-over prompt.
Example End-to-End
A parent task t571 splits into three children during planning: t571_4, t571_5, t571_6. At the aggregate prompt, answer “Yes, aggregate sibling covering all children”. The seeder creates t571_7_manual_verification_structured_brainstorming.md with verifies: [571_4, 571_5, 571_6] and a pre-seeded checklist assembled from the three child plans’ ## Verification bullets.
The children implement and archive normally. Eventually:
/aitask-pick 571_7
Step 3 Check 3 routes to the Manual Verification Procedure. Three items, rendered one at a time:
- Item 1: “Brainstorm left pane renders on Ctrl+B” → Pass.
- Item 2: “Ctrl+N spawns a new task from the brainstorm view” → Fail. The helper emits
ORIGIN_AMBIGUOUS:t571_4,t571_5,t571_6; pick571_5as the attribution.FOLLOWUP_CREATED:612:aitasks/t612_fix_failed_verification_t571_7_item2.md. The item line is now[fail] ... follow-up t612. - Item 3: “Agent spawn lands in a fresh tmux window” → Defer. Waiting on the tmux wrapper landing in a sibling task.
Post-loop: DEFER=1 → “Archive with carry-over” → t571_7 archives as Done. A new t613_manual_verification_structured_brainstorming_deferred.md is created with only item 3 pending. Pick it later with /aitask-pick 613 once the tmux wrapper ships.
Tips
- Seed aggregate siblings from child plans. The aggregate-sibling prompt pre-fills bullets from each child plan’s
## VerificationH2 — make sure child plans write concrete, testable verification bullets during planning. Stub items become friction at verification time. - Use “Other” for conversational adjustments. Ask clarifying questions, request follow-up investigations, or instruct one-off commands from inside the verification loop without leaving the picker. The item index does not advance until a terminal outcome is recorded.
- Set
manual_verification_followup_mode: neverfor batched work. When you are archiving many small commits, the per-task Step 8c prompt can be noisy. Profile the prompt off until you hit a larger feature that warrants one. - Child tasks don’t get Step 8c prompts. The aggregate-sibling path during parent planning is the correct place to cover children. If a single child needs post-implementation verification, add it during the parent’s planning phase with the “let me choose” option.
- Deferred items keep the task implementing. Re-pick the same task later with
/aitask-pick <id>(same owner, same PC) to resume rather than forcing an archive.