graph: per-thread goal + task board (goals/todos) as tinyagent features#4
Conversation
Introduce the per-thread goal primitive under graph::goals: a single durable objective per thread with a small lifecycle (Active/Paused/BudgetLimited/ Complete), an optional token budget, and the pure GoalProgress accounting shape plus active_goal_context_block prompt renderer. Ported from OpenHuman's thread_goals, minus app-specific coupling. Claude-Session: https://claude.ai/code/session_01NRL2bim9Gz3UN3SRtpeXEX
Add graph::goals::store: namespaced (graph.goals) Store-backed CRUD keyed by hex(thread_id), serialised per thread by an async mutex map since the Store trait offers no CAS. Preserves OpenHuman's goal_id compare-and-set stale-write guard (account_usage / set_continuation_suppressed_if) and the budget cap that flips an active goal to BudgetLimited. Uses next_seq() for goal ids and a local SystemTime clock (no uuid/chrono deps). Documents the single-process RMW caveat. Claude-Session: https://claude.ai/code/session_01NRL2bim9Gz3UN3SRtpeXEX
Add graph::goals::tool: GoalTool dispatching on GoalToolKind, mirroring OrchestrationTool. The default model-facing set is goal_get/goal_set/ goal_complete (asymmetric ownership); pause/resume/clear are constructible host controls. thread_id is resolved from ToolExecutionContext (never a tool arg), so a model cannot address another thread's goal; the bare call() entry point errors without an active thread. Store is captured as Arc<dyn Store> at construction. Claude-Session: https://claude.ai/code/session_01NRL2bim9Gz3UN3SRtpeXEX
Re-express OpenHuman's heartbeat auto-continuation on the graph runtime: - goal_gate_node: a command-routing node forming a self-driving bounded loop. After each work iteration it folds the iteration's GoalProgress via account_usage and routes back to the work node while the goal is Active and under budget, else to END. recursion_limit is the hard backstop; a zero- progress iteration sets the one-shot suppression and stops. - run_continuation_tick: a faithful port of the heartbeat for callers with an external scheduler, generic over a run-turn closure (idle filter, oldest-first, max_per_tick, one-shot suppression). - note_user_turn: clears one-shot suppression and reactivates a paused goal on a user-initiated run. Token accounting is explicit at the work-node boundary since the graph runtime does not meter tokens per node. Tests build real graphs on InMemoryStore. Claude-Session: https://claude.ai/code/session_01NRL2bim9Gz3UN3SRtpeXEX
Re-export the goal surface at the crate root (goal_store CRUD alias, GoalTool, goal_gate_node, run_continuation_tick, note_user_turn, ThreadGoal, ...) and note the primitive in the graph-surface doc. Add src/graph/goals/README.md covering the data model, Store key scheme, the single-process RMW/CAS caveat, the tool surface, and the continuation mapping. Claude-Session: https://claude.ai/code/session_01NRL2bim9Gz3UN3SRtpeXEX
Introduce the per-thread kanban board under graph::todos: TaskCardStatus, TaskApprovalMode, TaskBoardCard, TaskBoard, plus the pure helpers parse_status, render_markdown (GitHub-flavored markers [ ]/[x]/[~]/[!]/[?]/[-] + indented metadata), normalise_board (id generation, order recompute, empty-title drop, blocker-from-notes), CardPatch, and TodosSnapshot. Ported from OpenHuman's task board; uses next_seq() ids and a dependency-free millis timestamp. Claude-Session: https://claude.ai/code/session_01NRL2bim9Gz3UN3SRtpeXEX
…rogress Add graph::todos::store: namespaced (graph.todos) Store-backed board CRUD keyed by hex(thread_id), serialised per thread by an async mutex. Ports add/edit/ update_status/decide_plan/revise_plan/remove/replace/clear/list/claim_card/ set_session_thread, each returning a TodosSnapshot (normalised cards + markdown). Enforces the single-in-progress invariant (errors, never auto-fixes), guards decide_plan to AwaitingApproval cards, and does an atomic CAS claim_card. Claude-Session: https://claude.ai/code/session_01NRL2bim9Gz3UN3SRtpeXEX
Add graph::todos::tool: a single `todo` harness Tool dispatching on the `op` field over add/edit/update_status/decide_plan/revise_plan/remove/replace/clear/ list, mirroring OpenHuman's todo tool. The board is bound to ToolExecutionContext.thread_id (never a tool arg); the bare call() entry point errors without a thread. Domain errors (unknown id, invariant violation) are surfaced to the model as tool errors rather than failing the run. Captures Arc<dyn Store> at construction. Claude-Session: https://claude.ai/code/session_01NRL2bim9Gz3UN3SRtpeXEX
Re-export the task-board surface at the crate root (todo_store CRUD alias, TodoTool, todo_tools, register_todo_tools, TaskBoard, TaskBoardCard, ...) and note the primitive in the graph-surface docs. Add src/graph/todos/README.md covering the data model, Store key scheme, invariants (single-in-progress, decide_plan guard, CAS claim_card), and the tool surface. Claude-Session: https://claude.ai/code/session_01NRL2bim9Gz3UN3SRtpeXEX
Add integration tests exercising the public crate surface end-to-end:
tests/e2e_graph_goals.rs drives a self-driving goal loop (goal_gate_node +
goal_store on a shared InMemoryStore) until the work node completes the goal;
tests/e2e_graph_todos.rs drives the `todo` tool through a MockModel agent loop
and asserts the board persists under the run's thread id. Add
docs/modules/graph/{goals,todos}.md and link them from the graph module index.
Claude-Session: https://claude.ai/code/session_01NRL2bim9Gz3UN3SRtpeXEX
📝 WalkthroughWalkthroughThis PR adds two new per-thread graph primitives: ChangesGraph Goals feature
Graph Todos (kanban) feature
Estimated code review effort: 4 (Complex) | ~75 minutes Sequence Diagram(s)sequenceDiagram
participant WorkNode
participant GoalGateNode
participant GoalStore
participant Scheduler
WorkNode->>GoalGateNode: iteration result (GoalProgress)
GoalGateNode->>GoalStore: account_usage(goal_id, tokens, secs)
GoalStore-->>GoalGateNode: updated ThreadGoal
alt no progress
GoalGateNode->>GoalStore: set_continuation_suppressed_if(true)
GoalGateNode-->>WorkNode: route to END
else progress made
GoalGateNode-->>WorkNode: loop back to work_node
end
Scheduler->>GoalStore: list_all()
GoalStore-->>Scheduler: eligible idle goals
Scheduler->>WorkNode: run_turn(goal)
WorkNode-->>Scheduler: TurnOutcome
Scheduler->>GoalStore: account_usage(goal_id, outcome)
sequenceDiagram
participant Model
participant TodoTool
participant TodoStore
Model->>TodoTool: call_with_context(op="add", args)
TodoTool->>TodoStore: add(thread_id, content, patch)
TodoStore-->>TodoTool: TodosSnapshot
TodoTool-->>Model: success payload (cards, markdown)
Model->>TodoTool: call_with_context(op="list")
TodoTool->>TodoStore: list(thread_id)
TodoStore-->>TodoTool: TodosSnapshot
TodoTool-->>Model: cards + markdown
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✨ Finishing Touches📝 Generate docstrings
Comment |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: a891732356
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| ToolResult { | ||
| call_id, | ||
| name: TODO_TOOL_NAME.to_string(), | ||
| content: String::new(), |
There was a problem hiding this comment.
Put todo tool errors in the message content
When the model makes any invalid todo call (unknown op, missing fields, bad status, or a board invariant violation), this helper stores the message only in ToolResult.error while leaving content empty. The agent loop forwards tool results back to the model using only result.content (Message::tool(result.call_id, result.content)), so these soft errors become blank tool replies and the model has no signal to repair the call. Populate content with the same error text, as ToolResult::error does.
Useful? React with 👍 / 👎.
| if token_delta == 0 && secs_delta == 0 { | ||
| return Ok(Some(goal)); |
There was a problem hiding this comment.
Refresh goal activity for zero-metered turns
If a continuation turn reports made_progress: true but has no token/time metering available (tokens_used == 0 and elapsed_secs == 0), this early return skips updating updated_at_ms. run_continuation_tick uses updated_at_ms to decide whether a goal has been idle, so the same goal remains immediately eligible on every scheduler tick despite just having run, which can create a busy continuation loop in providers or tests that do not report usage. A successful turn should still touch the goal's activity timestamp or the driver should otherwise mark it non-idle.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Actionable comments posted: 4
🧹 Nitpick comments (3)
src/graph/todos/types.rs (1)
212-226: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚖️ Poor tradeoff
parse_statusreturnsResult<T, String>instead of the crate error type.Per repo guidelines, public APIs should prefer
Result<T>using the crate error type exported fromsrc/error.rs, butparse_status(and per the downstreampatch_from_argssnippet, the same String-error pattern) returns a rawString. Since this is a re-exported public function (graph::todos::parse_status, and re-exported at the crate root), locking inStringerrors now makes a later switch to a typed error a breaking change.As per coding guidelines, "Prefer small, typed APIs with
Result<T>using the crate error type exported fromsrc/error.rs."🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/graph/todos/types.rs` around lines 212 - 226, `parse_status` currently exposes a raw `Result<TaskCardStatus, String>` instead of the crate’s shared error type, which locks the public API into string errors. Update `parse_status` in `TaskCardStatus` handling to return the error type exported from `src/error.rs`, and adjust any downstream callers such as `patch_from_args` to propagate that typed error consistently rather than constructing or returning `String` errors. Keep the status parsing logic the same, but convert the invalid-status branch to use the crate error type so the re-exported `graph::todos::parse_status` API stays typed.Source: Coding guidelines
src/graph/todos/test.rs (1)
356-519: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick winMissing tool-layer test coverage for
remove/replace/clear/revise_planops.
store_testsexercises these at the store layer, buttool_testsnever drivesTodoTool::dispatchthrough these ops (e.g.,replace's JSON→Vec<TaskBoardCard>deserialization/error path, orclear/revise_plan's no-arg dispatch) — the exact argument-parsing/validation surface added intool.rs. As per coding guidelines, "Add focused tests when changing serialization, graph routing, tool invocation, or public model request/response shapes."🧪 Example additions
#[tokio::test] async fn replace_via_tool_rejects_invalid_cards_payload() { let tool = TodoTool::new(store()); let res = run(&tool, Some("t"), json!({ "op": "replace", "cards": "not-an-array" })).await; assert!(res.error.unwrap().contains("invalid `cards`")); } #[tokio::test] async fn clear_and_revise_plan_via_tool() { let tool = TodoTool::new(store()); run(&tool, Some("t"), json!({ "op": "add", "content": "A" })).await; let res = run(&tool, Some("t"), json!({ "op": "clear" })).await; assert!(res.raw.unwrap()["cards"].as_array().unwrap().is_empty()); }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/graph/todos/test.rs` around lines 356 - 519, Add missing tool-level coverage for the TodoTool dispatch paths handling remove, replace, clear, and revise_plan. Update the existing tool_tests module to drive TodoTool::dispatch through these ops, especially the replace JSON-to-Vec<TaskBoardCard> parsing/error path and the no-arg clear/revise_plan cases. Keep the tests focused on tool-layer behavior by using the existing run helper and asserting on the returned ToolResult error/raw shapes.Source: Coding guidelines
src/graph/todos/store.rs (1)
31-36: 🚀 Performance & Scalability | 🔵 Trivial | ⚡ Quick winPer-thread lock map grows unbounded for the process lifetime.
Entries in
LOCKSare inserted on first access but never evicted, so every distinctthread_idever seen accumulates a permanentHashMapentry. In a long-running server handling many threads this is an unbounded memory leak. Since the map itself is already guarded by a synchronous mutex, pruning entries with no other holders (Arc::strong_count == 1) before/while inserting is cheap and safe.♻️ Proposed fix to prune stale entries
fn thread_lock(thread_id: &str) -> Arc<Mutex<()>> { static LOCKS: OnceLock<StdMutex<HashMap<String, Arc<Mutex<()>>>>> = OnceLock::new(); let map = LOCKS.get_or_init(|| StdMutex::new(HashMap::new())); let mut guard = map.lock().expect("todo lock map poisoned"); + // Drop entries no longer referenced elsewhere to bound map growth. + guard.retain(|_, lock| Arc::strong_count(lock) > 1); guard.entry(thread_id.to_string()).or_default().clone() }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/graph/todos/store.rs` around lines 31 - 36, The per-thread lock cache in thread_lock() never evicts stale Arc<Mutex<()>> entries, so LOCKS can grow without bound over the process lifetime. Update the thread_lock() logic in the LOCKS/HashMap path to prune entries whose only remaining reference is the map itself (for example by checking Arc::strong_count == 1) before or during insert/lookup, then keep reusing the existing lock when present. Ensure the cleanup happens while holding the StdMutex guard so the map stays consistent.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/graph/goals/tool.rs`:
- Around line 146-186: The GoalToolKind::Set, GoalToolKind::Complete,
GoalToolKind::Pause, and GoalToolKind::Resume branches in dispatch are still
using store::set/store::complete/store::pause/store::resume with ?, which turns
validation failures into hard errors. Update these paths in dispatch so store
validation failures are converted into an Ok tool-result message (like the
existing missing objective case) instead of propagating Err, while keeping
store::get and store::clear behavior unchanged.
In `@src/graph/todos/store.rs`:
- Around line 332-345: The replace path in store::replace currently only checks
enforce_single_in_progress and then persists caller-supplied cards, so duplicate
non-blank ids can slip through and break later lookups. Update replace to
validate the incoming Vec<TaskBoardCard> for duplicate ids before calling
save_cards, rejecting duplicates with an error; keep the check near thread_id
validation/enforce_single_in_progress so the behavior is enforced for the todo
tool’s cards input. Use the existing replace, save_cards, and
enforce_single_in_progress flow as the main entry points to locate the change.
In `@src/graph/todos/types.rs`:
- Around line 312-347: The stale blocker value is only being populated in
normalise_board when a card is Blocked, but it is never cleared when the card is
no longer Blocked. Update normalise_board so it explicitly removes or resets
card.blocker for any non-Blocked TaskCardStatus, while preserving the existing
backfill from notes for Blocked cards. Use the TaskBoardCard fields and the
status check in normalise_board to keep serialized TodosSnapshot responses from
carrying an old blocker reason after unblocking.
- Around line 121-143: The task card ID generation is process-local and can
collide with persisted cards after restart; update the ID creation path used by
TaskBoardCard::new and normalise_board so it is seeded from the persisted board
state or replaced with UUID-based IDs. Make sure any helper that mints new card
IDs continues to produce unique values across reloads, and keep the existing
call sites using the new ID source consistently.
---
Nitpick comments:
In `@src/graph/todos/store.rs`:
- Around line 31-36: The per-thread lock cache in thread_lock() never evicts
stale Arc<Mutex<()>> entries, so LOCKS can grow without bound over the process
lifetime. Update the thread_lock() logic in the LOCKS/HashMap path to prune
entries whose only remaining reference is the map itself (for example by
checking Arc::strong_count == 1) before or during insert/lookup, then keep
reusing the existing lock when present. Ensure the cleanup happens while holding
the StdMutex guard so the map stays consistent.
In `@src/graph/todos/test.rs`:
- Around line 356-519: Add missing tool-level coverage for the TodoTool dispatch
paths handling remove, replace, clear, and revise_plan. Update the existing
tool_tests module to drive TodoTool::dispatch through these ops, especially the
replace JSON-to-Vec<TaskBoardCard> parsing/error path and the no-arg
clear/revise_plan cases. Keep the tests focused on tool-layer behavior by using
the existing run helper and asserting on the returned ToolResult error/raw
shapes.
In `@src/graph/todos/types.rs`:
- Around line 212-226: `parse_status` currently exposes a raw
`Result<TaskCardStatus, String>` instead of the crate’s shared error type, which
locks the public API into string errors. Update `parse_status` in
`TaskCardStatus` handling to return the error type exported from `src/error.rs`,
and adjust any downstream callers such as `patch_from_args` to propagate that
typed error consistently rather than constructing or returning `String` errors.
Keep the status parsing logic the same, but convert the invalid-status branch to
use the crate error type so the re-exported `graph::todos::parse_status` API
stays typed.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: db5df525-769d-44cb-b818-235d5de6b21a
📒 Files selected for processing (20)
docs/modules/graph/README.mddocs/modules/graph/goals.mddocs/modules/graph/todos.mdsrc/graph/goals/README.mdsrc/graph/goals/continuation.rssrc/graph/goals/mod.rssrc/graph/goals/store.rssrc/graph/goals/test.rssrc/graph/goals/tool.rssrc/graph/goals/types.rssrc/graph/mod.rssrc/graph/todos/README.mdsrc/graph/todos/mod.rssrc/graph/todos/store.rssrc/graph/todos/test.rssrc/graph/todos/tool.rssrc/graph/todos/types.rssrc/lib.rstests/e2e_graph_goals.rstests/e2e_graph_todos.rs
| async fn dispatch(&self, thread_id: &str, args: &Value) -> Result<(String, Option<Value>)> { | ||
| match self.kind { | ||
| GoalToolKind::Get => match store::get(&self.store, thread_id).await? { | ||
| Some(goal) => Ok((render_goal(&goal), Some(serde_json::to_value(&goal)?))), | ||
| None => Ok(("no goal set for this thread".to_string(), None)), | ||
| }, | ||
| GoalToolKind::Set => { | ||
| let Some(objective) = args.get("objective").and_then(Value::as_str) else { | ||
| return Ok(("error: missing 'objective' parameter".to_string(), None)); | ||
| }; | ||
| let token_budget = args.get("token_budget").and_then(Value::as_u64); | ||
| let goal = store::set(&self.store, thread_id, objective, token_budget).await?; | ||
| Ok(( | ||
| format!("Goal set.\n{}", render_goal(&goal)), | ||
| Some(serde_json::to_value(&goal)?), | ||
| )) | ||
| } | ||
| GoalToolKind::Complete => { | ||
| let goal = store::complete(&self.store, thread_id).await?; | ||
| Ok(( | ||
| format!("Goal marked complete.\n{}", render_goal(&goal)), | ||
| Some(serde_json::to_value(&goal)?), | ||
| )) | ||
| } | ||
| GoalToolKind::Pause => { | ||
| let goal = store::pause(&self.store, thread_id).await?; | ||
| Ok((render_goal(&goal), Some(serde_json::to_value(&goal)?))) | ||
| } | ||
| GoalToolKind::Resume => { | ||
| let goal = store::resume(&self.store, thread_id).await?; | ||
| Ok((render_goal(&goal), Some(serde_json::to_value(&goal)?))) | ||
| } | ||
| GoalToolKind::Clear => { | ||
| let removed = store::clear(&self.store, thread_id).await?; | ||
| Ok(( | ||
| format!("Goal cleared (removed={removed})."), | ||
| Some(json!({ "removed": removed })), | ||
| )) | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
🎯 Functional Correctness | 🟠 Major | ⚡ Quick win
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Map relevant files and inspect the goal tool implementation and tests.
git ls-files 'src/graph/goals/*' 'src/graph/*tool*' 'src/*error*' | sort
printf '\n--- outline: src/graph/goals/tool.rs ---\n'
ast-grep outline src/graph/goals/tool.rs --view expanded
printf '\n--- outline: src/graph/goals/test.rs ---\n'
ast-grep outline src/graph/goals/test.rs --view expanded
printf '\n--- search ToolResult / call_with_context / error_result ---\n'
rg -n "call_with_context|error_result|ToolResult|Result::Err|error: Some|soft error|missing 'objective'|goal_complete|goal_pause|goal_resume" src/graph src -g '!**/target/**'Repository: tinyhumansai/tinyagents
Length of output: 18224
🏁 Script executed:
#!/bin/bash
set -euo pipefail
printf '\n--- src/graph/goals/tool.rs relevant sections ---\n'
sed -n '146,310p' src/graph/goals/tool.rs
printf '\n--- src/harness/tool/mod.rs ---\n'
sed -n '1,220p' src/harness/tool/mod.rs
printf '\n--- src/harness/tool/types.rs relevant sections ---\n'
sed -n '280,340p' src/harness/tool/types.rs
printf '\n--- src/graph/goals/test.rs relevant tests ---\n'
sed -n '220,405p' src/graph/goals/test.rsRepository: tinyhumansai/tinyagents
Length of output: 19282
🏁 Script executed:
#!/bin/bash
set -euo pipefail
printf '\n--- call_with_context call sites and error handling ---\n'
rg -n "call_with_context\\(|\\.call\\(|ToolResult::error\\(|error: Some\\(" src/harness src/graph -g '!**/target/**' | sed -n '1,220p'
printf '\n--- harness agent_loop handling around tool calls ---\n'
sed -n '1000,1105p' src/harness/agent_loop/mod.rs
printf '\n--- harness testkit / middleware handling around tool calls if relevant ---\n'
sed -n '260,340p' src/harness/testkit/mod.rsRepository: tinyhumansai/tinyagents
Length of output: 7481
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Focus on the exact tool-call execution path where Result::Err might be handled.
sed -n '1,220p' src/harness/tool/types.rs
printf '\n---\n'
sed -n '1060,1135p' src/harness/agent_loop/mod.rs
printf '\n---\n'
sed -n '470,520p' src/harness/subagent/mod.rs
printf '\n---\n'
sed -n '1,220p' src/harness/middleware/mod.rsRepository: tinyhumansai/tinyagents
Length of output: 18659
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Inspect the implementation of the goal store mutators to see whether the errors are input-validation errors or missing-state errors.
sed -n '1,260p' src/graph/goals/store.rs
printf '\n--- search for docs mentioning goal_complete/goal_pause/goal_resume behavior ---\n'
sed -n '1,180p' src/graph/goals/README.mdRepository: tinyhumansai/tinyagents
Length of output: 15902
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Find the concrete execution path from agent loop to tool results.
rg -n "call_with_context\\(|call\\(&.*ToolCall|invoke.*tool|ToolResult::text|ToolResult::error|match .*Result<.*ToolResult" src/harness/agent_loop src/harness/subagent src/harness/middleware src/harness/testkit -g '!**/target/**'Repository: tinyhumansai/tinyagents
Length of output: 2337
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Check whether goal-tool failures are already covered by tests or expected by docs.
rg -n "goal_complete|goal_pause|goal_resume|error: missing 'objective'|mutators_error_without_a_goal|empty_objective_and_blank_thread_id_rejected" src/graph/goals -g '!**/target/**'Repository: tinyhumansai/tinyagents
Length of output: 1406
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Show the exact code path that executes tools and handles a returned Result::Err.
python3 - <<'PY'
from pathlib import Path
text = Path('src/harness/agent_loop/mod.rs').read_text()
for needle in ['call_with_context(', 'call(', 'ToolResult', 'Err(', 'error']:
idx = text.find(needle)
print(f'NEEDLE {needle}:', idx)
PY
printf '\n--- agent loop tool execution area ---\n'
sed -n '1030,1125p' src/harness/agent_loop/mod.rsRepository: tinyhumansai/tinyagents
Length of output: 1269
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Inspect the tool execution branch in the agent loop and subagent path for Err handling.
sed -n '1040,1125p' src/harness/agent_loop/mod.rs
printf '\n---\n'
sed -n '520,590p' src/harness/subagent/mod.rs
printf '\n---\n'
sed -n '280,360p' src/harness/middleware/mod.rsRepository: tinyhumansai/tinyagents
Length of output: 5614
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Show the exact tool-call execution block in the agent loop.
awk 'NR>=1040 && NR<=1125 {print NR ":" $0}' src/harness/agent_loop/mod.rsRepository: tinyhumansai/tinyagents
Length of output: 1021
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Inspect the agent-loop branch that executes tools and handles tool-call errors/results.
rg -n "run_after_tool|run_on_tool_delta|ToolResult::error|is_error\\(|match .*ToolResult|match .*call_with_context|tool_result|tool call" src/harness/agent_loop/mod.rs src/harness/middleware/mod.rs src/harness/agent_loop/test.rs -g '!**/target/**'
printf '\n--- agent_loop tool execution around the result handling ---\n'
sed -n '850,1035p' src/harness/agent_loop/mod.rs
printf '\n--- any direct handling of Err from tool invocation in tests ---\n'
sed -n '1,260p' src/harness/agent_loop/test.rsRepository: tinyhumansai/tinyagents
Length of output: 17634
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Inspect the exact agent-loop branch that consumes tool-call results.
sed -n '600,670p' src/harness/agent_loop/mod.rs
printf '\n--- nearby tool-result tests ---\n'
sed -n '900,960p' src/harness/agent_loop/test.rsRepository: tinyhumansai/tinyagents
Length of output: 5321
Surface goal store validation failures as tool errors
goal_set, goal_complete, goal_pause, and goal_resume still propagate store validation failures via ?, which turns a recoverable tool mistake into a hard Err that aborts the run. Convert these into a tool error result so the model can react and retry.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/graph/goals/tool.rs` around lines 146 - 186, The GoalToolKind::Set,
GoalToolKind::Complete, GoalToolKind::Pause, and GoalToolKind::Resume branches
in dispatch are still using
store::set/store::complete/store::pause/store::resume with ?, which turns
validation failures into hard errors. Update these paths in dispatch so store
validation failures are converted into an Ok tool-result message (like the
existing missing objective case) instead of propagating Err, while keeping
store::get and store::clear behavior unchanged.
| /// Wholesale-replace the board's cards. Cards missing ids get server-generated | ||
| /// ones on normalise. | ||
| pub async fn replace( | ||
| store: &Arc<dyn Store>, | ||
| thread_id: &str, | ||
| cards: Vec<TaskBoardCard>, | ||
| ) -> Result<TodosSnapshot> { | ||
| let thread_id = validate_thread_id(thread_id)?; | ||
| let lock = thread_lock(&thread_id); | ||
| let _guard = lock.lock().await; | ||
| enforce_single_in_progress(&cards)?; | ||
| let cards = save_cards(store, &thread_id, cards).await?; | ||
| Ok(snapshot(&thread_id, cards)) | ||
| } |
There was a problem hiding this comment.
🎯 Functional Correctness | 🟠 Major | ⚡ Quick win
replace doesn't validate caller-supplied card ids for duplicates.
replace persists the caller-supplied cards as-is (only enforce_single_in_progress is checked); normalise_board/save_cards fill in blank ids but never dedupe non-blank ones. Since this op is reachable directly from the model via the todo tool's cards argument, duplicate ids would make subsequent find(|c| c.id == id) lookups in edit/remove/claim_card/decide_plan silently resolve to the first match, leaving the intended card unmodified.
🛡️ Proposed fix to reject duplicate ids
pub async fn replace(
store: &Arc<dyn Store>,
thread_id: &str,
cards: Vec<TaskBoardCard>,
) -> Result<TodosSnapshot> {
let thread_id = validate_thread_id(thread_id)?;
let lock = thread_lock(&thread_id);
let _guard = lock.lock().await;
+ let mut seen = std::collections::HashSet::new();
+ for c in &cards {
+ if !c.id.trim().is_empty() && !seen.insert(c.id.clone()) {
+ return Err(TinyAgentsError::Validation(format!(
+ "duplicate card id '{}' in replace payload",
+ c.id
+ )));
+ }
+ }
enforce_single_in_progress(&cards)?;
let cards = save_cards(store, &thread_id, cards).await?;
Ok(snapshot(&thread_id, cards))
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| /// Wholesale-replace the board's cards. Cards missing ids get server-generated | |
| /// ones on normalise. | |
| pub async fn replace( | |
| store: &Arc<dyn Store>, | |
| thread_id: &str, | |
| cards: Vec<TaskBoardCard>, | |
| ) -> Result<TodosSnapshot> { | |
| let thread_id = validate_thread_id(thread_id)?; | |
| let lock = thread_lock(&thread_id); | |
| let _guard = lock.lock().await; | |
| enforce_single_in_progress(&cards)?; | |
| let cards = save_cards(store, &thread_id, cards).await?; | |
| Ok(snapshot(&thread_id, cards)) | |
| } | |
| /// Wholesale-replace the board's cards. Cards missing ids get server-generated | |
| /// ones on normalise. | |
| pub async fn replace( | |
| store: &Arc<dyn Store>, | |
| thread_id: &str, | |
| cards: Vec<TaskBoardCard>, | |
| ) -> Result<TodosSnapshot> { | |
| let thread_id = validate_thread_id(thread_id)?; | |
| let lock = thread_lock(&thread_id); | |
| let _guard = lock.lock().await; | |
| let mut seen = std::collections::HashSet::new(); | |
| for c in &cards { | |
| if !c.id.trim().is_empty() && !seen.insert(c.id.clone()) { | |
| return Err(TinyAgentsError::Validation(format!( | |
| "duplicate card id '{}' in replace payload", | |
| c.id | |
| ))); | |
| } | |
| } | |
| enforce_single_in_progress(&cards)?; | |
| let cards = save_cards(store, &thread_id, cards).await?; | |
| Ok(snapshot(&thread_id, cards)) | |
| } |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/graph/todos/store.rs` around lines 332 - 345, The replace path in
store::replace currently only checks enforce_single_in_progress and then
persists caller-supplied cards, so duplicate non-blank ids can slip through and
break later lookups. Update replace to validate the incoming Vec<TaskBoardCard>
for duplicate ids before calling save_cards, rejecting duplicates with an error;
keep the check near thread_id validation/enforce_single_in_progress so the
behavior is enforced for the todo tool’s cards input. Use the existing replace,
save_cards, and enforce_single_in_progress flow as the main entry points to
locate the change.
| impl TaskBoardCard { | ||
| /// Creates a minimal `Todo` card with `title`, a fresh id, and no metadata. | ||
| pub fn new(title: impl Into<String>) -> Self { | ||
| Self { | ||
| id: new_card_id(), | ||
| title: title.into(), | ||
| status: TaskCardStatus::Todo, | ||
| objective: None, | ||
| plan: Vec::new(), | ||
| assigned_agent: None, | ||
| allowed_tools: Vec::new(), | ||
| approval_mode: None, | ||
| acceptance_criteria: Vec::new(), | ||
| evidence: Vec::new(), | ||
| notes: None, | ||
| blocker: None, | ||
| session_thread_id: None, | ||
| source_metadata: None, | ||
| order: 0, | ||
| updated_at: now_stamp(), | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
🗄️ Data Integrity & Integration | 🟠 Major | ⚡ Quick win
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Inspect next_seq's implementation to confirm restart-safety.
fd -a ids.rs src/harness
cat -n $(fd -a ids.rs src/harness | head -1)Repository: tinyhumansai/tinyagents
Length of output: 161
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Locate the id generator and the card-claim logic, then inspect the relevant slices.
files=$(git ls-files | rg '(^|/)ids\.rs$|(^|/)store\.rs$|(^|/)types\.rs$|README\.md$')
printf '%s\n' "$files"
echo '--- src/harness/ids.rs ---'
sed -n '1,220p' src/harness/ids.rs
echo '--- src/harness/store.rs (claim_card and nearby) ---'
rg -n -A8 -B8 'claim_card|next_seq|task-|card id|TaskBoard' src/harness/store.rs src/harness/ids.rs src/graph/todos/types.rs README.md
echo '--- README mentions of persistence ---'
rg -n -A3 -B3 'serialized `TaskBoard`|thread|persist|store' README.mdRepository: tinyhumansai/tinyagents
Length of output: 2154
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# If the direct search was inconclusive, print compact context from the files.
for f in src/harness/ids.rs src/harness/store.rs src/graph/todos/types.rs; do
if [ -f "$f" ]; then
echo "--- $f (line count: $(wc -l < "$f")) ---"
nl -ba "$f" | sed -n '1,260p'
fi
doneRepository: tinyhumansai/tinyagents
Length of output: 252
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Narrow structural search for restart-related id generation and persistence boundaries.
rg -n 'next_seq|new_card_id|claim_card|TaskBoard|serialized `TaskBoard`|one serialized `TaskBoard` per thread|process restart|uuid' src README.mdRepository: tinyhumansai/tinyagents
Length of output: 7550
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Find the actual files and inspect only the relevant ranges.
printf 'FILES:\n'
git ls-files | rg '(^|/)ids\.rs$|(^|/)store\.rs$|(^|/)types\.rs$|(^|/)README\.md$'
echo
echo '--- src/harness/ids.rs ---'
if [ -f src/harness/ids.rs ]; then
nl -ba src/harness/ids.rs | sed -n '1,220p'
fi
echo
echo '--- src/harness/store.rs ---'
if [ -f src/harness/store.rs ]; then
nl -ba src/harness/store.rs | sed -n '1,260p'
fi
echo
echo '--- src/graph/todos/types.rs ---'
if [ -f src/graph/todos/types.rs ]; then
nl -ba src/graph/todos/types.rs | sed -n '1,220p'
fi
echo
echo '--- README.md snippets ---'
if [ -f README.md ]; then
rg -n -A3 -B3 'serialized `TaskBoard`|one serialized `TaskBoard` per thread|persist|thread' README.md
fiRepository: tinyhumansai/tinyagents
Length of output: 2205
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Inspect the id generator, persistence layer, and the board/card types.
for f in src/harness/ids.rs src/harness/store.rs src/graph/todos/types.rs README.md; do
if [ -f "$f" ]; then
echo "=== $f ($(wc -l < "$f") lines) ==="
nl -ba "$f" | sed -n '1,260p'
echo
fi
doneRepository: tinyhumansai/tinyagents
Length of output: 246
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo '--- src/harness/ids/mod.rs ---'
sed -n '1,180p' src/harness/ids/mod.rs
echo '--- src/graph/todos/types.rs (relevant range) ---'
sed -n '110,390p' src/graph/todos/types.rs
echo '--- src/graph/todos/store.rs (claim_card + persistence comments) ---'
sed -n '1,430p' src/graph/todos/store.rs
echo '--- src/graph/todos/README.md (id/persistence notes) ---'
sed -n '1,120p' src/graph/todos/README.mdRepository: tinyhumansai/tinyagents
Length of output: 31351
🏁 Script executed:
#!/bin/bash
set -euo pipefail
python3 - <<'PY'
from pathlib import Path
for path in [
"src/harness/ids/mod.rs",
"src/graph/todos/types.rs",
"src/graph/todos/store.rs",
"src/graph/todos/README.md",
]:
p = Path(path)
print(f"=== {path} exists={p.exists()} lines={sum(1 for _ in p.open()) if p.exists() else 'NA'} ===")
if p.exists():
lines = p.read_text().splitlines()
# print targeted windows only
if path.endswith("ids/mod.rs"):
rng = (1, 180)
elif path.endswith("types.rs"):
rng = (110, 390)
elif path.endswith("store.rs"):
rng = (1, 430)
else:
rng = (1, 120)
for i in range(rng[0]-1, min(rng[1], len(lines))):
print(f"{i+1:4d}: {lines[i]}")
print()
PYRepository: tinyhumansai/tinyagents
Length of output: 36502
Seed card ids from persisted state or switch to UUIDs. next_seq() is process-local and resets on restart, but graph.todos boards are persisted per thread. After a restart, TaskBoardCard::new and normalise_board can mint task-<n> ids that collide with existing cards, making id-based updates ambiguous and potentially hitting the wrong card.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/graph/todos/types.rs` around lines 121 - 143, The task card ID generation
is process-local and can collide with persisted cards after restart; update the
ID creation path used by TaskBoardCard::new and normalise_board so it is seeded
from the persisted board state or replaced with UUID-based IDs. Make sure any
helper that mints new card IDs continues to produce unique values across
reloads, and keep the existing call sites using the new ID source consistently.
| /// Normalises a board in place: trims fields, generates missing card ids, drops | ||
| /// empty-title cards, backfills a `Blocked` card's blocker from its notes, and | ||
| /// recomputes `order` / `updated_at`. | ||
| pub fn normalise_board(board: &mut TaskBoard) { | ||
| let now = now_stamp(); | ||
| board.thread_id = board.thread_id.trim().to_string(); | ||
| board.updated_at = now.clone(); | ||
|
|
||
| for card in board.cards.iter_mut() { | ||
| card.title = card.title.trim().to_string(); | ||
| if card.id.trim().is_empty() { | ||
| card.id = new_card_id(); | ||
| } else { | ||
| card.id = card.id.trim().to_string(); | ||
| } | ||
| card.notes = trim_opt(card.notes.take()); | ||
| card.objective = trim_opt(card.objective.take()); | ||
| card.assigned_agent = trim_opt(card.assigned_agent.take()); | ||
| trim_string_vec(&mut card.plan); | ||
| trim_string_vec(&mut card.allowed_tools); | ||
| trim_string_vec(&mut card.acceptance_criteria); | ||
| trim_string_vec(&mut card.evidence); | ||
| card.blocker = trim_opt(card.blocker.take()); | ||
| card.session_thread_id = trim_opt(card.session_thread_id.take()); | ||
| if card.status == TaskCardStatus::Blocked && card.blocker.is_none() { | ||
| card.blocker = card.notes.clone(); | ||
| } | ||
| } | ||
|
|
||
| board.cards.retain(|card| !card.title.is_empty()); | ||
|
|
||
| for (idx, card) in board.cards.iter_mut().enumerate() { | ||
| card.order = idx as u32; | ||
| card.updated_at = now.clone(); | ||
| } | ||
| } |
There was a problem hiding this comment.
🎯 Functional Correctness | 🟡 Minor | ⚡ Quick win
Stale blocker field not cleared when a card leaves Blocked.
Line 336-338 backfills card.blocker from notes whenever status is Blocked and blocker is empty, but there's no corresponding reset when the card transitions to a different status (e.g. InProgress, Done). Since blocker is serialized in TaskBoardCard (skipped only when None), a card that was once blocked and later unblocked will keep surfacing its old blocker reason in every TodosSnapshot/tool response, even though render_markdown no longer prints it for non-Blocked cards.
🐛 Proposed fix to clear stale blocker on unblock
- if card.status == TaskCardStatus::Blocked && card.blocker.is_none() {
- card.blocker = card.notes.clone();
- }
+ if card.status == TaskCardStatus::Blocked {
+ if card.blocker.is_none() {
+ card.blocker = card.notes.clone();
+ }
+ } else {
+ card.blocker = None;
+ }📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| /// Normalises a board in place: trims fields, generates missing card ids, drops | |
| /// empty-title cards, backfills a `Blocked` card's blocker from its notes, and | |
| /// recomputes `order` / `updated_at`. | |
| pub fn normalise_board(board: &mut TaskBoard) { | |
| let now = now_stamp(); | |
| board.thread_id = board.thread_id.trim().to_string(); | |
| board.updated_at = now.clone(); | |
| for card in board.cards.iter_mut() { | |
| card.title = card.title.trim().to_string(); | |
| if card.id.trim().is_empty() { | |
| card.id = new_card_id(); | |
| } else { | |
| card.id = card.id.trim().to_string(); | |
| } | |
| card.notes = trim_opt(card.notes.take()); | |
| card.objective = trim_opt(card.objective.take()); | |
| card.assigned_agent = trim_opt(card.assigned_agent.take()); | |
| trim_string_vec(&mut card.plan); | |
| trim_string_vec(&mut card.allowed_tools); | |
| trim_string_vec(&mut card.acceptance_criteria); | |
| trim_string_vec(&mut card.evidence); | |
| card.blocker = trim_opt(card.blocker.take()); | |
| card.session_thread_id = trim_opt(card.session_thread_id.take()); | |
| if card.status == TaskCardStatus::Blocked && card.blocker.is_none() { | |
| card.blocker = card.notes.clone(); | |
| } | |
| } | |
| board.cards.retain(|card| !card.title.is_empty()); | |
| for (idx, card) in board.cards.iter_mut().enumerate() { | |
| card.order = idx as u32; | |
| card.updated_at = now.clone(); | |
| } | |
| } | |
| /// Normalises a board in place: trims fields, generates missing card ids, drops | |
| /// empty-title cards, backfills a `Blocked` card's blocker from its notes, and | |
| /// recomputes `order` / `updated_at`. | |
| pub fn normalise_board(board: &mut TaskBoard) { | |
| let now = now_stamp(); | |
| board.thread_id = board.thread_id.trim().to_string(); | |
| board.updated_at = now.clone(); | |
| for card in board.cards.iter_mut() { | |
| card.title = card.title.trim().to_string(); | |
| if card.id.trim().is_empty() { | |
| card.id = new_card_id(); | |
| } else { | |
| card.id = card.id.trim().to_string(); | |
| } | |
| card.notes = trim_opt(card.notes.take()); | |
| card.objective = trim_opt(card.objective.take()); | |
| card.assigned_agent = trim_opt(card.assigned_agent.take()); | |
| trim_string_vec(&mut card.plan); | |
| trim_string_vec(&mut card.allowed_tools); | |
| trim_string_vec(&mut card.acceptance_criteria); | |
| trim_string_vec(&mut card.evidence); | |
| card.blocker = trim_opt(card.blocker.take()); | |
| card.session_thread_id = trim_opt(card.session_thread_id.take()); | |
| if card.status == TaskCardStatus::Blocked { | |
| if card.blocker.is_none() { | |
| card.blocker = card.notes.clone(); | |
| } | |
| } else { | |
| card.blocker = None; | |
| } | |
| } | |
| board.cards.retain(|card| !card.title.is_empty()); | |
| for (idx, card) in board.cards.iter_mut().enumerate() { | |
| card.order = idx as u32; | |
| card.updated_at = now.clone(); | |
| } | |
| } |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/graph/todos/types.rs` around lines 312 - 347, The stale blocker value is
only being populated in normalise_board when a card is Blocked, but it is never
cleared when the card is no longer Blocked. Update normalise_board so it
explicitly removes or resets card.blocker for any non-Blocked TaskCardStatus,
while preserving the existing backfill from notes for Blocked cards. Use the
TaskBoardCard fields and the status check in normalise_board to keep serialized
TodosSnapshot responses from carrying an old blocker reason after unblocking.
Summary
Exposes OpenHuman's two per-thread productivity primitives as reusable, provider-neutral tinyagents features under the graph module:
graph::goals— one durable objective per thread (a "completion contract") with a small lifecycle (Active/Paused/BudgetLimited/Complete), an optional token budget, model-facing tools, and a graph-native continuation loop.graph::todos— a per-thread kanban task board: an ordered list of task cards with a single-in_progressinvariant, plan-approval transitions, markdown rendering, and atodomultiplexer tool.Both are ported from OpenHuman (
thread_goals/ task board) but drop the app-specific coupling (event bus, RPC envelopes,AgentProgress,Config, heartbeat scheduler) and re-host persistence on the existingharness::store::Storetrait.Design highlights
harness::store::Store. One JSON value per thread undergraph.goals/graph.todos, keyed byhex(thread_id)(valid acrossInMemoryStore/FileStore/Sqlite). SinceStorehas no CAS, every mutation runsload → mutate → putunder a per-thread async mutex (atomic within one process; cross-process lost-update documented). OpenHuman'sgoal_idcompare-and-set stale-write guard and the single-in-progress invariant are preserved. No new deps —next_seq()ids, dependency-free millis timestamps.OrchestrationTool. Each capturesArc<dyn Store>and resolves the thread fromToolExecutionContext::thread_id(never a tool arg), so a model can't address another thread. Goals:goal_get/goal_set/goal_complete(asymmetric ownership; pause/resume/clear are host controls). Todos: a singletodomultiplexer.goal_gate_node(primary) — a command-routing node forming a self-driving bounded loop: after each work iteration it folds usage viaaccount_usageand routes back to the work node while the goal is Active and under budget, else toEND.recursion_limitis the hard backstop; a zero-progress iteration sets one-shot suppression and stops.run_continuation_tick— a faithful heartbeat port for callers with an external scheduler.note_user_turn— clears the one-shot suppression / reactivates a paused goal on a user run.Testing
InMemoryStore.tests/e2e_graph_goals.rs(self-driving goal loop until complete) andtests/e2e_graph_todos.rs(MockModel agent loop drives thetodotool; board persists to the thread).Commands run locally
cargo fmt --check✓cargo clippy --all-targets -- -D warnings✓cargo test✓ (745 lib tests + integration tests pass)cargo run --example basic_graph✓Docs
Module READMEs (
src/graph/{goals,todos}/README.md),docs/modules/graph/{goals,todos}.mdlinked from the graph index, and crate-root/graph-surface doc updates.https://claude.ai/code/session_01NRL2bim9Gz3UN3SRtpeXEX
Summary by CodeRabbit
New Features
Documentation
Bug Fixes