Skip to content

[enhancement] Distinguish unrecoverable host/LLM client errors (4xx) from Gene-repair failures in the failure-loop detector #571

@autogame-17

Description

@autogame-17

Summary

When the host agent layer (the IDE / agent that runs evolver) gets an unrecoverable client error from its LLM provider — HTTP 400 / 401 / 403, e.g. a malformed-request rejection like field MaxTokens invalid, should be in [1, 65536], an auth failure, or a hard quota denial — every evolution attempt fails for a reason that has nothing to do with the Gene being run. Today evolver counts each of these toward the consecutive-failure streak, so after ~5 failures it trips failure_loop_detected and bans an innocent Gene (ban_gene:<gene>), then force_innovation_after_repair_loop.

This was surfaced by the auto-report in #534, whose root cause was a host-side LLM 400 on every call — evolver faithfully recorded 8 straight failures and banned gene_gep_repair_from_errors for an error it had no part in.

Current behaviour (no error-class distinction)

  • src/gep/signals.js increments consecutiveFailureCount for any outcome.status === 'failed', with no inspection of the failure cause.
  • At streak ≥ 5 it emits failure_loop_detected + ban_gene:<topGene> (and force_innovation_after_repair_loop).
  • The host's error text is already in hand at the collector (src/evolve/pipeline/collect.js, where a host errorMessage is captured and prefixed [LLM ERROR] …), but it never feeds the streak/ban logic.

The only 4xx classification that exists today lives in unrelated subsystems (hub heartbeat circuit breaker, ATP/stake terminal-status handling) and never touches the evolution-outcome path.

Proposed improvement

Classify an evolution outcome whose underlying failure is an unrecoverable host/LLM client error (4xx — request-invalid / auth / quota) as a distinct, non-Gene-attributable failure that:

  1. Aborts the current intent early instead of retrying into the streak, and
  2. Is excluded from consecutiveFailureCount / per-Gene failure frequency, so it cannot trigger ban_gene / failure_loop_detected.
  3. Surfaces a clear, actionable signal to the operator (e.g. host_llm_client_error) pointing at the host LLM config rather than at evolver/the Gene.

This keeps the failure-loop detector focused on genuine repair/optimize failures and stops a misconfigured host model from poisoning Gene reputation.

Notes

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions