Fix/18 auto continue on usage reset #32
Conversation
| persistErr := s.persistTicketFailure(repoID, repoRoot, ticket, repoRt, job, err) | ||
| if persistErr != nil { | ||
| return fmt.Errorf("%w (also failed to persist ticket failure: %w)", err, persistErr) | ||
| if errors.Is(err, providers.ErrTokensExhausted) { |
There was a problem hiding this comment.
this is already handled inside of the StartFlow -> runState or MoveToState -> transitionTo -> runState so the ticket is updated twice
There was a problem hiding this comment.
Sorry for the late reply — I was busy last week.
If you mean the persistTicketScheduled that I used a couple of lines above this block — you're right, I found it wasn't correct and I removed it. But if you're pointing to persistTicketFailure, that one already existed. By the way, I double-checked it and I don't think they're redundant — they operate at two different layers with different responsibilities.
One more thing: I think this block should check if !errors.Is(err, providers.ErrTokensExhausted), because a quota error isn't exactly an error state. In that case it should return to workerLoop so the requeueing is handled by the top-level caller.
It might even be a good idea to handle quota as a non-error across the whole process.
There was a problem hiding this comment.
It might even be a good idea to handle quota as a non-error across the whole process.
So you mean it is more of a state for the provider interface that needs to be checked in the workerLoop and can be handled there instead of bubbling up as an error?
There was a problem hiding this comment.
Exactly, your point about the Provider is perfect. quota reach is just the window has already closed, and jobs should wait till opens again.
Oncourse, you know how do handle it, but let me a little more work on this idea tomorrow. I'll be back soon
There was a problem hiding this comment.
1.As mentioned following quota status is the provider property, so it's nice to handle it by provider too
2.Current approach still work in single provider running.
3.Current quota reached is shared among all provider, If the infrastructure allow to run multi provider, so there no way just it should handle in provider level.
4.One needed also to mentioned is, Flow go through workerLoop->executeJob->StartFlow->runState->o.Provider.Execute. the status of the job or ticket or state could be change in one of following step, but the only data is return is Error, and i think it cant to carry all status by just an error. (suppose an http request that have vary range of response). so it would not be scalable for future demand too.
So i think i would be nice to return an struct of ProviderResult to allow flexible-handlng in all levels
struct ProviderResult{
Stderr error
QuotaReached bool
RawOutput string
....
}
I saw there is 'ExecuteResult' and we could use it or having something new like
struct SampleResult{
ExecuteResult <- QuotaReached defined in ExecuteResult
....
}
There was a problem hiding this comment.
Sounds like it could make it all much more flexible, lets do it
Neokil
left a comment
There was a problem hiding this comment.
one comment on duplicate state saves and also please execute the golangi-lint run and make sure the issues (like empty lines) are resolved
Summary
quota_monitor.go: background daemon tracksquotaReachedflag and probes the provider every 15 minutes to detect when quota has resetwaitIfQuotaReached()when quota is hit and resume automaticallyErrTokensExhaustedinstead of marking them failedProbeProvider()to Orchestrator for lightweight LLM pingFlowStatusRescheduledstatus for tickets waiting on quota resetrescheduledto OpenAPI schema for future WebUI supportTesting
go test ./...bash -c 'echo "usage limit reached" >&2; exit 1'and run a ticketrescheduledand job re-queues automatically~/.auto-pr/config.yamlNotes
Known Limitations
when a ticket is waiting for quota reset, all workers are paused including system jobs (cleanup, cleanup-all). this will be fixed in a follow-up issue by splitting the job queue into llmJobs and systemJobs channels with dedicated worker pools.