Skip to content

feat: AI-native plan/apply and an interactive assistant#141

Merged
nfebe merged 14 commits into
mainfrom
feat/ai-native
Jun 7, 2026
Merged

feat: AI-native plan/apply and an interactive assistant#141
nfebe merged 14 commits into
mainfrom
feat/ai-native

Conversation

@nfebe

@nfebe nfebe commented Jun 7, 2026

Copy link
Copy Markdown
Contributor

Makes the agent operable by AI and adds a preview-before-apply layer for humans and agents alike.

Plan/apply

  • Mutations (env, compose, delete, domains, proxy, config, service actions) accept ?plan=true to return a saved preview instead of executing: exact file diffs, affected containers/certs/databases, each with a reason.
  • GET/POST/DELETE /api/plans... to list, inspect, apply or discard. Apply re-checks permissions and protected mode, refuses stale (409) or expired (410) plans, and invalidates siblings.
  • Opt-in require_plan per deployment forces both humans and AI through the plan path.
  • Plans are flat JSON files under the global flatrun dir; service-level start/stop/restart/rebuild/pull added.

AI assistant

  • Provider-agnostic OpenAI-compatible layer (OpenAI, Ollama, vLLM, LM Studio), runtime-configurable, API key never echoed.
  • Interactive sessions with a tool loop: the model investigates with read-only tools (logs, files, metadata, networks, deployments, host info, in-container and host commands) instead of guessing.
  • Per-session auto-run vs approve-each; mutating fixes surface as suggested actions that run only through the existing guarded APIs.
  • Analyses grounded in this installation's live config and routing; secrets redacted before anything leaves the host.

All new packages unit-tested; full go test ./... and make build green.

nfebe added 14 commits June 7, 2026 13:26
Plans are previews of mutations, stored as inspectable JSON files under
the global flatrun state directory, grouped by the resource they affect.
Each plan records the requested change, a content hash of every file it
read, the predicted changes with create/update/replace/delete verbs and
a reason per change, and its lifecycle status. Status moves through
available, applying, applied or failed, with obsolete marking drift and
expired marking TTL passage. Sensitive diffs are masked in API-facing
copies while full values stay in the 0600 file, matching how env files
are already trusted on disk. A prune loop expires stale plans and
removes old history on a configurable retention.
Deployment delete, domain add/update/delete, and config key updates now
run through handler-independent functions, and the proxy layer can
render a virtual host without writing or reloading. No behavior change.
This lets the same execution paths be driven by both live requests and
the upcoming plan apply step, and lets plans preview nginx output
without side effects.
High-risk mutations (env vars, compose updates, deployment deletion,
domains, proxy setup, config keys) can now be called with a plan flag
to get a preview instead of execution: what files change with full
before/after content, which containers get recreated and why, which
certificates and databases are affected. The preview is saved as a plan
that can be listed, inspected, applied or discarded.

Applying re-checks the caller's permissions and protected mode, refuses
plans whose underlying files changed since planning (including edits
made outside the API), and refuses expired or already-applied plans.
Applying one plan invalidates other open plans on the same resource.
Sensitive diffs are masked in responses unless explicitly requested
with write-level access. A plan is a prediction: apply can still fail,
and the failure is recorded on the plan.
The agent can now talk to any OpenAI-compatible model server (OpenAI,
Ollama, vLLM, LM Studio or a gateway), configured at runtime with no
restart. The provider sits behind a small internal interface so other
backends can be added without touching callers, keeping the platform
model-agnostic.

First feature: deployment diagnosis. An authenticated caller with read
access can ask for an analysis of a deployment's recent logs or of a
failed operation's output; the agent assembles the context, strips
every known secret value and credential-shaped assignment before
anything leaves the host, and returns a structured Markdown diagnosis
with evidence and a suggested fix, plus a count of redactions made.
Context is truncated newest-first to stay within small local model
windows.

The model API key is configurable through the config API but never
echoed back; the same masking now also covers the shared database,
Redis and PowerDNS credentials, which previously appeared in plain
text in config reads.
…actions

Plan review is no longer implied as a default workflow. Each deployment
gets a require_plan switch, managed like protected mode with admin
access: when on, env, compose, delete, domain, proxy and service
mutations refuse to run directly and answer with a plan_required error,
so both humans and AI agents are forced through the plan, review, apply
path for that deployment. Plans themselves can always be created, and
the switch takes effect immediately without restart.

Individual compose services can now be started, stopped, restarted and
rebuilt on their own instead of cycling the whole deployment. These
actions are plannable like the rest, previewing which service gets
recreated and why, and service rebuild honors protected mode the same
way deployment rebuild does.
A single service's image can now be pulled fresh from its registry
without touching the rest of the deployment. The action is plannable
and respects the deployment's require_plan setting like the other
service actions; running containers stay on their current image until
the next deploy.
A diagnosis can now carry up to a few machine-readable remediation
proposals: restart, rebuild or pull a specific service, or run a
command inside a service container. The agent never executes them; it
validates the shape, rejects unknown action kinds, and drops any
suggestion naming a service that does not exist in the compose file,
so a hallucinated target can never be acted on. Clients present the
proposals for explicit approval and run approved ones through the
normal APIs, where access control, protected mode and plan
requirements apply unchanged.
One pipeline now serves every AI interaction: the caller picks an
intent (diagnose, improve, secure or explain) and a list of context
sources; the agent gathers each source, redacts secrets, prompts the
configured model and returns the analysis with validated suggested
actions. Adding a capability is one intent entry and adding a context
kind is one source collector, with no new endpoints or flows.

Deployment analysis can mix gathered sources (recent logs, the compose
file) with output the caller already has, plus a free-text question.
Host-level analysis accepts caller-provided output only and never
returns runnable suggestions. Status reporting now lists the available
intents so clients can render them dynamically.
Every analysis now carries a platform context section built from the
live agent state: the configured proxy and database network names, the
Docker networks that actually exist on the host, whether the shared
database and managed reverse proxy are available, and for deployment
scopes the domains, databases and virtual host of the deployment under
analysis. The system prompt instructs the model to reconcile its
recommendations against this section and prefer the platform's way
over generic Docker advice, fixing answers that were technically
correct but useless on FlatRun, such as recommending creating an
arbitrary network where the platform's configured network was meant.

The prompt also references the product documentation site, with the
URL configurable per installation.
The diagnose intent now decides first whether the context shows a
problem at all: normal startups, passing health checks and graceful
shutdowns are to be reported as healthy operation, and restarts
without errors attributed to operator action, instead of being woven
into a speculative failure story. All intents are told that an
unsupported finding is wrong and that normal or inconclusive is a
valid answer.

The platform context now states how each domain is actually routed:
which service and container port the reverse proxy forwards to, as
configured in deployment metadata, with an explicit note that the
compose expose field plays no role in FlatRun routing. This removes
the false signal that made an expose/listen port difference look like
a root cause on a healthy deployment.
The assistant can now hold a session and look things up for itself
instead of relying on pre-gathered context. The model is given
read-only investigation tools: list the networks and deployments that
actually exist, read a deployment's metadata and files (including
app-generated logs and data), and run read-only commands inside
service containers. It calls these in a loop, reasoning over real
results, until it can answer.

Each session chooses how tools run: auto-run, where the agent executes
read-only lookups and reports as it goes, or approve-each, where every
tool call pauses for the operator to allow or decline. Commands that
would change state are refused outright, container exec still honors
protected mode, deployment access is re-checked per tool call, and
secrets are redacted from every result. Sessions are stored as flat
JSON files and pruned by age.
The assistant now reads container logs through a dedicated tool, since
FlatRun logs are captured stdout/stderr rather than files; this stops
it from fruitlessly searching the deployment filesystem and running
random container commands when asked to review logs.

On a fresh analysis it also changes posture: when handed logs or an
operation's output it summarizes them, reports any problems with
likely solutions, and if deeper investigation could help it describes
what it would check and asks first, rather than launching lookups
unprompted. The log view hands the assistant the logs already on
screen so a simple review needs no tool calls at all.
The assistant can now answer questions about the machine FlatRun runs
on, not just its deployments. A get_instance_info tool returns the
hostname and public IP, and a run_host_command tool runs free-form
READ-ONLY commands on the host for anything the structured tools do
not cover.

Host commands are gated: the caller must have system access, the
global system-terminal protected mode is honored (including its
command rules and the disable switch), obviously state-changing
commands are refused, and output is redacted before it reaches the
model. This closes the gap where the assistant wrongly claimed it
could not determine facts like the instance IP.
…ranscript

When the assistant wants to investigate it now makes the tool call
directly instead of describing the command in prose and waiting for a
typed yes; in approve-each mode that call is what the operator allows
or declines, so the approval lives in the interface, not the chat text.

A session turn can also carry bulky context (logs, an operation's
output) that the model needs but the operator should not have to
scroll past: the model receives it, while the transcript shows only
the short prompt. This keeps a log analysis from echoing the entire
log back into the conversation.
@sourceant

sourceant Bot commented Jun 7, 2026

Copy link
Copy Markdown

Code Review Summary

This PR introduces a significant enhancement to the FlatRun agent, enabling AI-native operations via a plan/apply workflow and an interactive assistant. The implementation is robust, well-tested, and security-conscious.

🚀 Key Improvements

  • Implemented a structured 'Plan' system to allow human/AI review before executing mutations.
  • Added a provider-agnostic AI layer with built-in secret redaction.
  • Introduced interactive AI sessions with a tool-loop for grounded system investigation.

💡 Minor Suggestions

  • Optimization of file hashing for large files.
  • Stricter regex boundaries for destructive command detection.

@sourceant sourceant Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review complete. See the overview comment for a summary.

Comment thread internal/ai/redact.go
Comment on lines +22 to +23

func NewRedactor(secrets []string) *Redactor {

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current implementation of the Redactor creates a new slice and map every time NewRedactor is called. Since secrets are sorted by length and filtered, this is efficient for lookups, but ensure that the list of secrets passed in does not grow unbounded to avoid performance degradation in the regex and replacement loops.

Suggested change
func NewRedactor(secrets []string) *Redactor {
func NewRedactor(secrets []string) *Redactor {
seen := make(map[string]struct{}, len(secrets))
var filtered []string
for _, s := range secrets {
s = strings.TrimSpace(s)
if len(s) < minSecretLength {
continue
}
if _, dup := seen[s]; dup {
continue
}
seen[s] = struct{}{}
filtered = append(filtered, s)
}
sort.Slice(filtered, func(i, j int) bool {
return len(filtered[i]) > len(filtered[j])
})
return &Redactor{secrets: filtered}
}

@nfebe nfebe merged commit 25b2aa2 into main Jun 7, 2026
5 checks passed
@nfebe nfebe deleted the feat/ai-native branch June 7, 2026 22:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant