feat: AI-native plan/apply and an interactive assistant by nfebe · Pull Request #141 · flatrun/agent

nfebe · 2026-06-07T22:27:07Z

Makes the agent operable by AI and adds a preview-before-apply layer for humans and agents alike.

Plan/apply

Mutations (env, compose, delete, domains, proxy, config, service actions) accept ?plan=true to return a saved preview instead of executing: exact file diffs, affected containers/certs/databases, each with a reason.
GET/POST/DELETE /api/plans... to list, inspect, apply or discard. Apply re-checks permissions and protected mode, refuses stale (409) or expired (410) plans, and invalidates siblings.
Opt-in require_plan per deployment forces both humans and AI through the plan path.
Plans are flat JSON files under the global flatrun dir; service-level start/stop/restart/rebuild/pull added.

AI assistant

Provider-agnostic OpenAI-compatible layer (OpenAI, Ollama, vLLM, LM Studio), runtime-configurable, API key never echoed.
Interactive sessions with a tool loop: the model investigates with read-only tools (logs, files, metadata, networks, deployments, host info, in-container and host commands) instead of guessing.
Per-session auto-run vs approve-each; mutating fixes surface as suggested actions that run only through the existing guarded APIs.
Analyses grounded in this installation's live config and routing; secrets redacted before anything leaves the host.

All new packages unit-tested; full go test ./... and make build green.

Plans are previews of mutations, stored as inspectable JSON files under the global flatrun state directory, grouped by the resource they affect. Each plan records the requested change, a content hash of every file it read, the predicted changes with create/update/replace/delete verbs and a reason per change, and its lifecycle status. Status moves through available, applying, applied or failed, with obsolete marking drift and expired marking TTL passage. Sensitive diffs are masked in API-facing copies while full values stay in the 0600 file, matching how env files are already trusted on disk. A prune loop expires stale plans and removes old history on a configurable retention.

Deployment delete, domain add/update/delete, and config key updates now run through handler-independent functions, and the proxy layer can render a virtual host without writing or reloading. No behavior change. This lets the same execution paths be driven by both live requests and the upcoming plan apply step, and lets plans preview nginx output without side effects.

High-risk mutations (env vars, compose updates, deployment deletion, domains, proxy setup, config keys) can now be called with a plan flag to get a preview instead of execution: what files change with full before/after content, which containers get recreated and why, which certificates and databases are affected. The preview is saved as a plan that can be listed, inspected, applied or discarded. Applying re-checks the caller's permissions and protected mode, refuses plans whose underlying files changed since planning (including edits made outside the API), and refuses expired or already-applied plans. Applying one plan invalidates other open plans on the same resource. Sensitive diffs are masked in responses unless explicitly requested with write-level access. A plan is a prediction: apply can still fail, and the failure is recorded on the plan.

The agent can now talk to any OpenAI-compatible model server (OpenAI, Ollama, vLLM, LM Studio or a gateway), configured at runtime with no restart. The provider sits behind a small internal interface so other backends can be added without touching callers, keeping the platform model-agnostic. First feature: deployment diagnosis. An authenticated caller with read access can ask for an analysis of a deployment's recent logs or of a failed operation's output; the agent assembles the context, strips every known secret value and credential-shaped assignment before anything leaves the host, and returns a structured Markdown diagnosis with evidence and a suggested fix, plus a count of redactions made. Context is truncated newest-first to stay within small local model windows. The model API key is configurable through the config API but never echoed back; the same masking now also covers the shared database, Redis and PowerDNS credentials, which previously appeared in plain text in config reads.

…actions Plan review is no longer implied as a default workflow. Each deployment gets a require_plan switch, managed like protected mode with admin access: when on, env, compose, delete, domain, proxy and service mutations refuse to run directly and answer with a plan_required error, so both humans and AI agents are forced through the plan, review, apply path for that deployment. Plans themselves can always be created, and the switch takes effect immediately without restart. Individual compose services can now be started, stopped, restarted and rebuilt on their own instead of cycling the whole deployment. These actions are plannable like the rest, previewing which service gets recreated and why, and service rebuild honors protected mode the same way deployment rebuild does.

A single service's image can now be pulled fresh from its registry without touching the rest of the deployment. The action is plannable and respects the deployment's require_plan setting like the other service actions; running containers stay on their current image until the next deploy.

A diagnosis can now carry up to a few machine-readable remediation proposals: restart, rebuild or pull a specific service, or run a command inside a service container. The agent never executes them; it validates the shape, rejects unknown action kinds, and drops any suggestion naming a service that does not exist in the compose file, so a hallucinated target can never be acted on. Clients present the proposals for explicit approval and run approved ones through the normal APIs, where access control, protected mode and plan requirements apply unchanged.

One pipeline now serves every AI interaction: the caller picks an intent (diagnose, improve, secure or explain) and a list of context sources; the agent gathers each source, redacts secrets, prompts the configured model and returns the analysis with validated suggested actions. Adding a capability is one intent entry and adding a context kind is one source collector, with no new endpoints or flows. Deployment analysis can mix gathered sources (recent logs, the compose file) with output the caller already has, plus a free-text question. Host-level analysis accepts caller-provided output only and never returns runnable suggestions. Status reporting now lists the available intents so clients can render them dynamically.

Every analysis now carries a platform context section built from the live agent state: the configured proxy and database network names, the Docker networks that actually exist on the host, whether the shared database and managed reverse proxy are available, and for deployment scopes the domains, databases and virtual host of the deployment under analysis. The system prompt instructs the model to reconcile its recommendations against this section and prefer the platform's way over generic Docker advice, fixing answers that were technically correct but useless on FlatRun, such as recommending creating an arbitrary network where the platform's configured network was meant. The prompt also references the product documentation site, with the URL configurable per installation.

The diagnose intent now decides first whether the context shows a problem at all: normal startups, passing health checks and graceful shutdowns are to be reported as healthy operation, and restarts without errors attributed to operator action, instead of being woven into a speculative failure story. All intents are told that an unsupported finding is wrong and that normal or inconclusive is a valid answer. The platform context now states how each domain is actually routed: which service and container port the reverse proxy forwards to, as configured in deployment metadata, with an explicit note that the compose expose field plays no role in FlatRun routing. This removes the false signal that made an expose/listen port difference look like a root cause on a healthy deployment.

The assistant can now hold a session and look things up for itself instead of relying on pre-gathered context. The model is given read-only investigation tools: list the networks and deployments that actually exist, read a deployment's metadata and files (including app-generated logs and data), and run read-only commands inside service containers. It calls these in a loop, reasoning over real results, until it can answer. Each session chooses how tools run: auto-run, where the agent executes read-only lookups and reports as it goes, or approve-each, where every tool call pauses for the operator to allow or decline. Commands that would change state are refused outright, container exec still honors protected mode, deployment access is re-checked per tool call, and secrets are redacted from every result. Sessions are stored as flat JSON files and pruned by age.

The assistant now reads container logs through a dedicated tool, since FlatRun logs are captured stdout/stderr rather than files; this stops it from fruitlessly searching the deployment filesystem and running random container commands when asked to review logs. On a fresh analysis it also changes posture: when handed logs or an operation's output it summarizes them, reports any problems with likely solutions, and if deeper investigation could help it describes what it would check and asks first, rather than launching lookups unprompted. The log view hands the assistant the logs already on screen so a simple review needs no tool calls at all.

The assistant can now answer questions about the machine FlatRun runs on, not just its deployments. A get_instance_info tool returns the hostname and public IP, and a run_host_command tool runs free-form READ-ONLY commands on the host for anything the structured tools do not cover. Host commands are gated: the caller must have system access, the global system-terminal protected mode is honored (including its command rules and the disable switch), obviously state-changing commands are refused, and output is redacted before it reaches the model. This closes the gap where the assistant wrongly claimed it could not determine facts like the instance IP.

…ranscript When the assistant wants to investigate it now makes the tool call directly instead of describing the command in prose and waiting for a typed yes; in approve-each mode that call is what the operator allows or declines, so the approval lives in the interface, not the chat text. A session turn can also carry bulky context (logs, an operation's output) that the model needs but the operator should not have to scroll past: the model receives it, while the transcript shows only the short prompt. This keeps a log analysis from echoing the entire log back into the conversation.

sourceant · 2026-06-07T22:27:44Z

Code Review Summary

This PR introduces a significant enhancement to the FlatRun agent, enabling AI-native operations via a plan/apply workflow and an interactive assistant. The implementation is robust, well-tested, and security-conscious.

🚀 Key Improvements

Implemented a structured 'Plan' system to allow human/AI review before executing mutations.
Added a provider-agnostic AI layer with built-in secret redaction.
Introduced interactive AI sessions with a tool-loop for grounded system investigation.

💡 Minor Suggestions

Optimization of file hashing for large files.
Stricter regex boundaries for destructive command detection.

sourceant

Review complete. See the overview comment for a summary.

sourceant · 2026-06-07T22:27:44Z

+
+func NewRedactor(secrets []string) *Redactor {


The current implementation of the Redactor creates a new slice and map every time NewRedactor is called. Since secrets are sorted by length and filtered, this is efficient for lookups, but ensure that the list of secrets passed in does not grow unbounded to avoid performance degradation in the regex and replacement loops.

Suggested change

func NewRedactor(secrets []string) *Redactor {

func NewRedactor(secrets []string) *Redactor {

seen := make(map[string]struct{}, len(secrets))

var filtered []string

for _, s := range secrets {

s = strings.TrimSpace(s)

if len(s) < minSecretLength {

continue

}

if _, dup := seen[s]; dup {

continue

}

seen[s] = struct{}{}

filtered = append(filtered, s)

}

sort.Slice(filtered, func(i, j int) bool {

return len(filtered[i]) > len(filtered[j])

})

return &Redactor{secrets: filtered}

}

nfebe added 14 commits June 7, 2026 13:26

sourceant Bot reviewed Jun 7, 2026

View reviewed changes

nfebe merged commit 25b2aa2 into main Jun 7, 2026
5 checks passed

nfebe deleted the feat/ai-native branch June 7, 2026 22:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: AI-native plan/apply and an interactive assistant#141

feat: AI-native plan/apply and an interactive assistant#141
nfebe merged 14 commits into
mainfrom
feat/ai-native

nfebe commented Jun 7, 2026

Uh oh!

sourceant Bot commented Jun 7, 2026

Uh oh!

sourceant Bot left a comment

Uh oh!

sourceant Bot Jun 7, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

-func NewRedactor(secrets []string) *Redactor {
+func NewRedactor(secrets []string) *Redactor {
+	seen := make(map[string]struct{}, len(secrets))
+	var filtered []string
+	for _, s := range secrets {
+		s = strings.TrimSpace(s)
+		if len(s) < minSecretLength {
+			continue
+		}
+		if _, dup := seen[s]; dup {
+			continue
+		}
+		seen[s] = struct{}{}
+		filtered = append(filtered, s)
+	}
+	sort.Slice(filtered, func(i, j int) bool {
+		return len(filtered[i]) > len(filtered[j])
+	})
+	return &Redactor{secrets: filtered}
+}

Conversation

nfebe commented Jun 7, 2026

Plan/apply

AI assistant

Uh oh!

sourceant Bot commented Jun 7, 2026

Code Review Summary

🚀 Key Improvements

💡 Minor Suggestions

Uh oh!

sourceant Bot left a comment

Choose a reason for hiding this comment

Uh oh!

sourceant Bot Jun 7, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant