Add opt-in RBAC management UI via rbacManagement flag#4865
Add opt-in RBAC management UI via rbacManagement flag#4865dimitri-nicolo wants to merge 10 commits into
Conversation
504568c to
8239694
Compare
| // that need a tenant-scoped Manager (the manager controller itself) must use | ||
| // GetManager from pkg/controller/manager. Non-manager controllers reading | ||
| // zero-tenant-only flags (e.g. spec.rbacManagement) belong here. | ||
| func GetZeroTenantManagerOrNil(ctx context.Context, c client.Client) (*operatorv1.Manager, error) { |
There was a problem hiding this comment.
Can't we move the GetManager func from manager_controller to the utils package and re-use it? There is no reason that apiserver should throw an error if manager is not present. We should however only fetch it when enterprise CRDs exist and we are not running in cloud.
There was a problem hiding this comment.
I see that you moved the func to here, but I also meant to delete this func entirely as well.
There was a problem hiding this comment.
I think the code is more clear if we delete this method, remove any notion of "zero tenant" from the code including comments (it's not yargon that is used today) and use the multitenant booleans explicitly at the caller. I think that will be more understandable to the average reader.
| // +optional | ||
| ManagerDeployment *ManagerDeployment `json:"managerDeployment,omitempty"` | ||
|
|
||
| // RBAC configures the RBAC management UI feature. Only honored in |
There was a problem hiding this comment.
I don't think we should leak cloud (internal) implications into the API comments. Our users don't need to be aware/think about those.
| // Disabled. | ||
| // +optional | ||
| // +kubebuilder:validation:Enum=Enabled;Disabled | ||
| Mode RBACMode `json:"mode,omitempty"` |
There was a problem hiding this comment.
What do you think of the following field names?
- Management: Enabled|Disabled
- UI: Enabled|Disabled
or values: Visible | Hidden
b9493a6 to
94159c6
Compare
|
|
||
| // RBAC management UI: when enabled on the Manager CR, the rbacsync | ||
| // controller runs inside calico-kube-controllers to reconcile the | ||
| // catalog of managed ClusterRoles (per-tier + 32 fine-grained + 6 |
There was a problem hiding this comment.
nit: This number is not likely to hold up over time, best to keep the comment succinct.
|
General comment: the AI generated comments in this code are a bit excessive. An extreme example may be the permissions. The func explains in comments twice which permissions are added + the code to add them. They are making the code less readable. And now we need to also keep the comments up to date when we add more. The operator code is not super complicated, I'd much rather have comments be only used if it adds critical information you couldn't get from simply reading the code (or have your ai read it for you). |
| }, | ||
| // Escalation coverage for webhook-mod resource roles (create/patch on | ||
| // Secrets to wire up webhook auth). | ||
| rbacv1.PolicyRule{ |
There was a problem hiding this comment.
Could we be more specific with secrets and configmaps and bind them to the namespaces we actually need?
There was a problem hiding this comment.
The named-resource access moved to a new Role + RoleBinding in calico-system, where tigera-idp-groups and tigera-idp-ldap-config actually live. The broad create is bound to that one namespace now.
Dropped the tigera-known-oidc-users rule (it lives in tigera-elasticsearch and is already granted by logstorage.go), and tightened the activation gate to !Tenant.MultiTenant() since the IdP resources are pinned to calico-system and the RBAC UI is zero-tenant-only anyway.
Is that what you meant by "bind them to the namespaces we actually need" — or were you pointing at something narrower?
8833288 to
2b749da
Compare
| // management UI's directory cache and the cascading cleanup of managed | ||
| // bindings when a group is removed). | ||
| { | ||
| APIGroups: []string{""}, |
There was a problem hiding this comment.
This one should be in a role.
2b749da to
6af6ea3
Compare
| return reconcile.Result{}, err | ||
| } | ||
|
|
||
| if instance.Spec.Variant.IsEnterprise() { |
There was a problem hiding this comment.
Wondering if you found the variant check to be necessary here?
| rbacv1.PolicyRule{ | ||
| APIGroups: []string{"rbac.authorization.k8s.io"}, | ||
| Resources: []string{"clusterroles", "clusterrolebindings"}, | ||
| Verbs: []string{"escalate", "bind"}, |
There was a problem hiding this comment.
Is it possible to avoid giving escalate and bind privileges? It essentially means that this controller can become a superuser if it chooses to be (i.e. is compromised)
| rules = append(rules, rbacv1.PolicyRule{ | ||
| APIGroups: []string{"rbac.authorization.k8s.io"}, | ||
| Resources: []string{"clusterroles", "roles", "clusterrolebindings", "rolebindings"}, | ||
| Verbs: []string{"get", "list", "watch", "create", "update", "patch", "delete", "bind", "escalate"}, |
There was a problem hiding this comment.
Similar to previous comment - can we avoid bind and escalate? This one seems a bit more serious since it is given to users
| rbacv1.PolicyRule{ | ||
| APIGroups: []string{""}, | ||
| Resources: []string{"configmaps"}, | ||
| ResourceNames: []string{"tigera-known-oidc-users"}, |
There was a problem hiding this comment.
Does this controller access this config map? I thought only the ui-apis endpoint would consult this config map when listing users
| // UI adds to calico-manager-role. Named-resource access is scoped separately | ||
| // on rbacManagementUINamespacedRole. | ||
| func rbacManagementUIRules() []rbacv1.PolicyRule { | ||
| rules := RBACManagementEscalationRules() |
There was a problem hiding this comment.
Does the manager use its own SA when creating RBAC via the UI? Or is it impersonating the user?
| }) | ||
| } | ||
|
|
||
| if c.cfg.Manager.RBACManagementEnabled() && c.cfg.Authentication != nil && c.cfg.Authentication.Spec.OIDC != nil { |
There was a problem hiding this comment.
Why do we gate on OIDC being enabled? I'm wondering what that has do to with the user deciding they want to do an LDAP sync for RBAC UI.
To me I would have expected that this block conditionally renders depending on whether the LDAP secret is placed by the user, and that we could parse the LDAP URL to scope the network policy (we do a similar thing for guardian policy)
…ileges (PR tigera#4865 review) Addresses Pasan's review comment on the tigera-network-admin escalate/bind grant. The RBAC management UI must never let an admin grant more than they already hold, so this grants NO escalate and NO bind. ui-apis performs the /team/* RBAC writes by impersonating the calling user, so the kube-apiserver runs its privilege-escalation check against the admin's own permissions. Without escalate/bind that means: an admin can assign a calico-ui- role to a group only if they personally hold every rule in it, and can edit a role only if they already hold its rules — otherwise the apiserver returns 403. The UI is a convenience layer over the user's own privileges, not an amplifier; deciding which admins may delegate which capabilities is then a matter of what those admins are granted, under the operator's control. The grant is also narrowed to what the handlers actually do (verified against the /team/* implementation): roles are read-only (the UI lists/inspects the catalog, never authors roles); clusterrolebindings get create/update/delete for group and member writes; rolebindings are never created (only cluster-scoped groups are), so they get update/delete only. Gated behind the RBAC-UI feature flag. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…issions instead (PR tigera#4865 review) Addresses Pasan's review comment that the rbacsync controller can become a superuser if compromised, via cluster-wide escalate/bind on RBAC objects. The verbs aren't actually needed: rbacSyncControllerRules already grants the SA the full permission set the calico-ui- roles contain (RBACManagementEscalationRules), so it satisfies Kubernetes' privilege-escalation check by *holding* the rules it writes, rather than by escalate/bind. Removing the verbs means a compromised SA can only assign permissions it already has — it cannot mint an arbitrary cluster-admin role. This relies on RBACManagementEscalationRules staying a superset of the managed roles' contents (which live in calico/kube-controllers). That drift risk is accepted here and is better caught by E2Es than guarded by a standing escalate grant. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ing on variant (PR tigera#4865 review) Addresses Pasan's review comment asking whether the variant check before the GetManager read is necessary. It was — but it was the wrong tool. The Manager type is registered in the scheme unconditionally, so on a Calico (OSS) cluster the cached client returns a NoMatchError (not NotFound) when the Manager CRD is absent; the previous error check only tolerated IsNotFound, so without the variant guard the installation controller would degrade on OSS. Drop the redundant variant check (the block already sits under r.enterpriseCRDsExist) and instead tolerate meta.IsNoMatchError alongside IsNotFound, mirroring how this same function handles the calico-system Tier read a few lines below. This is more robust than the variant guard: it also covers the case where enterpriseCRDsExist is true but the Manager CRD has not yet registered. managerCR stays nil on those paths and RBACManagementEnabled() is nil-safe, so behavior is unchanged on OSS. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… SA (PR tigera#4865 review) Addresses Pasan's review comment asking whether the kube-controllers (rbacsync) controller actually accesses the tigera-known-oidc-users ConfigMap — he believed only the ui-apis endpoint consults it. Verified against the RBAC management UI feature branches (calico-private kube-controllers PR #12077 and ui-apis PR #12088): the rbacsync controller never reads this ConfigMap (no direct CoreV1().ConfigMaps() access and no rbaccache path to it; the only ConfigMap it reads is tigera-idp-groups, covered by its own namespaced Role). The actual consumers are in ui-apis — the user-handler middleware writes it and idp/reader.go reads it to surface idp-group members at /team/groups time — using ui-apis's own identity in the tigera-elasticsearch namespace, not the kube-controllers SA. The legitimate kube-controllers/authorization access is already granted separately by logstorage.go. So this rule was dead RBAC on the kube-controllers SA, also gated on the unrelated RBACManagementEnabled flag. Remove it. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…not OIDC (PR tigera#4865 review) Addresses Pasan's review comment: the manager's LDAP/AD egress (389/636) was gated on Authentication.Spec.OIDC != nil, which is the wrong signal. Verified against the RBAC management UI feature branches (ui-apis PR #12088): the LDAP dial is performed by ui-apis's rbacmanagement/idp client for the /team/idp-groups directory sync, gated solely on the presence of the tigera-idp-ldap-config Secret (in calico-system) — it does not consult the Authentication CR's OIDC connector at all (that path never dials LDAP from this pod). So the old gate opened the port for pure-OIDC setups that never use it and, worse, withheld it for LDAP-sync setups that need it, leaving the feature broken. Gate the egress on the secret instead: - manager controller discovers the tigera-idp-ldap-config Secret in calico-system and watches it (so the policy re-renders on add/remove), passing presence into the render config as RBACManagementLDAPConfigured. - the egress now renders iff RBACManagementEnabled && RBACManagementLDAPConfigured. Added focused tests including a regression guard that OIDC-configured + no LDAP secret produces no egress. Scoping the destination to the configured LDAP host remains tracked in EV-6665 (it needs the same secret read). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Add Manager.spec.rbac.ui (Enabled/Disabled enum, defaults to Disabled), the RBAC/RBACUI types, and the nil-safe Manager.RBACManagementEnabled() helper. Regenerate deepcopy methods and the Manager CRD manifest. This is the opt-in toggle for the RBAC management UI feature; the render and controller wiring that consume it follow in subsequent commits. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Add rbac_management.go with the escalation-prevention rule set shared by the calico-manager and calico-kube-controllers roles, and gate the feature's RBAC on Manager.spec.rbac.ui == Enabled: - manager: extend calico-manager-role with the cluster-scoped UI rules and add a calico-system Role+RoleBinding scoping Secret/ConfigMap access to the IdP resources; both gated on !MultiTenant() (the feature is zero-tenant-only). Set RBAC_UI_ENABLED on ui-apis, and allow LDAP egress when OIDC authentication is configured. - apiserver: extend tigera-network-admin with the manage-roles rule. - kube-controllers: enable the rbacsync controller, grant its rules, and add a calico-system Role+RoleBinding reading tigera-idp-groups. The Manager and Authentication CRs are surfaced through the render configs here; the controllers that populate them follow. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Move GetManager from the manager controller into pkg/controller/utils so the apiserver and installation controllers can read the Manager CR without importing the manager controller package. Behaviour is unchanged: it keys on tigera-secure, namespacing the lookup only in multi-tenant mode, and returns the raw client error so callers decide whether absence is fatal. Update the manager controller and its tests to the new location. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Read the Manager CR in the installation and apiserver controllers and pass RBACManagementEnabled into their render configs, and watch the Manager CR so toggling spec.rbac re-runs each reconcile. The installation controller only reads it for the enterprise variant; absence is treated as RBAC management disabled. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ileges (PR tigera#4865 review) Addresses Pasan's review comment on the tigera-network-admin escalate/bind grant. The RBAC management UI must never let an admin grant more than they already hold, so this grants NO escalate and NO bind. ui-apis performs the /team/* RBAC writes by impersonating the calling user, so the kube-apiserver runs its privilege-escalation check against the admin's own permissions. Without escalate/bind that means: an admin can assign a calico-ui- role to a group only if they personally hold every rule in it, and can edit a role only if they already hold its rules — otherwise the apiserver returns 403. The UI is a convenience layer over the user's own privileges, not an amplifier; deciding which admins may delegate which capabilities is then a matter of what those admins are granted, under the operator's control. The grant is also narrowed to what the handlers actually do (verified against the /team/* implementation): roles are read-only (the UI lists/inspects the catalog, never authors roles); clusterrolebindings get create/update/delete for group and member writes; rolebindings are never created (only cluster-scoped groups are), so they get update/delete only. Gated behind the RBAC-UI feature flag. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…issions instead (PR tigera#4865 review) Addresses Pasan's review comment that the rbacsync controller can become a superuser if compromised, via cluster-wide escalate/bind on RBAC objects. The verbs aren't actually needed: rbacSyncControllerRules already grants the SA the full permission set the calico-ui- roles contain (RBACManagementEscalationRules), so it satisfies Kubernetes' privilege-escalation check by *holding* the rules it writes, rather than by escalate/bind. Removing the verbs means a compromised SA can only assign permissions it already has — it cannot mint an arbitrary cluster-admin role. This relies on RBACManagementEscalationRules staying a superset of the managed roles' contents (which live in calico/kube-controllers). That drift risk is accepted here and is better caught by E2Es than guarded by a standing escalate grant. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ing on variant (PR tigera#4865 review) Addresses Pasan's review comment asking whether the variant check before the GetManager read is necessary. It was — but it was the wrong tool. The Manager type is registered in the scheme unconditionally, so on a Calico (OSS) cluster the cached client returns a NoMatchError (not NotFound) when the Manager CRD is absent; the previous error check only tolerated IsNotFound, so without the variant guard the installation controller would degrade on OSS. Drop the redundant variant check (the block already sits under r.enterpriseCRDsExist) and instead tolerate meta.IsNoMatchError alongside IsNotFound, mirroring how this same function handles the calico-system Tier read a few lines below. This is more robust than the variant guard: it also covers the case where enterpriseCRDsExist is true but the Manager CRD has not yet registered. managerCR stays nil on those paths and RBACManagementEnabled() is nil-safe, so behavior is unchanged on OSS. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… SA (PR tigera#4865 review) Addresses Pasan's review comment asking whether the kube-controllers (rbacsync) controller actually accesses the tigera-known-oidc-users ConfigMap — he believed only the ui-apis endpoint consults it. Verified against the RBAC management UI feature branches (calico-private kube-controllers PR #12077 and ui-apis PR #12088): the rbacsync controller never reads this ConfigMap (no direct CoreV1().ConfigMaps() access and no rbaccache path to it; the only ConfigMap it reads is tigera-idp-groups, covered by its own namespaced Role). The actual consumers are in ui-apis — the user-handler middleware writes it and idp/reader.go reads it to surface idp-group members at /team/groups time — using ui-apis's own identity in the tigera-elasticsearch namespace, not the kube-controllers SA. The legitimate kube-controllers/authorization access is already granted separately by logstorage.go. So this rule was dead RBAC on the kube-controllers SA, also gated on the unrelated RBACManagementEnabled flag. Remove it. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…not OIDC (PR tigera#4865 review) Addresses Pasan's review comment: the manager's LDAP/AD egress (389/636) was gated on Authentication.Spec.OIDC != nil, which is the wrong signal. Verified against the RBAC management UI feature branches (ui-apis PR #12088): the LDAP dial is performed by ui-apis's rbacmanagement/idp client for the /team/idp-groups directory sync, gated solely on the presence of the tigera-idp-ldap-config Secret (in calico-system) — it does not consult the Authentication CR's OIDC connector at all (that path never dials LDAP from this pod). So the old gate opened the port for pure-OIDC setups that never use it and, worse, withheld it for LDAP-sync setups that need it, leaving the feature broken. Gate the egress on the secret instead: - manager controller discovers the tigera-idp-ldap-config Secret in calico-system and watches it (so the policy re-renders on add/remove), passing presence into the render config as RBACManagementLDAPConfigured. - the egress now renders iff RBACManagementEnabled && RBACManagementLDAPConfigured. Added focused tests including a regression guard that OIDC-configured + no LDAP secret produces no egress. Scoping the destination to the configured LDAP host remains tracked in EV-6665 (it needs the same secret read). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2d72d96 to
4aee36e
Compare
…ks bundles Refresh the bundled enterprise CRDs after rebasing onto the latest master. make gen-files picks up upstream schema changes: istioDSCPMark becomes int-or-string (anyOf integer/string) on felixconfigurations, and ipamblocks allocations drops a stale controller-gen TODO and reorders nullable/type. No operator code change; satisfies make dirty-check / validate-gen-versions. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Description
Type: New feature (Calico Enterprise only). Addresses PMREQ-824.
Adds an opt-in RBAC management UI feature, toggled by a new
Manager.spec.rbac.uifield (Enabled|Disabled, defaults toDisabled). When enabled, the operator renders the additional RBAC and network access the UI needs to let an administrator manage role/group assignments from the Manager: maintaining a catalog of managedClusterRoles, binding IdP groups to roles via therbacsynccontroller, and resolving groups from an external LDAP/AD directory.This should be merged because it is the operator-side enablement for the RBAC management UI; without it the UI's backend has no RBAC or egress to function. Gating on
spec.rbac.uikeeps the feature — and its escalation-capable permissions — entirely dormant on clusters that don't opt in.What the flag turns on, by layer
api/v1/manager_types.go) — newRBACspec struct with aUI RBACUIenum field and a nil-safeManager.RBACManagementEnabled()helper (returnsfalsefor a nil Manager or any value other thanEnabled). Regenerated deepcopy and the Manager CRD.calico-managerrole (pkg/render/manager.go): adds the cluster-scoped RBAC management UI rules (full CRUD on managed RBAC objects + escalation-prevention coverage for tier/resource roles), plus a namespacedRole+RoleBindingincalico-systemthat scopes thecreate/named-resource access to the IdPConfigMap/Secrets (tigera-idp-groups,tigera-idp-ldap-config) instead of granting it cluster-wide. SetsRBAC_UI_ENABLEDon theui-apiscontainer and opens LDAP/AD egress (389/636) when OIDC authentication is configured.calico-kube-controllers(pkg/render/kubecontrollers/kube-controllers.go): enables therbacsynccontroller and its RBAC only when the flag is set (dormant otherwise), and adds a namespacedRole+RoleBindinggrantingrbacsyncread on thetigera-idp-groupsConfigMapincalico-system.tigera-network-admin(pkg/render/apiserver.go): grants thebind/escalate-capable rule a network-admin needs to assign managed roles via the UI, also gated on the flag.pkg/render/rbac_management.goso the manager and kube-controllers roles stay aligned (the SA writing a managed role must already hold every permission it grants, or K8s rejects the write as privilege escalation).Why the installation and apiserver controllers read the Manager CR
The flag conceptually belongs to the Manager, but two of the resources it gates are not rendered by the manager controller:
rbacsynccontroller runs insidecalico-kube-controllers, rendered by the installation controller.tigera-network-adminClusterRole is rendered by the apiserver controller.So each of those controllers reads the Manager CR itself and passes a
RBACManagementEnabledbool into its render config. To support this,GetManagerwas moved out of the manager controller intopkg/controller/utilsso the apiserver and installation controllers can reuse it without importing the manager controller package. Both callers treat a missing Manager as not-found (feature off) rather than an error, and the installation controller only reads it for the enterprise variant.Both controllers also add a watch on the
ManagerCR. Without it, togglingspec.rbac.uiwould not re-trigger an installation/apiserver reconcile, so therbacsynccontroller and the network-admin rule wouldn't appear or disappear until some unrelated event re-ran those controllers.Per-cluster behavior
The feature is keyed off each cluster's own
Manager.spec.rbac.ui, so a management cluster and a managed cluster opt in independently. Calico Enterprise evaluates a user's managed-cluster actions against that managed cluster's own RBAC (Voltron proxies the request to the managed cluster's apiserver), so the network-adminbind/escalategrant and the manager-SA rules are intentionally rendered on managed clusters too — that is where a user managing per-cluster RBAC needs them. In multi-tenant management clusters the manager-side rules and namespaced Role are not rendered (gated on!MultiTenant()).Testing: unit tests added for all three render surfaces (manager, apiserver, kube-controllers) covering the enabled/disabled gate and the multi-tenant exclusion;
make gen-filesproduces no drift;go build ./...and the affectedpkg/render*/pkg/controller/apiserversuites pass.Components affected: apiserver (
tigera-network-admin), manager (calico-managerrole + namespaced Role + network policy), kube-controllers (rbacsync), Manager CRD, installation & apiserver controllers,pkg/controller/utils.Release Note
For PR author
make gen-filesmake gen-versionsFor PR reviewers
A note for code reviewers - all pull requests must have the following:
kind/bugif this is a bugfix.kind/enhancementif this is a a new feature.enterpriseif this PR applies to Calico Enterprise only.