ROX-35008: Add GH action to add VMs to existing OCP clusters#21060
ROX-35008: Add GH action to add VMs to existing OCP clusters#21060vikin91 wants to merge 4 commits into
Conversation
|
This change is part of the following stack: Change managed by git-spice. |
|
Skipping CI for Draft Pull Request. |
|
Important Review skippedAuto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Enterprise Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
📝 WalkthroughWalkthroughThis PR adds a complete GitHub Actions-driven workflow for provisioning RHEL VMs on ACS clusters and installing a StackRox VM agent. It includes a dispatch-triggered workflow, composite action for infrastructure orchestration, Bash scripts for VM creation/SSH access/agent installation, and Quadlet systemd unit files for container-based agent deployment. ChangesVM Provisioning and Agent Installation Workflow
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes 🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
🚀 Build Images ReadyImages are ready for commit 332f73c. To use with deploy scripts: export MAIN_IMAGE_TAG=4.12.x-136-g332f73ce15 |
|
@coderabbitai full review |
✅ Action performedFull review finished. |
There was a problem hiding this comment.
Actionable comments posted: 4
🧹 Nitpick comments (1)
scripts/ci/add-vms/action.yml (1)
71-86: 💤 Low valueConsider certificate verification when downloading virtctl.
Line 83 uses
curl -k(insecure) to download virtctl from the cluster's ConsoleCLIDownload URL. While this is typically necessary for in-cluster service endpoints, it bypasses certificate validation and could expose the download to MitM attacks if the cluster or network is compromised.If the cluster provides a valid certificate or a trusted CA bundle is available, prefer validating the connection. Otherwise, document this security tradeoff.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@scripts/ci/add-vms/action.yml` around lines 71 - 86, The download step "Install virtctl from cluster" currently uses curl -k which disables certificate verification; change it to validate TLS by removing -k and instead allow supplying a CA bundle (e.g., via an environment variable like CA_BUNDLE or KUBECONFIG_CACERT) to curl with --cacert, and only fall back to an explicit opt-in (e.g., SKIP_CERT_VERIFY=true) to keep -k for environments that truly require it; update the shell logic around the DOWNLOAD_URL retrieval and the curl invocation to use the CA_BUNDLE variable when present, and ensure the step still writes the downloaded binary to /usr/local/bin/virtctl and sets executable permissions for virtctl.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In @.github/workflows/add-vms-to-cluster.yml:
- Around line 49-50: The Checkout step currently uses actions/checkout without
disabling credential persistence; update the step that has uses:
actions/checkout@df4cb1c069e1874edd31b4311f1884172cec0e10 (the "Checkout" step)
to add the input persist-credentials: false so the GitHub token is not written
to .git/config and credentials are not persisted in the workspace.
In `@scripts/ci/add-vms/install-agent-quadlet.sh`:
- Around line 60-61: The temporary directory created in install-agent-quadlet.sh
(RENDERED_QUADLET_DIR="$(mktemp -d)") is never removed; add a cleanup trap that
rm -rfs "$RENDERED_QUADLET_DIR" on EXIT (and on ERR if desired) so the temp dir
is always removed after cp -a "${QUADLET_DIR}/." "${RENDERED_QUADLET_DIR}/";
implement this by defining a cleanup function and registering it with trap
(e.g., trap cleanup EXIT) near where RENDERED_QUADLET_DIR is created to ensure
no resource leak.
- Around line 75-94: The idempotency check in installed_image_tag_matches only
verifies the container Image= line and can skip reinstallation if the container
file exists but the timer/service or prep script weren't installed; modify
installed_image_tag_matches to also verify the presence and/or enabled state of
the agent systemd artifacts before returning true: after validating the
Image=${IMAGE_TAG} line (use IMAGE_TAG/installed_image/desired_line as currently
done), SSH into the VM (reuse the existing virtctl SSH invocation parameters:
NAMESPACE, AUTOMATION_SSH_PRIVKEY, SSH_USER) and assert that the roxagent timer
and service unit files (e.g., the roxagent.timer and roxagent-prep.service names
used in install.sh) and the prep script/binary installed by install.sh exist and
are executable (or that systemctl reports the timer/service enabled), and only
return success when both the image line and these files/services are present and
correct.
In `@scripts/ci/add-vms/install-virt-operator.sh`:
- Around line 129-156: After annotating the HyperConverged CR to add the VSOCK
gate (use the vsock_patch / HCO_NAME / OLM_NAMESPACE block), add a wait that
ensures the virt-handler DaemonSet has rolled out with the new config before
proceeding: after detecting VSOCK in kv_gates, call the same rollout check used
in tests/e2e/lib.sh (i.e., wait for the daemonset/virt-handler in the KubeVirt
namespace to reach desired number of available pods or use kubectl rollout
status) and fail via die if the rollout does not complete within a timeout;
ensure this check references the virt-handler daemonset name and the appropriate
KubeVirt namespace.
---
Nitpick comments:
In `@scripts/ci/add-vms/action.yml`:
- Around line 71-86: The download step "Install virtctl from cluster" currently
uses curl -k which disables certificate verification; change it to validate TLS
by removing -k and instead allow supplying a CA bundle (e.g., via an environment
variable like CA_BUNDLE or KUBECONFIG_CACERT) to curl with --cacert, and only
fall back to an explicit opt-in (e.g., SKIP_CERT_VERIFY=true) to keep -k for
environments that truly require it; update the shell logic around the
DOWNLOAD_URL retrieval and the curl invocation to use the CA_BUNDLE variable
when present, and ensure the step still writes the downloaded binary to
/usr/local/bin/virtctl and sets executable permissions for virtctl.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Enterprise
Run ID: 02e70d78-3b1b-4b33-8de7-22777da1e1a6
📒 Files selected for processing (13)
.github/workflows/add-vms-to-cluster.ymlscripts/ci/add-vms/action.ymlscripts/ci/add-vms/add-vms.shscripts/ci/add-vms/deploy-vms.shscripts/ci/add-vms/install-agent-native.shscripts/ci/add-vms/install-agent-quadlet.shscripts/ci/add-vms/install-virt-operator.shscripts/ci/add-vms/quadlet/README.mdscripts/ci/add-vms/quadlet/install.shscripts/ci/add-vms/quadlet/roxagent-prep.servicescripts/ci/add-vms/quadlet/roxagent-tmpfiles.confscripts/ci/add-vms/quadlet/roxagent.containerscripts/ci/add-vms/quadlet/roxagent.timer
Split the add-vms work so the base action branch contains the workflow, VM provisioning, virt operator setup, and native agent path only, leaving the Quadlet mode for a stacked follow-up PR. User request: "okay, we split that. All the non-quadlet action goes to `piotr/ROX-35008-action-add-VMs` all quadlet related to `piotr/ROX-35008-VM-action-quadlet`. Note that the branches are on top of each other. Split that cleanly. You can commit". Partially generated by AI. Co-authored-by: Cursor <cursoragent@cursor.com>
bb8db35 to
ff281a0
Compare
|
@coderabbitai full review |
✅ Action performedFull review finished. |
|
@vikin91: The following tests failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
Description
Add a GitHub Actions workflow and supporting shell scripts to provision RHEL VMs on an existing OCP cluster and install the native roxagent on them.
This PR is now the native-only base of the stack:
/tmp/roxrootview for scanningworkflow_dispatchaction for running the flow from GitHub ActionsFollow-ups
roxagenton the flyUser-facing documentation
Testing and quality
Automated testing
How I validated my change
Running the action:
Adding new VM to an existing cluster with another VM (cluster has KVM and virt operator installed already)
Logs: https://github.com/stackrox/stackrox/actions/runs/27331925416
Copying to a totally fresh cluster (no ACS):
Logs: https://github.com/stackrox/stackrox/actions/runs/27337049006 (action was ✅ but roxagent had a tiny failure - wrong binary was uploaded)
Logs for rerun (to overwrite the roxagent): https://github.com/stackrox/stackrox/actions/runs/27338863169/job/80770148617#logs
Manually by running the bash scripts: