Refactor 703#728
Draft
kvinwang wants to merge 67 commits into
Draft
Conversation
added 30 commits
June 12, 2026 04:14
Unified dstack images now ship both the TDX firmware (ovmf.fd) and the AMD SEV firmware (ovmf-sev.fd), the latter referenced by a new "bios-sev" field in metadata.json. Add ImageInfo::bios_sev and an Image::firmware(is_amd_sev_snp) helper that returns bios-sev for SEV-SNP guests (falling back to bios) and bios for TDX. Use it both when launching QEMU (-bios) and when computing the SEV-SNP OVMF launch measurement, so the measured firmware always matches the launched one. TDX behaviour is unchanged; images without bios-sev fall back to bios.
`platform = "auto"` (the default) previously always resolved to TDX, requiring operators to opt into SEV-SNP explicitly. Implement real detection: AMD SEV-SNP hosts advertise the `sev_snp` CPU flag and Intel TDX hosts advertise `tdx_host_platform`; these flags are vendor-exclusive so the flag alone is unambiguous. Unknown hosts still fall back to TDX, and an explicit `platform = "tdx" | "amd-sev-snp"` always overrides detection. Combined with the bios-sev firmware selection, an AMD SEV-SNP host with a default config now auto-launches SEV-SNP guests with the SEV firmware. Verified on real hardware: AMD EPYC SNP host reports `sev_snp`, Intel TDX host reports `tdx_host_platform`. Unit tests cover both plus fallback.
snp_measurement_os_image_hash hashed the entire MeasurementInput document, which includes per-deployment fields (vcpus, vcpu_type, guest_features, app_id, compose_hash). That made the same OS image hash differently for different vCPU counts, breaking per-image on-chain allow-listing. Hash only the image-determined measurement inputs (rootfs_hash, base_cmdline, ovmf_hash, kernel_hash, initrd_hash, sev_hashes_table_gpa, sev_es_reset_eip, ovmf_sections) via a canonical SevOsImageMeasurement projection. The actual SNP launch measurement (compute_expected_measurement) still uses the full input and is unchanged. Test now asserts image fields change the hash while per-deployment fields do not.
Factor the SEV-SNP os_image_hash projection into a shared dstack_types::SevOsImageMeasurement (canonical JCS + SHA-256). KMS derives it from a verified launch measurement; add a config-free `dstack-vmm sev-os-image-hash <image-dir>` subcommand that computes the same value from the OS image artifacts, so the image build can emit digest.sev.txt that matches what the verifier computes. A cross-check test asserts sev_os_image_hash(image) equals the hash derived from the launch measurement document, guarding against field drift between the build and verify paths.
sha256 is always 32 bytes; use a fixed-size array instead of Vec<u8> for type safety and to avoid the allocation. KMS converts to Vec<u8> at the BootInfo boundary; the VMM tool/test use the array directly.
CI runs clippy with -D clippy::expect_used -D clippy::unwrap_used. Replace the two infallible-serialization expect() calls (SevOsImageMeasurement::os_image_hash and MrConfigV3::to_canonical_json) with the repo's or_panic() helper, and add the conventional #[allow(clippy::too_many_arguments)] to the central SNP build_amd_snp_boot_info_with_tcb_status (matching existing usage elsewhere). These were pre-existing rust-checks failures surfaced once expect_used was cleared. Verified: the exact CI clippy command passes clean, fmt --check passes, SNP/types tests pass.
Two tests asserted the old behavior where os_image_hash changed with any MeasurementInput field. Now that os_image_hash is the image-invariant projection, per-deployment fields (app_id, vcpus) must NOT change it: - app_id_changes_host_data_and_authorization_binding: app_id changes the authorization binding but leaves os_image_hash unchanged. - measured_input_changes_reject_until_measurement_is_recomputed: assert os_image_hash changes only for image fields (kernel_hash), not vcpus. (These run under the full test suite; my earlier 'snp'-filtered local run missed them.)
Add host_shared_dir() honoring DSTACK_HOST_SHARED_DIR, and SysConfig::mr_config_document() (top-level mr_config, falling back to the copy embedded in vm_config). These give every reader one accessor so the guest quote path and the config-id verifier cannot disagree about where host-shared files / the mr_config document live.
dstack-util setup runs before /dstack/.host-shared is bind-mounted, so the hardcoded path was empty when dstack-attest built the SEV quote, producing 'amd sev-snp mr_config is missing'. setup now exports DSTACK_HOST_SHARED_DIR pointing at its work-dir copy; dstack-attest and the config-id verifier both resolve via host_shared_dir() + SysConfig::mr_config_document().
make_vm_config wrote image.digest (the generic content digest) into vm_config.os_image_hash for every platform. For AMD SEV-SNP the value must be the launch-measurement-derived hash (== sev-os-image-hash subcommand / digest.sev.txt, and what KMS recomputes from the verified measurement). The mismatch left vm_config and the guest app-info reporting a value inconsistent with digest.sev / the KMS-derived one. Compute sev_os_image_hash(image) for SEV, keep image.digest for TDX.
show-mrs special-cased AMD SEV-SNP to emit null MRs with a note claiming they were TDX-RTMR-only. The app-info path (Attestation::local()-> decode_app_info) computes mr_system/mr_aggregated for SEV too, so drop the special case and report the real values.
ensure_snp_key_release_config_safe refused to start the KMS when sev_snp_key_release was enabled without enforce_self_authorization. The self-authorization requirement is not needed for SEV key release, so remove the startup gate, its helper, and the associated test.
Add a real AMD SEV-SNP attestation captured from a live dstack CVM plus its pinned ASK/VCEK, and an integration test that verifies the full chain offline (builtin ARK -> ASK -> VCEK -> report signature) and asserts the report_data marker, launch measurement, and HOST_DATA. Fully deterministic — nothing is fetched from AMD KDS. See sev_snp_fixture.README.md for provenance.
Move the SEV-SNP launch-measurement recomputation and os_image_hash derivation into a new dstack-mr::sev module so the KMS (key release) and the verifier (attestation verification) compute identical values from a single source of truth, instead of the verifier lacking it entirely. Primitive-typed API (measurement/host_data byte arrays) keeps the module free of attestation/RA-TLS types, avoiding a dependency cycle. Includes a real-fixture regression test that recomputes the captured CVM's launch measurement (7f51e17f...) and os_image_hash (32b47673...).
Replace the in-tree launch-measurement recomputation, os_image_hash derivation, OVMF parsing and mr_config binding with re-exports from dstack-mr::sev. The KMS keeps its authorization BootInfo/policy layer on top. Behaviour is unchanged: all 28 KMS tests (incl. the pinned 88a479... measurement vector) pass against the shared implementation.
verify_os_image_hash previously bailed "Unsupported attestation quote" for DstackAmdSevSnp, so SEV-SNP attestations always returned is_valid=false. Add verify_os_image_hash_for_dstack_sev: recompute the launch measurement from the self-contained sev_snp_measurement inputs carried in the attestation config, require it to equal the hardware-signed MEASUREMENT, require HOST_DATA to bind the MrConfigV3 document, then derive and surface the image-invariant os_image_hash. Also fills tcb_status/advisory_ids for SEV. Same dstack-mr::sev code path the KMS uses for key release, so a quote the KMS would release keys for now verifies here too (is_valid=true).
dstack-util quote was TDX-only (read the Intel configfs directly and failed on SEV hosts); make it detect the running TEE via Attestation::quote and emit the platform's raw hardware quote (TDX DCAP quote or SNP report). GetQuoteResponse gains an 'attestation' field carrying the platform- adaptive versioned attestation, populated on every platform. On non-TDX (SEV-SNP) the legacy quote/event_log fields are empty, so this is the verifier-ready payload to send to dstack-verifier's /verify 'attestation' field. Populated in the real, simulator and test backends; exposed in the Rust SDK GetQuoteResponse with a decode_attestation helper.
Extend the offline SEV-SNP fixture test to also run the verifier's full binding path with no network: after the hardware report verifies, recompute the launch measurement from the attestation's embedded sev_snp_measurement, confirm HOST_DATA binds the mr_config, and assert the derived os_image_hash (32b47673...) and HOST_DATA-bound app_id. Adds dstack-mr as a dev-dep.
The binary/PEM SEV-SNP fixtures can't carry inline SPDX headers; annotate them in REUSE.toml as CC0-1.0 alongside the existing nitro fixtures so the REUSE compliance check passes.
Adversarial negative tests for the SEV-SNP verification path: dstack-mr::sev (synthetic, deterministic): - forged hardware MEASUREMENT and HOST_DATA are rejected - every measured launch field (ovmf/kernel/initrd hashes, cmdline, hash-table offset, reset eip, section gpa, vcpus, vcpu_type, guest_features) is caught by the measurement-equality check - substituting a different MrConfigV3 (app/compose/instance id) breaks the HOST_DATA binding - an advertised top-level os_image_hash is ignored (derived value wins) - booting a different image cannot present an allow-listed image's inputs - missing sev_snp_measurement / mr_config fail closed - documents that rootfs_hash is os_image_hash-only (bound via the measured cmdline), so tampering it changes the derived os_image_hash rather than failing the measurement check dstack-attest (real fixture, offline): - flipping any signed report field (report_data/measurement/host_data) or the signature invalidates VCEK verification; zeroed/truncated reports rejected - wrong collateral (ASK-as-VCEK, malformed VCEK) rejected - forged measurement/host_data, tampered launch inputs, substituted mr_config and bogus advertised os_image_hash all handled correctly against real data Derive Debug on SevImageBinding for test ergonomics.
Move the AMD SEV-SNP os_image_hash computation out of dstack-vmm into the dstack-mr crate, and add a `dstack-mr sev-os-image-hash <image_dir>` command that emits the value (digest.sev.txt). dstack-mr now parses metadata.json, measures the SEV firmware (GCTX over ovmf-sev.fd), hashes kernel/initrd and projects them through dstack_types::SevOsImageMeasurement — the single hashing path already shared with KMS/verifier. dstack-vmm no longer recomputes the SEV os_image_hash at deploy: Image::load reads digest.sev.txt and make_vm_config uses it directly (failing closed if the file is absent), mirroring how TDX uses digest.txt. The vmm `sev-os-image-hash` subcommand is removed. Verified the new CLI reproduces the existing digest.sev.txt byte-for-byte (32b47673...) on the nvidia-0.6.0.a2 image, matching the value the verifier and CVM report.
The sev_snp_measurement launch-input document built at deploy time used vmm's own snp_measure.rs (OVMF footer parse + GCTX). That logic is byte-for-byte the same as dstack_mr::sev::ovmf_measurement_info (added for the os_image_hash CLI), so delegate to it and delete the duplicate module. dstack-mr becomes a normal vmm dependency. Output is unchanged: the measurement-doc test and its os_image_hash projection cross-check still pass.
TeePlatform::resolve() folded an 'Auto' variant into the resolved type, so
every match on a resolved platform carried a dead Auto arm (e.g. `Tdx | Auto`
in the -machine selection). Remove the Auto variant: the config field becomes
`Option<TeePlatform>` (None = auto-detect), and CvmConfig::resolved_platform()
returns the pinned platform or TeePlatform::detect(). Matches on the resolved
platform are now exhaustive over {Tdx, AmdSevSnp} with no unreachable arm.
A back-compat deserializer still accepts the literal `platform = "auto"`
(mapped to None) so existing vmm.toml configs keep working.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.