Refactor 703 by kvinwang · Pull Request #728 · Dstack-TEE/dstack

kvinwang · 2026-06-15T07:57:16Z

No description provided.

Unified dstack images now ship both the TDX firmware (ovmf.fd) and the AMD SEV firmware (ovmf-sev.fd), the latter referenced by a new "bios-sev" field in metadata.json. Add ImageInfo::bios_sev and an Image::firmware(is_amd_sev_snp) helper that returns bios-sev for SEV-SNP guests (falling back to bios) and bios for TDX. Use it both when launching QEMU (-bios) and when computing the SEV-SNP OVMF launch measurement, so the measured firmware always matches the launched one. TDX behaviour is unchanged; images without bios-sev fall back to bios.

`platform = "auto"` (the default) previously always resolved to TDX, requiring operators to opt into SEV-SNP explicitly. Implement real detection: AMD SEV-SNP hosts advertise the `sev_snp` CPU flag and Intel TDX hosts advertise `tdx_host_platform`; these flags are vendor-exclusive so the flag alone is unambiguous. Unknown hosts still fall back to TDX, and an explicit `platform = "tdx" | "amd-sev-snp"` always overrides detection. Combined with the bios-sev firmware selection, an AMD SEV-SNP host with a default config now auto-launches SEV-SNP guests with the SEV firmware. Verified on real hardware: AMD EPYC SNP host reports `sev_snp`, Intel TDX host reports `tdx_host_platform`. Unit tests cover both plus fallback.

snp_measurement_os_image_hash hashed the entire MeasurementInput document, which includes per-deployment fields (vcpus, vcpu_type, guest_features, app_id, compose_hash). That made the same OS image hash differently for different vCPU counts, breaking per-image on-chain allow-listing. Hash only the image-determined measurement inputs (rootfs_hash, base_cmdline, ovmf_hash, kernel_hash, initrd_hash, sev_hashes_table_gpa, sev_es_reset_eip, ovmf_sections) via a canonical SevOsImageMeasurement projection. The actual SNP launch measurement (compute_expected_measurement) still uses the full input and is unchanged. Test now asserts image fields change the hash while per-deployment fields do not.

Factor the SEV-SNP os_image_hash projection into a shared dstack_types::SevOsImageMeasurement (canonical JCS + SHA-256). KMS derives it from a verified launch measurement; add a config-free `dstack-vmm sev-os-image-hash <image-dir>` subcommand that computes the same value from the OS image artifacts, so the image build can emit digest.sev.txt that matches what the verifier computes. A cross-check test asserts sev_os_image_hash(image) equals the hash derived from the launch measurement document, guarding against field drift between the build and verify paths.

sha256 is always 32 bytes; use a fixed-size array instead of Vec<u8> for type safety and to avoid the allocation. KMS converts to Vec<u8> at the BootInfo boundary; the VMM tool/test use the array directly.

CI runs clippy with -D clippy::expect_used -D clippy::unwrap_used. Replace the two infallible-serialization expect() calls (SevOsImageMeasurement::os_image_hash and MrConfigV3::to_canonical_json) with the repo's or_panic() helper, and add the conventional #[allow(clippy::too_many_arguments)] to the central SNP build_amd_snp_boot_info_with_tcb_status (matching existing usage elsewhere). These were pre-existing rust-checks failures surfaced once expect_used was cleared. Verified: the exact CI clippy command passes clean, fmt --check passes, SNP/types tests pass.

Two tests asserted the old behavior where os_image_hash changed with any MeasurementInput field. Now that os_image_hash is the image-invariant projection, per-deployment fields (app_id, vcpus) must NOT change it: - app_id_changes_host_data_and_authorization_binding: app_id changes the authorization binding but leaves os_image_hash unchanged. - measured_input_changes_reject_until_measurement_is_recomputed: assert os_image_hash changes only for image fields (kernel_hash), not vcpus. (These run under the full test suite; my earlier 'snp'-filtered local run missed them.)

Add host_shared_dir() honoring DSTACK_HOST_SHARED_DIR, and SysConfig::mr_config_document() (top-level mr_config, falling back to the copy embedded in vm_config). These give every reader one accessor so the guest quote path and the config-id verifier cannot disagree about where host-shared files / the mr_config document live.

dstack-util setup runs before /dstack/.host-shared is bind-mounted, so the hardcoded path was empty when dstack-attest built the SEV quote, producing 'amd sev-snp mr_config is missing'. setup now exports DSTACK_HOST_SHARED_DIR pointing at its work-dir copy; dstack-attest and the config-id verifier both resolve via host_shared_dir() + SysConfig::mr_config_document().

make_vm_config wrote image.digest (the generic content digest) into vm_config.os_image_hash for every platform. For AMD SEV-SNP the value must be the launch-measurement-derived hash (== sev-os-image-hash subcommand / digest.sev.txt, and what KMS recomputes from the verified measurement). The mismatch left vm_config and the guest app-info reporting a value inconsistent with digest.sev / the KMS-derived one. Compute sev_os_image_hash(image) for SEV, keep image.digest for TDX.

show-mrs special-cased AMD SEV-SNP to emit null MRs with a note claiming they were TDX-RTMR-only. The app-info path (Attestation::local()-> decode_app_info) computes mr_system/mr_aggregated for SEV too, so drop the special case and report the real values.

ensure_snp_key_release_config_safe refused to start the KMS when sev_snp_key_release was enabled without enforce_self_authorization. The self-authorization requirement is not needed for SEV key release, so remove the startup gate, its helper, and the associated test.

Add a real AMD SEV-SNP attestation captured from a live dstack CVM plus its pinned ASK/VCEK, and an integration test that verifies the full chain offline (builtin ARK -> ASK -> VCEK -> report signature) and asserts the report_data marker, launch measurement, and HOST_DATA. Fully deterministic — nothing is fetched from AMD KDS. See sev_snp_fixture.README.md for provenance.

Move the SEV-SNP launch-measurement recomputation and os_image_hash derivation into a new dstack-mr::sev module so the KMS (key release) and the verifier (attestation verification) compute identical values from a single source of truth, instead of the verifier lacking it entirely. Primitive-typed API (measurement/host_data byte arrays) keeps the module free of attestation/RA-TLS types, avoiding a dependency cycle. Includes a real-fixture regression test that recomputes the captured CVM's launch measurement (7f51e17f...) and os_image_hash (32b47673...).

Replace the in-tree launch-measurement recomputation, os_image_hash derivation, OVMF parsing and mr_config binding with re-exports from dstack-mr::sev. The KMS keeps its authorization BootInfo/policy layer on top. Behaviour is unchanged: all 28 KMS tests (incl. the pinned 88a479... measurement vector) pass against the shared implementation.

verify_os_image_hash previously bailed "Unsupported attestation quote" for DstackAmdSevSnp, so SEV-SNP attestations always returned is_valid=false. Add verify_os_image_hash_for_dstack_sev: recompute the launch measurement from the self-contained sev_snp_measurement inputs carried in the attestation config, require it to equal the hardware-signed MEASUREMENT, require HOST_DATA to bind the MrConfigV3 document, then derive and surface the image-invariant os_image_hash. Also fills tcb_status/advisory_ids for SEV. Same dstack-mr::sev code path the KMS uses for key release, so a quote the KMS would release keys for now verifies here too (is_valid=true).

dstack-util quote was TDX-only (read the Intel configfs directly and failed on SEV hosts); make it detect the running TEE via Attestation::quote and emit the platform's raw hardware quote (TDX DCAP quote or SNP report). GetQuoteResponse gains an 'attestation' field carrying the platform- adaptive versioned attestation, populated on every platform. On non-TDX (SEV-SNP) the legacy quote/event_log fields are empty, so this is the verifier-ready payload to send to dstack-verifier's /verify 'attestation' field. Populated in the real, simulator and test backends; exposed in the Rust SDK GetQuoteResponse with a decode_attestation helper.

Extend the offline SEV-SNP fixture test to also run the verifier's full binding path with no network: after the hardware report verifies, recompute the launch measurement from the attestation's embedded sev_snp_measurement, confirm HOST_DATA binds the mr_config, and assert the derived os_image_hash (32b47673...) and HOST_DATA-bound app_id. Adds dstack-mr as a dev-dep.

The binary/PEM SEV-SNP fixtures can't carry inline SPDX headers; annotate them in REUSE.toml as CC0-1.0 alongside the existing nitro fixtures so the REUSE compliance check passes.

Adversarial negative tests for the SEV-SNP verification path: dstack-mr::sev (synthetic, deterministic): - forged hardware MEASUREMENT and HOST_DATA are rejected - every measured launch field (ovmf/kernel/initrd hashes, cmdline, hash-table offset, reset eip, section gpa, vcpus, vcpu_type, guest_features) is caught by the measurement-equality check - substituting a different MrConfigV3 (app/compose/instance id) breaks the HOST_DATA binding - an advertised top-level os_image_hash is ignored (derived value wins) - booting a different image cannot present an allow-listed image's inputs - missing sev_snp_measurement / mr_config fail closed - documents that rootfs_hash is os_image_hash-only (bound via the measured cmdline), so tampering it changes the derived os_image_hash rather than failing the measurement check dstack-attest (real fixture, offline): - flipping any signed report field (report_data/measurement/host_data) or the signature invalidates VCEK verification; zeroed/truncated reports rejected - wrong collateral (ASK-as-VCEK, malformed VCEK) rejected - forged measurement/host_data, tampered launch inputs, substituted mr_config and bogus advertised os_image_hash all handled correctly against real data Derive Debug on SevImageBinding for test ergonomics.

Move the AMD SEV-SNP os_image_hash computation out of dstack-vmm into the dstack-mr crate, and add a `dstack-mr sev-os-image-hash <image_dir>` command that emits the value (digest.sev.txt). dstack-mr now parses metadata.json, measures the SEV firmware (GCTX over ovmf-sev.fd), hashes kernel/initrd and projects them through dstack_types::SevOsImageMeasurement — the single hashing path already shared with KMS/verifier. dstack-vmm no longer recomputes the SEV os_image_hash at deploy: Image::load reads digest.sev.txt and make_vm_config uses it directly (failing closed if the file is absent), mirroring how TDX uses digest.txt. The vmm `sev-os-image-hash` subcommand is removed. Verified the new CLI reproduces the existing digest.sev.txt byte-for-byte (32b47673...) on the nvidia-0.6.0.a2 image, matching the value the verifier and CVM report.

The sev_snp_measurement launch-input document built at deploy time used vmm's own snp_measure.rs (OVMF footer parse + GCTX). That logic is byte-for-byte the same as dstack_mr::sev::ovmf_measurement_info (added for the os_image_hash CLI), so delegate to it and delete the duplicate module. dstack-mr becomes a normal vmm dependency. Output is unchanged: the measurement-doc test and its os_image_hash projection cross-check still pass.

TeePlatform::resolve() folded an 'Auto' variant into the resolved type, so every match on a resolved platform carried a dead Auto arm (e.g. `Tdx | Auto` in the -machine selection). Remove the Auto variant: the config field becomes `Option<TeePlatform>` (None = auto-detect), and CvmConfig::resolved_platform() returns the pinned platform or TeePlatform::detect(). Matches on the resolved platform are now exhaustive over {Tdx, AmdSevSnp} with no unreachable arm. A back-compat deserializer still accepts the literal `platform = "auto"` (mapped to None) so existing vmm.toml configs keep working.

ChrisWorkBot added 30 commits June 12, 2026 04:14

feat: add amd sev-snp attestation support

77aa2c2

fix: harden sev-snp report data binding

87c78d7

fix: address sev-snp draft review findings

0701e63

feat: add sev-snp verifier core

ce4aa6d

fix: normalize sev-snp cert collateral

c0b442b

fix: add fail-closed sev-snp measurement binding

1adfa2b

fix: recompute sev-snp launch measurement

0e9f586

fix: add sev-snp boot info helper

5755441

test: add sev-snp measurement golden vector

4b31c97

fix: add sev-snp auth policy helper

7dda8ff

fix: bind sev-snp app id into measurement

6c2d817

fix: connect sev-snp verified attestation to boot info

be054d8

fix: parse sev-snp measurement inputs from vm config

ace8753

fix: route kms snp attestation through dry-run auth

2b73095

fix: report sev-snp onboarding attestation info

922afd7

fix: use sev-snp boot info for kms self auth

5b36b4c

fix: make auth-simple tcb policy explicit

a8f0e88

fix: block sev-snp temp ca release

058fd29

fix: derive sev-snp tcb policy from report

73d857b

chore: satisfy sev-snp workspace clippy

52d3fac

docs: add sev-snp review readiness note

0077ec9

feat: enable guarded sev-snp key release

3792eb1

fix: bind sev-snp vm launch inputs

027077b

fix: complete sev-snp key release smoke path

cfe476b

fix: satisfy ci lint checks

40396b7

fix: satisfy prek shellcheck

5cb4566

test: add sev-snp e2e smoke script

409c4c5

test: harden sev-snp smoke script

2aa70e8

docs: document sev-snp smoke host matrix

fc22673

docs: clarify sev-snp smoke image requirements

0303256

Bind SNP app config via HOST_DATA

8bbade4

kvinwang force-pushed the rebase-703 branch from dc091e6 to 8bbade4 Compare June 15, 2026 15:05

kvinwang added 26 commits June 15, 2026 19:38

Select SEV-SNP KDS product from report

3f71e7d

Use self-contained SNP measurement input

c5d4910

Detect SEV-SNP C-bit position from CPUID

48fa211

Detect SEV-SNP launch params from QEMU

02865d0

dstack-types: return [u8; 32] from SevOsImageMeasurement::os_image_hash

9830dce

sha256 is always 32 bytes; use a fixed-size array instead of Vec<u8> for type safety and to avoid the allocation. KMS converts to Vec<u8> at the BootInfo boundary; the VMM tool/test use the array directly.

reuse: license SEV-SNP test fixtures (CC0-1.0)

d80daeb

The binary/PEM SEV-SNP fixtures can't carry inline SPDX headers; annotate them in REUSE.toml as CC0-1.0 alongside the existing nitro fixtures so the REUSE compliance check passes.

kvinwang force-pushed the rebase-703 branch from 7f6f18e to a9f4b36 Compare June 18, 2026 03:27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor 703#728

Refactor 703#728
kvinwang wants to merge 67 commits into
masterfrom
rebase-703

kvinwang commented Jun 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

kvinwang commented Jun 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant