Skip to content

Refactor 703#728

Draft
kvinwang wants to merge 67 commits into
masterfrom
rebase-703
Draft

Refactor 703#728
kvinwang wants to merge 67 commits into
masterfrom
rebase-703

Conversation

@kvinwang

Copy link
Copy Markdown
Collaborator

No description provided.

ChrisWorkBot added 30 commits June 12, 2026 04:14
kvinwang added 26 commits June 15, 2026 19:38
Unified dstack images now ship both the TDX firmware (ovmf.fd) and the AMD
SEV firmware (ovmf-sev.fd), the latter referenced by a new "bios-sev" field
in metadata.json.

Add ImageInfo::bios_sev and an Image::firmware(is_amd_sev_snp) helper that
returns bios-sev for SEV-SNP guests (falling back to bios) and bios for TDX.
Use it both when launching QEMU (-bios) and when computing the SEV-SNP OVMF
launch measurement, so the measured firmware always matches the launched
one. TDX behaviour is unchanged; images without bios-sev fall back to bios.
`platform = "auto"` (the default) previously always resolved to TDX,
requiring operators to opt into SEV-SNP explicitly. Implement real
detection: AMD SEV-SNP hosts advertise the `sev_snp` CPU flag and Intel
TDX hosts advertise `tdx_host_platform`; these flags are vendor-exclusive
so the flag alone is unambiguous. Unknown hosts still fall back to TDX, and
an explicit `platform = "tdx" | "amd-sev-snp"` always overrides detection.

Combined with the bios-sev firmware selection, an AMD SEV-SNP host with a
default config now auto-launches SEV-SNP guests with the SEV firmware.

Verified on real hardware: AMD EPYC SNP host reports `sev_snp`, Intel TDX
host reports `tdx_host_platform`. Unit tests cover both plus fallback.
snp_measurement_os_image_hash hashed the entire MeasurementInput document,
which includes per-deployment fields (vcpus, vcpu_type, guest_features,
app_id, compose_hash). That made the same OS image hash differently for
different vCPU counts, breaking per-image on-chain allow-listing.

Hash only the image-determined measurement inputs (rootfs_hash, base_cmdline,
ovmf_hash, kernel_hash, initrd_hash, sev_hashes_table_gpa, sev_es_reset_eip,
ovmf_sections) via a canonical SevOsImageMeasurement projection. The actual
SNP launch measurement (compute_expected_measurement) still uses the full
input and is unchanged. Test now asserts image fields change the hash while
per-deployment fields do not.
Factor the SEV-SNP os_image_hash projection into a shared
dstack_types::SevOsImageMeasurement (canonical JCS + SHA-256). KMS derives
it from a verified launch measurement; add a config-free
`dstack-vmm sev-os-image-hash <image-dir>` subcommand that computes the same
value from the OS image artifacts, so the image build can emit digest.sev.txt
that matches what the verifier computes.

A cross-check test asserts sev_os_image_hash(image) equals the hash derived
from the launch measurement document, guarding against field drift between
the build and verify paths.
sha256 is always 32 bytes; use a fixed-size array instead of Vec<u8> for
type safety and to avoid the allocation. KMS converts to Vec<u8> at the
BootInfo boundary; the VMM tool/test use the array directly.
CI runs clippy with -D clippy::expect_used -D clippy::unwrap_used. Replace the
two infallible-serialization expect() calls (SevOsImageMeasurement::os_image_hash
and MrConfigV3::to_canonical_json) with the repo's or_panic() helper, and add
the conventional #[allow(clippy::too_many_arguments)] to the central SNP
build_amd_snp_boot_info_with_tcb_status (matching existing usage elsewhere).

These were pre-existing rust-checks failures surfaced once expect_used was
cleared. Verified: the exact CI clippy command passes clean, fmt --check
passes, SNP/types tests pass.
Two tests asserted the old behavior where os_image_hash changed with any
MeasurementInput field. Now that os_image_hash is the image-invariant
projection, per-deployment fields (app_id, vcpus) must NOT change it:
- app_id_changes_host_data_and_authorization_binding: app_id changes the
  authorization binding but leaves os_image_hash unchanged.
- measured_input_changes_reject_until_measurement_is_recomputed: assert
  os_image_hash changes only for image fields (kernel_hash), not vcpus.

(These run under the full test suite; my earlier 'snp'-filtered local run
missed them.)
Add host_shared_dir() honoring DSTACK_HOST_SHARED_DIR, and
SysConfig::mr_config_document() (top-level mr_config, falling back to the
copy embedded in vm_config). These give every reader one accessor so the
guest quote path and the config-id verifier cannot disagree about where
host-shared files / the mr_config document live.
dstack-util setup runs before /dstack/.host-shared is bind-mounted, so the
hardcoded path was empty when dstack-attest built the SEV quote, producing
'amd sev-snp mr_config is missing'. setup now exports DSTACK_HOST_SHARED_DIR
pointing at its work-dir copy; dstack-attest and the config-id verifier both
resolve via host_shared_dir() + SysConfig::mr_config_document().
make_vm_config wrote image.digest (the generic content digest) into
vm_config.os_image_hash for every platform. For AMD SEV-SNP the value must
be the launch-measurement-derived hash (== sev-os-image-hash subcommand /
digest.sev.txt, and what KMS recomputes from the verified measurement).
The mismatch left vm_config and the guest app-info reporting a value
inconsistent with digest.sev / the KMS-derived one. Compute
sev_os_image_hash(image) for SEV, keep image.digest for TDX.
show-mrs special-cased AMD SEV-SNP to emit null MRs with a note claiming
they were TDX-RTMR-only. The app-info path (Attestation::local()->
decode_app_info) computes mr_system/mr_aggregated for SEV too, so drop the
special case and report the real values.
ensure_snp_key_release_config_safe refused to start the KMS when
sev_snp_key_release was enabled without enforce_self_authorization. The
self-authorization requirement is not needed for SEV key release, so remove
the startup gate, its helper, and the associated test.
Add a real AMD SEV-SNP attestation captured from a live dstack CVM plus its
pinned ASK/VCEK, and an integration test that verifies the full chain offline
(builtin ARK -> ASK -> VCEK -> report signature) and asserts the report_data
marker, launch measurement, and HOST_DATA. Fully deterministic — nothing is
fetched from AMD KDS. See sev_snp_fixture.README.md for provenance.
Move the SEV-SNP launch-measurement recomputation and os_image_hash
derivation into a new dstack-mr::sev module so the KMS (key release) and
the verifier (attestation verification) compute identical values from a
single source of truth, instead of the verifier lacking it entirely.

Primitive-typed API (measurement/host_data byte arrays) keeps the module
free of attestation/RA-TLS types, avoiding a dependency cycle. Includes a
real-fixture regression test that recomputes the captured CVM's launch
measurement (7f51e17f...) and os_image_hash (32b47673...).
Replace the in-tree launch-measurement recomputation, os_image_hash
derivation, OVMF parsing and mr_config binding with re-exports from
dstack-mr::sev. The KMS keeps its authorization BootInfo/policy layer on
top. Behaviour is unchanged: all 28 KMS tests (incl. the pinned 88a479...
measurement vector) pass against the shared implementation.
verify_os_image_hash previously bailed "Unsupported attestation quote"
for DstackAmdSevSnp, so SEV-SNP attestations always returned is_valid=false.

Add verify_os_image_hash_for_dstack_sev: recompute the launch measurement
from the self-contained sev_snp_measurement inputs carried in the
attestation config, require it to equal the hardware-signed MEASUREMENT,
require HOST_DATA to bind the MrConfigV3 document, then derive and surface
the image-invariant os_image_hash. Also fills tcb_status/advisory_ids for
SEV. Same dstack-mr::sev code path the KMS uses for key release, so a quote
the KMS would release keys for now verifies here too (is_valid=true).
dstack-util quote was TDX-only (read the Intel configfs directly and
failed on SEV hosts); make it detect the running TEE via Attestation::quote
and emit the platform's raw hardware quote (TDX DCAP quote or SNP report).

GetQuoteResponse gains an 'attestation' field carrying the platform-
adaptive versioned attestation, populated on every platform. On non-TDX
(SEV-SNP) the legacy quote/event_log fields are empty, so this is the
verifier-ready payload to send to dstack-verifier's /verify 'attestation'
field. Populated in the real, simulator and test backends; exposed in the
Rust SDK GetQuoteResponse with a decode_attestation helper.
Extend the offline SEV-SNP fixture test to also run the verifier's full
binding path with no network: after the hardware report verifies, recompute
the launch measurement from the attestation's embedded sev_snp_measurement,
confirm HOST_DATA binds the mr_config, and assert the derived os_image_hash
(32b47673...) and HOST_DATA-bound app_id. Adds dstack-mr as a dev-dep.
The binary/PEM SEV-SNP fixtures can't carry inline SPDX headers; annotate
them in REUSE.toml as CC0-1.0 alongside the existing nitro fixtures so the
REUSE compliance check passes.
Adversarial negative tests for the SEV-SNP verification path:

dstack-mr::sev (synthetic, deterministic):
- forged hardware MEASUREMENT and HOST_DATA are rejected
- every measured launch field (ovmf/kernel/initrd hashes, cmdline, hash-table
  offset, reset eip, section gpa, vcpus, vcpu_type, guest_features) is caught
  by the measurement-equality check
- substituting a different MrConfigV3 (app/compose/instance id) breaks the
  HOST_DATA binding
- an advertised top-level os_image_hash is ignored (derived value wins)
- booting a different image cannot present an allow-listed image's inputs
- missing sev_snp_measurement / mr_config fail closed
- documents that rootfs_hash is os_image_hash-only (bound via the measured
  cmdline), so tampering it changes the derived os_image_hash rather than
  failing the measurement check

dstack-attest (real fixture, offline):
- flipping any signed report field (report_data/measurement/host_data) or the
  signature invalidates VCEK verification; zeroed/truncated reports rejected
- wrong collateral (ASK-as-VCEK, malformed VCEK) rejected
- forged measurement/host_data, tampered launch inputs, substituted mr_config
  and bogus advertised os_image_hash all handled correctly against real data

Derive Debug on SevImageBinding for test ergonomics.
Move the AMD SEV-SNP os_image_hash computation out of dstack-vmm into the
dstack-mr crate, and add a `dstack-mr sev-os-image-hash <image_dir>` command
that emits the value (digest.sev.txt). dstack-mr now parses metadata.json,
measures the SEV firmware (GCTX over ovmf-sev.fd), hashes kernel/initrd and
projects them through dstack_types::SevOsImageMeasurement — the single hashing
path already shared with KMS/verifier.

dstack-vmm no longer recomputes the SEV os_image_hash at deploy: Image::load
reads digest.sev.txt and make_vm_config uses it directly (failing closed if the
file is absent), mirroring how TDX uses digest.txt. The vmm `sev-os-image-hash`
subcommand is removed.

Verified the new CLI reproduces the existing digest.sev.txt byte-for-byte
(32b47673...) on the nvidia-0.6.0.a2 image, matching the value the verifier and
CVM report.
The sev_snp_measurement launch-input document built at deploy time used vmm's
own snp_measure.rs (OVMF footer parse + GCTX). That logic is byte-for-byte the
same as dstack_mr::sev::ovmf_measurement_info (added for the os_image_hash CLI),
so delegate to it and delete the duplicate module. dstack-mr becomes a normal
vmm dependency. Output is unchanged: the measurement-doc test and its
os_image_hash projection cross-check still pass.
TeePlatform::resolve() folded an 'Auto' variant into the resolved type, so
every match on a resolved platform carried a dead Auto arm (e.g. `Tdx | Auto`
in the -machine selection). Remove the Auto variant: the config field becomes
`Option<TeePlatform>` (None = auto-detect), and CvmConfig::resolved_platform()
returns the pinned platform or TeePlatform::detect(). Matches on the resolved
platform are now exhaustive over {Tdx, AmdSevSnp} with no unreachable arm.

A back-compat deserializer still accepts the literal `platform = "auto"`
(mapped to None) so existing vmm.toml configs keep working.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant