Skip to content

perf: scale throughput by effective batch size, not requested --batch-size#930

Open
xieofxie wants to merge 3 commits into
mainfrom
hualxie/perf_batch_test
Open

perf: scale throughput by effective batch size, not requested --batch-size#930
xieofxie wants to merge 3 commits into
mainfrom
hualxie/perf_batch_test

Conversation

@xieofxie

Copy link
Copy Markdown
Contributor

Summary

Fixes incorrect samples_per_sec in wmk perf when the requested --batch-size cannot be applied to the model.

The requested --batch-size only lands on inputs whose leading dimension is dynamic (see _resolve_shape). For a model with a statically-fixed batch dim (e.g. [1, 3, 224, 224]), input generation silently keeps the static value, so the session runs batch=1 — but throughput was still computed as config.batch_size / latency, inflating samples_per_sec by the requested batch factor (e.g. reporting 800 sps when the model actually processed 100). Latency stats were always correct; only the derived per-sample throughput was wrong.

Changes

  • New helper effective_batch_size() — reads the actual batch back from the first batched (rank ≥ 1) generated input, matching the module's existing "first dim is batch" convention. Falls back to the requested value when all inputs are scalar.
  • _collect_results() now scales samples_per_sec by the effective batch the session ran, not the requested config.batch_size. (batches_per_sec was already correct — it is per-call.)
  • _generate_inputs() logs a warning when the requested --batch-size could not be applied (static batch dim).
  • BenchmarkResult gains an effective_batch_size field, surfaced in the JSON benchmark_info alongside the requested batch_size, and shown in the console throughput line with a Note: when the two differ.

Tests

New TestEffectiveBatchSize in tests/unit/commands/test_perf_cli.py covers the helper (dynamic / static / scalar / fallback), throughput scaling for both static-batch (regression guard: 100 not 800 sps) and dynamic-batch cases, the warning on mismatch, and the JSON field. Full test_perf_cli.py, test_perf_module.py, and test_perf_composite.py suites pass.

Scope note

This addresses the batch-size correctness aspect of the epic. The broader items (batch-size sweep --batch-sizes 1,4,8,16 and --ep all cross-EP comparison) are not part of this PR.

Closes #155

@xieofxie xieofxie requested a review from a team as a code owner June 22, 2026 08:15
@xieofxie

Copy link
Copy Markdown
Contributor Author

For the issue 155, only the remaining --ep all is tracked via #449, so close it

@DingmaomaoBJTU DingmaomaoBJTU left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall this is a clean, well-motivated fix. The core bug (samples_per_sec inflated by requested batch on static-batch models) is correctly diagnosed and fixed. The effective_batch_size helper is clear and its fallback behavior is well-documented. Tests cover the important scenarios including the regression guard. A few minor observations below.

Comment thread src/winml/modelkit/commands/perf.py Outdated
Comment thread src/winml/modelkit/commands/perf.py
- Simplify static-batch warning guard to fire consistently with the
  console Note (drop redundant batch_size != 1 check)
- Document single-input assumption in effective_batch_size docstring
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

P1-FEATURE-002: wmk perf Command — Improve Existing

2 participants