Skip to content

Add scoped event-time director models#249

Open
sanjaychari wants to merge 7 commits into
codes-org:masterfrom
sanjaychari:digital-twin-sbir-develop-component-ml
Open

Add scoped event-time director models#249
sanjaychari wants to merge 7 commits into
codes-org:masterfrom
sanjaychari:digital-twin-sbir-develop-component-ml

Conversation

@sanjaychari

Copy link
Copy Markdown
Contributor

This PR makes the following changes.

  • Add scoped event-time model storage and inference so the
    dragonfly-dally event-time surrogate can use separate ML models for
    switch LPs while retaining the existing director request flow.

  • Fix ZeroMQ director request argument handling so command handlers parse
    the argument-count prefix exactly once. This prevents client IDs such as
    1 from being mistaken for a second argument count and dropped from
    training/inference requests.

  • Restore the ZeroMQ director build path by compiling director-client.C
    when USE_ZMQML is enabled and propagating the USE_ZMQML compile
    definition to downstream targets.

  • Clean up the director-client merge conflict around global ZMQ latency
    statistics and keep the cumulative MPI-reduced DIR_STATS output format.
    Expose a latency-recording hook so event-time inference requests from
    dragonfly-dally are included in the shared ZMQ request statistics.

  • Update the event-time workflow to use START_ITER and END_ITER template
    variables and save/load the scoped event-time model directory rather
    than a single model file.

Add scoped event-time model storage and inference so the
dragonfly-dally event-time surrogate can use separate ML models for
switch LPs while retaining the existing director request flow.

Fix ZeroMQ director request argument handling so command handlers parse
the argument-count prefix exactly once. This prevents client IDs such as
1 from being mistaken for a second argument count and dropped from
training/inference requests.

Restore the ZeroMQ director build path by compiling director-client.C
when USE_ZMQML is enabled and propagating the USE_ZMQML compile
definition to downstream targets.

Clean up the director-client merge conflict around global ZMQ latency
statistics and keep the cumulative MPI-reduced DIR_STATS output format.
Expose a latency-recording hook so event-time inference requests from
dragonfly-dally are included in the shared ZMQ request statistics.

Update the event-time workflow to use START_ITER and END_ITER template
variables and save/load the scoped event-time model directory rather
than a single model file.
This commit formats files with clang-format-20
,which is used by the CI, instead of just clang-format.
@sanjaychari sanjaychari force-pushed the digital-twin-sbir-develop-component-ml branch from 73d52b7 to e937968 Compare June 23, 2026 15:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant