Integrate "get model versions" and "download specific model version" into cpp Core with max_versions#816
Integrate "get model versions" and "download specific model version" into cpp Core with max_versions#816selenayang888 wants to merge 23 commits into
Conversation
…o baijumeswani/catalog
…/integrate-get-model-versions-into-cpp
…odel-versions-into-cpp
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
| /// is owned by the list and remains valid until the list is released. NULL | ||
| /// when no continuation token was set (e.g. the list came from GetModels(), | ||
| /// or GetModelVersions has walked the underlying source to exhaustion). | ||
| const char* FL_API_T(ModelList_GetContinuationToken, _In_ const flModelList* models); |
There was a problem hiding this comment.
Can this be removed along with next_continuation_token in flModelList?
There was a problem hiding this comment.
Yes, I removed this API since the next_continuation_token was already removed.
| /// Get all versions of a model alias, optionally narrowed to a specific model name. | ||
| /// @param model_alias Alias of the model (e.g. "phi-4-mini"). Must be non-NULL and non-empty. | ||
| /// @param model_name Optional model name (ModelInfo.Name, e.g. "Phi-4-generic-gpu"). NULL returns | ||
| /// every model name. | ||
| /// @param max_versions Select latest X versions per model name. Pass 0 (or any | ||
| /// negative value) for no per-model-name cap. | ||
| /// Returned list contains existing flModel handles owned by the catalog; releasing | ||
| /// the list does not invalidate the underlying model handles. | ||
| FL_API_STATUS(GetModelVersions, _In_ const flCatalog* catalog, _In_ const char* model_alias, | ||
| _In_opt_ const char* model_name, int32_t max_versions, _Outptr_ flModelList** out_models); |
There was a problem hiding this comment.
Should we note that (IIUC) these models will be returned by catalog operations in the current instance (e.g. list models) but are not saved in the model cache on disk so won't be present on restart? I think that behavior is fine but it could be a little surprising.
There was a problem hiding this comment.
I think this is how design doc specified for caching strategy. get_model_versions results are not cached in BaseModelCatalog. Each call to get_model_versions does a fresh catalog query. However, to support download an older version", the resolved ModelInfo for that version is temporarily added to the catalog's modelIdToInfo index.
The link below is the part in the design doc for Caching Strategy for All-Versions Data:
There was a problem hiding this comment.
should we explain that behavior to the user in the comments here as it's a little opaque?
There was a problem hiding this comment.
Added the explanation of the behavior in the comments now.
Moreover, based on the design doc requirements for caching strategy, I updated the GetModelVersions in base_model_catalog.cc:475 no longer calls IntegrateVariants. It now stores the fetched models in transient per-catalog query storage and returns those handles without adding them to the main alias/id indices.
| struct ModelVersionsPage { | ||
| std::vector<Model*> models; | ||
| }; |
There was a problem hiding this comment.
do we need this struct anymore?
There was a problem hiding this comment.
Removed this struct ModelVersionsPage.
| while (true) { | ||
| const std::string body = BuildRequestBody(filters, skip, continuation_token); | ||
| int requested_page_size = kPageSize; | ||
| if (max_count > 0) { |
There was a problem hiding this comment.
Does anything set max_count? If not can we remove the code related to it?
There was a problem hiding this comment.
As the function FetchFilterSetWithState is reverted, no max_count_ anymore.
| // Page 1: run through region fallback starting from the sticky region (last known-good) or the active region. | ||
| // Exhaustion means every candidate had a retryable region-health failure, so fail just this filter set. | ||
| const std::string start = | ||
| region_fallback_.StickyRegion().value_or(region_); | ||
| // Page 1 fresh start: run region fallback starting from the sticky/active region. | ||
| const std::string start = region_fallback_.StickyRegion().value_or(region_); |
There was a problem hiding this comment.
nit: removed comments that seemed like they were more descriptive of the processing here and lines 296, 302-303.
There was a problem hiding this comment.
I reverted this function FetchFilterSetWithState back to FetchFilterSet, since the continuation_token is already removed from getModelVersions.
| // Scan local models so any version already on disk is reported as cached. | ||
| auto local_models = ScanLocalModels(cache_dir_, logger_); |
There was a problem hiding this comment.
Do we need to scan local models here and in FetchModelsByIds? That seems unrelated to fetching the latest versions/ids. We want to do it when we do the general fetch, but I don't think it's required in these targeted fetches.
| struct FetchedModelVersions { | ||
| std::vector<Model> models; | ||
| }; | ||
|
|
||
| virtual FetchedModelVersions FetchModelVersions( | ||
| const std::string& /*model_alias*/, | ||
| const std::string& /*model_name*/ = "") const { | ||
| return {}; | ||
| } |
There was a problem hiding this comment.
Do we need the FetchedModelVersions struct?
There was a problem hiding this comment.
Since it is only a thin wrapper around a vector, so removing and replacing it with std::vector<Model> in the base virtual method.
| // Capture fetch order before moving so we can return models in the order the | ||
| // underlying source produced them — important for stable pagination. | ||
| std::vector<std::string> fetched_ids; | ||
| fetched_ids.reserve(fetched.models.size()); | ||
| for (const auto& m : fetched.models) { | ||
| fetched_ids.push_back(m.Info().model_id); | ||
| } |
| // Deterministic API output order: alias alpha, then name alpha, then version asc. | ||
| bool CompareModelPointersForVersionList(const Model* lhs, const Model* rhs) { |
There was a problem hiding this comment.
Should this have the same ordering as CompareModelsForSort? Currently the device is ignored so things like 'cuda' and 'generic-gpu' will come before 'npu' which would differ from the model list output.
When we add to the model variants in IntegrateVariantsLocked do we end up with a weird ordering given AddVariant appends to the original list, which was sorted using CompareModelsForSort?
This branch off yours has an alternative approach where the variant sorting is pushed down to Model given we're now adding new variants in multiple places: skottmckay/get-model-versions-with-variant-sort
It also uses that ordering when returning values from GetModelVersions for consistency.
There was a problem hiding this comment.
The local scan removal from FetchModelVersions and the transient version_query_models_ approach are both solid and fully address the first and main concern in the comments.
Then, the three major changes:
- Added container re-sorting support in
model.handmodel.cc. - Applied that re-sort in integration flow in
base_model_catalog.cc - Fixed
max_versionsselection robustness inbase_model_catalog.cc
There was a problem hiding this comment.
Thanks for sharing the branch — I've merged it in (after the latest origin/main merge). All are completed as your suggested in teams chat:
- Make Model own ordering invariant
- Move comparator/device-priority logic from BaseModelCatalog to Model
- Add default-selection step in Model
- Remove duplicated ordering work in GetModelVersions and integration paths.
| // Optional Windows App SDK bootstrap. When the caller enables Bootstrap in | ||
| // additional_options we initialize the WinAppSDK framework package for this process. This | ||
| // must run before the Manager constructor so that WinML EP discovery (inside | ||
| // Manager::Manager) can resolve Microsoft.Windows.AI.MachineLearning.dll. We use a | ||
| // temporary stderr logger here because the Manager-owned logger doesn't exist yet; | ||
| // bootstrap output is low-volume (one line on success, one warning on failure). Mirrors | ||
| // the C# FoundryLocalCore IS_WINML path. Only meaningful in WinML builds; outside that | ||
| // configuration TryInitializeWindowsAppSdk is a no-op stub. | ||
| #if defined(FOUNDRY_LOCAL_USE_WINML) && FOUNDRY_LOCAL_USE_WINML | ||
| { | ||
| if (IsAdditionalOptionEnabled(config, "Bootstrap")) { | ||
| StderrLogger bootstrap_logger; | ||
| TryInitializeWindowsAppSdk(bootstrap_logger); | ||
| } | ||
| } | ||
| #endif |
There was a problem hiding this comment.
Assuming this is unintentional
There was a problem hiding this comment.
It was unintentional and reverted the changes.
…odel-versions-into-cpp
|
|
||
| /// Transient storage for the most recent GetModelVersions query. Replaced on each call. | ||
| /// These models are intentionally not integrated into the main lookup indices. | ||
| mutable std::vector<std::unique_ptr<Model>> version_query_models_; |
There was a problem hiding this comment.
Replacing on each call creates potential issues on the client side if they call this for multiple models as it would invalidate all info from the previous fetch and they have no way to keep that info alive.
Could we store in an unordered_map<string_view, unique_ptr> that uses the model id as the key?
| } | ||
|
|
||
| result.push_back(model.get()); | ||
| } |
There was a problem hiding this comment.
My concern with this is that we're using a different structure to the Model's returned by the default list, which have a parent (a.k.a. 'container') Model instance for the alias and the variants are accesses from that. Here we're returning individual variants and no parent Model instance is created.
That means usage could differ in the user code, and ideally we can avoid that.
Given we're fetching a single alias, could we create the parent Model and instead use AddVariant to add each individual result and return the parent? That would mean the API returns a single Model though.
There was a problem hiding this comment.
Actually we could do both. Internally create the same structure (parent container Model with leaf variants), and return a flModelList of the variants at the C API level like we do in Model_GetVariantsImpl.
That creates a consistent setup internally whilst returning the expected list of individual variant Model instances.
…odel-versions-into-cpp
Porting the changes "get model versions" and "download specific model version" from C# into C++ Core now: