feat(scheduling): migrate CR and failover calls to options-based pipeline selection#950
feat(scheduling): migrate CR and failover calls to options-based pipeline selection#950mblos wants to merge 5 commits into
Conversation
📝 WalkthroughWalkthroughAdds a ChangesScheduling Options, Filter Steps, Capacity Probe, and HANA Failover Pipeline
Sequence Diagram(s)sequenceDiagram
participant capacityProbe as Capacity Probe
participant scheduler as Scheduler
participant filters as Filter Steps
capacityProbe->>scheduler: ExternalSchedulerRequest<br/>(SkipPlacementContextFilters=true<br/>ProjectID=empty<br/>AssumeEmptyHosts/IgnoredTypes)
scheduler->>filters: Run each step
filters->>filters: Check SkipPlacementContextFilters
filters-->>scheduler: Return all hosts<br/>(skip context filtering)
scheduler-->>capacityProbe: Capacity result
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Suggested reviewers
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
as of now, tests are redundant. no strong opinion on either joint testing across filters like done in |
|
changed semantic of failover calls.. now selecting gp/hana pipelines. Please check if we want to do it like this or always go with gp pipeline |
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
internal/scheduling/reservations/failover/integration_test.go (2)
1371-1399:⚠️ Potential issue | 🟠 Major | ⚡ Quick winSame duplicate pipeline issue as in
newIntegrationTestEnv.Lines 1371-1383 and 1384-1399 both define pipelines named
PipelineNewFailoverReservation. Remove the first duplicate definition here as well.🔧 Remove the duplicate pipeline definition
pipelines := []v1alpha1.Pipeline{ { ObjectMeta: metav1.ObjectMeta{ Name: "test-filter-pipeline", }, Spec: v1alpha1.PipelineSpec{ Type: v1alpha1.PipelineTypeFilterWeigher, Filters: []v1alpha1.FilterSpec{ {Name: "filter_has_enough_capacity"}, {Name: "filter_has_requested_traits"}, {Name: "filter_correct_az"}, }, Weighers: []v1alpha1.WeigherSpec{ {Name: "kvm_failover_evacuation"}, }, }, }, - { - ObjectMeta: metav1.ObjectMeta{ - Name: PipelineNewFailoverReservation, - }, - Spec: v1alpha1.PipelineSpec{ - Type: v1alpha1.PipelineTypeFilterWeigher, - Filters: []v1alpha1.FilterSpec{ - {Name: "filter_has_requested_traits"}, - {Name: "filter_correct_az"}, - }, - Weighers: []v1alpha1.WeigherSpec{}, - }, - }, { ObjectMeta: metav1.ObjectMeta{ Name: PipelineNewFailoverReservation, },🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@internal/scheduling/reservations/failover/integration_test.go` around lines 1371 - 1399, There are two pipeline definitions with the same name PipelineNewFailoverReservation in the test data. Remove the first pipeline definition (the one with only filter_has_requested_traits and filter_correct_az filters and no weighers) and keep the second one which contains the complete pipeline specification with filter_has_enough_capacity, filter_has_requested_traits, filter_correct_az, and the kvm_failover_evacuation weigher.
1170-1198:⚠️ Potential issue | 🟠 Major | ⚡ Quick winDuplicate pipeline definitions — first one is dead code.
After the rename at line 1172, both pipelines at lines 1170-1182 and 1183-1198 use the same name
PipelineNewFailoverReservation. The loop at lines 1213-1225 will overwrite the first definition with the second, making the first pipeline block unreachable dead code. The two have different filter/weigher configurations.🔧 Remove the duplicate pipeline definition
pipelines := []v1alpha1.Pipeline{ { ObjectMeta: metav1.ObjectMeta{ Name: "test-filter-pipeline", }, Spec: v1alpha1.PipelineSpec{ Type: v1alpha1.PipelineTypeFilterWeigher, Filters: []v1alpha1.FilterSpec{ {Name: "filter_has_enough_capacity"}, {Name: "filter_correct_az"}, }, Weighers: []v1alpha1.WeigherSpec{ {Name: "kvm_failover_evacuation"}, }, }, }, - { - ObjectMeta: metav1.ObjectMeta{ - Name: PipelineNewFailoverReservation, - }, - Spec: v1alpha1.PipelineSpec{ - Type: v1alpha1.PipelineTypeFilterWeigher, - Filters: []v1alpha1.FilterSpec{ - {Name: "filter_has_requested_traits"}, - {Name: "filter_correct_az"}, - }, - Weighers: []v1alpha1.WeigherSpec{}, - }, - }, { ObjectMeta: metav1.ObjectMeta{ Name: PipelineNewFailoverReservation, },🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@internal/scheduling/reservations/failover/integration_test.go` around lines 1170 - 1198, There are two pipeline definitions with the identical name PipelineNewFailoverReservation in the test fixtures, causing the first definition to be dead code that gets overwritten by the second when processed. Remove the first pipeline definition block that contains only the filters filter_has_requested_traits and filter_correct_az with an empty weighers list, keeping the second pipeline definition that includes the additional filter_has_enough_capacity filter and the kvm_failover_evacuation weigher, as it is the intended configuration that will actually be used.
🧹 Nitpick comments (1)
internal/scheduling/reservations/capacity/config.go (1)
17-19: ⚡ Quick winUpdate stale pipeline example in
TotalPipelinecomment.The example still mentions
kvm-report-capacity, but defaults now usekvm-general-purpose-load-balancing. Updating this avoids operator confusion.Suggested patch
- // TotalPipeline is the scheduler pipeline used for the empty-state probe. - // This pipeline should ignore current VM allocations (e.g. kvm-report-capacity). + // TotalPipeline is the scheduler pipeline used for the empty-state probe. + // This pipeline should ignore current VM allocations (e.g. kvm-general-purpose-load-balancing with AssumeEmptyHosts). TotalPipeline string `json:"capacityTotalPipeline"`🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@internal/scheduling/reservations/capacity/config.go` around lines 17 - 19, The comment for the TotalPipeline field references an outdated pipeline example `kvm-report-capacity` which no longer reflects the current defaults. Update the comment text to replace the outdated example with the current default pipeline name `kvm-general-purpose-load-balancing` to keep the documentation accurate and avoid operator confusion.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@internal/scheduling/reservations/failover/reservation_scheduling.go`:
- Around line 25-32: The inferFailoverPipeline function performs a
case-sensitive string comparison on the extra specs value, but the upstream
GetFlavorType function uses strings.ToLower for case-insensitive comparison,
creating an inconsistency. Modify the comparison in inferFailoverPipeline to
convert the extraSpecs value to lowercase using strings.ToLower before comparing
it to "required", ensuring that values like "Required" or "REQUIRED" are handled
correctly and consistently with the upstream implementation.
---
Outside diff comments:
In `@internal/scheduling/reservations/failover/integration_test.go`:
- Around line 1371-1399: There are two pipeline definitions with the same name
PipelineNewFailoverReservation in the test data. Remove the first pipeline
definition (the one with only filter_has_requested_traits and filter_correct_az
filters and no weighers) and keep the second one which contains the complete
pipeline specification with filter_has_enough_capacity,
filter_has_requested_traits, filter_correct_az, and the kvm_failover_evacuation
weigher.
- Around line 1170-1198: There are two pipeline definitions with the identical
name PipelineNewFailoverReservation in the test fixtures, causing the first
definition to be dead code that gets overwritten by the second when processed.
Remove the first pipeline definition block that contains only the filters
filter_has_requested_traits and filter_correct_az with an empty weighers list,
keeping the second pipeline definition that includes the additional
filter_has_enough_capacity filter and the kvm_failover_evacuation weigher, as it
is the intended configuration that will actually be used.
---
Nitpick comments:
In `@internal/scheduling/reservations/capacity/config.go`:
- Around line 17-19: The comment for the TotalPipeline field references an
outdated pipeline example `kvm-report-capacity` which no longer reflects the
current defaults. Update the comment text to replace the outdated example with
the current default pipeline name `kvm-general-purpose-load-balancing` to keep
the documentation accurate and avoid operator confusion.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: a58fd721-2f3d-43a1-b4d4-38c0ed4cee3b
📒 Files selected for processing (25)
api/scheduling/options.gohelm/bundles/cortex-nova/values.yamlinternal/scheduling/nova/plugins/filters/filter_aggregate_metadata.gointernal/scheduling/nova/plugins/filters/filter_aggregate_metadata_test.gointernal/scheduling/nova/plugins/filters/filter_allowed_projects.gointernal/scheduling/nova/plugins/filters/filter_allowed_projects_test.gointernal/scheduling/nova/plugins/filters/filter_external_customer.gointernal/scheduling/nova/plugins/filters/filter_external_customer_test.gointernal/scheduling/nova/plugins/filters/filter_has_enough_capacity.gointernal/scheduling/nova/plugins/filters/filter_has_enough_capacity_test.gointernal/scheduling/nova/plugins/filters/filter_instance_group_affinity.gointernal/scheduling/nova/plugins/filters/filter_instance_group_affinity_test.gointernal/scheduling/nova/plugins/filters/filter_instance_group_anti_affinity.gointernal/scheduling/nova/plugins/filters/filter_instance_group_anti_affinity_test.gointernal/scheduling/nova/plugins/filters/filter_live_migratable.gointernal/scheduling/nova/plugins/filters/filter_live_migratable_test.gointernal/scheduling/nova/plugins/filters/filter_quota_enforcement.gointernal/scheduling/nova/plugins/filters/filter_quota_enforcement_test.gointernal/scheduling/nova/plugins/filters/filter_requested_destination.gointernal/scheduling/nova/plugins/filters/filter_requested_destination_test.gointernal/scheduling/nova/plugins/filters/filter_skip_placement_context_test.gointernal/scheduling/reservations/capacity/config.gointernal/scheduling/reservations/capacity/controller.gointernal/scheduling/reservations/failover/integration_test.gointernal/scheduling/reservations/failover/reservation_scheduling.go
| // inferFailoverPipeline returns the standard pipeline for a failover scheduling call based on | ||
| // the VM's flavor extra specs — the same HANA vs general-purpose split used by Nova placement. | ||
| func inferFailoverPipeline(extraSpecs map[string]string) string { | ||
| if extraSpecs["trait:CUSTOM_HANA_EXCLUSIVE_HOST"] == "required" { | ||
| return "kvm-hana-bin-packing" | ||
| } | ||
| return "kvm-general-purpose-load-balancing" | ||
| } |
There was a problem hiding this comment.
Case-insensitive comparison inconsistency with upstream.
The upstream GetFlavorType in api/external/nova/messages.go:100-115 uses strings.ToLower(val) before comparing. This function uses direct string equality, which could cause pipeline selection mismatch if extra specs contain "Required" or "REQUIRED".
🔧 Suggested fix for case-insensitive comparison
func inferFailoverPipeline(extraSpecs map[string]string) string {
- if extraSpecs["trait:CUSTOM_HANA_EXCLUSIVE_HOST"] == "required" {
+ if strings.EqualFold(extraSpecs["trait:CUSTOM_HANA_EXCLUSIVE_HOST"], "required") {
return "kvm-hana-bin-packing"
}
return "kvm-general-purpose-load-balancing"
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| // inferFailoverPipeline returns the standard pipeline for a failover scheduling call based on | |
| // the VM's flavor extra specs — the same HANA vs general-purpose split used by Nova placement. | |
| func inferFailoverPipeline(extraSpecs map[string]string) string { | |
| if extraSpecs["trait:CUSTOM_HANA_EXCLUSIVE_HOST"] == "required" { | |
| return "kvm-hana-bin-packing" | |
| } | |
| return "kvm-general-purpose-load-balancing" | |
| } | |
| // inferFailoverPipeline returns the standard pipeline for a failover scheduling call based on | |
| // the VM's flavor extra specs — the same HANA vs general-purpose split used by Nova placement. | |
| func inferFailoverPipeline(extraSpecs map[string]string) string { | |
| if strings.EqualFold(extraSpecs["trait:CUSTOM_HANA_EXCLUSIVE_HOST"], "required") { | |
| return "kvm-hana-bin-packing" | |
| } | |
| return "kvm-general-purpose-load-balancing" | |
| } |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@internal/scheduling/reservations/failover/reservation_scheduling.go` around
lines 25 - 32, The inferFailoverPipeline function performs a case-sensitive
string comparison on the extra specs value, but the upstream GetFlavorType
function uses strings.ToLower for case-insensitive comparison, creating an
inconsistency. Modify the comparison in inferFailoverPipeline to convert the
extraSpecs value to lowercase using strings.ToLower before comparing it to
"required", ensuring that values like "Required" or "REQUIRED" are handled
correctly and consistently with the upstream implementation.
Test Coverage ReportTest Coverage 📊: 69.8% |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@internal/scheduling/reservations/commitments/integration_test.go`:
- Around line 1109-1128: The test in the captureScheduler handler ignores decode
errors on line 1111 and has no assertion that verifies the scheduler was
actually called. This allows the test to pass with a zero-value capturedReq even
if the scheduler was never invoked. Add a boolean flag (such as schedulerCalled)
that gets set to true when the captureScheduler handler executes, then assert
this flag is true before checking
capturedReq.Options.SkipPlacementContextFilters. This ensures the test can only
pass if the scheduler was actually called and the request was properly captured.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: c023c681-2b76-4618-bcff-96c4b36da0b5
📒 Files selected for processing (5)
api/scheduling/options.gointernal/scheduling/reservations/capacity/controller_test.gointernal/scheduling/reservations/commitments/integration_test.gointernal/scheduling/reservations/commitments/reservation_controller.gointernal/scheduling/reservations/failover/reservation_scheduling.go
🚧 Files skipped from review as they are similar to previous changes (2)
- api/scheduling/options.go
- internal/scheduling/reservations/failover/reservation_scheduling.go
| var capturedReq schedulerdelegationapi.ExternalSchedulerRequest | ||
| captureScheduler := http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) { | ||
| json.NewDecoder(r.Body).Decode(&capturedReq) //nolint:errcheck | ||
| resp := &schedulerdelegationapi.ExternalSchedulerResponse{Hosts: []string{"host-1"}} | ||
| json.NewEncoder(w).Encode(resp) //nolint:errcheck | ||
| }) | ||
|
|
||
| env := newIntgEnv(t, []client.Object{newTestFlavorKnowledge(), intgHypervisor("host-1")}, captureScheduler) | ||
| defer env.close() | ||
|
|
||
| cr := intgCR("test-cr", "commit-uuid-1", v1alpha1.CommitmentStatusConfirmed) | ||
| if err := env.k8sClient.Create(context.Background(), cr); err != nil { | ||
| t.Fatalf("create CR: %v", err) | ||
| } | ||
| env.reconcileCR(t, cr.Name) | ||
| env.reconcileChildReservations(t, cr.Name) | ||
|
|
||
| if capturedReq.Options.SkipPlacementContextFilters { | ||
| t.Error("CR slot scheduling must not set SkipPlacementContextFilters — tenant-context filters must run") | ||
| } |
There was a problem hiding this comment.
Test can pass without proving a scheduler request was captured.
Because Line 1111 ignores decode errors and there’s no “scheduler was called” assertion, this test can pass with the zero-value request (SkipPlacementContextFilters == false) even when capture fails.
Suggested fix
func TestCRScheduling_DoesNotSetSkipPlacementContextFilters(t *testing.T) {
@@
var capturedReq schedulerdelegationapi.ExternalSchedulerRequest
+ schedulerCalled := false
captureScheduler := http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
- json.NewDecoder(r.Body).Decode(&capturedReq) //nolint:errcheck
+ if err := json.NewDecoder(r.Body).Decode(&capturedReq); err != nil {
+ http.Error(w, err.Error(), http.StatusBadRequest)
+ return
+ }
+ schedulerCalled = true
resp := &schedulerdelegationapi.ExternalSchedulerResponse{Hosts: []string{"host-1"}}
json.NewEncoder(w).Encode(resp) //nolint:errcheck
})
@@
env.reconcileCR(t, cr.Name)
env.reconcileChildReservations(t, cr.Name)
+ if !schedulerCalled {
+ t.Fatal("expected CR scheduling to call the scheduler at least once")
+ }
if capturedReq.Options.SkipPlacementContextFilters {
t.Error("CR slot scheduling must not set SkipPlacementContextFilters — tenant-context filters must run")
}
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| var capturedReq schedulerdelegationapi.ExternalSchedulerRequest | |
| captureScheduler := http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) { | |
| json.NewDecoder(r.Body).Decode(&capturedReq) //nolint:errcheck | |
| resp := &schedulerdelegationapi.ExternalSchedulerResponse{Hosts: []string{"host-1"}} | |
| json.NewEncoder(w).Encode(resp) //nolint:errcheck | |
| }) | |
| env := newIntgEnv(t, []client.Object{newTestFlavorKnowledge(), intgHypervisor("host-1")}, captureScheduler) | |
| defer env.close() | |
| cr := intgCR("test-cr", "commit-uuid-1", v1alpha1.CommitmentStatusConfirmed) | |
| if err := env.k8sClient.Create(context.Background(), cr); err != nil { | |
| t.Fatalf("create CR: %v", err) | |
| } | |
| env.reconcileCR(t, cr.Name) | |
| env.reconcileChildReservations(t, cr.Name) | |
| if capturedReq.Options.SkipPlacementContextFilters { | |
| t.Error("CR slot scheduling must not set SkipPlacementContextFilters — tenant-context filters must run") | |
| } | |
| var capturedReq schedulerdelegationapi.ExternalSchedulerRequest | |
| schedulerCalled := false | |
| captureScheduler := http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) { | |
| if err := json.NewDecoder(r.Body).Decode(&capturedReq); err != nil { | |
| http.Error(w, err.Error(), http.StatusBadRequest) | |
| return | |
| } | |
| schedulerCalled = true | |
| resp := &schedulerdelegationapi.ExternalSchedulerResponse{Hosts: []string{"host-1"}} | |
| json.NewEncoder(w).Encode(resp) //nolint:errcheck | |
| }) | |
| env := newIntgEnv(t, []client.Object{newTestFlavorKnowledge(), intgHypervisor("host-1")}, captureScheduler) | |
| defer env.close() | |
| cr := intgCR("test-cr", "commit-uuid-1", v1alpha1.CommitmentStatusConfirmed) | |
| if err := env.k8sClient.Create(context.Background(), cr); err != nil { | |
| t.Fatalf("create CR: %v", err) | |
| } | |
| env.reconcileCR(t, cr.Name) | |
| env.reconcileChildReservations(t, cr.Name) | |
| if !schedulerCalled { | |
| t.Fatal("expected CR scheduling to call the scheduler at least once") | |
| } | |
| if capturedReq.Options.SkipPlacementContextFilters { | |
| t.Error("CR slot scheduling must not set SkipPlacementContextFilters — tenant-context filters must run") | |
| } |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@internal/scheduling/reservations/commitments/integration_test.go` around
lines 1109 - 1128, The test in the captureScheduler handler ignores decode
errors on line 1111 and has no assertion that verifies the scheduler was
actually called. This allows the test to pass with a zero-value capturedReq even
if the scheduler was never invoked. Add a boolean flag (such as schedulerCalled)
that gets set to true when the captureScheduler handler executes, then assert
this flag is true before checking
capturedReq.Options.SkipPlacementContextFilters. This ensures the test can only
pass if the scheduler was actually called and the request was properly captured.
| }, scheduling.Options{SkipHistory: true, SkipInflight: true, SkipCommittedResourceTracking: true}) | ||
| }, scheduling.Options{ | ||
| ReadOnly: true, | ||
| AssumeEmptyHosts: ignoreAllocations, |
There was a problem hiding this comment.
If AssumeEmptyHosts and ignoreAllocations are synonymous, we should probably stick with one and get rid of the other
Eliminating dedicated CR and failover pipelines by encoding their behavioral differences as call-time scheduling options, so the standard GP and HANA pipelines are sufficient for all internal scheduling calls.