SWIP-15: BanyanDB instance-relation deployment topology + category-separated self-observability rules#13905
Merged
Merged
Conversation
…pology and category-separated so11y rules Add a SERVICE_INSTANCE_RELATION scope to the MAL engine and a serviceInstanceRelation(...) builder so MAL rules can emit intra-cluster (same-service) instance topology. The meter Analyzer bridges these to the ServiceInstanceRelation server/client-side topology metrics, so getServiceInstanceTopology renders the edges. SWIP-15 uses this for the BanyanDB deployment view (new banyandb-instance-relation.yaml): the pod-to-pod flow graph with per-edge, per-operation metrics -- write distribution (liaison<->data via publish_* / queue_sub_*) and tier migration (lifecycle->data via migration_*), each carrying throughput / p99 latency / error rate / bytes rate. Category-separate the BanyanDB self-observability rules so a metric reads only the families that genuinely exist for its category (instead of one unified rule that left empty panels): instance rules carry a role prefix on the rule name (node_* shared resource/runtime, liaison_* front-door gRPC/publish, data_* storage/index/subscribe-queue/retention, lifecycle_* migration health); endpoint rules carry a data-type prefix (measure_* / stream_* / stream_tst_* / trace_* / property_*, with operation-keyed queue_* / publish_bytes staying type-agnostic). This adds the previously-unmodeled property data type (so sw_property groups stop rendering all-empty) and the trace storage inverted-index series/term-search/total-series (previously silently dropped). Scope and entity keys are unchanged. Validation: MAL execution test 1350/0, boot-check 1/0, banyandb e2e 28/28 live. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…metrics Update SWIP-15 to what was actually built (then it is the stable design doc): the SERVICE_INSTANCE_RELATION scope is now implemented (no longer future work / out of config-only scope), and the instance/endpoint catalogs are category-separated (node_*/liaison_*/data_*/lifecycle_* and measure_*/stream_*/stream_tst_*/trace_*/property_*). Resolve the stale "should pin the closure with a compile test" note (it ships + is boot-compile-tested), and reference sections via Markdown anchor links instead of the section sign. Refresh docs/en/banyandb/dashboards-banyandb.md (the living operator catalog) with the per-role / per-type / property metric names and a new deployment (instance-relation) topology section. Add a CLAUDE.md note: a SWIP is the stable design doc synced once at implementation and then frozen (further metrics go to the operator doc); use Markdown anchor links in docs, not the section sign. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
system_memory_percent / disk_usage_percent / disk_used_percent_by_path produced a raw 0-1 fraction (memory used_percent gauge, or disk used/total), which the integer meter-value store collapses toward 0/1 — so a ~51% disk rendered as 100%. Scale to a 0-100 percentage (* 100), the convention the other otel-rules already follow. For disk, use BanyanDB's own kind='used_percent' (gopsutil used/(used+free)) instead of recomputing used/total (which ignores reserved blocks and under-reports); the node's data paths share one filesystem and report the same value, so avg() collapses them without the per-path sum inflation. retention *_disk_usage_percent already emits 0-100 and is unchanged. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
GitHub renders release notes as GFM, where a single newline inside a list item becomes a <br> -- so a hard-wrapped changelog bullet shows jagged mid-sentence breaks on the release page (the docs website reflows and hides this). Add a "Publish the GitHub release" section to How-to-release.md explaining the gh release step and the one-line-per-bullet rule, note it in CLAUDE.md, and un-wrap this branch's SWIP-15 changelog entry to a single line. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…rified on v10.4.0 Confirmed against the v10.4.0 release page's rendered HTML: a prose continuation line becomes a <br> (the DrainBalancer entry renders '...rebalancing (DrainBalancer).<br>Designed to replace DataCarrier...'), while nested sub-bullets render as a clean nested list. Refine the guidance accordingly -- only a bullet's prose must stay on one line; sub-bullets are fine. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Lead the instance name with the role: instance() keys become ['container_name','pod_name'] (e.g. data@demo-banyandb-data-hot-0). The instance-relation endpoint keys are flipped in lockstep (local + remote) so the deployment topology still resolves to the same instances. Fixtures, e2e cases / expected topology, SWIP-15, and the operator doc updated to match. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
wankai123
approved these changes
Jun 12, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
SWIP-15: BanyanDB self-observability — pod-to-pod deployment topology + category-separated rules
docs/en/swip/SWIP-15.md).skywalking-horizon-uiPR; no UI change in this repo.What this includes
1.
SERVICE_INSTANCE_RELATIONMAL scope + deployment topologySERVICE_INSTANCE_RELATIONscope +serviceInstanceRelation(...)builder; the meterAnalyzerbridges to theServiceInstanceRelationserver/client-side topology metrics, sogetServiceInstanceTopologyrenders the edges.banyandb-instance-relation.yaml: the pod-to-pod flow graph (the Horizon UI "deployment" component) with per-edge, per-operationpublish_*/queue_sub_*/migration_*metrics (throughput / p99 / error / bytes).2. Category-separated BanyanDB self-observability rules
banyandb-instance.yaml→ role-separated (node_*shared,liaison_*,data_*,lifecycle_*).banyandb-endpoint.yaml→ data-type-separated (measure_*/stream_*/stream_tst_*/trace_*/property_*; operation-keyedqueue_*stay type-agnostic). Adds the previously-unmodeled property type (sosw_propertygroups stop rendering all-empty) and the trace storage inverted-index series.3. Fix: percent metrics rendered as 100%
system_memory_percent/disk_usage_percent/disk_used_percent_by_pathemitted a 0–1 fraction (collapsed to 0/1 in the integer meter store → rendered 100%). Now use BanyanDB'sused_percent× 100. Verified live on a kind cluster: a ~51% disk now reads 51%.4. Docs
docs/en/banyandb/dashboards-banyandb.mdrefreshed; the SWIP-15 changelog consolidated into one concise entry.guides/How-to-release.md: new "Publish the GitHub release" section documenting the changelog-wrapping rule (GitHub renders a bullet's prose continuation lines as<br>, so keep prose on one line; sub-bullets are fine — verified on v10.4.0). Related conventions added toCLAUDE.md.Validation
MAL test 1350/0 · boot-check 1/0 · checkstyle / license-eye / FQCN clean · banyandb e2e 28/28 (live cluster) · disk-percent fix verified live (51%).
CHANGESlog.🤖 Generated with Claude Code