Skip to content

fix(kernel): server-sourced async state + cursor-state/metadata parity for use_kernel#838

Open
vikrantpuppala wants to merge 1 commit into
mainfrom
feat/use-kernel-cuj-batch3-fixes
Open

fix(kernel): server-sourced async state + cursor-state/metadata parity for use_kernel#838
vikrantpuppala wants to merge 1 commit into
mainfrom
feat/use-kernel-cuj-batch3-fixes

Conversation

@vikrantpuppala
Copy link
Copy Markdown
Contributor

@vikrantpuppala vikrantpuppala commented Jun 6, 2026

What

Brings the use_kernel=True path closer to Thrift parity on the cursor-state-tracking (T7) and metadata clusters from the API+CUJ gap audit, surfaces DML rowcount, and makes async command state server-sourced (no connector-side state to drift). Batch 3 of the use_kernel parity work (follows #824 / #830).

Depends on kernel PR databricks/databricks-sql-kernel#139 (case-insensitive table_types, HTTP error-envelope parsing, num_modified_rows getter, Session::attach_async_statement). KERNEL_REV points at that PR's branch HEAD; re-pin to the squash SHA before this merges.

Async command state — server is the source of truth

This addresses the core concern that connector-side async bookkeeping drifts from the server.

  • get_query_state / get_execution_result re-attach by id. They call Session.attach_async_statement(command_id) and read state / await results from the server, instead of consulting a connector-side handle map. SEA keys GetStatementStatus/Result purely on the id, so this works whether or not the connector still holds the submitting handle — and get_execution_result is inherently re-callable (each call re-attaches), matching Thrift's re-fetchable operation handle. No more unknown command_id on a second fetch.
  • Deleted _closed_commands (and its bounded-FIFO machinery _record_closed / _CLOSED_COMMANDS_MAX). This was the invented state — the connector guessing CLOSED-vs-SUCCEEDED locally, with the bounded FIFO making the answer depend on how many other commands had closed since. The server reports CLOSED natively (verified live: GET after close → 200 state=CLOSED until the result TTL, then 404 NOT_FOUND → mapped to a terminal state via _is_not_found).
  • _async_handles / _async_statements demoted to keep-alive-only. They're retained for one reason: the submitting ExecutedAsyncStatement's Drop fires close_statement, which would kill the live async query — so the connector holds the handle until close_command / close_session. They are no longer consulted for state or results.

Cursor-state tracking (T7)

  • Metadata no longer leaves active_command_id stale. Set in the _make_result_set chokepoint so sync execute, async fetch, AND metadata all leave the cursor pointing at the producing command (matches Thrift's _handle_execute_response).
  • Failed sync execute publishes the statement id (from the canceller's inflight slot) before re-raising. Best-effort; never masks the original error.

Metadata

  • Wildcard/blank catalog → all-catalogs ('%'/'*'/''/NoneNone) across schemas/tables/columns (_catalog_or_none), making them symmetric (fixes columns).
  • Empty-string pattern → match-all (_none_if_blank); %/* stay as LIKE wildcards.
  • Dropped the table_types connector drain — the kernel filters it (case-insensitively, Bump version to 2.6.0 #139). Removed dead _drain_kernel_handle/_StaticArrowHandle + unused imports.
  • Fixed Cursor.columns() docstringcatalog_name=None is accepted on all backends.

DML rowcount (enhancement)

  • Surface num_modified_rows as cursor.rowcount; None leaves -1 (matches Thrift). getattr-guarded.

Testing

  • Unit (tests/unit/test_kernel_client.py, 116 pass): get_query_state/get_execution_result rewritten to mock attach_async_statement (server-state, NotFound→terminal, re-callable); close paths assert close+drop (no _closed_commands); metadata normalization; num_modified_rowsrowcount. test_client.py 128 pass.
  • E2E (live dogfood): async execute_async→poll→fetch, get_async_execution_result re-callable, result resumable from a fresh cursor (proves state comes from the server, not per-cursor tracking); empty-string filter; case-insensitive table_types (self-contained table+view); active_command_id after metadata; DML rowcount. All green.
  • black (pinned 22.3.0) clean.

Brings the use_kernel=True path closer to Thrift parity on the
cursor-state-tracking and metadata clusters from the API+CUJ gap
audit, surfaces DML rowcount, and makes async command state
server-sourced.

Async command state — server is the source of truth (no local state):
- get_query_state / get_execution_result now re-attach to the
  statement by id (Session.attach_async_statement, kernel side) and
  read state / await results from the server, instead of consulting a
  connector-side handle map. SEA keys GetStatementStatus/Result purely
  on the id, so this works whether or not the connector still holds the
  submitting handle — and get_execution_result is inherently re-callable
  (each call re-attaches), matching the Thrift backend's re-fetchable
  operation handle.
- Deleted _closed_commands (and its bounded-FIFO machinery
  _record_closed / _CLOSED_COMMANDS_MAX): the connector no longer
  invents a CLOSED-vs-SUCCEEDED answer. The server reports CLOSED
  natively (verified: GET after close returns 200 state=CLOSED until
  TTL, then 404 NOT_FOUND, which maps to a terminal state). This is the
  local-state inconsistency removed.
- _async_handles / _async_statements are retained but DEMOTED to a
  keep-alive-only registry: the submitting ExecutedAsyncStatement's Drop
  fires close_statement, which would kill the live async query, so we
  hold the handle until close_command / close_session. They are no
  longer consulted for state or results.

Cursor-state tracking (T7):
- Set cursor.active_command_id in the _make_result_set chokepoint so
  sync execute, async fetch, AND metadata all leave the cursor pointing
  at the command that produced the current result set (matches Thrift's
  unconditional set in _handle_execute_response).
- On a failed sync execute, publish the server-issued statement id
  (read from the canceller's inflight slot) onto active_command_id
  before re-raising. Best-effort; never masks the original failure.

Metadata:
- Normalize wildcard/blank catalog ('%' / '*' / '' / None) to None
  (all-catalogs) across get_schemas/get_tables/get_columns
  (_catalog_or_none), making the three symmetric (fixes columns).
- Normalize empty/whitespace-only pattern args to None (match-all) via
  _none_if_blank. % / * stay as real LIKE wildcards on patterns.
- Drop the connector-side table_types drain + client-side refilter in
  get_tables; the kernel filters table_types itself (case-insensitively
  as of the batch-3 kernel change). Removed the dead
  _drain_kernel_handle / _StaticArrowHandle helpers + unused imports.
- Fix the Cursor.columns() docstring: catalog_name=None is accepted on
  all backends.

DML rowcount:
- Surface the kernel's num_modified_rows as cursor.rowcount for DML
  instead of the hardcoded -1. None (SELECT / warehouses that don't
  report it) leaves rowcount at -1, matching Thrift; getattr-guarded.

Bumps KERNEL_REV to the batch-3 kernel commit (case-insensitive
table_types, HTTP error-envelope parsing, num_modified_rows getter,
Session::attach_async_statement).

Co-authored-by: Isaac
Signed-off-by: Vikrant Puppala <vikrant.puppala@databricks.com>
@vikrantpuppala vikrantpuppala force-pushed the feat/use-kernel-cuj-batch3-fixes branch from 6a21a91 to 78748fa Compare June 8, 2026 11:58
@vikrantpuppala vikrantpuppala changed the title fix(kernel): cursor-state tracking + metadata parity for use_kernel fix(kernel): server-sourced async state + cursor-state/metadata parity for use_kernel Jun 8, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant