fix(kernel): server-sourced async state + cursor-state/metadata parity for use_kernel by vikrantpuppala · Pull Request #838 · databricks/databricks-sql-python

vikrantpuppala · 2026-06-06T13:27:33Z

What

Brings the use_kernel=True path closer to Thrift parity on the cursor-state-tracking (T7) and metadata clusters from the API+CUJ gap audit, surfaces DML rowcount, and makes async command state server-sourced (no connector-side state to drift). Batch 3 of the use_kernel parity work (follows #824 / #830).

Depends on kernel PR databricks/databricks-sql-kernel#139 (case-insensitive table_types, HTTP error-envelope parsing, num_modified_rows getter, Session::attach_async_statement). KERNEL_REV points at that PR's branch HEAD; re-pin to the squash SHA before this merges.

Async command state — server is the source of truth

This addresses the core concern that connector-side async bookkeeping drifts from the server.

get_query_state / get_execution_result re-attach by id. They call Session.attach_async_statement(command_id) and read state / await results from the server, instead of consulting a connector-side handle map. SEA keys GetStatementStatus/Result purely on the id, so this works whether or not the connector still holds the submitting handle — and get_execution_result is inherently re-callable (each call re-attaches), matching Thrift's re-fetchable operation handle. No more unknown command_id on a second fetch.
Deleted _closed_commands (and its bounded-FIFO machinery _record_closed / _CLOSED_COMMANDS_MAX). This was the invented state — the connector guessing CLOSED-vs-SUCCEEDED locally, with the bounded FIFO making the answer depend on how many other commands had closed since. The server reports CLOSED natively (verified live: GET after close → 200 state=CLOSED until the result TTL, then 404 NOT_FOUND → mapped to a terminal state via _is_not_found).
_async_handles / _async_statements demoted to keep-alive-only. They're retained for one reason: the submitting ExecutedAsyncStatement's Drop fires close_statement, which would kill the live async query — so the connector holds the handle until close_command / close_session. They are no longer consulted for state or results.

Cursor-state tracking (T7)

Metadata no longer leaves active_command_id stale. Set in the _make_result_set chokepoint so sync execute, async fetch, AND metadata all leave the cursor pointing at the producing command (matches Thrift's _handle_execute_response).
Failed sync execute publishes the statement id (from the canceller's inflight slot) before re-raising. Best-effort; never masks the original error.

Metadata

Wildcard/blank catalog → all-catalogs ('%'/'*'/''/None → None) across schemas/tables/columns (_catalog_or_none), making them symmetric (fixes columns).
Empty-string pattern → match-all (_none_if_blank); %/* stay as LIKE wildcards.
Dropped the table_types connector drain — the kernel filters it (case-insensitively, Bump version to 2.6.0 #139). Removed dead _drain_kernel_handle/_StaticArrowHandle + unused imports.
Fixed Cursor.columns() docstring — catalog_name=None is accepted on all backends.

DML rowcount (enhancement)

Surface num_modified_rows as cursor.rowcount; None leaves -1 (matches Thrift). getattr-guarded.

Testing

Unit (tests/unit/test_kernel_client.py, 116 pass): get_query_state/get_execution_result rewritten to mock attach_async_statement (server-state, NotFound→terminal, re-callable); close paths assert close+drop (no _closed_commands); metadata normalization; num_modified_rows→rowcount. test_client.py 128 pass.
E2E (live dogfood): async execute_async→poll→fetch, get_async_execution_result re-callable, result resumable from a fresh cursor (proves state comes from the server, not per-cursor tracking); empty-string filter; case-insensitive table_types (self-contained table+view); active_command_id after metadata; DML rowcount. All green.
black (pinned 22.3.0) clean.

Brings the use_kernel=True path closer to Thrift parity on the cursor-state-tracking and metadata clusters from the API+CUJ gap audit, surfaces DML rowcount, and makes async command state server-sourced. Async command state — server is the source of truth (no local state): - get_query_state / get_execution_result now re-attach to the statement by id (Session.attach_async_statement, kernel side) and read state / await results from the server, instead of consulting a connector-side handle map. SEA keys GetStatementStatus/Result purely on the id, so this works whether or not the connector still holds the submitting handle — and get_execution_result is inherently re-callable (each call re-attaches), matching the Thrift backend's re-fetchable operation handle. - Deleted _closed_commands (and its bounded-FIFO machinery _record_closed / _CLOSED_COMMANDS_MAX): the connector no longer invents a CLOSED-vs-SUCCEEDED answer. The server reports CLOSED natively (verified: GET after close returns 200 state=CLOSED until TTL, then 404 NOT_FOUND, which maps to a terminal state). This is the local-state inconsistency removed. - _async_handles / _async_statements are retained but DEMOTED to a keep-alive-only registry: the submitting ExecutedAsyncStatement's Drop fires close_statement, which would kill the live async query, so we hold the handle until close_command / close_session. They are no longer consulted for state or results. Cursor-state tracking (T7): - Set cursor.active_command_id in the _make_result_set chokepoint so sync execute, async fetch, AND metadata all leave the cursor pointing at the command that produced the current result set (matches Thrift's unconditional set in _handle_execute_response). - On a failed sync execute, publish the server-issued statement id (read from the canceller's inflight slot) onto active_command_id before re-raising. Best-effort; never masks the original failure. Metadata: - Normalize wildcard/blank catalog ('%' / '*' / '' / None) to None (all-catalogs) across get_schemas/get_tables/get_columns (_catalog_or_none), making the three symmetric (fixes columns). - Normalize empty/whitespace-only pattern args to None (match-all) via _none_if_blank. % / * stay as real LIKE wildcards on patterns. - Drop the connector-side table_types drain + client-side refilter in get_tables; the kernel filters table_types itself (case-insensitively as of the batch-3 kernel change). Removed the dead _drain_kernel_handle / _StaticArrowHandle helpers + unused imports. - Fix the Cursor.columns() docstring: catalog_name=None is accepted on all backends. DML rowcount: - Surface the kernel's num_modified_rows as cursor.rowcount for DML instead of the hardcoded -1. None (SELECT / warehouses that don't report it) leaves rowcount at -1, matching Thrift; getattr-guarded. Bumps KERNEL_REV to the batch-3 kernel commit (case-insensitive table_types, HTTP error-envelope parsing, num_modified_rows getter, Session::attach_async_statement). Co-authored-by: Isaac Signed-off-by: Vikrant Puppala <vikrant.puppala@databricks.com>

vikrantpuppala temporarily deployed to azure-prod June 6, 2026 13:27 — with GitHub Actions Inactive

vikrantpuppala force-pushed the feat/use-kernel-cuj-batch3-fixes branch from 6a21a91 to 78748fa Compare June 8, 2026 11:58

vikrantpuppala requested a deployment to azure-prod June 8, 2026 11:58 — with GitHub Actions In progress

vikrantpuppala changed the title ~~fix(kernel): cursor-state tracking + metadata parity for use_kernel~~ fix(kernel): server-sourced async state + cursor-state/metadata parity for use_kernel Jun 8, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(kernel): server-sourced async state + cursor-state/metadata parity for use_kernel#838

fix(kernel): server-sourced async state + cursor-state/metadata parity for use_kernel#838
vikrantpuppala wants to merge 1 commit into
mainfrom
feat/use-kernel-cuj-batch3-fixes

vikrantpuppala commented Jun 6, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

vikrantpuppala commented Jun 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What

Async command state — server is the source of truth

Cursor-state tracking (T7)

Metadata

DML rowcount (enhancement)

Testing

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

vikrantpuppala commented Jun 6, 2026 •

edited

Loading