Skip to content

fix(deps): add AAD bearer fallback and HTML sign-in detection in download_ado_file (closes #1671)#1675

Open
danielmeppiel wants to merge 2 commits into
mainfrom
autopilot/1671-ado-bearer-fallback
Open

fix(deps): add AAD bearer fallback and HTML sign-in detection in download_ado_file (closes #1671)#1675
danielmeppiel wants to merge 2 commits into
mainfrom
autopilot/1671-ado-bearer-fallback

Conversation

@danielmeppiel
Copy link
Copy Markdown
Collaborator

TL;DR

download_ado_file() now falls back to an AAD bearer token (via az login) when no ADO PAT is configured, and fail-closes on the Azure DevOps sign-in HTML page that ADO silently returns as HTTP 200 instead of a 401.

Closes #1671


Problem (WHY)

"apm gets into function download_ado_file without fetching a bearer token via az. Since there are no credentials attached to the request, ADO gives back the HTML file which is then saved in apm_modules." — #1671

Azure DevOps REST API returns an interactive sign-in HTML page with HTTP 200 (not 401) when authentication is absent or insufficient. Before this fix:

  1. No PAT configured, az-cli logged in: The download request carried no Authorization header; ADO returned HTML. The HTML was written to disk as if it were the requested .agent.md file. No error was surfaced to the user.
  2. Bearer path existed only in clone_engine.py: The az login credential was already wired for git-clone operations but never for the REST file-download path.

Approach (WHAT)

Two independent changes in DownloadDelegate.download_ado_file():

  1. AAD bearer fallback — when self._host.ado_token (the ADO PAT) is absent, call self._host.auth_resolver.resolve() to obtain an AuthContext. If auth_ctx.auth_scheme == "bearer", inject Authorization: Bearer <token>. The PAT branch is first and byte-identical to the pre-fix behaviour; bearer is strictly the fallback.

  2. HTML sign-in detection — after every HTTP 200, inspect Content-Type. If it contains text/html, raise a RuntimeError with an actionable message built via build_error_context(). The check fires on both the primary request and the main/master fallback request.

Auth-boundary compliance: download_strategies.py routes entirely through AuthResolver.resolve() (which internally handles bearer acquisition in core/auth.py). The module never imports the bearer provider directly, keeping lint-auth-signals.sh Rule A clean.


Implementation (HOW)

flowchart TD
    A[download_ado_file called] --> B{ado_token set?}
    B -- yes --> C[Basic auth header PAT unchanged]
    B -- no --> D[auth_resolver.resolve host org]
    D --> E{auth_scheme == bearer?}
    E -- yes --> F[Bearer auth header]
    E -- no --> G[No auth header]
    C & F & G --> H[HTTP GET api_url]
    H --> I{200?}
    I -- no --> J[raise_for_status / 404 fallback / 401-403 error]
    I -- yes --> K{Content-Type text/html?}
    K -- yes --> L[RuntimeError: sign-in page + build_error_context]
    K -- no --> M[return response.content]
Loading

Changed files

File Change
src/apm_cli/deps/download_strategies.py Added bearer fallback block (8 lines) + _check_html_signin() inner function (13 lines); removed one redundant import base64 from inside the function body
tests/unit/deps/test_download_strategies_phase3.py 7 new tests: 2 HTML-detection, 2 bearer-fallback, 2 mutation-break, 1 fallback-ref HTML path

Trade-offs

  • Bearer is resolved once per call (no caching at this layer). AuthResolver.resolve() already caches per (host, port, org), so repeated calls within one install plan are O(1) after the first.
  • HTML detection is Content-Type only (no body scan). This matches ADO's documented behaviour and avoids parsing HTML, which would be fragile.

Validation evidence

Acceptance condition Check Result
PAT present -> behaviour byte-identical test_pat_path_unchanged_when_pat_present + test_bearer_auth_resolver_not_called_when_pat_present PASSED
PAT absent + bearer available -> Authorization: Bearer header test_bearer_fallback_used_when_no_pat PASSED
text/html 200 -> RuntimeError, no bytes returned test_html_200_response_raises_actionable_error + test_html_200_response_writes_no_content PASSED
HTML detection fires on main/master fallback too test_html_detection_on_fallback_ref_response PASSED
Auth-boundary lint bash scripts/lint-auth-signals.sh -> exit 0 PASSED
Full ruff check + format uv run --extra dev ruff check src/ tests/ && ruff format --check PASSED
pylint R0801 uv run --extra dev python -m pylint --disable=all --enable=R0801 ... PASSED (10.00/10)
Full unit-deps suite uv run --extra dev pytest tests/unit/deps/ -q 1125 passed

How to test

# 1. Verify all new regression tests pass
uv run --extra dev pytest tests/unit/deps/test_download_strategies_phase3.py::TestDownloadAdoFile -v

# 2. Verify lint is clean
bash scripts/lint-auth-signals.sh
uv run --extra dev ruff check src/ tests/

# 3. Manual: unset ADO_APM_PAT, run az login, then apm update with an ADO dep
#    -> expect the file to be downloaded successfully (not HTML)
#    -> unset ADO_APM_PAT and log out of az -> expect an actionable error

…load_ado_file (closes #1671)

- When ADO_APM_PAT is absent, resolve an AAD bearer token via
  AuthResolver.resolve() (which internally uses az-cli); inject it as
  Authorization: Bearer <token>.  PAT path is first and unchanged.
- Detect ADO's silent HTTP 200 + text/html sign-in-page response;
  raise an actionable RuntimeError (via build_error_context) instead of
  writing the HTML to disk.  Detection fires on both the primary and the
  main/master fallback requests.
- Auth-boundary lint (scripts/lint-auth-signals.sh Rule A) is satisfied:
  download_strategies.py routes through AuthResolver, never imports the
  bearer provider directly.
- Regression tests added: HTML-200 raises, bearer header set when no PAT,
  PAT path unchanged, resolver bypassed when PAT present, fallback-ref
  HTML detection, mutation-break assertions.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings June 5, 2026 10:24
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes Azure DevOps file downloads so download_ado_file() (a) falls back to an AAD bearer token resolved via AuthResolver when no ADO PAT is configured, and (b) fail-closes when ADO returns an interactive sign-in HTML page with HTTP 200 (preventing corrupt HTML being written as a .agent.md payload).

Changes:

  • Add bearer-token fallback (via auth_resolver.resolve()) when ado_token is absent.
  • Add Content-Type: text/html detection to treat ADO sign-in HTML pages as an auth failure even when status is 200.
  • Add unit regressions covering bearer fallback and HTML sign-in detection (including fallback ref path).
Show a summary per file
File Description
src/apm_cli/deps/download_strategies.py Adds bearer fallback header injection and HTML sign-in detection for ADO downloads.
tests/unit/deps/test_download_strategies_phase3.py Adds regression tests for #1671 covering bearer fallback + HTML 200 detection behavior.

Copilot's findings

  • Files reviewed: 2/2 changed files
  • Comments generated: 2

Comment on lines +540 to +551
def _check_html_signin(response) -> None:
"""Fail-closed when ADO returns an interactive sign-in HTML page.

Azure DevOps responds with HTTP 200 + text/html when auth is
missing or insufficient instead of a 401. Writing that HTML to
disk produces a corrupt file (the #1671 bug). Detect it by
Content-Type and raise an actionable error before any bytes are
returned to the caller.
"""
content_type = response.headers.get("Content-Type", "")
if "text/html" in content_type:
error_msg = (
Comment on lines +933 to +938
# Must raise, never return html bytes as if they were file content
with pytest.raises(RuntimeError):
result = d.download_ado_file(self._dep(), "apm.yml")
# If we reach here, the bug is present: HTML was returned as content
assert result != html_body, "Bug: HTML sign-in page returned as file content"

…Type, fix test assertions

- Add early return in _check_html_signin when response.status_code != 200 so
  404/403 responses with text/html bodies fall through to raise_for_status and
  the existing 404-fallback / 401-403 error paths (addresses Copilot review and
  panel blocking finding: misleading 'sign-in page' error on non-200 responses).
- Lowercase Content-Type before substring match per RFC 7230 case-insensitivity
  (addresses Copilot review and panel blocking finding: bypass possible with
  'Text/HTML' or 'TEXT/HTML').
- Restructure test_html_200_response_writes_no_content to place assertions outside
  the pytest.raises context manager so they execute in the non-raising (buggy)
  path (addresses Copilot review and test-coverage-expert: dead assertions).
- Add test_404_with_html_content_type_does_not_trigger_sign_in_detection to lock
  in the status-code guard: 404+text/html must reach the main/master fallback.
- Add test_bearer_fallback_skipped_when_scheme_not_bearer to confirm non-bearer
  scheme tokens are not injected as Bearer headers.
- Add CHANGELOG entry for #1671 fix.

All folds: addresses panel CEO follow-ups FU-1 (status-code guard), FU-2
(case-normalization), FU-3 (dead assertions), FU-4 (CHANGELOG), FU-5 (404+html
regression trap); also addresses Copilot inline comments id:3362036952 and
id:3362036988.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@danielmeppiel
Copy link
Copy Markdown
Collaborator Author

[shepherd-driver] Advisory panel comment posting deferred - will retry.

@danielmeppiel
Copy link
Copy Markdown
Collaborator Author

APM Review Panel: ship_now

AAD bearer fallback is architecturally sound; all blocking correctness issues have been folded into this run -- PR is ready for maintainer review.

cc @danielmeppiel @sergio-sisternes-epam -- a fresh advisory pass is ready for your review.

The panel reviewed PR #1675 across all lenses. Two correctness defects were identified in the initial submission and folded by the shepherd-driver before the final advisory pass:

  1. _check_html_signin was firing on all HTTP status codes, not just 200. A 404 response with text/html Content-Type (common for ADO error pages) would raise a misleading "sign-in page" error instead of triggering the existing 404-fallback path. Fixed by adding if response.status_code != 200: return as an early guard.

  2. Content-Type was compared case-sensitively; a server returning Text/HTML would bypass the sign-in detection, reintroducing the corrupt-file bug this PR fixes. Fixed by lowercasing the header value before comparison per RFC 7230.

With these folds applied, the auth boundary is clean: bearer acquisition routes through AuthResolver.resolve() (not a raw azure-identity import), PAT remains the first-attempted credential, HTML detection fails closed via build_error_context() (no corrupt file written), and no token or secret is logged. scripts/lint-auth-signals.sh exits 0.

Aligned with: Secure by default (PAT-first ordering preserved; HTML sign-in detection prevents silent corruption; bearer routed through AuthResolver); Pragmatic as npm (ADO users with az-login now resolve dependencies without any env-var ceremony).

Growth signal. This PR is the first credible zero-friction ADO onboarding path. Enterprise users can now apm install against ADO repos without setting ADO_APM_PAT if they use az login. The CHANGELOG entry has been added and frames the fix as a zero-config capability, not just a bug fix.

Panel summary

Persona B R N Takeaway
Python Architect 2 1 1 Bearer fallback routed correctly through AuthResolver; HTML detection had status-code ordering bug (FOLDED).
CLI Logging Expert 0 1 2 Error message is well-structured; misleading sign-in error on 404 addressed (FOLDED).
DevX UX Expert 0 1 2 Error messages are actionable; HTML check on wrong status codes addressed (FOLDED).
Supply Chain Security Expert 0 2 1 Bearer fallback routes through AuthResolver correctly; case-normalization folded.
OSS Growth Hacker 0 1 1 CHANGELOG entry added; zero-config ADO support is a strong enterprise story.
Auth Expert 0 0 2 Reservation satisfied: auth boundary clean, PAT first, no token leak.
Doc Writer 0 1 0 CHANGELOG entry added (FOLDED); existing docs already cover auth precedence.
Test Coverage Expert 0 2 1 Dead assertions fixed; 404+html regression trap added (FOLDED).

B = blocking-severity findings, R = recommended, N = nits.
Counts are signal strength, not gates. The maintainer ships.

Reservations carried from strategic-alignment

  • auth-surface: Confirm the AAD bearer path routes through AuthResolver/get_bearer_provider and NOT a raw azure-identity import; scripts/lint-auth-signals.sh must stay green; PAT path remains FIRST/unchanged; text/html sign-in 200 fails closed via build_error_context and writes no corrupt file; no secret/token is logged.
    -- addressed: scripts/lint-auth-signals.sh exits 0 on this commit; auth-expert confirmed the full contract; no raw azure-identity import present.

Folded in this run (commit 101bf4e)

  1. [panel / python-architect + cli-logging-expert + devx-ux-expert] Guard _check_html_signin on response.status_code == 200 -- prevents misleading "sign-in page" error on 404+html responses.
  2. [panel / python-architect + supply-chain-security-expert] Case-normalize Content-Type via .lower() per RFC 7230 -- prevents bypass with Text/HTML.
  3. [panel / test-coverage-expert + Copilot id:3362036988] Fixed dead assertions in test_html_200_response_writes_no_content -- moved outside pytest.raises block so they execute in the non-raising path.
  4. [panel / doc-writer + oss-growth-hacker] Added CHANGELOG.md entry for [BUG] Azure DevOps authentication via 'az login' does not always work #1671 fix under [Unreleased] > Fixed.
  5. [panel / test-coverage-expert] Added test_404_with_html_content_type_does_not_trigger_sign_in_detection -- regression trap for the status-code guard.
  6. [panel / test-coverage-expert] Added test_bearer_fallback_skipped_when_scheme_not_bearer -- confirms non-bearer scheme tokens are not injected as Bearer headers.

Copilot signals reviewed

  • id:3362036952 (path: src/apm_cli/deps/download_strategies.py) -- LEGIT: _check_html_signin fires on any status code and lacks case-normalization. Resolved in 101bf4e.
  • id:3362036988 (path: tests/unit/deps/test_download_strategies_phase3.py) -- LEGIT: Dead assertions inside pytest.raises context. Resolved in 101bf4e.

Deferred

None. All follow-ups were within the stated scope and have been folded.

Lint evidence

uv run --extra dev ruff check src/ tests/  -> All checks passed (exit 0)
uv run --extra dev ruff format --check src/ tests/  -> 1224 files already formatted (exit 0)
uv run --extra dev python -m pylint --disable=all --enable=R0801 --min-similarity-lines=10 --fail-on=R0801 src/apm_cli/  -> 10.00/10 (exit 0)
bash scripts/lint-auth-signals.sh  -> [+] auth-signal lint clean (exit 0)

CI evidence

All CI checks green on commit 101bf4e (run: https://github.com/microsoft/apm/actions/runs/27010185367)

Job Result
Lint pass
Build & Test Shard 1 (Linux) pass
Build & Test Shard 2 (Linux) pass
PR Binary Smoke (Linux) pass
Coverage Combine (Linux) pass
APM Self-Check pass
Spec conformance gate pass
CodeQL pass
NOTICE Drift Check pass
gate pass

Mergeability

#PR sha CEO stance iterations folds deferrals copilot rounds ci_status mergeable merge_state_status notes
#1675 101bf4e ship_now 1 6 0 1 green MERGEABLE BLOCKED awaiting maintainer review

Recommendation

The PR correctly solves #1671. The two blocking correctness issues (status-code guard and Content-Type normalization) have been folded and all CI checks are green. The PR is landing-ready; the maintainer may merge.


Full per-persona findings

Python Architect

  • [blocking] _check_html_signin fires on non-200 responses, breaking 404 fallback logic at src/apm_cli/deps/download_strategies.py:550
    The check runs BEFORE raise_for_status(). If a 404 response carries Content-Type: text/html, _check_html_signin raises RuntimeError('sign-in page') instead of letting raise_for_status() fire and reach the 404-fallback branch.
    Suggested: Guard with if response.status_code == 200:. FOLDED in 101bf4e.
  • [blocking] Content-Type comparison is not case-normalized at src/apm_cli/deps/download_strategies.py:549
    HTTP headers are case-insensitive per RFC 7230. A server returning Text/HTML bypasses the guard and corrupt HTML bytes flow to disk.
    Suggested: content_type = response.headers.get("Content-Type", "").lower(). FOLDED in 101bf4e.
  • [recommended] Dead assertions in test_html_200_response_writes_no_content at tests/unit/deps/test_download_strategies_phase3.py:935
    Assertions inside pytest.raises never run when exception is raised; test passes vacuously. FOLDED in 101bf4e.
  • [nit] Missing test for 404+html edge case. FOLDED in 101bf4e (test_404_with_html_content_type_does_not_trigger_sign_in_detection added).

CLI Logging Expert

  • [recommended] _check_html_signin should only fire on 2xx responses. FOLDED in 101bf4e.
  • [nit] Content-Type case-insensitivity. FOLDED in 101bf4e.
  • [nit] Error message wordy -- two sentences say the same thing. Not folded (nit; message is already actionable via build_error_context).

DevX UX Expert

  • [recommended] _check_html_signin fires on wrong status codes. FOLDED in 101bf4e.
  • [nit] Content-Type not case-normalized. FOLDED in 101bf4e.
  • [nit] No debug log when no auth credentials available. Not folded (separate diagnostic enhancement; would need logger dependency in download_strategies.py which was not introduced in this PR scope).

Supply Chain Security Expert

  • [recommended] Silent unauthenticated request possible when resolver returns no token. Not folded -- the HTML detection catches the resulting sign-in redirect downstream; a pre-flight fail-close would be a separate behavioral change. Sufficient defense-in-depth for v1.
  • [recommended] Content-Type not case-normalized. FOLDED in 101bf4e.
  • [nit] Status-code issue. FOLDED in 101bf4e.

OSS Growth Hacker

  • [recommended] Error message should include docs URL. Not folded -- build_error_context output is managed separately; adding a URL requires knowing the stable docs page, which is out of scope for this bugfix.
  • [nit] CHANGELOG entry missing. FOLDED in 101bf4e.

Auth Expert

  • [nit] Debug log when no auth available. Not folded -- same rationale as devx-ux nit (logger not available in this module's scope without introducing a new dependency).
  • [nit] import base64 moved from local to module-level. No action needed.

Doc Writer

  • Missing CHANGELOG entry. FOLDED in 101bf4e.
  • Existing docs already cover auth precedence -- no prose update needed.

Test Coverage Expert

  • [recommended] Dead assertions in test_html_200_response_writes_no_content. FOLDED in 101bf4e.
  • [recommended] Missing 404+html regression trap. FOLDED in 101bf4e.
  • [nit] Missing test for non-bearer scheme path. FOLDED in 101bf4e (test_bearer_fallback_skipped_when_scheme_not_bearer added).

This panel is advisory. It does not block merge. Re-apply the panel-review label after addressing feedback to re-run.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] Azure DevOps authentication via 'az login' does not always work

2 participants