Skip to content

[FEATURE] LLM integration framework — multi-provider abstraction, cache, cost tracking, explanation generator #63

Description

@Wolfvin

Summary

Add LLM integration framework for CodeLens features that benefit from LLM reasoning (taint validation, secret FP check, smell justification, dead-code reason, bug explanation). Multi-provider (OpenAI/Anthropic/Google/DeepSeek/Z.ai GLM), disk cache, token cost tracking, opt-in.

Worker consensus (4 reports)

Worker Source Contribution
RepoAudit update!/CodeLens_Upgrade_Issues_from_RepoAudit.md CL-037 LLMTool ABC with invoke(input, output_cls), _get_prompt, _parse_response, built-in caching, retry, token-cost tracking. LLMToolInput/LLMToolOutput ABCs with __hash__/__eq__.
RepoAudit same file CL-042 LLM response cache at ~/.codelens/llm_cache/<tool_name>/<input_hash>.json (key = SHA-256 of (tool_name, model_name, input_hash)). Hit ratio in output. pricing.json per model. codelens llm-cache stats / clear. --no-cache, --max-cost-usd N flags.
RepoAudit same file CL-043 Multi-provider LLM abstraction — 6 providers: OpenAI, Anthropic, Bedrock, Google, DeepSeek, Z.ai GLM. Dispatch by model_name prefix. Lazy import. 60s timeout, 3-retry backoff.
RepoAudit same file CL-044 Bug report with LLM-generated CoT explanation — explanation: str, fix_suggestion: str, confidence: float. After PathValidator confirms is_reachable: True, invoke ExplanationGenerator. `<10s per finding.
CodeGraph update!/CodeLens_CodeGraph_Upgrade_Analysis.md #19 Reasoning offload — codelens_explore sends assembled context to remote OpenAI-compatible reasoning model, returns tight self-contained answer. Strictly degradable (any failure → null → caller falls back to local source). NEVER throws. BYO endpoint (CODELENS_OFFLOAD_URL, CODELENS_OFFLOAD_MODEL, CODELENS_OFFLOAD_API_KEY or keyEnv).
Semgrep update!/CodeLens_Upgrade_Issues_from_Semgrep.md CL-014 MCP prompt write_custom_codelens_rule(description) returns ready-to-use rule YAML. get_codelens_rule_schema, get_codelens_rule_yaml(rule_id).

Proposed scope (P2, 4-6 weeks total)

Phase 1 — LLMTool ABC + provider abstraction (P2, 1-2 weeks)

  • New scripts/llm/base_tool.py with LLMTool ABC
  • New scripts/llm/provider.py with 6 providers (OpenAI, Anthropic, Bedrock, Google, DeepSeek, Z.ai GLM)
  • Dispatch by model_name prefix (gpt-*, claude-*, gemini-*, deepseek-*, glm-*)
  • Lazy import per provider
  • 60s timeout, 3-retry exponential backoff
  • API keys from env vars per provider
  • Config via CODELENS_LLM_PROVIDER, CODELENS_LLM_MODEL, CODELENS_LLM_API_KEY env vars + codelens.yaml

Phase 2 — Disk cache + cost tracking (P2, 1 week)

  • Cache at ~/.codelens/llm_cache/<tool_name>/<input_hash>.json
  • Key = SHA-256 of (tool_name, model_name, input_hash) (invalidates on model change)
  • Cache value: {output, input_token_cost, output_token_cost, timestamp, model_name}
  • pricing.json per model (GPT-4o, Claude 3.7, Gemini 1.5 Pro, DeepSeek V3, Z.ai GLM-4)
  • codelens llm-cache stats / codelens llm-cache clear commands
  • --no-cache and --max-cost-usd N flags
  • Auto-evict entries >30 days
  • Thread-safe for concurrent agents
  • Hit ratio in output: {cache: {hits, misses, hit_ratio}}

Phase 3 — Explanation generator (P2, 1-2 weeks, depends on taint analysis depth issue Phase 7)

  • ExplanationGenerator LLMTool subclass
  • After taint PathValidator confirms is_reachable: True, invoke generator with bug_type/buggy_value/relevant_functions/path
  • Output: explanation: str (CoT), fix_suggestion: str (code snippet), confidence: float (0.0-1.0)
  • Embed in JSON, SARIF (result.message.text), Markdown (blockquote)
  • codelens explain <finding-id> command (re-generate)
  • MCP tool codelens_explain_finding
  • Target <10s per finding

Phase 4 — Reasoning offload (P3, 2-3 weeks, optional, depends on Phase 1)

  • codelens_explore does local retrieval, then sends context to remote reasoning model
  • Returns tight self-contained answer that becomes tool call result
  • Strictly degradable: any failure → null → caller falls back to local source verbatim
  • NEVER throws, NEVER isError
  • BYO endpoint via CODELENS_OFFLOAD_URL, CODELENS_OFFLOAD_MODEL, CODELENS_OFFLOAD_API_KEY
  • Token storage at ~/.codelens/credentials.json (revocable, org-scoped)

Phase 5 — MCP prompts for rule authoring (P3, 1 week, optional)

  • 3 MCP prompts: write_custom_codelens_rule(description), get_codelens_rule_schema, get_codelens_rule_yaml(rule_id)
  • Let Claude Code invoke /write_custom_codelens_rule description="detect SQL injection in Flask" and get validated rule

Acceptance criteria

  • Phase 1: all 6 providers work with valid API key
  • Phase 2: cache hit ratio >80% on repeated queries
  • Phase 2: --max-cost-usd 1.0 aborts before exceeding budget
  • Phase 3: explanation embedded in SARIF renders in GitHub code scanning UI
  • Phase 4: reasoning offload falls back gracefully on network failure

License note

RepoAudit is Purdue Non-Commercial — design influenced, reimplement from scratch. Use Z.ai GLM provider as default for CodeLens (matches existing z-ai-web-dev-sdk integration pattern).

Related

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions