Skip to content

FEAT: Adding json schema to all scorer datasets#2017

Open
rlundeen2 wants to merge 4 commits into
microsoft:mainfrom
rlundeen2:rlundeen2/scorer-yaml-response-schema
Open

FEAT: Adding json schema to all scorer datasets#2017
rlundeen2 wants to merge 4 commits into
microsoft:mainfrom
rlundeen2:rlundeen2/scorer-yaml-response-schema

Conversation

@rlundeen2

Copy link
Copy Markdown
Contributor

This PR adds schema to all scorer datasets. This allows for enforced JSON schemas on models that support it, and better/more consistent code reuse.

rlundeen2 and others added 4 commits June 15, 2026 14:54
Give every response-format-bearing scorer seed prompt a response_json_schema so
targets that support structured outputs enforce the shape. Scale/Likert scorers
share a new bundled scale_with_rationale schema; unique shapes use inline schemas.
Scorers now forward the schema to the target like the refusal scorer does.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…olean

Collapse licensed_therapist and crisis_management onto the shared
scale_with_rationale schema by normalizing their reasoning key to rationale,
removing the duplicated inline schemas and the rationale_output_key override.

Change the true/false score_value from a string enum to a JSON boolean in both
the bundled true_false_with_rationale schema and the inline true_false system
prompt, update the prompt text and refusal few-shot examples to emit booleans,
and add a test covering boolean parsing.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The JsonSchemaNormalizer already appends the schema body and the
'only return the JSON object' / 'Possible JSON response' boilerplate
for non-native targets, and native targets enforce the schema via
structured output. So the literal schema block and that boilerplate
were redundant in every response-format-bearing scorer prompt.

Keep the task-specific field descriptions (e.g. MHFA scale meanings)
in prose, since those carry semantics the shared schema does not.
insecure_code's literal JSON example is likewise replaced with a
prose key list.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The bundled schema's score_value switched from a string enum to
type: boolean, but three tests still asserted the old enum. Update
them to assert type == "boolean"; the deep-copy tests now mutate the
nested properties dict, which remains a valid deep-copy proof.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant