Sanitize autoentity-generated names to prevent invalid REST paths and GraphQL singular/plural tables for SQL objects with spaces#3609
Conversation
Agent-Logs-Url: https://github.com/Azure/data-api-builder/sessions/2051de6a-3432-4bd6-901c-3a3d6a8cc114 Co-authored-by: RubenCerna2079 <32799214+RubenCerna2079@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
This PR improves MSSQL autoentity generation by sanitizing auto-generated entity names so SQL objects containing whitespace don’t produce invalid REST path segments or invalid GraphQL names during default singular/plural generation and config validation.
Changes:
- Add a whitespace-removal (camel-join) sanitizer for generated entity names and apply it during MSSQL autoentity generation.
- Extend MSSQL test schema with a table that contains spaces in its name and seed data for it.
- Update configuration tests to validate REST + GraphQL access for an autoentity generated from an object with spaces.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
| src/Service.Tests/DatabaseSchema-MsSql.sql | Adds an MSSQL table with whitespace in its name to exercise autoentity naming. |
| src/Service.Tests/Configuration/ConfigurationTests.cs | Extends autoentity end-to-end test to include an object with spaces and validates REST/GraphQL access. |
| src/Core/Services/MetadataProviders/SqlMetadataProvider.cs | Introduces SanitizeGeneratedEntityName helper for generated entity names. |
| src/Core/Services/MetadataProviders/MsSqlMetadataProvider.cs | Applies sanitization to autoentity-generated entityName before entity creation. |
| /// <summary> | ||
| /// Sanitizes the generated entity name by removing whitespace and capitalizing the next character after whitespace. | ||
| /// </summary> | ||
| /// <param name="name">The entity name to be sanitized.</param> | ||
| /// <returns>The sanitized entity name.</returns> | ||
| protected static string SanitizeGeneratedEntityName(string name) | ||
| { | ||
| StringBuilder sanitizedName = new(name.Length); | ||
| bool capitalizeNext = false; | ||
|
|
||
| foreach (char character in name) | ||
| { | ||
| if (char.IsWhiteSpace(character)) |
There was a problem hiding this comment.
should we sanitize other valid characters like []? for e.g.- [Customer-Orders] still produces dbo_Customer-Orders, which can be invalid for REST/GraphQL. if this is about only removing whitespaces, may be the function can be renamed with that.
There was a problem hiding this comment.
Use a more specific name (e.g., RemoveWhitespaceAndCamelCase) could improve clarity and avoid confusion for future contributors.
There was a problem hiding this comment.
Renamed to RemoveWhitespaceAndCamelCase in 29f0e79 → addressed in the latest commit. The updated XML doc also describes the camelCase-join behavior explicitly.
There was a problem hiding this comment.
Done — SanitizeGeneratedEntityName has been renamed to RemoveWhitespaceAndCamelCase with an updated doc comment that describes the camelCase-join behavior.
| } | ||
|
|
||
| // Sanitize the entity name by ensuring all whitespace characters are removed. | ||
| entityName = SanitizeGeneratedEntityName(entityName); |
There was a problem hiding this comment.
I am looking at this section lately. consider we have 2 schema in database- Order Item and OrderItem. can we include the schema name along with the entityName/objectName in logs and response? since, santitized object will always show up as OederItem
|
|
||
| // Sanitize the entity name by ensuring all whitespace characters are removed. | ||
| entityName = SanitizeGeneratedEntityName(entityName); | ||
|
|
There was a problem hiding this comment.
Sanitization via whitespace removal can introduce collisions. For example:
- "Order Item" → "OrderItem"
- "OrderItem" → "OrderItem"
Both would map to the same entity name after sanitization.
Do we currently detect or handle such collisions? It might be useful to either log or explicitly guard against these cases to avoid silent overrides or ambiguous schema behavior.
There was a problem hiding this comment.
Collisions are now surfaced explicitly. The original (pre-sanitization) name is stored before RemoveWhitespaceAndCamelCase is called. When entities.TryAdd fails (covering the "Order Item" / "OrderItem" → "OrderItem" collision case), the exception message includes both the sanitized name and the original name along with the schema, e.g.:
Entity 'OrderItem' (normalized from 'Order Item' in schema 'dbo') conflicts with autoentity pattern '...'. Use --patterns.exclude to skip it.
This makes it clear to the user that the collision may be a result of whitespace removal rather than an exact name match.
| /// <summary> | ||
| /// Sanitizes the generated entity name by removing whitespace and capitalizing the next character after whitespace. | ||
| /// </summary> | ||
| /// <param name="name">The entity name to be sanitized.</param> | ||
| /// <returns>The sanitized entity name.</returns> | ||
| protected static string SanitizeGeneratedEntityName(string name) | ||
| { | ||
| StringBuilder sanitizedName = new(name.Length); | ||
| bool capitalizeNext = false; | ||
|
|
||
| foreach (char character in name) | ||
| { | ||
| if (char.IsWhiteSpace(character)) |
There was a problem hiding this comment.
Use a more specific name (e.g., RemoveWhitespaceAndCamelCase) could improve clarity and avoid confusion for future contributors.
| capitalizeNext = false; | ||
| } | ||
|
|
||
| return sanitizedName.ToString(); |
There was a problem hiding this comment.
The sanitizer does not currently handle edge cases like empty or whitespace-only strings. For example:
" " → ""
This would result in an invalid entity name.
It may be worth guarding against such cases by either throwing or providing a fallback to ensure the final entity name is always valid.
There was a problem hiding this comment.
Edge case is now guarded. After calling RemoveWhitespaceAndCamelCase, if the result is empty (e.g. input was all whitespace), an error is logged with the original name and schema for context, and the entry is skipped via continue — consistent with how null/empty raw values are handled just above.
| [DataRow("foo.%", 0, DisplayName = "Test Autoentities with foo schema")] | ||
| [DataRow("bar.%", 1, DisplayName = "Test Autoentities with bar schema")] | ||
| [DataRow("dbo.Order Items", 2, DisplayName = "Test Autoentities with object with spaces")] | ||
| public async Task TestAutoentitiesGeneratedWithUnusualElements(string includePattern, int patternType) |
There was a problem hiding this comment.
The tests validate that REST and GraphQL access work end-to-end, which is great.
One improvement would be to explicitly assert the generated/sanitized entity name (e.g., "dbo_OrderItems") in the test. That would make the behavior more explicit and protect against future regressions in naming logic.
There was a problem hiding this comment.
The new TestAutoentitiesGeneratedWithSpacesInObjectName test declares const string EXPECTED_ENTITY_NAME = "dbo_OrderItems" and uses it directly as the REST path and GraphQL field name. The Assert.AreEqual failure message explicitly states that the entity name must be sanitized from "dbo_Order Items" to "dbo_OrderItems", making the expected generated name unambiguous.
| { | ||
| // Act | ||
| string path = isPatternFoo ? "foo_magazines" : "bar_magazines"; | ||
| string path; |
There was a problem hiding this comment.
The current test combines multiple scenarios (schemas + whitespace handling) using a patternType switch. This makes the test a bit harder to follow and extend.
It may be cleaner to split whitespace-specific validation into a separate test method for better readability and maintainability.
There was a problem hiding this comment.
Extracted into a dedicated TestAutoentitiesGeneratedWithSpacesInObjectName test method. TestAutoentitiesGeneratedWithUnusualElements now only covers the two non-default schema cases (foo/bar), while the whitespace normalization scenario has its own test with an explicit EXPECTED_ENTITY_NAME constant and targeted assertions.
… collision message, split whitespace test
Why make this change?
Tables with whitespace in names (for example,
Order Items) were being auto-generated into entity names likedbo_Order Items, which then produced invalid REST paths as well as invalid GraphQL names for singular/plural tables which failed config validation. This change ensures autoentity name generation does not pass whitespace through to REST path and GraphQL singular/plural tables defaults.{schema}/{object}verbatim; whitespace in object names breaksdab validate.What is this change?
RemoveWhitespaceAndCamelCaseinSqlMetadataProviderand applied it to generatedentity_namebefore entity creation. The method name is intentionally specific to clarify it only removes whitespace (not other invalid characters).dbo_Order Items→dbo_OrderItems.TryAddfails because two database objects normalize to the same entity name (e.g."Order Item"and"OrderItem"both yield"OrderItem"), the exception message includes the pre-normalization name and schema to make the collision diagnosable.TestAutoentitiesGeneratedWithSpacesInObjectNametest method that explicitly asserts the sanitized entity namedbo_OrderItems(viaconst string EXPECTED_ENTITY_NAME) and validates end-to-end REST and GraphQL access with that name.TestAutoentitiesGeneratedWithUnusualElementsfor better readability and maintainability.How was this tested?
Sample Request(s)
dab auto-config add mydef --patterns.include "dbo.Order Items" --patterns.name "{schema}_{object}"dab validatedbo.[Order Items]dbo_OrderItems(no whitespace)