fix(concurrency): guard 20+ race conditions across getOrCreate, balance updates, and view permissions by hongwei1 · Pull Request #2845 · OpenBankProject/OBP-API

hongwei1 · 2026-06-23T07:28:53Z

Summary

This branch fixes 20+ concurrency race conditions identified through systematic hazard analysis and concurrency test suite execution. All 20 scenarios in CONCURRENCY_HAZARDS.md are now in the Gracefully handled tier.

Core fixes

Atomic balance updates — replace Lift Mapper read-modify-write with Doobie SELECT FOR UPDATE row-level lock (DoobieBankAccountQueries)
MFA double-spend prevention — lockTransactionRequest now uses runUpdate (Strategy.default) instead of runQuery (Strategy.void) so the FOR UPDATE lock is held through the transaction on the fallback path
getOrCreate races — wrap all saveMe() in tryo + re-fetch on Failure across 9 providers: OAuth Consumer, SystemView, ViewPermission, AccountHolder, Entitlement, mapping providers (Customer / Transaction / Account), and BadLoginStatus
Scheduler stale-save — consent scheduler uses conditional UPDATE WHERE mStatus = guardStatus; removed direct mStatus(...).saveMe() that could resurrect revoked consents
View permission races — guard addPermission, addPermissions, revokePermission, and migrateViewPermissions against duplicate inserts; fix removeAllViewsAndVierPermissions to explicitly delete AccountAccess before bulkDelete_!! (which bypasses beforeDelete hooks)
Future counter atomicity — incrementFutureCounter / decrementFutureCounter use UPDATE ... SET n = n ± 1 instead of read-increment-write

Post-review correctness fixes (5 additional)

lockTransactionRequest: runQuery → runUpdate (critical: lock was released before payment logic ran on non-request-scoped paths)
OAuth getOrCreateConsumer re-fetch: use (azp, sub) match; fall back to key when either is absent to avoid matching unrelated consumers via empty-string
removeAllViewsAndVierPermissions: explicit AccountAccess.bulkDelete_!! before ViewDefinition.bulkDelete_!!
LoginAttempts.incrementBadLoginAttempts: wrap first-time create.save in tryo
migrate_00000020.sql: dedup prerequisite for new UniqueIndex additions on mapperaccountholder and mappedentitlement

Codebase-wide audit (4 additional mapping provider fixes)

Full scan of the codebase revealed the same unguarded getOrCreate pattern in three ID-mapping providers and getOrCreateBadLoginStatus. All four fixed with the same tryo + re-fetch pattern.

Test plan

All 20 scenarios in obp-api/src/test/scala/code/concurrency/CONCURRENCY_HAZARDS.md pass
ConcurrentViewPermissionRaceTest — 4 scenarios (N, O, R, migrate)
CI shards 1–4 green
Run migrate_00000020.sql against any DB with pre-existing data before deploying

Adds a tagged ScalaTest suite that simulates 16 confirmed database concurrency hazards in OBP-API: lost-update, check-then-act, check-then-insert, unique-constraint-unhandled, and counter-sequence races across money movement, security, consent scheduling, view permissions, and OAuth user creation paths. Each scenario asserts the theoretically-correct outcome so that a hazard surfaces as a FAILED test (red bar = evidence the hazard is real). Two safeguard scenarios (connection pool back-pressure, per-request context isolation) are verified to PASS. All scenarios are tagged ConcurrencyRace and excluded from the CI main flow. See CONCURRENCY_HAZARDS.md for the full taxonomy, source locations, and three-tier protection analysis. Run only these tests: mvn -pl obp-commons,obp-api scalatest:test \ -DtagsToInclude=code.concurrency.ConcurrencyRace \ -DfailIfNoTests=false

…ards

…into feature/concurrency-hazard-tests

…json4s in concurrency tests

…locking and fallback transactor

…lect global Http4s proxy usage

…ondition

…vel locking Replace non-atomic read-modify-write counter updates in LoginAttempt.incrementBadLoginAttempts and MappedChallengeProvider.validateChallenge with atomic Doobie SQL operations using SELECT FOR UPDATE + UPDATE to serialize concurrent access at the database level. Previously, concurrent requests all read the same starting counter value and each wrote back an identical incremented value (lost-update), allowing attackers to: - Send concurrent bad-login attempts without consuming the lockout threshold (Scenario H) - Submit concurrent wrong challenge answers without exhausting the allowed-attempt budget (Scenario K) The SELECT FOR UPDATE acquires an exclusive row lock before the UPDATE, forcing subsequent transactions to wait and then re-read the current committed counter value rather than operating on their snapshot value. Verified against H2 (MVCC/MVStore) in concurrency tests.

…consents Replace unconditional .save calls in ConsentScheduler with conditional Doobie UPDATEs that guard on the expected current status. If the status was changed by a concurrent HTTP request between the scheduler's findAll and its update, the WHERE clause does not match and the stale write is silently skipped (0 rows updated), preserving the terminal status committed by the HTTP path. Affected scheduler tasks: - expiredBerlinGroupConsents: guards on mstatus IN ('valid','VALID') - unfinishedBerlinGroupConsents: guards on mstatus = 'received' - expiredObpConsents: guards on mstatus = 'ACCEPTED' Fixes ConcurrentConsentRaceTest scenarios J and U.

… methods Add UniqueIndex constraints where missing and wrap saveMe() calls in scala.util.Try or tryo, re-fetching the committed row on constraint violation so concurrent callers always get a usable result rather than an unhandled JdbcSQLIntegrityConstraintViolationException. Affected methods: - MappedEntitlement.addEntitlement (C): UniqueIndex(bankId,userId,roleName) + tryo retry - MapperAccountHolders.getOrCreateAccountHolder (D): UniqueIndex(user,bankPermalink,accPermalink) + tryo retry - MapperCounterparties.getOrCreateMetadata (F): tryo retry on existing UniqueIndex(counterpartyId) - OAuth.getOrCreateConsumer (W): re-fetch on Failure from existing tryo block - MappedUserCustomerLink.getOCreateUserCustomerLink (L): Try retry on existing UniqueIndex(userId,customerId) - LiftUsers.getOrCreateUserByProviderId (I): Try retry on existing UniqueIndex(provider,providerId)

…ccess rows N: getOrCreateCustomPublicView — wrap createDefaultCustomPublicView in scala.util.Try; on constraint violation re-fetch the committed row. Prevents uncaught JdbcSQLIntegrityConstraintViolationException when two concurrent callers both see Empty and both try to insert the _public view. O: resetViewPermissions — wrap ViewPermission.create.save in scala.util.Try so a concurrent reset that already inserted the same (bank,account,view, permission) row is handled gracefully instead of throwing. R: ViewDefinition.beforeDelete cascade — add a beforeDelete hook that deletes all AccountAccess rows for the view before the view row is removed. Closes the TOCTOU window in removeCustomView's empty-check-then-delete sequence: any AccountAccess granted in that window is cleaned up atomically with the view deletion rather than orphaned.

…ter atomic Replace the non-atomic getOrDefault + put pair with ConcurrentHashMap.compute, which holds the internal segment lock for the duration of the read-modify-write. This prevents concurrent callers from reading the same stale tuple and overwriting each other's increments.

…enarios fixed

…ons against duplicate inserts - MapperViews.getOrCreateSystemView: wrap createDefaultSystemView in scala.util.Try; re-fetch the committed row on constraint violation (mirrors the N fix for custom views) - MapperViews.migrateViewPermissions: wrap the four bare ViewPermission.create.save call sites (system-view and custom-view branches for both boolean and list-valued permissions) in scala.util.Try; duplicate inserts from concurrent migrations are silently absorbed - ConcurrentViewPermissionRaceTest: add 'migrate' scenario — two threads concurrently call migrateViewPermissions on a fresh custom view; asserts no exception and exactly one ViewPermission row per enabled permission - CONCURRENCY_HAZARDS.md: update to 20 scenarios; document M/P/migrateViewPermissions as fixed; add migrate row to the View Permissions results table

- DoobieTransactionRequestQueries: use runUpdate instead of runQuery for lockTransactionRequest so SELECT FOR UPDATE holds the row lock on the fallback (non-request-context) path; runQuery's Strategy.void transactor released the lock before the caller's payment logic could run - OAuth.getOrCreateConsumer: re-fetch after UniqueIndex failure now uses (azp, sub) only when both are present; when either is absent falls back to re-fetch by key to avoid matching an unrelated consumer stored with empty azp+sub fields - MapperViews.removeAllViewsAndVierPermissions: explicitly delete AccountAccess rows before ViewDefinition.bulkDelete_!! because bulkDelete_!! bypasses beforeDelete hooks, leaving orphaned rows - LoginAttempts.incrementBadLoginAttempts: wrap the first-time MappedBadLoginAttempt create in tryo so concurrent first-time bad logins don't propagate an uncaught JDBC UniqueIndex exception - Add migrate_00000020.sql: dedup mapperaccountholder and mappedentitlement rows before the new UniqueIndex additions can be applied by Schemifier; without prior dedup Schemifier's CREATE UNIQUE INDEX aborts on any production DB that accumulated duplicate rows before this fix

…n status against duplicate inserts Three ID-mapping providers (Customer, Transaction, Account) and getOrCreateBadLoginStatus in LoginAttempts had the same check-then-insert race: two concurrent callers both saw Empty, both called saveMe(), and the losing thread threw an uncaught UniqueIndex violation. Apply the standard tryo + re-fetch pattern to all four methods: - wrap saveMe in tryo to absorb the UniqueIndex violation - on Failure, re-fetch by the natural key to return the committed row - getOrCreateBadLoginStatus: replace eager .or(Full(saveMe())) with an explicit match so saveMe is only called when find returns non-Full

… providers - MappedCustomerIdMappingProvider: `case other => other.map(_.customerId)` fixes Box[MappedCustomerIdMapping] returned where Box[CustomerId] required - MappedTransactionIdMappingProvider: same fix → `.map(_.transactionId)` - MappedAccountIdMappingProvider: same fix → `.map(_.accountId)` - LoginAttempts: add `Failure` to the net.liftweb.common import so the inner tryo match arm compiles (was: not found: value Failure)

Replace opaque (c, o) tuple destructuring with (totalCallCount, openFuturesCount) in incrementFutureCounter and decrementFutureCounter. Add inline comments explaining the atomicity guarantee from ConcurrentHashMap.compute, the role of each counter, and the null-guard initialisation behaviour in the decrement path.

The per-permission boolean columns on ViewDefinition (canSeeTransactionAmount_, canSeeTransactionCurrency_, ...) were superseded by the normalized ViewPermission table. Production already reads via allowed_actions and writes via ViewPermission.resetViewPermissions, leaving the columns as a legacy data source read only by the one-time migrateViewPermissions bridge. - Reimplement the boolean permission accessors to derive from allowed_actions, making ViewPermission the single source of truth. - Comment out the boolean permission MappedBoolean columns; Schemifier stops managing them while the physical columns remain for a later SQL drop. The MappedText list columns (canGrantAccessToViews_ / canRevokeAccessToViews_) stay. - Remove the migrateViewPermissions bridge, the obsolete MigrationOf* view permission scripts, and their boot wiring in Migration.scala. - Drop the now-moot migrate concurrency scenario and its CONCURRENCY_HAZARDS.md entry, since the migrated method no longer exists. Closes #26

Replace the manual DBA script migrate_00000020.sql with an automatic pre-Schemifier dedup step called unconditionally in Boot.scala. The concurrency-hazard fix added UniqueIndex to MapperAccountHolders and MappedEntitlements. Schemifier issues CREATE UNIQUE INDEX without a prior duplicate check, so an existing DB with duplicates would fail to start. The SQL file was a manual workaround. The replacement adds Migration.database.deduplicateBeforeUniqueIndexSchemify() directly before schemifyAll() in Boot.scala. It is not gated by migration_scripts.* props (Schemifier itself is not gated), runs as a no-op on fresh/test DBs (table-absent check), and probes for duplicates before issuing the DELETE so it is cheap on healthy databases. On an existing database with duplicates it removes all but the lowest-id row per natural key before the UniqueIndex DDL runs. Also adds DbFunction.tableExistsByName(tableName: String) as a string-based counterpart to the existing tableExists(BaseMetaMapper) helper, to avoid importing the mapper objects inside Migration.

…ition The per-permission MappedBoolean columns were retired in issue #26 and left as a /* */ block comment to allow Schemifier to stop managing them while keeping the physical DB columns intact. Now that the approach has been verified in production, remove the dead comment block entirely. The two MappedText list columns (canGrantAccessToViews_ / canRevokeAccessToViews_) are intentionally kept — they serve a different purpose and are still active.

The request layer migrated from Lift Web to http4s, but the comments still described connection sharing in terms of "Lift HTTP request context". Update the terminology to http4s request scope and rename the private helper liftCurrentConnection -> currentRequestConnection. References to Lift Mapper's DB API (DB.currentConnection / DB.runQuery) are kept — Mapper remains the ORM.

Remove the obsolete ScalaTest simulation plan (2026-06-11-10-22-obp-api-api-tidy-robin.md). It was a pre-work planning artifact describing a superseded state (7 hazards A-G, account-holder still a non-unique Index, fixes listed as future work). It is referenced nowhere and is fully superseded by CONCURRENCY_HAZARDS.md, which documents the delivered suite (19 scenarios, all fixed). Salvage the one piece of lasting value the plan held that the reference lacked: the H2 in-memory reproduction caveats (DB-isolation-independent hazards, table-lock masking, the asymmetric "reproduced on H2 => Postgres has it; not reproduced != safe" conclusion, and the dispatch HttpClient pool retry note) — now a "Testing Notes" section in CONCURRENCY_HAZARDS.md.

…azard-tests # Conflicts: # obp-api/src/main/scala/code/api/util/migration/Migration.scala

…ion framework deduplicateBeforeUniqueIndexSchemify() is the only schema-touching call in Boot that runs outside Migration.database.executeScripts, and the call site had no explanation. With the migration README/scaladoc now establishing that schema changes go through the runOnce framework, a future reader could move this into executeScripts and silently break it. Document at both the Boot.scala call site and the method: it MUST run before schemifyAll() (both executeScripts passes run after Schemifier, despite the historical "before Schemifier" comment) and ungated by migration_scripts.* props (off in tests, while Schemifier creates the index in every environment). No logic change.

The scripts/migrate/*.sql files were manual DBA runbooks for upgrading existing databases by hand. Nothing executes them: there is no Flyway/Liquibase/RUNSCRIPT loader, no code or build reference, and src/main/scripts is not packaged into the artifact. Schema evolution is handled by Lift Schemifier plus the runOnce MigrationOf* framework in Migration.scala, which supersedes these one-off scripts. Removing them keeps the migration story in one place (Migration.scala + the migration README) and avoids implying these need to be run or extended. History is preserved in git if a legacy manual upgrade path is ever needed.

… boot sequence Both executeScripts passes run AFTER schemifyAll(); the comment and log claiming the existing-DB pass runs "before Lift Mapper Schemifier" have been stale since 2021 (commit ea45370 moved schemifyAll ahead of both passes). The startedBeforeSchemifier flag selects the existing-DB pass, in which post-Schemifier-dependent migrations skip themselves — it does not change call order relative to Schemifier. Reword the Boot comment and the existing-DB log line, and document the real semantics of the flag on Migration.database.executeScripts. The fresh-DB log line is already accurate and is left unchanged. No behaviour change.

…e docs The resetPasswordUrl description had a raw <email> placeholder and two dynamic-entity-backup descriptions had a raw <BackupEntityName> placeholder. When the resource docs are serialized and parsed as XML (ResourceDocsTest "getResourceDocsObp Api -v6.0.0"), these read as unterminated XML start tags and throw SAXParseException, failing the ResourceDocs CI shard. Replace <email> with the concrete example user@example.com (already the idiomatic placeholder elsewhere in this file) and <BackupEntityName> with {BackupEntityName}. Documentation text only.

… DB vendors The natural-key dedup ran DELETE ... WHERE id NOT IN (SELECT MIN(id) FROM sameTable GROUP BY ...). That survivor-set form names the target table inside the DELETE subquery, which MySQL/MariaDB reject with ERROR 1093, and it scales with table size rather than with the number of duplicates. MySQL is a first-class OBP target (driver shipped, per-vendor SQL branches throughout this package), so the form could abort boot on a MySQL deployment while passing on H2/Postgres. Switch to a delete-set form: a derived table over ROW_NUMBER() OVER (PARTITION BY <natural key> ORDER BY id ASC) that targets only the surplus rows. The derived table is materialised, which sidesteps 1093, and window functions are supported by every driver OBP ships (PostgreSQL, H2 2.x, MySQL 8+/MariaDB 10.2+, SQL Server, Oracle). The survivor is unchanged: the lowest id per natural-key group. Also replace execute() with executeUpdate() so the destructive cleanup logs how many rows it removed, and document the lossy survivor policy: discarded duplicates' mentitlementid UUIDs and provenance columns are dropped, while the surviving row still encodes the same grant/ownership. Verified the new SQL on H2 2.2.220 (the shipped version): it keeps the lowest id per group, including for NULL partition keys.

sonarqubecloud · 2026-06-24T10:30:24Z

Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

hongwei1 added 18 commits June 11, 2026 15:25

docs: add concurrency test suite summary

9209722

docs: merge concurrency hazard docs into single CONCURRENCY_HAZARDS.md

cd8adee

docs: add ScalaTest simulation plan for concurrent race condition haz…

5fdc6b1

…ards

Merge branch 'develop' of https://github.com/OpenBankProject/OBP-API …

b501028

…into feature/concurrency-hazard-tests

refactor: migrate JSON serialization imports from net.liftweb to org.…

8bc031a

…json4s in concurrency tests

feat: implement atomic bank account balance updates using Doobie row …

5de7ff7

…locking and fallback transactor

docs: Update outdated V6/V7 comments in RequestScopeConnection to ref…

26941ab

…lect global Http4s proxy usage

fix(transaction): prevent Doobie escaping and MFA double-spend race c…

ed1242b

…ondition

docs(concurrency): update CONCURRENCY_HAZARDS.md to reflect all 19 sc…

ff19daf

…enarios fixed

hongwei1 changed the title ~~Feature/concurrency hazard tests~~ fix(concurrency): guard 20+ race conditions across getOrCreate, balance updates, and view permissions Jun 23, 2026

hongwei1 closed this Jun 23, 2026

hongwei1 reopened this Jun 23, 2026

hongwei1 added 8 commits June 23, 2026 17:37

Merge remote-tracking branch 'OBP/develop' into feature/concurrency-h…

e9bb033

…azard-tests # Conflicts: # obp-api/src/main/scala/code/api/util/migration/Migration.scala

hongwei1 added 4 commits June 24, 2026 10:21

simonredfern merged commit aad6300 into OpenBankProject:develop Jun 24, 2026
11 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(concurrency): guard 20+ race conditions across getOrCreate, balance updates, and view permissions#2845

fix(concurrency): guard 20+ race conditions across getOrCreate, balance updates, and view permissions#2845
simonredfern merged 31 commits into
OpenBankProject:developfrom
hongwei1:feature/concurrency-hazard-tests

hongwei1 commented Jun 23, 2026 •

edited

Loading

Uh oh!

Uh oh!

sonarqubecloud Bot commented Jun 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

hongwei1 commented Jun 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Core fixes

Post-review correctness fixes (5 additional)

Codebase-wide audit (4 additional mapping provider fixes)

Test plan

Uh oh!

Uh oh!

sonarqubecloud Bot commented Jun 24, 2026

Quality Gate passed

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

hongwei1 commented Jun 23, 2026 •

edited

Loading