본문으로 건너뛰기

2026-04-29 — Exhaustive brand_id RW (SELECT/UPDATE/DELETE) Audit (T14-A baseline)

Round-14 deliverable. The T9 INSERT audit (docs/runbooks/multi-brand/2026-04-28-brand-id-insert-audit.md) closed the write half of the brand-leak class. This document closes the symmetric read/update/delete half — the failure mode that has burned 8 prior cross-review rounds because every prior audit only looked at INSERT statements.

Codex's T14 review surfaced 8 explicit RW leaks in admin SELECT/UPDATE/DELETE statements that ALL prior cross-reviews missed because the classifier only audited INSERTs. This document expands the audit to the bounded universe of reads/mutations/deletes against brand-scoped tables and, like T9-A, makes the migration table list the canonical source of truth so future regressions are caught by CI.

Future regressions can be detected by re-running the greps in § Reproducing the audit and re-classifying via tools/audit/brand_id_rw_classifier.py. The Make target make audit-brand-id-rw wraps both steps and is gated by CI.

Cross-reference

  • INSERT half: docs/runbooks/multi-brand/2026-04-28-brand-id-insert-audit.md (T9-A baseline).
  • This doc: T14-A baseline for SELECT/UPDATE/DELETE.
  • Both halves run on every PR via .github/workflows/brand-id-audit.yml.
  • Phase 16 hard-flip is now gated on both classifiers reporting FIX=0 (see tools/multi_brand_backfill/phase16_dryrun.py).

Scope

  • Tree audited: servers_v2/ (active services). The legacy servers/ monolith is included in the grep but every match is classified OOS-LEGACY.
  • Tables audited: 122 brand-scoped tables — 111 in the rgb schema plus 11 in the wallet schema, sourced verbatim from RGB_SCHEMA_TABLES and WALLET_SCHEMA_TABLES in servers_v2/shared/rgb_db/migrations/versions/0024a_add_brand_id_nullable.py.
  • Patterns audited:
    • Raw text("SELECT … FROM <table> …"), text("UPDATE <table> SET …"), text("DELETE FROM <table> …") and their f-string variants.
    • Multi-table SELECTs that JOIN brand-scoped tables — every brand-scoped join target is classified separately so a missing brand_id predicate on one of them surfaces as a FIX even if a sibling join has the predicate.

The classifier is conservative by design: ties go to FIX so a human must explicitly mark a query OOS-INTENTIONAL (via the # brand-global: <reason> comment marker) before the gate accepts it.

Headline numbers

Column A is the T14-A baseline (the bounded universe at audit time). Column B is post-T14-B after the 10 explicit Codex sites were closed (see § Batch status). Column T15 is the post-T15-E sweep of the deferred 316-site backlog. Column T16-C is the current state after the dynamic-SQL classifier upgrade (no new FIX-DYNAMIC sites surfaced).

BucketSELECT (A → B → T15 → T16-C/current)UPDATE (A → B → T15 → T16-C/current)DELETE (A → B → T15 → T16-C/current)
OK (brand_id pinned)62 → 92 → 217 → 21913 → 24 → 97 → 971 → 2 → 24 → 24
FIX (production)235 → 209 → 0 → 088 → 81 → 0 → 027 → 26 → 0 → 0
FIX-DYNAMICn/a → n/a → n/a → 0n/a → n/a → n/a → 0n/a → n/a → n/a → 0
FIX-TEST3 → 3 → 0 → 00 → 0 → 0 → 00 → 0 → 0 → 0
OOS-LEGACY3 → 3 → 3 → 30 → 0 → 0 → 04 → 4 → 4 → 4
OOS-NON-SQL37 → 41 → 42 → 467 → 7 → 7 → 71 → 1 → 1 → 1
OOS-INTENTIONAL0 → 0 → 91 → 910 → 0 → 18 → 170 → 0 → 5 → 5
UNKNOWN0 → 0 → 0 → 00 → 0 → 0 → 00 → 0 → 0 → 0
TOTAL matches340 → 348 → 353 → 359108 → 112 → 122 → 12133 → 33 → 34 → 34

T14-A baseline production FIX sites: 350 (235 SELECT + 88 UPDATE + 27 DELETE). Post-T14-B production FIX sites: 316 (209 SELECT + 81 UPDATE + 26 DELETE). Post-T15-E production FIX sites: 0 (the entire 316-site backlog swept). Post-T16-C production FIX-DYNAMIC sites: 0 (the new dynamic-SQL class did not surface any real bugs the static classifier missed — the upgrade was prompted by the CR9 finding that pre-T16-A logs.py::_query_log_table was invisible to the classifier; T16-A already closed that helper). Test-fixture sites that need a code change: 0 (T15-E swept the last 3 FIX-TEST entries from the T14-A baseline).

Batch Status

Batch B closed every site Codex's T14 review explicitly flagged plus their immediate sibling endpoints in the same admin_service files. Per-route detail is in the commit message of 9287ce50.

T15 superseded the original T14-D / T14-E / T14-F deferred-backlog plan and swept the remaining 316 production FIX sites. The current classifier output is the release gate: FIX=0, FIX-DYNAMIC=0, and FIX-TEST=0 for SELECT, UPDATE, and DELETE. Any new non-zero entry is therefore a regression, not inherited backlog.

Per-batch fix manifest

The remainder of T14 closes the FIX entries below. Batch B is the fix-everything-in-one-shot batch; the audit baseline (Batch A, this document) is the deliverable that proves the universe is bounded. Batch C re-runs the classifier and updates this document's headline counts to confirm the universe was emptied.

The 350 entries are fanned out across 39 files. Two structural observations from the per-file breakdown:

  1. Codex's 8 explicit findings are clustered in the admin_service back-office routes (agent_finance.py, stats.py, player.py, player_finance.py, promotion.py). Those carry the highest exploitation risk because they are operator-driven mutations on financial state.
  2. The wallet/promotion/player service queries account for ~30% of the volume but most are inside helpers that are protected by an upstream assert_player_brand_matches_operator-style check. They are still defence-in-depth FIXes — the SQL must refuse to scope-cross even when an upstream guard is bypassed — but they do not represent currently-exploitable leaks.

Codex T14-priority sites (the 8 leaks that triggered this audit)

These are the highest-priority sites because they were explicitly exercised by Codex's review and demonstrate live cross-brand exploitability via guid guessing.

#WhereFix
1servers_v2/admin_service/app/api/routes/agent_finance.py:124,132 (agree_agent_withdrawal)Add AND brand_id = :brand_id to BOTH the SELECT and the UPDATE; brand from resolve_brand_id(request)
2servers_v2/admin_service/app/api/routes/agent_finance.py:186,209 (decline_agent_withdrawal)Same pattern
3servers_v2/admin_service/app/api/routes/agent_finance.py:259,267 (pay_agent_withdrawal)Same pattern
4servers_v2/admin_service/app/api/routes/stats.py:426–432 (player_provider_stat)Table doesn't exist (runtime-500 bomb) — rename to player_provider_stat_day AND add brand-scope predicate
5servers_v2/admin_service/app/api/routes/player.py:993–1006 (/players/registrations)Prepend brand_id = :brand_id to the dynamic WHERE 1=1 filter; brand from resolve_brand_id(request)
6servers_v2/admin_service/app/api/routes/player.py:1170–1192 (player_level_domain set primary)Add brand-scope to BOTH the SELECT and the UPDATE
7servers_v2/admin_service/app/api/routes/stats.py:190–512 (provider_stat_day, player_stat_day, agent_stat_day, promotion_stat_day, usdt_stat_day, player_provider_stat_day, rebate_stat_day, player_stat_acc)13 sibling stats endpoints share the same shape — add brand_id = :brand_id, fail-loud on missing brand context. admin_stat_acc / admin_stat_day are excluded (truly brand-global, see OOS-INTENTIONAL)
8servers_v2/admin_service/app/api/routes/player_finance.py:1281–1303 (/players/rolling/list)Prepend brand-scope to the dynamic WHERE 1=1 filter
9servers_v2/admin_service/app/api/routes/promotion.py:985–1005 (/promotions/list)Prepend brand-scope (Codex fixed coupons/list in 9d981cc9 but missed promotions/list in the same file)
10servers_v2/admin_service/app/middleware/brand.py:10–11 (docstring)Update the stale "fall back to body.brand_ids as before" line; the post-T11/T12 reality is fail-loud brand_required_envelope({})

Item 7 spans 14 sibling endpoints; the remaining items are 1 endpoint each. Items 1–9 sum to about 25 raw SQL statements out of the 350 total — the rest of the FIX list extends the same fix pattern across the back-office surface.

Full FIX inventory (350 sites — generated by the classifier)

The complete machine-generated list is the FAIL block of make audit-brand-id-rw (see § Reproducing the audit). Per-file totals:

FileSUDTotal
servers_v2/admin_service/app/api/routes/agent.py82313
servers_v2/admin_service/app/api/routes/agent_finance.py5308
servers_v2/admin_service/app/api/routes/banking.py47415
servers_v2/admin_service/app/api/routes/brand.py1001
servers_v2/admin_service/app/api/routes/finance_config.py3339
servers_v2/admin_service/app/api/routes/meta.py6006
servers_v2/admin_service/app/api/routes/player.py1618337
servers_v2/admin_service/app/api/routes/player_finance.py241025
servers_v2/admin_service/app/api/routes/promotion.py104216
servers_v2/admin_service/app/api/routes/stats.py160016
servers_v2/admin_service/app/api/routes/system.py65516
servers_v2/admin_service/app/services/brand_helpers.py1001
servers_v2/admin_service/app/tasks/check_expired.py0202
servers_v2/admin_service/app/tasks/expire.py1405
servers_v2/admin_service/app/tasks/level.py2204
servers_v2/admin_service/app/tasks/stat.py170017
servers_v2/admin_service/app/tasks/tag.py160521
servers_v2/admin_service/app/tasks/vera_sync.py7007
servers_v2/agent_service/app/api/routes/legacy_routes.py4004
servers_v2/agent_service/app/api/routes/messages.py1001
servers_v2/agent_service/app/api/routes/withdraw.py2002
servers_v2/agent_service/bootstrap_local_data.py1001
servers_v2/game_service/app/api/routes/integration.py6006
servers_v2/player_service/app/api/routes/auth.py2103
servers_v2/player_service/app/api/routes/player.py110011
servers_v2/player_service/app/api/routes/recovery.py1102
servers_v2/player_service/app/services/domain_cache.py4004
servers_v2/promotion_service/app/api/routes/coupon.py47011
servers_v2/promotion_service/app/api/routes/lossback.py3115
servers_v2/promotion_service/app/api/routes/rebate.py3115
servers_v2/promotion_service/app/api/routes/records.py83011
servers_v2/promotion_service/app/services/event_consumer.py2002
servers_v2/promotion_service/app/tasks/coupon_saga_recovery.py0606
servers_v2/recon_service/app/services/recon_store.py5005
servers_v2/rolling_service/app/api/routes/rolling.py2002
servers_v2/wallet_service/app/api/routes/plisio.py59014
servers_v2/wallet_service/app/api/routes/queries.py183021
servers_v2/wallet_service/app/api/routes/wallet.py5308
servers_v2/wallet_service/app/services/deposit_policy.py4004
servers_v2/wallet_service/app/services/wallet_bucket_commands.py1203
TOTAL2358827350

Brand-source recommendation per service

ServiceBrand source for RW predicates
admin_service/api/routes/*resolve_brand_id(request); fail-loud brand_required_envelope({"route": ...}) when None. The middleware (app/middleware/brand.py) populates request.state.brand_id from the X-Brand-Id header or the Redis domain map.
admin_service/tasks/*Per-brand iteration loop (T9-C pattern): enumerate enabled brands and run the task body once per brand, binding each iteration's brand_id into the SQL params.
admin_service/services/*Helpers receive brand_id: int as an explicit parameter; never read it from globals or fall back to a default.
agent_service/api/routes/*agent_token_brand_id(agent) (helper added in 9d981cc9) — the agent JWT's brand_id claim is the authoritative source for queries scoped to a single agent's view.
agent_service/bootstrap_local_data.pyHard-coded 1 (default brand) with a # dev-only bootstrap comment, mirroring the T9-D pattern for INSERTs.
wallet_service_request_brand_id(request) already established. For player-id-keyed queries, augment with a JOIN onto player.brand_id so the SQL refuses to return cross-brand rows even when the input player_id is from another brand.
game_serviceThe provider-callback brand is derived from the topology snapshot already loaded in the route; pass it as a named parameter into the SQL.
player_servicerequest.state.brand_id (gateway-injected); for routes called pre-auth (login, recovery), the brand is derived from the request domain via the same Redis lookup the gateway uses.
promotion_service_request_brand_id(request) for routes; per-brand iteration for the saga recovery task.
rolling_serviceThe rolling row's own brand_id (already loaded in the read for the cancel/complete codepath).
recon_serviceThe recon match path establishes brand context on shooter_sms UPDATE; SELECTs by ID can read across brands because no PII is exposed in the read shape. Mark with # brand-global: recon ID-based read, no cross-brand PII exposure after Batch B verifies.

OOS classifications (rationale)

OOS-LEGACY (3 SELECT + 0 UPDATE + 4 DELETE)

Same rationale as the INSERT audit: matches under the legacy servers/ tree are out of scope for the multi-brand isolation initiative which targets the v2 service split. See docs/reference/architecture/2026-04-rebuild-decision.md.

OOS-NON-SQL (37 + 7 + 1)

Matches inside test assertions, regex literals, comments, or docstrings that mention SELECT/UPDATE/DELETE for diagnostic purposes but do not execute SQL. The classifier flags any line that contains one of assert, in sql, in str(, re.compile|search|match, pattern =, .count(, .split(, r"SELECT|UPDATE|DELETE, leading #, or a numbered docstring bullet.

OOS-INTENTIONAL

Current count: 113 production sites (91 SELECT + 17 UPDATE + 5 DELETE).

Tables in BRAND_GLOBAL_BY_DESIGN (in brand_id_rw_classifier.py):

  • admin_stat_acc — platform-level cumulative aggregate, single row per period spanning all brands (per docs/architecture/data-ownership.md § Brand-Global Aggregates).
  • admin_stat_day — same, for daily granularity.

Add to this set with care: each entry is a permanent gap in CI coverage. Document the rationale in the classifier docstring AND in this section of the audit doc.

Current OOS-INTENTIONAL breakdown

CR9-B reviewed every site flagged OOS-INTENTIONAL and confirmed each falls into one of the categories below. The # brand-global: <reason> comment marker is present at every site (the classifier rejects the opt-out without it).

CategoryRationale
Resolver helpers (look up brand from a parent row)Helpers like _load_player_brand_id(player_id) MUST query the player row WITHOUT a brand filter — the whole point is to resolve the brand id from the parent so callers can then pin it. Adding a brand predicate would create a chicken-and-egg dependency. Each site has the marker AND a fail-closed resolve_strict_brand_id immediately downstream.
Scheduled tasks (cross-brand by design)The admin_service scheduler tasks (player_stat_acc, agent_stat_day, expire_coupons, expire_promotions, process_attendance, vera_sync_all) iterate every brand in turn and write per-brand result rows. The driver SELECT is intentionally cross-brand; the per-row UPSERTs DO carry brand_id.
Brand-global tables (admin_stat_*)The admin_stat_acc / admin_stat_day reads + writes — the platform-level aggregates that intentionally span all brands. Marker is implicit via the BRAND_GLOBAL_BY_DESIGN allow-list, not the per-line comment.
Cross-brand admin views (super_admin only)Routes like super_admin.list_brands, system.list_global_vars, the audit-trail viewers, and the legacy agent table reads (which is brand-global per ADR-009 — membership lives in agent_brand). All gated by require_super_admin.
Outbox / event-fanout queriesThe wallet/promotion/rolling outbox-poller scans outbox tables across all brands and dispatches per-row to brand-aware handlers. Per-row payloads carry the brand_id; the driver SELECT is intentionally cross-brand.
Topology / policy bundle readsUpdated 2026-05-06: per Phase 6B, wallet_topology and wallet_policy are now brand-scoped — uniqueness is (brand_id, code, version) and (brand_id, topology_code, version, policy_key) respectively (see docs/services/wallet-service.md "Multi-Brand Constraints"). Every active-bundle lookup binds brand_id via _brand_filter_clause. The historical "server-global" classification below dates to the pre-Phase-6B audit and is preserved for archaeology only. The store-layer _resolve_brand_id retains a default-brand fallback for defence-in-depth — production routes (topology.py, bucket_wallet.py, wallet.py, queries.py) all envelope-reject missing X-Brand-Id upfront so the fallback path is unreachable; the wallet_topology_default_brand_fallback_total counter exists to verify the fallback never fires in soak.
Idempotency / dedup checks (brand_id in JSON payload)A handful of helpers read idempotency rows by idempotency_key only (the brand_id is part of the JSON payload, not a filter column). Migration 0027 made the unique constraint (brand_id, idempotency_key) — these reads still match a single row.
Schema migrations / catalog scansOperator tooling that reads the migration history / brand catalog intentionally needs the cross-brand view.

CR9-B verified these sites are sound. The bucket grows over time as new genuinely-cross-brand queries land — the marker requirement keeps it auditable.

Classifier upgrades

T16-C dynamic-SQL detection

CR9-A surfaced one classifier blind spot: _query_log_table in servers_v2/admin_service/app/api/routes/logs.py built its SQL dynamically (WHERE 1=1 builder + f-string template f"SELECT ... FROM {table} WHERE {where} ..."). No static line carried a literal brand_id token, so the static classifier never flagged it even though the helper read every operator's log table across every brand. T16-A closed the helper itself; T16-C extended the classifier so the same construction pattern can never go undetected again.

The new FIX-DYNAMIC class flags two patterns when the surrounding ~60-line scope has no recognised brand idiom (conditions.append ("brand_id = :brand_id"), sql += " AND brand_id = :brand_id", alias-qualified variants):

  1. SQL strings containing WHERE 1=1 literal.
  2. f-string SQL templates f"SELECT ... WHERE {where} ..." where both the table AND the WHERE clause are templated.

FIX-DYNAMIC shares the same exit-code semantics as FIX (non-zero when count > 0). Conservative bias — when uncertain, flag.

A retroactive run on the pre-T16-A tree confirms the upgrade catches the 2 SQL templates inside _query_log_table (the count + fetch queries). The current tree (post-T16-A) reports FIX-DYNAMIC = 0 across the repo — the upgrade did not surface any new bugs the static classifier missed.

Reproducing the audit

mkdir -p /tmp/brand_audit_rw

grep -rniE --include='*.py' \
'SELECT[[:space:]]+' \
servers_v2 servers \
> /tmp/brand_audit_rw/all_selects.txt

grep -rniE --include='*.py' \
'UPDATE[[:space:]]+(rgb\.|wallet\.|`[^`]+`\.)?[a-z_]+[[:space:]]+SET' \
servers_v2 servers \
> /tmp/brand_audit_rw/all_updates.txt

grep -rniE --include='*.py' \
'DELETE[[:space:]]+FROM[[:space:]]+' \
servers_v2 servers \
> /tmp/brand_audit_rw/all_deletes.txt

python3 tools/audit/brand_id_rw_classifier.py

Or use the wrapper:

make audit-brand-id-rw

CI runs the same Make target on every PR (see .github/workflows/brand-id-audit.yml); the classifier exits non-zero when RW FIX > 0 so a regression fails the build.

After every fix batch, re-run the greps and confirm the bucket counts move toward the target steady state:

target steady state:
SELECT: FIX=0 FIX-TEST=0 OK ≥ 297 (was FIX=235, FIX-TEST=3, OK=62)
UPDATE: FIX=0 FIX-TEST=0 OK ≥ 101 (was FIX=88, FIX-TEST=0, OK=13)
DELETE: FIX=0 FIX-TEST=0 OK ≥ 28 (was FIX=27, FIX-TEST=0, OK=1)

post-Batch B intermediate state:
SELECT: FIX=209 FIX-TEST=3 OK=92
UPDATE: FIX=81 FIX-TEST=0 OK=24
DELETE: FIX=26 FIX-TEST=0 OK=2

If a new FIX entry appears that is not in the manifest above, treat it as a regression — a brand-scoped read/mutation/delete was added without a brand_id predicate between this baseline and the present commit.

Why the previous rounds missed sites

Every cross-review from T6 through T13 audited INSERT INTO and insert(<Model>) exclusively. The cross-brand exploit shape — guess a guid for a row owned by another brand and SELECT/UPDATE/DELETE it — never showed up in the INSERT audit because INSERTs always run with the writer's brand context attached. The reads/mutations/deletes are where guid guessing turns into actual cross-brand data exfiltration or mutation.

T14-A makes the migration table list the canonical source of truth for the RW path too: the classifier reads RGB_SCHEMA_TABLES + WALLET_SCHEMA_TABLES directly from 0024a_add_brand_id_nullable.py so adding a new brand-scoped table to the migration automatically extends the audit's coverage.

Last Verified Commit

934d47f4 (post-T16-C; T15-A..E swept the entire 316-site deferred RW backlog → make audit-brand-id-rw reports FIX=0 + FIX-TEST=0; T16-A closed the CR9 admin_service Critical+Important findings; T16-B fail-loud-converted every conditional brand predicate in wallet_service; T16-C extended the classifier with the new FIX-DYNAMIC class to detect helper-style queries). Phase 16 dry-run gate (tools/multi_brand_backfill/phase16_dryrun.py) now reports GO on the RW dimension.