Skip to main content

Multi-Brand Isolation

Status

Approved

Date

2026-04-27

Owners

  • Platform Backend
  • Player Domain
  • Wallet Domain
  • Agent Domain

Affected Services

  • gateway
  • player_service
  • wallet_service
  • rolling_service
  • promotion_service
  • game_service
  • agent_service
  • admin_service
  • recon_service
  • docs/adr/ADR-009-multi-brand-domain-routed-isolation.md
  • docs/adr/ADR-005-wallet-topology-bucket-ledger-model.md
  • docs/adr/ADR-001-document-driven-backend-change-workflow.md
  • docs/architecture/data-ownership.md
  • docs/architecture/domain-ownership.md
  • docs/architecture/system-overview.md
  • docs/architecture/http-entrypoints.md
  • docs/services/gateway.md
  • docs/services/player-service.md
  • docs/services/wallet-service.md
  • docs/services/rolling-service.md
  • docs/services/promotion-service.md
  • docs/services/game-service.md
  • docs/services/agent-service.md
  • docs/services/admin-service.md
  • docs/services/recon-service.md
  • docs/runbooks/multi-brand/multi-brand-isolation-rollout.md

Goal

Make servers_v2 capable of hosting multiple distinct player-facing brands on the same runtime, with brand-scoped player identity, money state, rolling and settlement, and configuration, while keeping a single PostgreSQL database, a single Redis, and a single Docker stack.

A "brand" is a self-contained operating product: its own player base, its own wallet topology and policy, its own promotion rules, its own configuration, its own domain footprint. Brands are identified at the request edge from the request domain. After login, brand identity is also carried in the JWT.

Scope

In scope:

  • A new brand aggregate and a new brand_config aggregate.
  • A new agent_brand join aggregate (brand membership for an agent).
  • Adding non-nullable brand_id to every brand-scoped table in servers_v2, with backfill into a single default brand.
  • Brand-scoped uniqueness constraints on player, wallet_topology, wallet_policy, agent_setting, agent_domain, and any other table whose current uniqueness must change.
  • Brand resolution in gateway from the request domain via the existing domain:agent:{host} and domain:level:{host} Redis maps, extended to carry brand_id.
  • Forwarding brand context downstream as X-Brand-Id.
  • Adding a brand_id claim to JWT and rejecting JWT/domain mismatches.
  • Brand-scoped query filtering in every domain service.
  • Wallet command rejection on cross-brand targets.
  • Brand-scoped wallet topology and wallet policy resolution.
  • Per-brand promotion configuration, coupon definitions, and rebate/cashback/lossback rates.
  • Per-brand rolling completion ratios.
  • Per-brand payment channel selection and per-brand operational config.
  • Game provider account namespacing ({brand_code}_{account}) and reverse parsing on callback.
  • Admin write surfaces for brand, brand_config, and agent_brand.
  • Agent allow-list enforcement at the agent edge.
  • Brand-aware structured logs and a brand label on player-, wallet-, rolling-, promotion-, and game-scoped Prometheus metrics.
  • Brand-aware outbox and cross-service event payloads.
  • Removal of the legacy_admin_v2 and legacy_auth staff routes and the supporting staff identity logic from admin_service.

Out of scope:

  • Schema-per-brand or database-per-brand isolation.
  • Per-brand game provider credentials.
  • A new back-office staff identity model. Staff is removed in this change and is not replaced.
  • Per-brand infrastructure (separate Redis, separate Postgres, separate Docker stack).
  • Cross-brand player identity migration (e.g. promoting a player from one brand to another).
  • Cross-brand wallet transfer.
  • Per-brand SLA, billing, or quota management.
  • Per-brand i18n and theming for any frontend that lives outside this repo (the brand_config slots are defined here; rendering is owned by the consuming frontend).

Background

servers_v2 was designed under a single-brand assumption. The only existing tenant-shaped artifact is the hard-coded project integer in admin_service/app/tasks/tag.py, which switches between two product labels and is not a real isolation mechanism. The platform now needs to host more than one brand at the same time on the same runtime.

gateway and player_service already carry a domain-routing mechanism: gateway._extract_domain reads Host or Origin and injects a domain field into the request body for legacy routes; player_service resolves agent_id and level_id from domain:agent:{host} and domain:level:{host} Redis maps. That mechanism is the natural extension point for brand routing and is preserved.

A no-staff posture is intentional. Back-office staff identity, role/menu management, and the legacy_admin_v2 and legacy_auth staff compatibility paths are being retired together with this change.

Compatibility

Required to remain stable:

  • The status/msg/data envelope on every external response is unchanged.
  • Existing player route paths under /api/v1/* and the legacy root aliases in gateway are unchanged.
  • Existing internal route paths under /internal/* are unchanged; only the required X-Brand-Id header is added.
  • The Plisio public callback paths (/internal/wallet/plisio/callback and the legacy /player/deposit/plisio/callback) remain public; brand is resolved from the deposit record, not from the caller.
  • The provider callback prefixes in game_service (/HO/*, /mg/*, /wc/*, /bti/*, /splus/*, /bt1/*, /digitain/*, /integration/*) are unchanged; brand is resolved from the account namespace embedded in the callback payload.
  • The agent frontend route surface in agent_service is unchanged.
  • The wallet topology contracts from ADR-005 are unchanged in shape; the only changes are brand scoping of uniqueness and resolution.
  • Existing JWT consumers that only read player_id continue to work; they must additionally read brand_id after this change.

Required to break by design:

  • The following admin_service route files are deleted: legacy_admin_v2.py, legacy_auth.py, legacy_agents_v2.py, legacy_agent_withdrawals_v2.py, legacy_meta_v2.py, legacy_recon.py, legacy_web_content.py. The supporting staff identity helpers are removed with them. The legacy admin paths under /api/admin/user/*, /api/admin/pushbullet/*, /api/admin/shooter/*, /api/admin/web/rules/*, /api/admin/web/faq/*, and /api/admin/web/config/* (and any other route served exclusively by the deleted files) are removed and will respond 404.
  • Any code path that assumes player.account is globally unique is changed; uniqueness is now (brand_id, account).
  • Any code path that assumes the active wallet topology is global is changed; topology and policy resolution is per brand.
  • Outbound game provider account names change shape from account to {brand_code}_{account} (or the documented per-provider equivalent for providers that constrain account format). Provider-side reconciliation must be reviewed before cutover.

Ownership

AggregateOwnerNotes
brandadmin_servicebrand catalog
brand_configadmin_serviceper-brand configuration values
agent_brandwritten by admin_service, read by agent_serviceagent-to-brand allow list
agentagent_servicebrand-global; no brand_id column
agent_setting, agent_domainplayer_service (already today)now scoped per (agent_id, brand_id)
player and player-owned tablesplayer_servicebrand-scoped
wallet aggregates (wallet_account, wallet_bucket, wallet_coupon_grant, wallet_bet_authorization, wallet_ledger, transfers, idempotency rows, outbox)wallet_servicebrand-scoped; ADR-005 unchanged
wallet_topology, wallet_policywallet_servicebrand-scoped uniqueness
rolling aggregatesrolling_servicebrand-scoped
coupon, promotion, settlement, saga aggregatespromotion_servicebrand-scoped
game provider stategame_servicebrand-scoped rows; credentials brand-global
recon aggregates (shooter_*)recon_servicebrand-scoped
game provider credentialsgame_servicebrand-global

Requirements

Brand entity

  • A brand row has:
    • brand_id BIGINT autoincrement surrogate primary key (matches existing surrogate convention used by agent.guid and player.guid)
    • brand_code VARCHAR(16) constrained by CHECK (brand_code ~ '^[a-z][a-z0-9]{1,15}$') (lowercase alphanumeric, leading letter, 2-16 chars; safe for every supported game provider's account format and for use in Prometheus labels)
    • name VARCHAR(64)
    • default_currency CHAR(3) (ISO 4217)
    • status enum (enabled, disabled)
    • created_at, updated_at TIMESTAMP
  • brand_code is unique and immutable after creation.
  • brand is read by every domain service. It is written only through admin_service.
  • The default brand seed has brand_code = 'default', name = 'Default Brand', status = enabled, default_currency matching the current single-brand environment's documented currency.

Brand configuration

  • A brand_config(brand_id, key, value) row stores per-brand override values. Values are JSON.
  • Resolution order for any configurable value is: per-brand brand_config[brand_id][key], then a documented global default. The global default may live in code or in a documented global config table.
  • The configurable keys include:
    • rolling completion ratios per provider type
    • cashback rate, rebate rate, and lossback rate
    • payment channel selection (Plisio enabled, manual bank channels)
    • withdrawal min/max amounts and fees
    • risk thresholds for deposit and withdrawal
    • i18n override token, theme token, customer-service link, email template overrides
  • Values that must remain global (game provider credentials, infrastructure URLs, shared rate limits) are explicitly listed in the plan and must not be moved into brand_config.

Brand resolution at the edge

  • gateway resolves brand_id for every external player request from the request domain via the existing _extract_domain helper plus the Redis maps domain:agent:{host} and domain:level:{host}. Both maps now resolve to a value carrying brand_id.
  • _extract_domain precedence is Origin first (used when the player is on the canonical web origin), then Host header (used as fallback for callers that omit Origin). Both come from the same Redis map; Origin is trusted only because TLS termination enforces hostname authenticity at the load balancer (CORS does not validate brand binding). If a deployment ever exposes the gateway without TLS-terminating LB, the precedence must be reversed; this is documented in gateway's deployment notes.
  • gateway MUST strip any inbound X-Brand-Id header from external requests before injecting the resolved value. agent_service MUST do the same on its agent-frontend edge. Internal callers may legitimately send X-Brand-Id to internal handlers (the header is trusted only on internal routes and only when paired with a valid X-Internal-Service-Token).
  • A request whose domain does not resolve to a brand is rejected at the edge with a stable error envelope.
  • The resolved brand_id is attached to request.state.brand_id and forwarded to all downstream services as the X-Brand-Id header.
  • agent_service resolves brand from the agent frontend domain the same way and validates the resolved brand against the authenticated agent's agent_brand allow list. Brand resolution applies uniformly to the agent's primary /api/v1/agent/* routes and to its legacy aliases (e.g. /api/v1/user/login); the legacy alias surface is not exempt.
  • admin_service is brand-global at the entry surface; per-brand operational routes accept an explicit brand_id parameter and forward it as X-Brand-Id on internal calls.
  • game_service resolves brand_id from the namespaced account in the provider callback payload. A callback whose namespaced account cannot be reverse-parsed is rejected and logged with provider context but no brand assumption.
  • The Plisio public callback resolves brand_id from the matching deposit record (which carries brand_id from creation time).
  • recon_service does not have a public HTTP callback surface. Inbound reconciliation signals (SMS, pushbullet) reach recon_service through internal collection paths, and brand is resolved from the matched deposit record, not from message text. There is no edge brand resolution to perform on those signals.
  • Internal handlers reject brand-scoped operations that arrive without a brand context.

JWT and authorization

  • JWT issued at login carries brand_id as a claim.
  • gateway checks jwt.brand_id == request.state.brand_id on every authenticated request. Behavior depends on MULTI_BRAND_ENFORCEMENT (see Staged enforcement below): in observe, mismatches are logged and counted; in enforce, mismatches are rejected with the documented error envelope.
  • A JWT that lacks brand_id is treated the same way (logged in observe, rejected in enforce); JWTs issued before brand-aware login goes live drain naturally over JWT_EXPIRE_MIN.
  • Internal services trust the X-Brand-Id header attached by the edge or by another internal caller; they do not re-derive brand from JWT (which is verified at the edge and not always propagated downstream).
  • Logout, refresh, and impersonation flows preserve brand_id.

Player identity

  • player uniqueness becomes (brand_id, account). The same account string may exist in two brands as two distinct player_id rows.
  • A MAX_PLAYER_ACCOUNT_LEN = 32 constant is introduced (codified in rgb_contracts) and enforced at registration time. The DB column remains VARCHAR(255) for legacy compatibility; the runtime cap of 32 ensures the outbound game-provider account {brand_code}_{account} (worst case 16 + 1 + 32 = 49 chars) fits every currently integrated provider's account-field cap. A backfill audit confirms no existing player.account exceeds 32 chars before the cap is enforced; any non-conforming legacy rows are flagged for manual remediation.
  • Registration resolves brand_id from the request domain and rejects any client-supplied brand override.
  • Recovery flows resolve brand_id from the request domain or from the recovery token; cross-brand recovery is forbidden.
  • Player-owned tables (player_deposit, player_withdraw, player_bank, message rows, common projections, player-owned outbox) carry brand_id.

Agent

  • agent is brand-global. The agent table carries no brand_id.
  • agent_brand is the only place where brand membership for an agent is recorded. Rows: (agent_id, brand_id, status, created_at).
  • agent_setting becomes keyed by (agent_id, brand_id). A single agent may carry different registration modes per brand.
  • agent_domain keeps its existing surrogate guid BIGINT primary key (preserving any FK references to agent_domain.guid); a composite UNIQUE(agent_id, brand_id, domain) constraint is added on top of the existing UNIQUE(domain) global constraint. Together they guarantee a domain value cannot exist twice under any (agent_id, brand_id) combination, and a domain belongs to exactly one brand and exactly one agent.
  • agent_service rejects every authenticated request whose resolved brand is not in the agent's agent_brand allow list.
  • agent_brand reads in agent_service are cached per process with a 60-second TTL AND invalidated by the AGENT_BRAND_CHANGED Redis pub/sub channel published by admin_service after every agent_brand write (insert, update, delete). Each agent_service process subscribes on startup; processes that miss a message refresh on TTL expiry.
  • agent_brand revocation does NOT invalidate the active JWT (JWT brand validation is gateway-side). The next request after revocation is rejected by allow-list check; the existing session can continue to receive 4xx responses until the agent re-authenticates against a brand they still hold. JWT revocation is a documented follow-up (requires a JWT denylist or short-lived access tokens with refresh).

Wallet

  • Every wallet-owned aggregate carries brand_id: wallet_account, wallet_bucket, wallet_bucket_type, wallet_coupon_grant, wallet_bet_authorization, wallet_ledger, wallet_transfer, wallet_idempotency, wallet outbox, wallet_inbox, wallet_dead_letter.
  • wallet_topology and wallet_policy documents are brand-scoped: uniqueness becomes (brand_id, code, version) for wallet_topology and (brand_id, topology_code, version, policy_key) for wallet_policy. The existing partial unique active indexes (uq_wallet_topology_single_active and uq_wallet_policy_active_key, both WHERE status = 'ACTIVE') are rebuilt to include brand_id so every brand may have its own active topology and active policy simultaneously without colliding on the global "exactly one ACTIVE" constraint.
  • wallet_bucket_type uniqueness becomes (brand_id, topology_code, topology_version, code) so two brands may legitimately share the same topology_code (e.g. both inherit RUBY_SPLIT_V1) without bucket-type collisions.
  • Active topology and policy resolution are per brand. Topology activation safety checks (per ADR-005: blocked when unresolved balances, coupon grants, rollings, unsettled bets, or transfers would become unreachable) are scoped to the activating brand only; cross- brand state never blocks activation.
  • wallet_idempotency uniqueness becomes (brand_id, idempotency_key) so the same key may legitimately appear in two brands; two brands' players sharing the same account string can independently retry without a phantom-duplicate rejection.
  • wallet_bet_authorization provider-bet uniqueness becomes (brand_id, provider_type, provider_id, bet_id) because the same provider may legitimately reuse a bet_id across two brands when outbound calls are namespaced.
  • wallet_service checks every command's target row brand against the request brand before any money mutation. Behavior depends on MULTI_BRAND_ENFORCEMENT (the same flag as gateway): in observe, mismatches log and increment wallet_cross_brand_rejected_total{command,mode="observe"} and the command is allowed to proceed using the request brand as the authoritative scope (so a missing or wrong X-Brand-Id cannot cause a money write to land in the wrong brand); in enforce, mismatches hard-reject. Money mutations never use a brand inferred from the target row alone.
  • ADR-005's "single money writer" rule is unchanged.
  • Wallet outbox events carry brand_id as a field on the shared DomainEvent envelope (rgb_contracts/events/base.py), not on per-payload schemas. Producers and consumers read/write it from the envelope; per-payload schemas are not modified.

Rolling

  • Rolling records, rolling inbox/outbox, and completion/cancel retry state carry brand_id.
  • The rolling event consumer reads brand_id from inbound payloads and persists it.
  • Per-brand rolling completion ratios resolve from brand_config.

Promotion

  • Coupon usage rows, promotion configs, settlement projections, and promotion_coupon_saga rows carry brand_id.
  • Coupon definitions, event configs, and rebate/cashback/lossback rates are per-brand.
  • Settlement schedulers iterate brand by brand in brand_id ascending order, sequentially (one brand at a time). Cross-brand aggregation in settlement is forbidden. Parallel-per-brand execution may be introduced later as an explicit follow-up; the day-0 contract is sequential to keep timing deterministic across environments.

Game integration

  • Game provider credentials stay global; one set per provider serves every brand.
  • Outbound calls to providers use {brand_code}_{account} as the default account namespace. The separator character _ is chosen because it is accepted by every currently integrated provider (HO, MG, WC, BTI, BT1, SPLUS, Digitain, Integration). Providers that constrain account format below this default fall back to a documented per-provider equivalent recorded in docs/services/game-service.md; no per-provider equivalent is introduced silently.
  • The reverse-parse algorithm is brand_code-prefix lookup against the brand table (longest matching prefix wins). Splitting on the first _ would be ambiguous because legacy player account strings may already contain _. The lookup is cached per process for performance.
  • For every supported provider, an outbound-account length audit is performed before Phase 9: if len(brand_code) + 1 + MAX_PLAYER_ACCOUNT_LEN (worst case 16 + 1 + 32 = 49) exceeds the provider's account-field cap, the per-provider equivalent format is documented and used (for example, a shorter separator or a hash projection). The audit and per-provider fallbacks are committed to the game-service profile before code lands.
  • Inbound provider callbacks reverse-parse the namespaced account to recover brand_id and player account. A callback whose namespaced account does not begin with a known brand_code prefix is rejected and increments game_callback_brand_unresolved_total{provider}.
  • Provider callback transaction state and provider-specific records (e.g. bti_*, ho_*, mg_*, wc_*, digitain_*) carry brand_id.

Reconciliation

  • The shooter_* table family is split into operator-infrastructure rows (no brand_id) and brand-scoped event rows:
    • Operator-global (no brand_id): shooter_device (physical phone-receiver hardware bindings), shooter_pushbullet (Pushbullet API token bindings), shooter_phone (phone whitelist), shooter_template_recharge (regex parsing rules). These are operator-owned infrastructure and are not duplicated per brand.
    • Brand-scoped (carries brand_id): shooter_sms (inbound SMS rows; brand_id is populated when the SMS is matched against a player_deposit row, not parsed from message text), and any recon review-state rows that reference a specific player or deposit.
  • Recon match decisions resolve brand_id from the matched deposit record (player_deposit.brand_id).
  • Approvals continue to flow through wallet_service; cross-brand rejection in wallet_service blocks any approval that would touch a different brand's deposit.

Admin surfaces

  • admin_service exposes brand catalog CRUD: create, enable, disable, edit metadata. brand_code is immutable after creation.
  • admin_service exposes brand_config CRUD with audited writes (timestamp, payload, prior value).
  • admin_service exposes agent_brand CRUD with audited writes.
  • Per-brand topology and policy writes flow through wallet_service CRUD with brand pinned in the request.
  • The legacy_admin_v2 and legacy_auth staff routes and the supporting staff identity logic are removed. Any route that survived the staff removal must keep its status/msg/data envelope shape.

Staged enforcement

  • Every brand-scoped enforcement point (gateway JWT/domain mismatch, wallet_service cross-brand command rejection, agent_service brand allow-list rejection, internal-handler X-Brand-Id requirement) is gated by a single environment-variable flag, MULTI_BRAND_ENFORCEMENT, with three modes:
    • off: resolve and forward brand context, but never reject.
    • observe (default): check, log, and increment brand_resolution_failed_total{reason,service} on mismatch; do not reject.
    • enforce: hard-reject on mismatch with the documented error envelope.
  • The flag is read by gateway, wallet_service, agent_service, and every internal handler that adds brand-scoped enforcement.
  • The flag is set as a process env var. Mode change requires a service restart. The current mode is included in each service's /health response payload.
  • The "request brand wins" behavior in observe applies symmetrically to read paths and write paths: a read query whose request brand differs from the JWT brand still uses the request brand for its WHERE brand_id = ? filter. This prevents a stale or wrong-brand JWT from disclosing another brand's data during the soak window; enforce mode then rejects the same scenario hard.
  • A second brand cannot be enabled in any environment until that environment is in enforce mode.

Authentication and integrity

Per ADR-009 "Authentication and integrity posture":

  • admin_service runs behind VPN ingress with an IP allow-list. Every brand, brand_config, and agent_brand write requires an X-Operator-Id header injected by the SSO LB; admin rejects writes without this header. Audit rows include operator_id, request IP, request_id, and timestamp.
  • Internal-service authentication uses per-caller-service tokens (INTERNAL_SERVICE_TOKEN_GATEWAY, INTERNAL_SERVICE_TOKEN_AGENT, etc.). A consumer service knows the set of caller tokens it accepts. A compromise of one caller's token cannot impersonate another.
  • Brand-scoped wallet write commands carry X-Brand-Signature: HMAC_SHA256(BRAND_SIGNING_KEY, caller_service| brand_id|request_id|timestamp). wallet_service rejects missing or invalid signatures in enforce mode (logged in observe).
  • JWT signing uses RS256. Private key is held by player_service and agent_service; other services verify with the public key. JWT header carries kid; verifier rejects unknown kids and rejects alg=none and HS-family algorithms unconditionally. Key rotation procedure is documented in the runbook.
  • brand_code charset ^[a-z][a-z0-9]{1,15}$ PLUS prefix-disjointness: admin_service rejects any new brand_code that is a prefix of an existing brand_code, has an existing brand_code as a prefix, or collides with the prefix of any existing player.account value that has been namespaced and sent to a game provider.
  • The reverse-parse cache in game_service invalidates on the BRAND_CATALOG_CHANGED Redis pub/sub channel published by admin_service after every brand create/disable; processes that miss the message refresh on a TTL of 60 seconds. Same contract for AGENT_BRAND_CHANGED consumed by agent_service. Redis pub/sub is best-effort: if Redis is down or a subscriber is briefly partitioned, invalidation messages are lost and correctness reverts to the 60-second TTL fallback. Documented in the runbook's diagnosis playbook for stale-cache scenarios.

Recovery flow brand precedence

  • Recovery tokens issued after Phase 5 carry brand_id as a signed claim. Recovery flow precedence: token-embedded brand_id wins; if the token lacks brand_id (issued before Phase 5), fall back to the request domain's brand. If the resolved brand does not match the underlying player's brand_id, the recovery request is rejected hard regardless of MULTI_BRAND_ENFORCEMENT mode (recovery is too sensitive for observe behavior).
  • Recovery responses must not leak the existence of an account in another brand. Same email or phone in two brands recovers each independently with brand-scoped tokens.

Idempotency cross-brand collision detection

  • The wallet_idempotency constraint is (brand_id, idempotency_key). An inbound idempotency key that matches an existing key in a different brand increments wallet_idempotency_brand_split_total. Phase 16 release gate requires this counter to be zero across the soak window.

Configuration consolidation

  • The hard-coded project integer in admin_service/app/tasks/tag.py is removed; behavior previously gated by project resolves from brand_config.
  • Any global config constant or global_var row that is now duplicated by per-brand brand_config entries (rolling ratios, cashback / rebate / lossback rates, payment channel selection, withdrawal min/max, risk thresholds) is either deleted or explicitly demoted to a "documented global default" used only when brand_config has no brand-specific override. The deletion list is enumerated in plan Phase 12.
  • Staff-related globals that become dead code with the removal of the seven admin_service legacy route files (legacy admin auth secret env vars, legacy session config, supporting helper modules) are removed at the same time. Any unused entries in servers_v2/.env.compose.example are removed.

Migration

  • A default brand is created at migration time. All existing rows backfill into this brand.
  • Every existing agent row gets an agent_brand row of the form (agent_id, default_brand_id, status='enabled') seeded at migration time so existing agents retain access through Phase 11 allow-list enforcement and Phase 16 hard-flip.
  • Backfill is reversible in non-production environments by dropping the added columns and constraints.
  • Domain-to-brand bindings for the default brand are seeded at migration time so the existing single-brand environment continues to resolve.
  • Production cutover does not enable a second brand until the runbook's release gate is satisfied.

Observability

Required logs:

  • structured log fields include brand_id and brand_code
  • edge logs include the resolved domain, the resolved brand, and whether brand resolution succeeded or failed
  • wallet command logs include the request brand and the target row brand; cross-brand rejections log both
  • game callback logs include the parsed brand_code, player_account, and the raw outbound account string

Required metrics:

  • player-, wallet-, rolling-, promotion-, and game-scoped counters and histograms carry a brand label whose value is brand_code
  • a brand_resolution_failed_total{reason,service} counter exists at the edge AND on every internal handler that performs brand-scoped checks, with reason constrained to a documented enum (jwt_domain_mismatch, missing_header, unknown_domain, jwt_missing_brand, player_brand_mismatch, agent_brand_not_allowed, recovery_brand_mismatch)
  • a wallet_cross_brand_rejected_total{command,mode} counter exists in wallet_service
  • a game_callback_brand_unresolved_total{provider} counter exists in game_service
  • a wallet_idempotency_brand_split_total counter exists in wallet_service (inbound key matches existing key in different brand)
  • a multi_brand_enforcement_mode{service} gauge exists in every service that reads the flag (value: 0=off, 1=observe, 2=enforce)
  • a brand_resolution_latency_seconds{service} histogram covers the Redis lookup cost on the edge brand-resolution hot path
  • a request_total{brand_code,service} counter (per-brand legitimate traffic) so a brand getting zero traffic is detectable
  • an event_legacy_schema_total{stream,consumer} counter increments when a consumer accepts a schema_version=1 event during the soak
  • a security_downgrade_total{service} counter increments when a service starts in MULTI_BRAND_ENFORCEMENT != 'enforce' while more than one enabled brand exists
  • per-service *_cross_brand_rejected_total{command,mode} counters exist in agent_service, game_service, promotion_service, recon_service, and rolling_service -- mirroring the wallet variant so each owner surface can alarm independently
  • a wallet_brand_signature_failed_total{caller_service,reason,mode} counter exists in wallet_service covering the missing_headers:*, invalid_timestamp, signature_mismatch, and signature_replay reasons; signature_replay is also surfaced as wallet_brand_signature_replay_blocked_total for dashboarding convenience
  • a wallet_brand_signature_missing_total{caller_service} counter exists in wallet_service so operators can see which callers still need to migrate before flipping WALLET_BRAND_SIGNATURE_REQUIRE=on
  • a wallet_brand_signature_misconfigured_total{mode} counter exists in wallet_service (P1-4) for the runtime fail-closed branch when BRAND_SIGNING_KEY is empty
  • a wallet_brand_signature_replay_redis_outage_total{caller_service,mode} counter exists in wallet_service (P2-δ) when the replay-cache Redis SETNX raises
  • a wallet_topology_default_brand_unprimed_total counter exists in wallet_service (P2-δ) when a topology / policy lookup arrives without a brand id and the default-brand cache is unprimed
  • a wallet_topology_default_brand_fallback_total{caller_origin} counter exists in wallet_service for primed-cache default-brand fallbacks (per-call-site breakdown for brand-onboarding)
  • a game_callback_verification_bypassed_total{provider} counter exists in game_service (T1-D-C2) for any callback that reached the VERIFY_CALLBACKS=False bypass branch -- MUST be zero in production
  • a game_callback_brand_unresolved_total{provider,reason?} counter exists in game_service; reason="raw_account_in_enforce" and reason="raw_guid_in_enforce" (P1-3) are hard-blocking on the Phase 16 flip
  • an internal_caller_token_legacy_total{consumer,caller} counter exists in every internal-token-accepting service for the per-caller rollout (Stage A→B→C signal)
  • an internal_caller_token_legacy_rejected_total{consumer,caller} counter exists in every internal-token-accepting service (T4-D-I2) to record requests rejected once PER_CALLER_TOKEN_REQUIRED=on
  • a gateway_session_legacy_key_used_total{reason} counter exists in gateway (T1-D-C1) for the Phase 4E session-key cutover signal, with reason ∈ {missing_brand, fallback_after_miss}
  • a gateway_jwt_session_missing_total counter exists in gateway (T1-D-C1) for "JWT-valid-but-session-gone" requests
  • a gateway_jwt_unknown_kid_total counter exists in gateway (T4-D-I4) for JWT verifier rejections by unknown kid; spikes during RS256 kid rotation indicate stale verifier caches
  • an agent_brand_not_allowed_total{reason,service} counter exists in agent_service for allow-list rejections (the spec-canonical brand_resolution_failed_total{reason="agent_brand_not_allowed"} is also emitted alongside it for cross-service dashboards)
  • an admin_operator_id_mismatch_total{reason} counter exists in admin_service (P1-3) for the JWT-vs-header cross-check rejections
  • an admin_audit_write_failed_total{route} counter exists in admin_service (P1-4) for audit-row INSERT failures observed AFTER the money write committed
  • an admin_ws_legacy_query_token_total counter exists in admin_service (T4-D-I7) for WebSocket auth via the legacy ?token= query string fallback
  • an admin_ws_rate_limited_total{reason} counter exists in admin_service (T4-D-I7) for per-IP rate-limit rejections, with reason ∈ {ip, redis_outage}
  • an agent_balance_legacy_write_total{change_type} counter exists in admin_service (T4-D-I3) for the legacy direct-write surface; goal is zero post-migration
  • a pii_aes_unconfigured_total{service,op} counter exists in player_service and agent_service (T1-D-C3) when the AES PII helpers hit the non-production fallback; MUST be zero in production
  • a plisio_callback_replay_blocked_total{status} counter exists in wallet_service (T1-D-C5/I6) for Plisio webhook replays caught by the SETNX guard
  • a plisio_callback_replay_redis_outage_total{env} counter exists in wallet_service (T1-D-C5/I6) when the Plisio replay-cache SETNX raises
  • a recovery_sms_delivery_failed_total{service} counter exists in player_service and agent_service (Codex P1-#6) when the SMS branch of /recovery/request could not deliver
  • a recovery_email_unprovisioned_total{service} counter exists in player_service and agent_service (Codex P1-#6) when the email recovery branch is selected but no transactional provider is wired
  • a recovery_rate_limit_redis_outage_total{bucket} counter exists in player_service and agent_service (T4-D-I5) for fail-closed per-contact rate-limit rejections caused by Redis exceptions in production
  • all metric labels are bounded enums; no attacker-controlled value (host, account, raw IP, brand_code beyond the 16-char regex bound) appears as a label

Required alerts (with explicit thresholds):

  • brand_resolution_failed_total{reason="jwt_domain_mismatch"} rate above 0.1/s sustained over 5 minutes (page on-call). Below threshold is expected baseline noise from scanners and stale clients.
  • brand_resolution_failed_total{reason="missing_header"} non-zero rate sustained over 5 minutes during enforce (page; in observe it is informational only).
  • wallet_cross_brand_rejected_total{mode="enforce"} non-zero rate sustained over 1 minute (page; potential cross-brand bug or caller-token compromise).
  • wallet_cross_brand_rejected_total{mode="observe"} non-zero rate during the Phase-16 soak window (warning only; gates the flip but does not page outside the soak window).
  • game_callback_brand_unresolved_total non-zero rate sustained over 1 minute (page; real money is stuck on the provider side and callbacks are entering the dead-letter queue).
  • wallet_idempotency_brand_split_total non-zero rate (warning; indicates client misrouting or replay).
  • security_downgrade_total non-zero in production (page immediately; a service downgraded enforcement mode after enforcement was live).
  • per-brand wallet outbox publication freshness alert (existing wallet outbox alert, scoped per brand label).
  • per-brand request_total rate dropping to zero for a previously active brand sustained over 15 minutes (warning; brand outage or domain misconfiguration).
  • silence procedure: alerts may be silenced by deploy lead for a documented rollout window; silence must be acknowledged in #incidents.

Rollout Notes

  • Migrations run before any default brand backfill. Backfill runs before any service starts requiring X-Brand-Id.
  • gateway enforcement of JWT/domain brand mismatch is enabled only after every downstream service has been deployed with brand-aware filtering.
  • A second brand is enabled only after the runbook release gate passes.
  • The hard-coded project integer is removed only after brand_config has the equivalent values for the default brand.

Acceptance Criteria

  • A migration on a fresh database creates brand, brand_config, agent_brand, and adds brand_id to every brand-scoped table with brand-scoped uniqueness.
  • Backfill on an existing database moves every existing row into a default brand without data loss; reversibility is verified in a non-production environment.
  • Two brands can be configured (e.g. default and brand2) with distinct domains. A registration on brand2's domain creates a player row in brand2. The same account string may also register on the default domain and create an independent player row in default.
  • A login on one brand's domain whose JWT was issued for the other brand is rejected at gateway.
  • A wallet command targeting a player whose brand differs from the request brand is rejected before any money mutation.
  • Wallet topology and policy can be configured independently per brand; active topology resolution returns the per-brand active document.
  • A bet authorized through game_service for a player in brand2 calls the provider with the brand-namespaced account; the provider callback is reverse-parsed back to brand2 and the matching player.
  • Per-brand rolling completion ratios, cashback/rebate/lossback rates, and payment channel selection resolve from brand_config.
  • All seven listed admin_service legacy route files and the supporting staff identity logic are deleted; previously served paths return 404; the test suite reflects the removal.
  • Structured logs and Prometheus metrics carry brand labels as specified.
  • ADR-005's "single money writer" rule is unchanged after this change.
  • Player registration rejects any account longer than MAX_PLAYER_ACCOUNT_LEN = 32 chars with the documented validation error.
  • MULTI_BRAND_ENFORCEMENT=enforce is exercised in at least one shared environment after the soak window; gateway, wallet_service, agent_service, and the brand-aware internal handlers all report enforce mode in /health.
  • An agent_brand row of the form (agent_id, default_brand_id, status='enabled') exists for every pre-migration agent row after 0023_seed_default_brand.py runs. The auto-seed trigger (0023b_install_agent_brand_autoseed_trigger.py) covers any new agent rows created during the Phase 2-12 window; the trigger is removed by 0028_remove_agent_brand_autoseed_trigger.py once admin_service writes agent_brand explicitly at agent creation in Phase 12.
  • Settlement schedulers iterate brands in brand_id ascending order, sequentially; observable in worker logs.
  • External-facing gateway and agent_service strip any inbound X-Brand-Id header before injecting the resolved value; verified by a synthetic header-spoofing test.

Required Tests

  • migration tests for brand, brand_config, agent_brand, and brand_id column additions
  • backfill tests proving default brand assignment for every existing row family
  • gateway brand resolution tests for valid domain, unknown domain, and domain bound to another agent
  • gateway JWT brand validation tests covering: cross-brand JWT replay (agent A's token on a domain bound to brand B), cross-agent JWT replay within the same brand (agent A's token on a domain bound to agent B in same brand), JWT missing brand_id claim, body/query attempts to override brand
  • gateway enforcement-mode tests: same scenarios under observe (logged + counted, not rejected) and under enforce (rejected with documented envelope)
  • player_service registration tests proving cross-brand account independence
  • agent_service allow-list rejection tests
  • wallet_service cross-brand command rejection tests for every command family (deposit approve, withdrawal create, bet authorize, settle, rollback, transfer, points credit, coupon grant, reversal, adjustment)
  • wallet_service per-brand topology and policy resolution tests
  • rolling_service brand-scoped consumption and progress tests
  • promotion_service per-brand coupon, rebate, cashback, and lossback resolution tests
  • game_service outbound account namespacing tests and inbound callback reverse-parse tests for every supported provider
  • recon_service brand-scoped match and approval tests
  • admin_service brand catalog, brand_config, and agent_brand CRUD tests
  • admin_service staff removal tests confirming all seven deleted route files no longer mount any router and that all previously served paths return 404 (legacy admin auth, legacy agents v2, legacy agent withdrawals v2, legacy meta v2, legacy recon compatibility, legacy web content)
  • observability tests confirming log fields and metric labels
  • local Docker end-to-end flows for two simultaneous brands covering: registration, login, deposit, bet, settlement, rolling, withdrawal, promotion settlement, coupon issue and use, cross-brand isolation
  • MAX_PLAYER_ACCOUNT_LEN validation tests at registration (boundary, oversize, exact-32-char accounts)
  • backfill audit test confirming no existing player.account exceeds 32 chars before the cap is enforced
  • migration test asserting an agent_brand row exists for every pre-migration agent after 0023 runs, AND that the 0023b autoseed trigger creates a row for any agent row inserted between 0023b (install) and 0028 (remove)
  • migration test asserting agent_setting, player_wallet_limit, daebak_email, mg_account, wc_account, digitain_token, i18n PK rewrites land cleanly with rows seeded pre-migration under both default and a second brand
  • gateway synthetic test: external request with inbound X-Brand-Id: 99 is stripped; downstream sees only the resolved brand
  • Redis cache key prefix tests confirming sms_captcha:{brand_code}: {phone} and login-throttle keys are brand-prefixed; same phone in two brands does not collide
  • per-provider outbound-account length audit results recorded as a test fixture in game-service tests
  • JWT signing tests: alg=none rejected, alg=HS256 rejected when RS256 is the configured algorithm, JWT with unknown kid rejected, JWT signed with old kid accepted during the documented rotation overlap, JWT signed with old kid rejected after overlap expires
  • admin_service write-without-X-Operator-Id rejected; audit row for every write contains operator identity
  • brand_code prefix-disjointness validator: brand-create with prefix collision against existing brand_code rejected; brand-create whose prefix matches an existing namespaced player.account rejected
  • internal-service-token-spoof test: caller using INTERNAL_SERVICE_TOKEN_GATEWAY cannot impersonate INTERNAL_SERVICE_TOKEN_AGENT against any consumer
  • brand-signature verification test: brand-scoped wallet write without X-Brand-Signature is rejected in enforce, logged in observe; signature with wrong BRAND_SIGNING_KEY is rejected in enforce
  • recovery flow: same email registered in two brands; recovery on brand A's domain returns a token bound to brand A only; recovery response on a non-existent account at brand A produces the same response shape as a known account (no existence oracle)
  • observe-mode read-path test: GET endpoint with brand-A JWT against brand-B domain returns either auth-error or empty data, never cross-brand data
  • event schema versioning: pre-Phase-6 event in stream is rejected by consumer in enforce mode and accepted as default brand in observe; event_legacy_schema_total increments accordingly
  • security_downgrade_total alert fires on synthetic boot in observe while >1 brand is enabled
  • agent_brand cache invalidation: admin update to agent_brand publishes AGENT_BRAND_CHANGED; agent_service processes invalidate within 1 second; processes that miss the message refresh on TTL expiry within 60 seconds