Multi-Brand Isolation
Status
Approved
Date
2026-04-27
Owners
- Platform Backend
- Player Domain
- Wallet Domain
- Agent Domain
Affected Services
gatewayplayer_servicewallet_servicerolling_servicepromotion_servicegame_serviceagent_serviceadmin_servicerecon_service
Related ADRs
docs/adr/ADR-009-multi-brand-domain-routed-isolation.mddocs/adr/ADR-005-wallet-topology-bucket-ledger-model.mddocs/adr/ADR-001-document-driven-backend-change-workflow.md
Related Stable Docs
docs/architecture/data-ownership.mddocs/architecture/domain-ownership.mddocs/architecture/system-overview.mddocs/architecture/http-entrypoints.mddocs/services/gateway.mddocs/services/player-service.mddocs/services/wallet-service.mddocs/services/rolling-service.mddocs/services/promotion-service.mddocs/services/game-service.mddocs/services/agent-service.mddocs/services/admin-service.mddocs/services/recon-service.md
Related Runbooks
docs/runbooks/multi-brand/multi-brand-isolation-rollout.md
Goal
Make servers_v2 capable of hosting multiple distinct player-facing brands on
the same runtime, with brand-scoped player identity, money state, rolling and
settlement, and configuration, while keeping a single PostgreSQL database, a
single Redis, and a single Docker stack.
A "brand" is a self-contained operating product: its own player base, its own wallet topology and policy, its own promotion rules, its own configuration, its own domain footprint. Brands are identified at the request edge from the request domain. After login, brand identity is also carried in the JWT.
Scope
In scope:
- A new
brandaggregate and a newbrand_configaggregate. - A new
agent_brandjoin aggregate (brand membership for an agent). - Adding non-nullable
brand_idto every brand-scoped table inservers_v2, with backfill into a singledefaultbrand. - Brand-scoped uniqueness constraints on
player,wallet_topology,wallet_policy,agent_setting,agent_domain, and any other table whose current uniqueness must change. - Brand resolution in
gatewayfrom the request domain via the existingdomain:agent:{host}anddomain:level:{host}Redis maps, extended to carrybrand_id. - Forwarding brand context downstream as
X-Brand-Id. - Adding a
brand_idclaim to JWT and rejecting JWT/domain mismatches. - Brand-scoped query filtering in every domain service.
- Wallet command rejection on cross-brand targets.
- Brand-scoped wallet topology and wallet policy resolution.
- Per-brand promotion configuration, coupon definitions, and rebate/cashback/lossback rates.
- Per-brand rolling completion ratios.
- Per-brand payment channel selection and per-brand operational config.
- Game provider account namespacing (
{brand_code}_{account}) and reverse parsing on callback. - Admin write surfaces for
brand,brand_config, andagent_brand. - Agent allow-list enforcement at the agent edge.
- Brand-aware structured logs and a
brandlabel on player-, wallet-, rolling-, promotion-, and game-scoped Prometheus metrics. - Brand-aware outbox and cross-service event payloads.
- Removal of the
legacy_admin_v2andlegacy_authstaff routes and the supporting staff identity logic fromadmin_service.
Out of scope:
- Schema-per-brand or database-per-brand isolation.
- Per-brand game provider credentials.
- A new back-office staff identity model. Staff is removed in this change and is not replaced.
- Per-brand infrastructure (separate Redis, separate Postgres, separate Docker stack).
- Cross-brand player identity migration (e.g. promoting a player from one brand to another).
- Cross-brand wallet transfer.
- Per-brand SLA, billing, or quota management.
- Per-brand i18n and theming for any frontend that lives outside this repo
(the
brand_configslots are defined here; rendering is owned by the consuming frontend).
Background
servers_v2 was designed under a single-brand assumption. The only existing
tenant-shaped artifact is the hard-coded project integer in
admin_service/app/tasks/tag.py, which switches between two product labels
and is not a real isolation mechanism. The platform now needs to host more
than one brand at the same time on the same runtime.
gateway and player_service already carry a domain-routing mechanism:
gateway._extract_domain reads Host or Origin and injects a domain field
into the request body for legacy routes; player_service resolves
agent_id and level_id from domain:agent:{host} and domain:level:{host}
Redis maps. That mechanism is the natural extension point for brand routing
and is preserved.
A no-staff posture is intentional. Back-office staff identity, role/menu
management, and the legacy_admin_v2 and legacy_auth staff compatibility
paths are being retired together with this change.
Compatibility
Required to remain stable:
- The
status/msg/dataenvelope on every external response is unchanged. - Existing player route paths under
/api/v1/*and the legacy root aliases ingatewayare unchanged. - Existing internal route paths under
/internal/*are unchanged; only the requiredX-Brand-Idheader is added. - The Plisio public callback paths
(
/internal/wallet/plisio/callbackand the legacy/player/deposit/plisio/callback) remain public; brand is resolved from the deposit record, not from the caller. - The provider callback prefixes in
game_service(/HO/*,/mg/*,/wc/*,/bti/*,/splus/*,/bt1/*,/digitain/*,/integration/*) are unchanged; brand is resolved from the account namespace embedded in the callback payload. - The agent frontend route surface in
agent_serviceis unchanged. - The wallet topology contracts from
ADR-005are unchanged in shape; the only changes are brand scoping of uniqueness and resolution. - Existing JWT consumers that only read
player_idcontinue to work; they must additionally readbrand_idafter this change.
Required to break by design:
- The following
admin_serviceroute files are deleted:legacy_admin_v2.py,legacy_auth.py,legacy_agents_v2.py,legacy_agent_withdrawals_v2.py,legacy_meta_v2.py,legacy_recon.py,legacy_web_content.py. The supporting staff identity helpers are removed with them. The legacy admin paths under/api/admin/user/*,/api/admin/pushbullet/*,/api/admin/shooter/*,/api/admin/web/rules/*,/api/admin/web/faq/*, and/api/admin/web/config/*(and any other route served exclusively by the deleted files) are removed and will respond404. - Any code path that assumes
player.accountis globally unique is changed; uniqueness is now(brand_id, account). - Any code path that assumes the active wallet topology is global is changed; topology and policy resolution is per brand.
- Outbound game provider account names change shape from
accountto{brand_code}_{account}(or the documented per-provider equivalent for providers that constrain account format). Provider-side reconciliation must be reviewed before cutover.
Ownership
| Aggregate | Owner | Notes |
|---|---|---|
brand | admin_service | brand catalog |
brand_config | admin_service | per-brand configuration values |
agent_brand | written by admin_service, read by agent_service | agent-to-brand allow list |
agent | agent_service | brand-global; no brand_id column |
agent_setting, agent_domain | player_service (already today) | now scoped per (agent_id, brand_id) |
player and player-owned tables | player_service | brand-scoped |
wallet aggregates (wallet_account, wallet_bucket, wallet_coupon_grant, wallet_bet_authorization, wallet_ledger, transfers, idempotency rows, outbox) | wallet_service | brand-scoped; ADR-005 unchanged |
wallet_topology, wallet_policy | wallet_service | brand-scoped uniqueness |
| rolling aggregates | rolling_service | brand-scoped |
| coupon, promotion, settlement, saga aggregates | promotion_service | brand-scoped |
| game provider state | game_service | brand-scoped rows; credentials brand-global |
recon aggregates (shooter_*) | recon_service | brand-scoped |
| game provider credentials | game_service | brand-global |
Requirements
Brand entity
- A
brandrow has:brand_idBIGINTautoincrement surrogate primary key (matches existing surrogate convention used byagent.guidandplayer.guid)brand_codeVARCHAR(16)constrained byCHECK (brand_code ~ '^[a-z][a-z0-9]{1,15}$')(lowercase alphanumeric, leading letter, 2-16 chars; safe for every supported game provider's account format and for use in Prometheus labels)nameVARCHAR(64)default_currencyCHAR(3)(ISO 4217)statusenum (enabled,disabled)created_at,updated_atTIMESTAMP
brand_codeis unique and immutable after creation.brandis read by every domain service. It is written only throughadmin_service.- The
defaultbrand seed hasbrand_code = 'default',name = 'Default Brand',status = enabled,default_currencymatching the current single-brand environment's documented currency.
Brand configuration
- A
brand_config(brand_id, key, value)row stores per-brand override values. Values are JSON. - Resolution order for any configurable value is: per-brand
brand_config[brand_id][key], then a documented global default. The global default may live in code or in a documented global config table. - The configurable keys include:
- rolling completion ratios per provider type
- cashback rate, rebate rate, and lossback rate
- payment channel selection (Plisio enabled, manual bank channels)
- withdrawal min/max amounts and fees
- risk thresholds for deposit and withdrawal
- i18n override token, theme token, customer-service link, email template overrides
- Values that must remain global (game provider credentials,
infrastructure URLs, shared rate limits) are explicitly listed in the
plan and must not be moved into
brand_config.
Brand resolution at the edge
gatewayresolvesbrand_idfor every external player request from the request domain via the existing_extract_domainhelper plus the Redis mapsdomain:agent:{host}anddomain:level:{host}. Both maps now resolve to a value carryingbrand_id._extract_domainprecedence isOriginfirst (used when the player is on the canonical web origin), thenHostheader (used as fallback for callers that omitOrigin). Both come from the same Redis map;Originis trusted only because TLS termination enforces hostname authenticity at the load balancer (CORS does not validate brand binding). If a deployment ever exposes the gateway without TLS-terminating LB, the precedence must be reversed; this is documented ingateway's deployment notes.gatewayMUST strip any inboundX-Brand-Idheader from external requests before injecting the resolved value.agent_serviceMUST do the same on its agent-frontend edge. Internal callers may legitimately sendX-Brand-Idto internal handlers (the header is trusted only on internal routes and only when paired with a validX-Internal-Service-Token).- A request whose domain does not resolve to a brand is rejected at the edge with a stable error envelope.
- The resolved
brand_idis attached torequest.state.brand_idand forwarded to all downstream services as theX-Brand-Idheader. agent_serviceresolves brand from the agent frontend domain the same way and validates the resolved brand against the authenticated agent'sagent_brandallow list. Brand resolution applies uniformly to the agent's primary/api/v1/agent/*routes and to its legacy aliases (e.g./api/v1/user/login); the legacy alias surface is not exempt.admin_serviceis brand-global at the entry surface; per-brand operational routes accept an explicitbrand_idparameter and forward it asX-Brand-Idon internal calls.game_serviceresolvesbrand_idfrom the namespaced account in the provider callback payload. A callback whose namespaced account cannot be reverse-parsed is rejected and logged with provider context but no brand assumption.- The Plisio public callback resolves
brand_idfrom the matching deposit record (which carriesbrand_idfrom creation time). recon_servicedoes not have a public HTTP callback surface. Inbound reconciliation signals (SMS, pushbullet) reachrecon_servicethrough internal collection paths, and brand is resolved from the matched deposit record, not from message text. There is no edge brand resolution to perform on those signals.- Internal handlers reject brand-scoped operations that arrive without a brand context.
JWT and authorization
- JWT issued at login carries
brand_idas a claim. gatewaychecksjwt.brand_id == request.state.brand_idon every authenticated request. Behavior depends onMULTI_BRAND_ENFORCEMENT(see Staged enforcement below): inobserve, mismatches are logged and counted; inenforce, mismatches are rejected with the documented error envelope.- A JWT that lacks
brand_idis treated the same way (logged inobserve, rejected inenforce); JWTs issued before brand-aware login goes live drain naturally overJWT_EXPIRE_MIN. - Internal services trust the
X-Brand-Idheader attached by the edge or by another internal caller; they do not re-derive brand from JWT (which is verified at the edge and not always propagated downstream). - Logout, refresh, and impersonation flows preserve
brand_id.
Player identity
playeruniqueness becomes(brand_id, account). The same account string may exist in two brands as two distinctplayer_idrows.- A
MAX_PLAYER_ACCOUNT_LEN = 32constant is introduced (codified inrgb_contracts) and enforced at registration time. The DB column remainsVARCHAR(255)for legacy compatibility; the runtime cap of 32 ensures the outbound game-provider account{brand_code}_{account}(worst case16 + 1 + 32 = 49chars) fits every currently integrated provider's account-field cap. A backfill audit confirms no existingplayer.accountexceeds 32 chars before the cap is enforced; any non-conforming legacy rows are flagged for manual remediation. - Registration resolves
brand_idfrom the request domain and rejects any client-supplied brand override. - Recovery flows resolve
brand_idfrom the request domain or from the recovery token; cross-brand recovery is forbidden. - Player-owned tables (
player_deposit,player_withdraw,player_bank, message rows, common projections, player-owned outbox) carrybrand_id.
Agent
agentis brand-global. Theagenttable carries nobrand_id.agent_brandis the only place where brand membership for an agent is recorded. Rows:(agent_id, brand_id, status, created_at).agent_settingbecomes keyed by(agent_id, brand_id). A single agent may carry different registration modes per brand.agent_domainkeeps its existing surrogateguid BIGINTprimary key (preserving any FK references toagent_domain.guid); a compositeUNIQUE(agent_id, brand_id, domain)constraint is added on top of the existingUNIQUE(domain)global constraint. Together they guarantee adomainvalue cannot exist twice under any(agent_id, brand_id)combination, and a domain belongs to exactly one brand and exactly one agent.agent_servicerejects every authenticated request whose resolved brand is not in the agent'sagent_brandallow list.agent_brandreads inagent_serviceare cached per process with a 60-second TTL AND invalidated by theAGENT_BRAND_CHANGEDRedis pub/sub channel published byadmin_serviceafter everyagent_brandwrite (insert, update, delete). Eachagent_serviceprocess subscribes on startup; processes that miss a message refresh on TTL expiry.agent_brandrevocation does NOT invalidate the active JWT (JWT brand validation is gateway-side). The next request after revocation is rejected by allow-list check; the existing session can continue to receive 4xx responses until the agent re-authenticates against a brand they still hold. JWT revocation is a documented follow-up (requires a JWT denylist or short-lived access tokens with refresh).
Wallet
- Every wallet-owned aggregate carries
brand_id:wallet_account,wallet_bucket,wallet_bucket_type,wallet_coupon_grant,wallet_bet_authorization,wallet_ledger,wallet_transfer,wallet_idempotency, wallet outbox,wallet_inbox,wallet_dead_letter. wallet_topologyandwallet_policydocuments are brand-scoped: uniqueness becomes(brand_id, code, version)forwallet_topologyand(brand_id, topology_code, version, policy_key)forwallet_policy. The existing partial unique active indexes (uq_wallet_topology_single_activeanduq_wallet_policy_active_key, bothWHERE status = 'ACTIVE') are rebuilt to includebrand_idso every brand may have its own active topology and active policy simultaneously without colliding on the global "exactly one ACTIVE" constraint.wallet_bucket_typeuniqueness becomes(brand_id, topology_code, topology_version, code)so two brands may legitimately share the sametopology_code(e.g. both inheritRUBY_SPLIT_V1) without bucket-type collisions.- Active topology and policy resolution are per brand. Topology
activation safety checks (per
ADR-005: blocked when unresolved balances, coupon grants, rollings, unsettled bets, or transfers would become unreachable) are scoped to the activating brand only; cross- brand state never blocks activation. wallet_idempotencyuniqueness becomes(brand_id, idempotency_key)so the same key may legitimately appear in two brands; two brands' players sharing the sameaccountstring can independently retry without a phantom-duplicate rejection.wallet_bet_authorizationprovider-bet uniqueness becomes(brand_id, provider_type, provider_id, bet_id)because the same provider may legitimately reuse abet_idacross two brands when outbound calls are namespaced.wallet_servicechecks every command's target row brand against the request brand before any money mutation. Behavior depends onMULTI_BRAND_ENFORCEMENT(the same flag as gateway): inobserve, mismatches log and incrementwallet_cross_brand_rejected_total{command,mode="observe"}and the command is allowed to proceed using the request brand as the authoritative scope (so a missing or wrongX-Brand-Idcannot cause a money write to land in the wrong brand); inenforce, mismatches hard-reject. Money mutations never use a brand inferred from the target row alone.- ADR-005's "single money writer" rule is unchanged.
- Wallet outbox events carry
brand_idas a field on the sharedDomainEventenvelope (rgb_contracts/events/base.py), not on per-payload schemas. Producers and consumers read/write it from the envelope; per-payload schemas are not modified.
Rolling
- Rolling records, rolling inbox/outbox, and completion/cancel retry state
carry
brand_id. - The rolling event consumer reads
brand_idfrom inbound payloads and persists it. - Per-brand rolling completion ratios resolve from
brand_config.
Promotion
- Coupon usage rows, promotion configs, settlement projections, and
promotion_coupon_sagarows carrybrand_id. - Coupon definitions, event configs, and rebate/cashback/lossback rates are per-brand.
- Settlement schedulers iterate brand by brand in
brand_idascending order, sequentially (one brand at a time). Cross-brand aggregation in settlement is forbidden. Parallel-per-brand execution may be introduced later as an explicit follow-up; the day-0 contract is sequential to keep timing deterministic across environments.
Game integration
- Game provider credentials stay global; one set per provider serves every brand.
- Outbound calls to providers use
{brand_code}_{account}as the default account namespace. The separator character_is chosen because it is accepted by every currently integrated provider (HO, MG, WC, BTI, BT1, SPLUS, Digitain, Integration). Providers that constrain account format below this default fall back to a documented per-provider equivalent recorded indocs/services/game-service.md; no per-provider equivalent is introduced silently. - The reverse-parse algorithm is
brand_code-prefix lookup against thebrandtable (longest matching prefix wins). Splitting on the first_would be ambiguous because legacy playeraccountstrings may already contain_. The lookup is cached per process for performance. - For every supported provider, an outbound-account length audit is
performed before Phase 9: if
len(brand_code) + 1 + MAX_PLAYER_ACCOUNT_LEN(worst case16 + 1 + 32 = 49) exceeds the provider's account-field cap, the per-provider equivalent format is documented and used (for example, a shorter separator or a hash projection). The audit and per-provider fallbacks are committed to thegame-serviceprofile before code lands. - Inbound provider callbacks reverse-parse the namespaced account to
recover
brand_idand playeraccount. A callback whose namespaced account does not begin with a knownbrand_codeprefix is rejected and incrementsgame_callback_brand_unresolved_total{provider}. - Provider callback transaction state and provider-specific records
(e.g.
bti_*,ho_*,mg_*,wc_*,digitain_*) carrybrand_id.
Reconciliation
- The
shooter_*table family is split into operator-infrastructure rows (nobrand_id) and brand-scoped event rows:- Operator-global (no
brand_id):shooter_device(physical phone-receiver hardware bindings),shooter_pushbullet(Pushbullet API token bindings),shooter_phone(phone whitelist),shooter_template_recharge(regex parsing rules). These are operator-owned infrastructure and are not duplicated per brand. - Brand-scoped (carries
brand_id):shooter_sms(inbound SMS rows;brand_idis populated when the SMS is matched against aplayer_depositrow, not parsed from message text), and any recon review-state rows that reference a specific player or deposit.
- Operator-global (no
- Recon match decisions resolve
brand_idfrom the matched deposit record (player_deposit.brand_id). - Approvals continue to flow through
wallet_service; cross-brand rejection inwallet_serviceblocks any approval that would touch a different brand's deposit.
Admin surfaces
admin_serviceexposes brand catalog CRUD: create, enable, disable, edit metadata.brand_codeis immutable after creation.admin_serviceexposesbrand_configCRUD with audited writes (timestamp, payload, prior value).admin_serviceexposesagent_brandCRUD with audited writes.- Per-brand topology and policy writes flow through
wallet_serviceCRUD with brand pinned in the request. - The
legacy_admin_v2andlegacy_authstaff routes and the supporting staff identity logic are removed. Any route that survived the staff removal must keep itsstatus/msg/dataenvelope shape.
Staged enforcement
- Every brand-scoped enforcement point (gateway JWT/domain mismatch,
wallet_service cross-brand command rejection, agent_service brand
allow-list rejection, internal-handler
X-Brand-Idrequirement) is gated by a single environment-variable flag,MULTI_BRAND_ENFORCEMENT, with three modes:off: resolve and forward brand context, but never reject.observe(default): check, log, and incrementbrand_resolution_failed_total{reason,service}on mismatch; do not reject.enforce: hard-reject on mismatch with the documented error envelope.
- The flag is read by
gateway,wallet_service,agent_service, and every internal handler that adds brand-scoped enforcement. - The flag is set as a process env var. Mode change requires a service
restart. The current mode is included in each service's
/healthresponse payload. - The "request brand wins" behavior in
observeapplies symmetrically to read paths and write paths: a read query whose request brand differs from the JWT brand still uses the request brand for itsWHERE brand_id = ?filter. This prevents a stale or wrong-brand JWT from disclosing another brand's data during the soak window;enforcemode then rejects the same scenario hard. - A second brand cannot be enabled in any environment until that
environment is in
enforcemode.
Authentication and integrity
Per ADR-009 "Authentication and integrity posture":
admin_serviceruns behind VPN ingress with an IP allow-list. Everybrand,brand_config, andagent_brandwrite requires anX-Operator-Idheader injected by the SSO LB; admin rejects writes without this header. Audit rows includeoperator_id, request IP, request_id, and timestamp.- Internal-service authentication uses per-caller-service tokens
(
INTERNAL_SERVICE_TOKEN_GATEWAY,INTERNAL_SERVICE_TOKEN_AGENT, etc.). A consumer service knows the set of caller tokens it accepts. A compromise of one caller's token cannot impersonate another. - Brand-scoped wallet write commands carry
X-Brand-Signature: HMAC_SHA256(BRAND_SIGNING_KEY, caller_service| brand_id|request_id|timestamp).wallet_servicerejects missing or invalid signatures inenforcemode (logged inobserve). - JWT signing uses RS256. Private key is held by
player_serviceandagent_service; other services verify with the public key. JWT header carrieskid; verifier rejects unknown kids and rejectsalg=noneand HS-family algorithms unconditionally. Key rotation procedure is documented in the runbook. brand_codecharset^[a-z][a-z0-9]{1,15}$PLUS prefix-disjointness:admin_servicerejects any newbrand_codethat is a prefix of an existingbrand_code, has an existingbrand_codeas a prefix, or collides with the prefix of any existingplayer.accountvalue that has been namespaced and sent to a game provider.- The reverse-parse cache in
game_serviceinvalidates on theBRAND_CATALOG_CHANGEDRedis pub/sub channel published byadmin_serviceafter everybrandcreate/disable; processes that miss the message refresh on a TTL of 60 seconds. Same contract forAGENT_BRAND_CHANGEDconsumed byagent_service. Redis pub/sub is best-effort: if Redis is down or a subscriber is briefly partitioned, invalidation messages are lost and correctness reverts to the 60-second TTL fallback. Documented in the runbook's diagnosis playbook for stale-cache scenarios.
Recovery flow brand precedence
- Recovery tokens issued after Phase 5 carry
brand_idas a signed claim. Recovery flow precedence: token-embeddedbrand_idwins; if the token lacksbrand_id(issued before Phase 5), fall back to the request domain's brand. If the resolved brand does not match the underlying player'sbrand_id, the recovery request is rejected hard regardless ofMULTI_BRAND_ENFORCEMENTmode (recovery is too sensitive for observe behavior). - Recovery responses must not leak the existence of an account in another brand. Same email or phone in two brands recovers each independently with brand-scoped tokens.
Idempotency cross-brand collision detection
- The
wallet_idempotencyconstraint is(brand_id, idempotency_key). An inbound idempotency key that matches an existing key in a different brand incrementswallet_idempotency_brand_split_total. Phase 16 release gate requires this counter to be zero across the soak window.
Configuration consolidation
- The hard-coded
projectinteger inadmin_service/app/tasks/tag.pyis removed; behavior previously gated byprojectresolves frombrand_config. - Any global config constant or
global_varrow that is now duplicated by per-brandbrand_configentries (rolling ratios, cashback / rebate / lossback rates, payment channel selection, withdrawal min/max, risk thresholds) is either deleted or explicitly demoted to a "documented global default" used only whenbrand_confighas no brand-specific override. The deletion list is enumerated in plan Phase 12. - Staff-related globals that become dead code with the removal of the
seven
admin_servicelegacy route files (legacy admin auth secret env vars, legacy session config, supporting helper modules) are removed at the same time. Any unused entries inservers_v2/.env.compose.exampleare removed.
Migration
- A
defaultbrand is created at migration time. All existing rows backfill into this brand. - Every existing
agentrow gets anagent_brandrow of the form(agent_id, default_brand_id, status='enabled')seeded at migration time so existing agents retain access through Phase 11 allow-list enforcement and Phase 16 hard-flip. - Backfill is reversible in non-production environments by dropping the added columns and constraints.
- Domain-to-brand bindings for the
defaultbrand are seeded at migration time so the existing single-brand environment continues to resolve. - Production cutover does not enable a second brand until the runbook's release gate is satisfied.
Observability
Required logs:
- structured log fields include
brand_idandbrand_code - edge logs include the resolved domain, the resolved brand, and whether brand resolution succeeded or failed
- wallet command logs include the request brand and the target row brand; cross-brand rejections log both
- game callback logs include the parsed
brand_code,player_account, and the raw outbound account string
Required metrics:
- player-, wallet-, rolling-, promotion-, and game-scoped counters and
histograms carry a
brandlabel whose value isbrand_code - a
brand_resolution_failed_total{reason,service}counter exists at the edge AND on every internal handler that performs brand-scoped checks, withreasonconstrained to a documented enum (jwt_domain_mismatch,missing_header,unknown_domain,jwt_missing_brand,player_brand_mismatch,agent_brand_not_allowed,recovery_brand_mismatch) - a
wallet_cross_brand_rejected_total{command,mode}counter exists inwallet_service - a
game_callback_brand_unresolved_total{provider}counter exists ingame_service - a
wallet_idempotency_brand_split_totalcounter exists inwallet_service(inbound key matches existing key in different brand) - a
multi_brand_enforcement_mode{service}gauge exists in every service that reads the flag (value: 0=off, 1=observe, 2=enforce) - a
brand_resolution_latency_seconds{service}histogram covers the Redis lookup cost on the edge brand-resolution hot path - a
request_total{brand_code,service}counter (per-brand legitimate traffic) so a brand getting zero traffic is detectable - an
event_legacy_schema_total{stream,consumer}counter increments when a consumer accepts aschema_version=1event during the soak - a
security_downgrade_total{service}counter increments when a service starts inMULTI_BRAND_ENFORCEMENT != 'enforce'while more than one enabled brand exists - per-service
*_cross_brand_rejected_total{command,mode}counters exist inagent_service,game_service,promotion_service,recon_service, androlling_service-- mirroring the wallet variant so each owner surface can alarm independently - a
wallet_brand_signature_failed_total{caller_service,reason,mode}counter exists inwallet_servicecovering themissing_headers:*,invalid_timestamp,signature_mismatch, andsignature_replayreasons;signature_replayis also surfaced aswallet_brand_signature_replay_blocked_totalfor dashboarding convenience - a
wallet_brand_signature_missing_total{caller_service}counter exists inwallet_serviceso operators can see which callers still need to migrate before flippingWALLET_BRAND_SIGNATURE_REQUIRE=on - a
wallet_brand_signature_misconfigured_total{mode}counter exists inwallet_service(P1-4) for the runtime fail-closed branch whenBRAND_SIGNING_KEYis empty - a
wallet_brand_signature_replay_redis_outage_total{caller_service,mode}counter exists inwallet_service(P2-δ) when the replay-cache Redis SETNX raises - a
wallet_topology_default_brand_unprimed_totalcounter exists inwallet_service(P2-δ) when a topology / policy lookup arrives without a brand id and the default-brand cache is unprimed - a
wallet_topology_default_brand_fallback_total{caller_origin}counter exists inwallet_servicefor primed-cache default-brand fallbacks (per-call-site breakdown for brand-onboarding) - a
game_callback_verification_bypassed_total{provider}counter exists ingame_service(T1-D-C2) for any callback that reached theVERIFY_CALLBACKS=Falsebypass branch -- MUST be zero in production - a
game_callback_brand_unresolved_total{provider,reason?}counter exists ingame_service;reason="raw_account_in_enforce"andreason="raw_guid_in_enforce"(P1-3) are hard-blocking on the Phase 16 flip - an
internal_caller_token_legacy_total{consumer,caller}counter exists in every internal-token-accepting service for the per-caller rollout (Stage A→B→C signal) - an
internal_caller_token_legacy_rejected_total{consumer,caller}counter exists in every internal-token-accepting service (T4-D-I2) to record requests rejected oncePER_CALLER_TOKEN_REQUIRED=on - a
gateway_session_legacy_key_used_total{reason}counter exists ingateway(T1-D-C1) for the Phase 4E session-key cutover signal, withreason ∈ {missing_brand, fallback_after_miss} - a
gateway_jwt_session_missing_totalcounter exists ingateway(T1-D-C1) for "JWT-valid-but-session-gone" requests - a
gateway_jwt_unknown_kid_totalcounter exists ingateway(T4-D-I4) for JWT verifier rejections by unknown kid; spikes during RS256 kid rotation indicate stale verifier caches - an
agent_brand_not_allowed_total{reason,service}counter exists inagent_servicefor allow-list rejections (the spec-canonicalbrand_resolution_failed_total{reason="agent_brand_not_allowed"}is also emitted alongside it for cross-service dashboards) - an
admin_operator_id_mismatch_total{reason}counter exists inadmin_service(P1-3) for the JWT-vs-header cross-check rejections - an
admin_audit_write_failed_total{route}counter exists inadmin_service(P1-4) for audit-row INSERT failures observed AFTER the money write committed - an
admin_ws_legacy_query_token_totalcounter exists inadmin_service(T4-D-I7) for WebSocket auth via the legacy?token=query string fallback - an
admin_ws_rate_limited_total{reason}counter exists inadmin_service(T4-D-I7) for per-IP rate-limit rejections, withreason ∈ {ip, redis_outage} - an
agent_balance_legacy_write_total{change_type}counter exists inadmin_service(T4-D-I3) for the legacy direct-write surface; goal is zero post-migration - a
pii_aes_unconfigured_total{service,op}counter exists inplayer_serviceandagent_service(T1-D-C3) when the AES PII helpers hit the non-production fallback; MUST be zero in production - a
plisio_callback_replay_blocked_total{status}counter exists inwallet_service(T1-D-C5/I6) for Plisio webhook replays caught by the SETNX guard - a
plisio_callback_replay_redis_outage_total{env}counter exists inwallet_service(T1-D-C5/I6) when the Plisio replay-cache SETNX raises - a
recovery_sms_delivery_failed_total{service}counter exists inplayer_serviceandagent_service(Codex P1-#6) when the SMS branch of/recovery/requestcould not deliver - a
recovery_email_unprovisioned_total{service}counter exists inplayer_serviceandagent_service(Codex P1-#6) when the email recovery branch is selected but no transactional provider is wired - a
recovery_rate_limit_redis_outage_total{bucket}counter exists inplayer_serviceandagent_service(T4-D-I5) for fail-closed per-contact rate-limit rejections caused by Redis exceptions in production - all metric labels are bounded enums; no attacker-controlled value (host, account, raw IP, brand_code beyond the 16-char regex bound) appears as a label
Required alerts (with explicit thresholds):
brand_resolution_failed_total{reason="jwt_domain_mismatch"}rate above 0.1/s sustained over 5 minutes (page on-call). Below threshold is expected baseline noise from scanners and stale clients.brand_resolution_failed_total{reason="missing_header"}non-zero rate sustained over 5 minutes duringenforce(page; inobserveit is informational only).wallet_cross_brand_rejected_total{mode="enforce"}non-zero rate sustained over 1 minute (page; potential cross-brand bug or caller-token compromise).wallet_cross_brand_rejected_total{mode="observe"}non-zero rate during the Phase-16 soak window (warning only; gates the flip but does not page outside the soak window).game_callback_brand_unresolved_totalnon-zero rate sustained over 1 minute (page; real money is stuck on the provider side and callbacks are entering the dead-letter queue).wallet_idempotency_brand_split_totalnon-zero rate (warning; indicates client misrouting or replay).security_downgrade_totalnon-zero in production (page immediately; a service downgraded enforcement mode after enforcement was live).- per-brand wallet outbox publication freshness alert (existing wallet
outbox alert, scoped per
brandlabel). - per-brand
request_totalrate dropping to zero for a previously active brand sustained over 15 minutes (warning; brand outage or domain misconfiguration). - silence procedure: alerts may be silenced by deploy lead for a
documented rollout window; silence must be acknowledged in
#incidents.
Rollout Notes
- Migrations run before any
defaultbrand backfill. Backfill runs before any service starts requiringX-Brand-Id. gatewayenforcement of JWT/domain brand mismatch is enabled only after every downstream service has been deployed with brand-aware filtering.- A second brand is enabled only after the runbook release gate passes.
- The hard-coded
projectinteger is removed only afterbrand_confighas the equivalent values for thedefaultbrand.
Acceptance Criteria
- A migration on a fresh database creates
brand,brand_config,agent_brand, and addsbrand_idto every brand-scoped table with brand-scoped uniqueness. - Backfill on an existing database moves every existing row into a
defaultbrand without data loss; reversibility is verified in a non-production environment. - Two brands can be configured (e.g.
defaultandbrand2) with distinct domains. A registration onbrand2's domain creates aplayerrow inbrand2. The same account string may also register on thedefaultdomain and create an independentplayerrow indefault. - A login on one brand's domain whose JWT was issued for the other brand
is rejected at
gateway. - A wallet command targeting a player whose brand differs from the request brand is rejected before any money mutation.
- Wallet topology and policy can be configured independently per brand; active topology resolution returns the per-brand active document.
- A bet authorized through
game_servicefor a player inbrand2calls the provider with the brand-namespaced account; the provider callback is reverse-parsed back tobrand2and the matching player. - Per-brand rolling completion ratios, cashback/rebate/lossback rates,
and payment channel selection resolve from
brand_config. - All seven listed
admin_servicelegacy route files and the supporting staff identity logic are deleted; previously served paths return404; the test suite reflects the removal. - Structured logs and Prometheus metrics carry brand labels as specified.
- ADR-005's "single money writer" rule is unchanged after this change.
- Player registration rejects any
accountlonger thanMAX_PLAYER_ACCOUNT_LEN = 32chars with the documented validation error. MULTI_BRAND_ENFORCEMENT=enforceis exercised in at least one shared environment after the soak window;gateway,wallet_service,agent_service, and the brand-aware internal handlers all reportenforcemode in/health.- An
agent_brandrow of the form(agent_id, default_brand_id, status='enabled')exists for every pre-migrationagentrow after0023_seed_default_brand.pyruns. The auto-seed trigger (0023b_install_agent_brand_autoseed_trigger.py) covers any newagentrows created during the Phase 2-12 window; the trigger is removed by0028_remove_agent_brand_autoseed_trigger.pyonceadmin_servicewritesagent_brandexplicitly at agent creation in Phase 12. - Settlement schedulers iterate brands in
brand_idascending order, sequentially; observable in worker logs. - External-facing
gatewayandagent_servicestrip any inboundX-Brand-Idheader before injecting the resolved value; verified by a synthetic header-spoofing test.
Required Tests
- migration tests for
brand,brand_config,agent_brand, andbrand_idcolumn additions - backfill tests proving
defaultbrand assignment for every existing row family gatewaybrand resolution tests for valid domain, unknown domain, and domain bound to another agentgatewayJWT brand validation tests covering: cross-brand JWT replay (agent A's token on a domain bound to brand B), cross-agent JWT replay within the same brand (agent A's token on a domain bound to agent B in same brand), JWT missingbrand_idclaim, body/query attempts to override brandgatewayenforcement-mode tests: same scenarios underobserve(logged + counted, not rejected) and underenforce(rejected with documented envelope)player_serviceregistration tests proving cross-brand account independenceagent_serviceallow-list rejection testswallet_servicecross-brand command rejection tests for every command family (deposit approve, withdrawal create, bet authorize, settle, rollback, transfer, points credit, coupon grant, reversal, adjustment)wallet_serviceper-brand topology and policy resolution testsrolling_servicebrand-scoped consumption and progress testspromotion_serviceper-brand coupon, rebate, cashback, and lossback resolution testsgame_serviceoutbound account namespacing tests and inbound callback reverse-parse tests for every supported providerrecon_servicebrand-scoped match and approval testsadmin_servicebrand catalog,brand_config, andagent_brandCRUD testsadmin_servicestaff removal tests confirming all seven deleted route files no longer mount any router and that all previously served paths return404(legacy admin auth, legacy agents v2, legacy agent withdrawals v2, legacy meta v2, legacy recon compatibility, legacy web content)- observability tests confirming log fields and metric labels
- local Docker end-to-end flows for two simultaneous brands covering: registration, login, deposit, bet, settlement, rolling, withdrawal, promotion settlement, coupon issue and use, cross-brand isolation
MAX_PLAYER_ACCOUNT_LENvalidation tests at registration (boundary, oversize, exact-32-char accounts)- backfill audit test confirming no existing
player.accountexceeds 32 chars before the cap is enforced - migration test asserting an
agent_brandrow exists for every pre-migrationagentafter0023runs, AND that the0023bautoseed trigger creates a row for anyagentrow inserted between0023b(install) and0028(remove) - migration test asserting
agent_setting,player_wallet_limit,daebak_email,mg_account,wc_account,digitain_token,i18nPK rewrites land cleanly with rows seeded pre-migration under bothdefaultand a second brand gatewaysynthetic test: external request with inboundX-Brand-Id: 99is stripped; downstream sees only the resolved brand- Redis cache key prefix tests confirming
sms_captcha:{brand_code}: {phone}and login-throttle keys are brand-prefixed; samephonein two brands does not collide - per-provider outbound-account length audit results recorded as a test fixture in game-service tests
- JWT signing tests:
alg=nonerejected,alg=HS256rejected whenRS256is the configured algorithm, JWT with unknownkidrejected, JWT signed with oldkidaccepted during the documented rotation overlap, JWT signed with oldkidrejected after overlap expires admin_servicewrite-without-X-Operator-Idrejected; audit row for every write contains operator identitybrand_codeprefix-disjointness validator: brand-create with prefix collision against existing brand_code rejected; brand-create whose prefix matches an existing namespaced player.account rejected- internal-service-token-spoof test: caller using
INTERNAL_SERVICE_TOKEN_GATEWAYcannot impersonateINTERNAL_SERVICE_TOKEN_AGENTagainst any consumer - brand-signature verification test: brand-scoped wallet write
without
X-Brand-Signatureis rejected inenforce, logged inobserve; signature with wrongBRAND_SIGNING_KEYis rejected inenforce - recovery flow: same email registered in two brands; recovery on brand A's domain returns a token bound to brand A only; recovery response on a non-existent account at brand A produces the same response shape as a known account (no existence oracle)
- observe-mode read-path test: GET endpoint with brand-A JWT against brand-B domain returns either auth-error or empty data, never cross-brand data
- event schema versioning: pre-Phase-6 event in stream is rejected by
consumer in
enforcemode and accepted asdefaultbrand inobserve;event_legacy_schema_totalincrements accordingly security_downgrade_totalalert fires on synthetic boot inobservewhile >1 brand is enabled- agent_brand cache invalidation: admin update to
agent_brandpublishesAGENT_BRAND_CHANGED;agent_serviceprocesses invalidate within 1 second; processes that miss the message refresh on TTL expiry within 60 seconds