跳到主要内容

ADR-009: Multi-Brand Domain-Routed Isolation

Status

Accepted

Date

2026-04-27

Owners

  • Platform Backend
  • Player Domain
  • Wallet Domain
  • Agent Domain

Affected Services

  • gateway
  • player_service
  • wallet_service
  • rolling_service
  • promotion_service
  • game_service
  • agent_service
  • admin_service
  • recon_service
  • docs/specs/multi-brand/2026-04-27-multi-brand-isolation-spec.md
  • docs/plans/multi-brand/2026-04-27-multi-brand-isolation-plan.md
  • docs/runbooks/multi-brand/multi-brand-isolation-rollout.md
  • docs/architecture/data-ownership.md
  • docs/architecture/domain-ownership.md
  • docs/architecture/system-overview.md
  • docs/architecture/http-entrypoints.md
  • docs/architecture/event-catalog.md
  • docs/architecture/service-catalog.md
  • docs/architecture/deployment-topology.md
  • docs/architecture/migration-readiness.md
  • docs/services/gateway.md
  • docs/services/player-service.md
  • docs/services/wallet-service.md
  • docs/services/rolling-service.md
  • docs/services/promotion-service.md
  • docs/services/game-service.md
  • docs/services/agent-service.md
  • docs/services/admin-service.md
  • docs/services/recon-service.md

Context

servers_v2 was designed under a single-brand assumption. There is no brand, tenant, site, or operator field anywhere in the data model, no brand context in request flow, no brand dimension in JWT claims, and no per-brand configuration surface. The only existing tenant-shaped artifact is a hard-coded project integer in admin_service/app/tasks/tag.py that switches between two product labels; it is not a real isolation mechanism.

The platform now needs to host multiple distinct player-facing brands at the same time on the same servers_v2 runtime. A brand is a self-contained operating product: its own player base, its own wallet topology and policy, its own promotion rules, its own configuration, its own domain footprint.

Brands must be isolated for player identity, money state, rolling and settlement, and configuration. Brands may share infrastructure (one PostgreSQL, one Redis, one Docker stack) and may share global integrations such as game provider credentials. Operators (agents) are not exclusive to a brand -- a single agent must be able to serve more than one brand based on an admin-managed allow list.

The existing domain-based identification mechanism in gateway and player_service (domain:agent:{host} and domain:level:{host} Redis maps) already routes requests by Host header and resolves them to an agent or a player level. That mechanism is the natural extension point for brand routing and is preserved.

A no-staff posture is intentional. Back-office staff identity, role/menu management, and the legacy_admin_v2 and legacy_auth staff compatibility paths are being retired together with this change; multi-brand staff permissions are explicitly out of scope and will not be reintroduced.

Decision

servers_v2 will adopt a domain-routed, single-database, brand-scoped isolation model.

Brand entity

A brand aggregate is introduced as the top-level isolation boundary. Every brand has a stable short brand_code (used in outbound game-provider account namespacing and in observability labels), a default currency, and an enabled status. A brand_config aggregate stores per-brand configuration values that may override or replace the existing global configuration surface (rolling ratios, cashback and rebate rates, payment channel selection, withdrawal limits, risk thresholds, i18n overrides, theming, customer-service links, email templates).

Domain-based brand identification

gateway resolves the brand for every external player request from the Host or Origin header using the same Redis-backed domain map already used to resolve agent and level. The Redis values for domain:agent:{host} and domain:level:{host} are extended so that every entry binds a domain to exactly one brand_id. A domain belongs to one brand. The resolved brand_id is attached to request.state and forwarded to all downstream services via an X-Brand-Id request header.

After login, JWT claims must carry brand_id. gateway rejects any request whose JWT brand does not match the brand resolved from the request domain.

Provider callbacks entering game_service resolve brand_id by reverse parsing the outbound game-provider account namespace described below. Internal service-to-service callers must propagate X-Brand-Id.

Single database, brand-scoped rows

servers_v2 keeps a single PostgreSQL database. Brand isolation is enforced at the row level by adding a non-nullable brand_id column to every brand-scoped table and changing existing uniqueness constraints to be brand-scoped. Cross-brand reads are not allowed in domain code paths; admin-side queries that intentionally aggregate across brands must opt in explicitly and must be limited to non-money projections.

Schema-per-brand and database-per-brand are explicitly rejected for this iteration. They are out of scope here and may be reconsidered later if regulatory or scale requirements change.

Player identity is per-brand

The same player account string may register independently in two different brands. The uniqueness key on player becomes (brand_id, account). A player created under brand A and a player created under brand B with the same account string are two separate player_id values with two separate wallet states.

Agents are global, brand allow-listed

Agents are not partitioned by brand. The agent aggregate stays global (no brand_id column). A new agent_brand join aggregate stores the admin-managed allow list of (agent_id, brand_id) pairs. Per-agent settings that vary by brand -- registration mode flags in agent_setting, the agent-owned agent_domain rows -- gain a brand_id column and brand-scoped uniqueness so the same agent can host different settings and domains across the brands it serves.

Player accounts owned by an agent are still per-brand, because players are per-brand.

Wallet topology and policy are per-brand

Wallet topology and wallet policy documents are owned per brand. The active topology resolution and policy resolution are scoped to the requesting brand. The uniqueness key on wallet_topology and wallet_policy documents becomes brand-scoped. Wallet command paths must resolve the brand from request context and must reject a command that would touch a wallet record belonging to a different brand than the request brand.

All wallet-owned aggregates carry brand_id: wallet_account, wallet_bucket, wallet_bucket_type, wallet_coupon_grant, wallet_bet_authorization, wallet_ledger, wallet_transfer, wallet_idempotency, wallet outbox, wallet_inbox, wallet_dead_letter.

Game provider credentials are global; account namespacing is per-brand

Game provider API keys, merchant codes, and callback secrets are not duplicated per brand. game_service continues to hold one set of credentials per provider. To make the same provider serve multiple brands without account collisions, outbound calls use a brand-prefixed account namespace. The outbound account is composed from brand_code and the player account string; provider callbacks reverse-parse the brand and the player account from the namespaced identifier and route the resulting bet, settlement, or rollback into the correct brand.

Configuration is per-brand

Per-brand configuration replaces the implicit global configuration surface for any value that may legitimately differ between brands. Resolution order is: per-brand brand_config value, then a documented global default. Values that must remain global (game provider credentials, infrastructure URLs, shared rate limits) are explicitly listed in the spec.

Staff is removed

This change removes every back-office staff-coupled compatibility path from admin_service. The deleted files are:

  • admin_service/app/api/routes/legacy_admin_v2.py
  • admin_service/app/api/routes/legacy_auth.py
  • admin_service/app/api/routes/legacy_agents_v2.py (mounted under legacy_admin_v2)
  • admin_service/app/api/routes/legacy_agent_withdrawals_v2.py (mounted under legacy_admin_v2)
  • admin_service/app/api/routes/legacy_meta_v2.py (mounted under legacy_admin_v2)
  • admin_service/app/api/routes/legacy_recon.py (depends on staff identity helpers in legacy_auth)
  • admin_service/app/api/routes/legacy_web_content.py (depends on staff identity helpers in legacy_auth)

The supporting staff identity helpers (_authenticate_legacy_admin, _refresh_legacy_admin_token, _json_with_token, and any module that becomes unimported after these deletions) are removed.

Multi-brand staff permission models are not introduced. Back-office surfaces that survive operate without per-staff identity for the duration of this change. Any consumer of the deleted routes (notably parts of bo/admin that depended on legacy admin auth, the legacy recon compatibility surface, or the legacy web CMS surface) will lose those endpoints; replacements are out of scope here.

Authentication and integrity posture

Multi-brand isolation is only as strong as the perimeter and the service-to-service trust model. The following decisions are part of this ADR:

admin_service authentication. With the staff layer removed, admin_service is no longer self-authenticating. Day-0 production posture is network-level isolation only: admin_service is reachable only via VPN ingress from a documented IP allow-list, and its load-balancer rejects all traffic from outside that allow-list. Every brand, brand_config, and agent_brand write must record an operator identity drawn from the LB-injected SSO header (X-Operator-Id); requests without that header are rejected at admin_service's edge.

Addendum (P1-3, post day-0). The signed-operator-JWT cross-check (delivered in P1-3) is now the canonical security boundary for operator identity in admin_service; network-level isolation remains as defence in depth. require_operator_id (admin_service/app/api/deps.py) reads the admin_id claim from the verified admin JWT and hard-rejects any request whose X-Operator-Id LB header disagrees (HTTP 403 + admin_operator_id_mismatch_total{reason="header_vs_jwt"}) or whose JWT lacks the admin_id claim entirely (HTTP 403 + {reason="jwt_missing_claim"}). Operator identity is now a JWT-bound contract, not a header-asserted one; the LB allow-list still exists to keep the surface off the public internet but is no longer the sole authenticator. The audit row written for every money-mutating admin route captures the JWT-bound admin_id, not the LB header, so a header spoof cannot launder writes through admin even if the LB allow-list itself were misconfigured. See the rollout runbook's "Delivered hardening" → "Admin operator-id JWT cross-check" section and the diagnosis playbook entry for admin_operator_id_mismatch_total.

Internal service tokens. Internal-service trust moves from a single shared INTERNAL_SERVICE_TOKEN to per-caller-service tokens (INTERNAL_SERVICE_TOKEN_{CALLER} env var per consuming service: INTERNAL_SERVICE_TOKEN_GATEWAY, INTERNAL_SERVICE_TOKEN_AGENT, INTERNAL_SERVICE_TOKEN_ADMIN, INTERNAL_SERVICE_TOKEN_RECON, INTERNAL_SERVICE_TOKEN_GAME, INTERNAL_SERVICE_TOKEN_PROMOTION, INTERNAL_SERVICE_TOKEN_ROLLING). Each consumer service knows the set of caller tokens it accepts. A compromise of one caller's token cannot impersonate another caller.

Signed X-Brand-Id for brand-scoped internal mutations. For brand-scoped wallet write commands, the X-Brand-Id header is paired with X-Brand-Signature: HMAC_SHA256(brand_signing_key, caller_service|brand_id|request_id|timestamp). The brand signing key is a separate env var (BRAND_SIGNING_KEY) shared only between the caller services that are authorized to issue brand-scoped commands and the consumer service (wallet_service). wallet_service rejects brand-scoped mutations whose signature is missing or invalid in enforce mode (logged in observe). Brand-scoped read paths and event-driven brand assertions use the same scheme.

JWT signing. JWT signing uses RS256 with the private key held exclusively by player_service, agent_service, and (post-T6-B1) admin_service. Other services verify with the public key. JWT headers carry a kid (key ID); the verifier rejects any JWT whose kid is not in the documented active-key allow list. The alg claim is whitelisted to RS256; alg=none and HS-family algorithms are rejected unconditionally. Key rotation is a documented runbook procedure with overlap (old kid stays accepted for the duration of JWT_EXPIRE_MIN plus a buffer).

Post-T6 amendment. The original draft listed only player_service and agent_service as RS256 issuers. T6-B1 migrated admin_service off the legacy HS256 SECRET_KEY flow; admin operator JWTs are now RS256-signed too. T7-B4 added the assert_jwt_public_keys_configured boot guard so production deployments fail to start when JWT_PUBLIC_KEYS is unset (preventing the in-process test-keypair fallback from silently accepting forged admin tokens).

brand_code prefix-disjointness. admin_service's brand-create endpoint rejects any brand_code that is a prefix of an existing brand_code, that has an existing brand_code as a prefix, or that collides with the prefix of any existing player.account value already namespaced and sent to a game provider. This is a write-time validator backed by a database index scan; it prevents the reverse-parse ambiguity in game_service callbacks.

Non-Goals

The following are explicitly out of scope for this decision:

  • Schema-per-brand or database-per-brand isolation.
  • Per-brand game provider credentials (credentials remain global; only the outbound account is namespaced).
  • A back-office staff identity model. The staff layer is removed by this change and is not replaced.
  • Per-brand infrastructure (separate Redis, separate Postgres, separate Docker stack).
  • Cross-brand player identity migration (e.g. promoting a player from one brand to another).
  • Cross-brand wallet transfer.
  • Per-brand SLA, billing, or quota management.
  • Per-brand frontend rendering. The brand_config slots are defined here; rendering is owned by the consuming frontend.

Event schema versioning across the brand_id rollout

The DomainEvent envelope schema_version is bumped from 1 to 2 when brand_id is added (Phase 1). Producers emit only schema_version = 2 events after Phase 6 deploy. Consumers handle stream events as follows:

  • schema_version >= 2: read brand_id from the envelope; apply MULTI_BRAND_ENFORCEMENT semantics (observe: log + count + use envelope brand; enforce: reject envelope/target mismatch).
  • schema_version == 1 and MULTI_BRAND_ENFORCEMENT == 'observe': treat the event as belonging to the default brand. Increment event_legacy_schema_total{stream,consumer}.
  • schema_version == 1 and MULTI_BRAND_ENFORCEMENT == 'enforce': reject the event and surface to supervision. By the time enforce is flipped, no schema_version == 1 events should remain in the stream (drained during the soak window).

At Phase 6 producer-deploy time, each producer historically emitted one BRAND_AWARE_PUBLISHING_BEGINS sentinel event so consumers could log the exact stream offset where the schema bump took effect. (Historical, removed in T6-E4 / T7-B2: producers no longer emit the sentinel because the event_legacy_schema_total{stream,consumer} counter described above covers the same alarm surface and the sentinel itself was a no-op in every stream consumer. The contract above remains the source of truth for schema-version handling; the sentinel is no longer part of the rollout.)

Staged enforcement via a runtime flag

Brand-scoped enforcement (JWT/domain mismatch rejection at gateway, cross-brand command rejection in wallet_service, agent allow-list rejection in agent_service, and X-Brand-Id requirement in every internal handler) is gated by a single runtime flag, MULTI_BRAND_ENFORCEMENT, with three modes:

  • off: brand resolution and forwarding still happen, but no rejection. Used during very early rollout windows when downstream services have not yet learned X-Brand-Id propagation.
  • observe (default): every brand-scoped check runs and increments a brand_resolution_failed_total{reason,service} counter on mismatch but does not reject the request. Used during staged rollout while old JWTs drain and downstream callers are migrated to forward X-Brand-Id.
  • enforce: brand-scoped checks reject hard with the documented error envelope. The terminal state for production.

The flag is read by gateway, wallet_service, agent_service, and every service whose handlers add brand-scoped enforcement. It is set as a process environment variable; flipping the mode requires a service restart (no hot-reload). The flag value is exposed in the /health response so operators can see each service's current enforcement mode at a glance.

The flag is flipped to enforce only after the rollout runbook's soak window confirms zero legitimate mismatches in observe mode and every in-flight JWT has either rotated or expired beyond the configured JWT_EXPIRE_MIN.

Observability carries brand

Structured logs include brand_id. Prometheus metrics that are player-scoped, wallet-scoped, rolling-scoped, promotion-scoped, or game-scoped carry a brand label. Outbox events and cross-service messages carry brand_id in the payload.

Consequences

Positive:

  • Multiple brands can run on the same servers_v2 runtime without sharing player identity, money state, or business configuration.
  • The same agent can serve multiple brands without duplicating the agent aggregate.
  • The same player account string can register independently in two brands without collision.
  • Wallet topology and policy can diverge between brands, allowing brand-specific betting and payout behavior.
  • Per-brand configuration removes hard-coded global assumptions about rolling ratios, cashback rates, payment channels, and i18n.
  • Existing domain-routing behavior in gateway and player_service is preserved; brand becomes one additional projection on the same lookup.
  • Staff complexity exits the system entirely.

Negative:

  • Every brand-scoped table requires a schema migration, a backfill, and a uniqueness-constraint change.
  • Every brand-scoped query must be audited to filter by brand_id; row-level isolation is enforced by code, not by the database.
  • JWT format changes require coordinated deployment across gateway and every downstream service.
  • Game provider account namespacing changes the outbound account string; provider-side bet history and reconciliation must be reviewed before cutover.
  • Removing staff also removes any in-flight back-office identity capability until a future replacement is decided.

Constraints

  • One database, one Redis. Schema-per-brand and database-per-brand are out of scope.
  • Every brand-scoped query must include brand_id in its filter. Helpers, repositories, and ORM access must make brand filtering hard to forget.
  • A request whose resolved brand_id cannot be determined from domain or callback context is rejected at the edge.
  • A JWT whose brand_id does not match the request domain's brand is rejected at gateway.
  • A wallet command that targets a player, bucket, coupon grant, or ledger row whose brand_id does not match the request brand is rejected before any money mutation.
  • Game provider credentials and integration secrets are not duplicated per brand.
  • Outbound game-provider account names must be derived deterministically from brand_code and the player account string and must be reversible during callback handling.
  • A domain belongs to exactly one brand and to exactly one agent. Cross-brand and cross-agent domain reuse is forbidden. agent_domain keeps its existing surrogate guid BIGINT primary key (so existing FK references to agent_domain.guid survive); a composite UNIQUE(agent_id, brand_id, domain) is added, and the existing UNIQUE(domain) global constraint is retained. The combination guarantees that the same domain value cannot appear twice under any (agent_id, brand_id) combination.
  • agent is global. agent_brand is the only place where brand membership for an agent is recorded. agent_setting PK becomes (agent_id, brand_id).
  • wallet_service remains the only money writer; brand isolation does not loosen ADR-005.
  • Back-office staff identity, role/menu management, and the legacy_admin_v2 and legacy_auth staff routes are removed by this change. They are not reintroduced under a per-brand permission model.

Follow-Up

  • Implement the multi-brand isolation spec and plan referenced above.
  • Backfill all existing rows into a single default brand before enabling any second brand in any environment.
  • Update stable docs to reflect brand scoping: architecture/data-ownership.md, architecture/domain-ownership.md, architecture/system-overview.md, architecture/http-entrypoints.md, architecture/event-catalog.md, architecture/service-catalog.md, architecture/deployment-topology.md, architecture/migration-readiness.md, and every service profile under docs/services/.
  • Promote the multi-brand rollout runbook to Ready only after local Docker validation, staging validation, and provider-side reconciliation review are complete.
  • Decide post-launch whether a back-office identity model returns; if it does, it must be designed against the brand boundary defined here, not against the removed staff model.

Implementation

The decision recorded in this ADR has been delivered across the following commits on main (Phases 1-15 of the multi-brand isolation plan; Phase 16 is the operator hard-flip and is tracked separately by the runbook):

  • Phase 1 -- shared contracts and models: 53639f23, 1eca859c, bc258a05.
  • Phase 2 -- migrations and default brand backfill: 304deec6, d01f85a9, c5d39d52, d7d0e134, 69c5d5a0.
  • Phase 3 -- domain-to-brand map and edge resolution (soft-fail): 8011f2d6, 030383d5, 586b6588.
  • Phase 4 -- player_service brand scoping: 288a5f18, 5730f74e, a410cbce.
  • Phase 5 -- brand-aware JWT issuance (RS256 with kid): 910ab341, 5c77fc81.
  • Phase 6 -- wallet_service brand scoping (per-brand topology, policy, and X-Brand-Signature verification): 63db9df4, c218e91d.
  • Phase 7 -- rolling_service brand scoping: e16d0974.
  • Phase 8 -- promotion_service brand scoping: c56438b5.
  • Phase 9 -- game_service brand scoping (outbound namespacing + inbound reverse-parse): 8da5848a.
  • Phase 10 -- recon_service brand scoping: 2a6e9728.
  • Phase 11 -- agent_service global agent + brand allow-list: 06e853f2.
  • Phase 12 -- admin_service brand catalog, brand_config, agent_brand CRUD, and staff route removal: dd8a0007.
  • Phase 13 -- observability wiring (counters, gauges, structured-log brand enrichment): e2d25f0f.
  • Phase 14 -- local Docker two-brand E2E harness: b73cc22c.
  • Phase 15 -- stable-doc refresh, spec promotion (Approved), runbook promotion (Ready): delivered by the same commit that adds this section.

Phase 16 (the runtime flip from MULTI_BRAND_ENFORCEMENT=observe to enforce) is operator action driven by docs/runbooks/multi-brand/phase-16-hard-flip-checklist.md and the operator script under servers_v2/tools/multi_brand_backfill/flip_enforcement.py.