ADR-009: Multi-Brand Domain-Routed Isolation
Status
Accepted
Date
2026-04-27
Owners
- Platform Backend
- Player Domain
- Wallet Domain
- Agent Domain
Affected Services
gatewayplayer_servicewallet_servicerolling_servicepromotion_servicegame_serviceagent_serviceadmin_servicerecon_service
Related Docs
docs/specs/multi-brand/2026-04-27-multi-brand-isolation-spec.mddocs/plans/multi-brand/2026-04-27-multi-brand-isolation-plan.mddocs/runbooks/multi-brand/multi-brand-isolation-rollout.mddocs/architecture/data-ownership.mddocs/architecture/domain-ownership.mddocs/architecture/system-overview.mddocs/architecture/http-entrypoints.mddocs/architecture/event-catalog.mddocs/architecture/service-catalog.mddocs/architecture/deployment-topology.mddocs/architecture/migration-readiness.mddocs/services/gateway.mddocs/services/player-service.mddocs/services/wallet-service.mddocs/services/rolling-service.mddocs/services/promotion-service.mddocs/services/game-service.mddocs/services/agent-service.mddocs/services/admin-service.mddocs/services/recon-service.md
Context
servers_v2 was designed under a single-brand assumption. There is no brand,
tenant, site, or operator field anywhere in the data model, no brand context in
request flow, no brand dimension in JWT claims, and no per-brand configuration
surface. The only existing tenant-shaped artifact is a hard-coded project
integer in admin_service/app/tasks/tag.py that switches between two product
labels; it is not a real isolation mechanism.
The platform now needs to host multiple distinct player-facing brands at the
same time on the same servers_v2 runtime. A brand is a self-contained
operating product: its own player base, its own wallet topology and policy, its
own promotion rules, its own configuration, its own domain footprint.
Brands must be isolated for player identity, money state, rolling and settlement, and configuration. Brands may share infrastructure (one PostgreSQL, one Redis, one Docker stack) and may share global integrations such as game provider credentials. Operators (agents) are not exclusive to a brand -- a single agent must be able to serve more than one brand based on an admin-managed allow list.
The existing domain-based identification mechanism in gateway and
player_service (domain:agent:{host} and domain:level:{host} Redis maps)
already routes requests by Host header and resolves them to an agent or a
player level. That mechanism is the natural extension point for brand routing
and is preserved.
A no-staff posture is intentional. Back-office staff identity, role/menu
management, and the legacy_admin_v2 and legacy_auth staff compatibility
paths are being retired together with this change; multi-brand staff
permissions are explicitly out of scope and will not be reintroduced.
Decision
servers_v2 will adopt a domain-routed, single-database, brand-scoped
isolation model.
Brand entity
A brand aggregate is introduced as the top-level isolation boundary. Every
brand has a stable short brand_code (used in outbound game-provider account
namespacing and in observability labels), a default currency, and an enabled
status. A brand_config aggregate stores per-brand configuration values that
may override or replace the existing global configuration surface (rolling
ratios, cashback and rebate rates, payment channel selection, withdrawal
limits, risk thresholds, i18n overrides, theming, customer-service links,
email templates).
Domain-based brand identification
gateway resolves the brand for every external player request from the Host
or Origin header using the same Redis-backed domain map already used to
resolve agent and level. The Redis values for domain:agent:{host} and
domain:level:{host} are extended so that every entry binds a domain to
exactly one brand_id. A domain belongs to one brand. The resolved
brand_id is attached to request.state and forwarded to all downstream
services via an X-Brand-Id request header.
After login, JWT claims must carry brand_id. gateway rejects any request
whose JWT brand does not match the brand resolved from the request domain.
Provider callbacks entering game_service resolve brand_id by reverse
parsing the outbound game-provider account namespace described below.
Internal service-to-service callers must propagate X-Brand-Id.
Single database, brand-scoped rows
servers_v2 keeps a single PostgreSQL database. Brand isolation is enforced
at the row level by adding a non-nullable brand_id column to every
brand-scoped table and changing existing uniqueness constraints to be
brand-scoped. Cross-brand reads are not allowed in domain code paths;
admin-side queries that intentionally aggregate across brands must opt in
explicitly and must be limited to non-money projections.
Schema-per-brand and database-per-brand are explicitly rejected for this iteration. They are out of scope here and may be reconsidered later if regulatory or scale requirements change.
Player identity is per-brand
The same player account string may register independently in two different
brands. The uniqueness key on player becomes (brand_id, account). A
player created under brand A and a player created under brand B with the same
account string are two separate player_id values with two separate wallet
states.
Agents are global, brand allow-listed
Agents are not partitioned by brand. The agent aggregate stays global (no
brand_id column). A new agent_brand join aggregate stores the
admin-managed allow list of (agent_id, brand_id) pairs. Per-agent settings
that vary by brand -- registration mode flags in agent_setting, the
agent-owned agent_domain rows -- gain a brand_id column and brand-scoped
uniqueness so the same agent can host different settings and domains across
the brands it serves.
Player accounts owned by an agent are still per-brand, because players are per-brand.
Wallet topology and policy are per-brand
Wallet topology and wallet policy documents are owned per brand. The active
topology resolution and policy resolution are scoped to the requesting
brand. The uniqueness key on wallet_topology and wallet_policy documents
becomes brand-scoped. Wallet command paths must resolve the brand from
request context and must reject a command that would touch a wallet record
belonging to a different brand than the request brand.
All wallet-owned aggregates carry brand_id: wallet_account,
wallet_bucket, wallet_bucket_type, wallet_coupon_grant,
wallet_bet_authorization, wallet_ledger, wallet_transfer,
wallet_idempotency, wallet outbox, wallet_inbox,
wallet_dead_letter.
Game provider credentials are global; account namespacing is per-brand
Game provider API keys, merchant codes, and callback secrets are not
duplicated per brand. game_service continues to hold one set of credentials
per provider. To make the same provider serve multiple brands without
account collisions, outbound calls use a brand-prefixed account namespace.
The outbound account is composed from brand_code and the player account
string; provider callbacks reverse-parse the brand and the player account
from the namespaced identifier and route the resulting bet, settlement, or
rollback into the correct brand.
Configuration is per-brand
Per-brand configuration replaces the implicit global configuration surface
for any value that may legitimately differ between brands. Resolution order
is: per-brand brand_config value, then a documented global default. Values
that must remain global (game provider credentials, infrastructure URLs,
shared rate limits) are explicitly listed in the spec.
Staff is removed
This change removes every back-office staff-coupled compatibility path
from admin_service. The deleted files are:
admin_service/app/api/routes/legacy_admin_v2.pyadmin_service/app/api/routes/legacy_auth.pyadmin_service/app/api/routes/legacy_agents_v2.py(mounted underlegacy_admin_v2)admin_service/app/api/routes/legacy_agent_withdrawals_v2.py(mounted underlegacy_admin_v2)admin_service/app/api/routes/legacy_meta_v2.py(mounted underlegacy_admin_v2)admin_service/app/api/routes/legacy_recon.py(depends on staff identity helpers inlegacy_auth)admin_service/app/api/routes/legacy_web_content.py(depends on staff identity helpers inlegacy_auth)
The supporting staff identity helpers (_authenticate_legacy_admin,
_refresh_legacy_admin_token, _json_with_token, and any module that
becomes unimported after these deletions) are removed.
Multi-brand staff permission models are not introduced. Back-office
surfaces that survive operate without per-staff identity for the duration
of this change. Any consumer of the deleted routes (notably parts of
bo/admin that depended on legacy admin auth, the legacy recon
compatibility surface, or the legacy web CMS surface) will lose those
endpoints; replacements are out of scope here.
Authentication and integrity posture
Multi-brand isolation is only as strong as the perimeter and the service-to-service trust model. The following decisions are part of this ADR:
admin_service authentication. With the staff layer removed,
admin_service is no longer self-authenticating. Day-0 production
posture is network-level isolation only: admin_service is
reachable only via VPN ingress from a documented IP allow-list, and
its load-balancer rejects all traffic from outside that allow-list.
Every brand, brand_config, and agent_brand write must record an
operator identity drawn from the LB-injected SSO header
(X-Operator-Id); requests without that header are rejected at
admin_service's edge.
Addendum (P1-3, post day-0). The signed-operator-JWT cross-check
(delivered in P1-3) is now the canonical security boundary for
operator identity in admin_service; network-level isolation remains
as defence in depth. require_operator_id
(admin_service/app/api/deps.py) reads the admin_id claim from
the verified admin JWT and hard-rejects any request whose
X-Operator-Id LB header disagrees (HTTP 403 +
admin_operator_id_mismatch_total{reason="header_vs_jwt"}) or whose
JWT lacks the admin_id claim entirely (HTTP 403 +
{reason="jwt_missing_claim"}). Operator identity is now a JWT-bound
contract, not a header-asserted one; the LB allow-list still exists
to keep the surface off the public internet but is no longer the
sole authenticator. The audit row written for every money-mutating
admin route captures the JWT-bound admin_id, not the LB header,
so a header spoof cannot launder writes through admin even if the
LB allow-list itself were misconfigured. See the rollout runbook's
"Delivered hardening" → "Admin operator-id JWT cross-check" section
and the diagnosis playbook entry for
admin_operator_id_mismatch_total.
Internal service tokens. Internal-service trust moves from a
single shared INTERNAL_SERVICE_TOKEN to per-caller-service
tokens (INTERNAL_SERVICE_TOKEN_{CALLER} env var per consuming
service: INTERNAL_SERVICE_TOKEN_GATEWAY,
INTERNAL_SERVICE_TOKEN_AGENT, INTERNAL_SERVICE_TOKEN_ADMIN,
INTERNAL_SERVICE_TOKEN_RECON, INTERNAL_SERVICE_TOKEN_GAME,
INTERNAL_SERVICE_TOKEN_PROMOTION, INTERNAL_SERVICE_TOKEN_ROLLING).
Each consumer service knows the set of caller tokens it accepts. A
compromise of one caller's token cannot impersonate another caller.
Signed X-Brand-Id for brand-scoped internal mutations. For
brand-scoped wallet write commands, the X-Brand-Id header is paired
with X-Brand-Signature: HMAC_SHA256(brand_signing_key, caller_service|brand_id|request_id|timestamp). The brand signing key
is a separate env var (BRAND_SIGNING_KEY) shared only between the
caller services that are authorized to issue brand-scoped commands and
the consumer service (wallet_service). wallet_service rejects
brand-scoped mutations whose signature is missing or invalid in
enforce mode (logged in observe). Brand-scoped read paths and
event-driven brand assertions use the same scheme.
JWT signing. JWT signing uses RS256 with the private key held
exclusively by player_service, agent_service, and (post-T6-B1)
admin_service. Other services verify with the public key. JWT
headers carry a kid (key ID); the verifier rejects any JWT whose
kid is not in the documented active-key allow list. The alg
claim is whitelisted to RS256; alg=none and HS-family algorithms
are rejected unconditionally. Key rotation is a documented runbook
procedure with overlap (old kid stays accepted for the duration of
JWT_EXPIRE_MIN plus a buffer).
Post-T6 amendment. The original draft listed only
player_serviceandagent_serviceas RS256 issuers. T6-B1 migratedadmin_serviceoff the legacy HS256SECRET_KEYflow; admin operator JWTs are now RS256-signed too. T7-B4 added theassert_jwt_public_keys_configuredboot guard so production deployments fail to start whenJWT_PUBLIC_KEYSis unset (preventing the in-process test-keypair fallback from silently accepting forged admin tokens).
brand_code prefix-disjointness. admin_service's brand-create
endpoint rejects any brand_code that is a prefix of an existing
brand_code, that has an existing brand_code as a prefix, or that
collides with the prefix of any existing player.account value
already namespaced and sent to a game provider. This is a write-time
validator backed by a database index scan; it prevents the
reverse-parse ambiguity in game_service callbacks.
Non-Goals
The following are explicitly out of scope for this decision:
- Schema-per-brand or database-per-brand isolation.
- Per-brand game provider credentials (credentials remain global; only the outbound account is namespaced).
- A back-office staff identity model. The staff layer is removed by this change and is not replaced.
- Per-brand infrastructure (separate Redis, separate Postgres, separate Docker stack).
- Cross-brand player identity migration (e.g. promoting a player from one brand to another).
- Cross-brand wallet transfer.
- Per-brand SLA, billing, or quota management.
- Per-brand frontend rendering. The
brand_configslots are defined here; rendering is owned by the consuming frontend.
Event schema versioning across the brand_id rollout
The DomainEvent envelope schema_version is bumped from 1 to 2
when brand_id is added (Phase 1). Producers emit only schema_version = 2 events after Phase 6 deploy. Consumers handle stream events as
follows:
schema_version >= 2: readbrand_idfrom the envelope; applyMULTI_BRAND_ENFORCEMENTsemantics (observe: log + count + use envelope brand; enforce: reject envelope/target mismatch).schema_version == 1andMULTI_BRAND_ENFORCEMENT == 'observe': treat the event as belonging to thedefaultbrand. Incrementevent_legacy_schema_total{stream,consumer}.schema_version == 1andMULTI_BRAND_ENFORCEMENT == 'enforce': reject the event and surface to supervision. By the time enforce is flipped, noschema_version == 1events should remain in the stream (drained during the soak window).
At Phase 6 producer-deploy time, each producer historically emitted
one BRAND_AWARE_PUBLISHING_BEGINS sentinel event so consumers could
log the exact stream offset where the schema bump took effect.
(Historical, removed in T6-E4 / T7-B2: producers no longer emit the
sentinel because the event_legacy_schema_total{stream,consumer}
counter described above covers the same alarm surface and the sentinel
itself was a no-op in every stream consumer. The contract above
remains the source of truth for schema-version handling; the sentinel
is no longer part of the rollout.)
Staged enforcement via a runtime flag
Brand-scoped enforcement (JWT/domain mismatch rejection at gateway,
cross-brand command rejection in wallet_service, agent allow-list
rejection in agent_service, and X-Brand-Id requirement in every
internal handler) is gated by a single runtime flag,
MULTI_BRAND_ENFORCEMENT, with three modes:
off: brand resolution and forwarding still happen, but no rejection. Used during very early rollout windows when downstream services have not yet learnedX-Brand-Idpropagation.observe(default): every brand-scoped check runs and increments abrand_resolution_failed_total{reason,service}counter on mismatch but does not reject the request. Used during staged rollout while old JWTs drain and downstream callers are migrated to forwardX-Brand-Id.enforce: brand-scoped checks reject hard with the documented error envelope. The terminal state for production.
The flag is read by gateway, wallet_service, agent_service, and
every service whose handlers add brand-scoped enforcement. It is set as
a process environment variable; flipping the mode requires a service
restart (no hot-reload). The flag value is exposed in the /health
response so operators can see each service's current enforcement mode
at a glance.
The flag is flipped to enforce only after the rollout runbook's soak
window confirms zero legitimate mismatches in observe mode and every
in-flight JWT has either rotated or expired beyond the configured
JWT_EXPIRE_MIN.
Observability carries brand
Structured logs include brand_id. Prometheus metrics that are
player-scoped, wallet-scoped, rolling-scoped, promotion-scoped, or
game-scoped carry a brand label. Outbox events and cross-service messages
carry brand_id in the payload.
Consequences
Positive:
- Multiple brands can run on the same
servers_v2runtime without sharing player identity, money state, or business configuration. - The same agent can serve multiple brands without duplicating the agent aggregate.
- The same player account string can register independently in two brands without collision.
- Wallet topology and policy can diverge between brands, allowing brand-specific betting and payout behavior.
- Per-brand configuration removes hard-coded global assumptions about rolling ratios, cashback rates, payment channels, and i18n.
- Existing domain-routing behavior in
gatewayandplayer_serviceis preserved; brand becomes one additional projection on the same lookup. - Staff complexity exits the system entirely.
Negative:
- Every brand-scoped table requires a schema migration, a backfill, and a uniqueness-constraint change.
- Every brand-scoped query must be audited to filter by
brand_id; row-level isolation is enforced by code, not by the database. - JWT format changes require coordinated deployment across
gatewayand every downstream service. - Game provider account namespacing changes the outbound account string; provider-side bet history and reconciliation must be reviewed before cutover.
- Removing staff also removes any in-flight back-office identity capability until a future replacement is decided.
Constraints
- One database, one Redis. Schema-per-brand and database-per-brand are out of scope.
- Every brand-scoped query must include
brand_idin its filter. Helpers, repositories, and ORM access must make brand filtering hard to forget. - A request whose resolved
brand_idcannot be determined from domain or callback context is rejected at the edge. - A JWT whose
brand_iddoes not match the request domain's brand is rejected atgateway. - A wallet command that targets a player, bucket, coupon grant, or ledger
row whose
brand_iddoes not match the request brand is rejected before any money mutation. - Game provider credentials and integration secrets are not duplicated per brand.
- Outbound game-provider account names must be derived deterministically
from
brand_codeand the player account string and must be reversible during callback handling. - A domain belongs to exactly one brand and to exactly one agent.
Cross-brand and cross-agent domain reuse is forbidden.
agent_domainkeeps its existing surrogateguid BIGINTprimary key (so existing FK references toagent_domain.guidsurvive); a compositeUNIQUE(agent_id, brand_id, domain)is added, and the existingUNIQUE(domain)global constraint is retained. The combination guarantees that the samedomainvalue cannot appear twice under any(agent_id, brand_id)combination. agentis global.agent_brandis the only place where brand membership for an agent is recorded.agent_settingPK becomes(agent_id, brand_id).wallet_serviceremains the only money writer; brand isolation does not loosen ADR-005.- Back-office staff identity, role/menu management, and the
legacy_admin_v2andlegacy_authstaff routes are removed by this change. They are not reintroduced under a per-brand permission model.
Follow-Up
- Implement the multi-brand isolation spec and plan referenced above.
- Backfill all existing rows into a single
defaultbrand before enabling any second brand in any environment. - Update stable docs to reflect brand scoping:
architecture/data-ownership.md,architecture/domain-ownership.md,architecture/system-overview.md,architecture/http-entrypoints.md,architecture/event-catalog.md,architecture/service-catalog.md,architecture/deployment-topology.md,architecture/migration-readiness.md, and every service profile underdocs/services/. - Promote the multi-brand rollout runbook to
Readyonly after local Docker validation, staging validation, and provider-side reconciliation review are complete. - Decide post-launch whether a back-office identity model returns; if it does, it must be designed against the brand boundary defined here, not against the removed staff model.
Implementation
The decision recorded in this ADR has been delivered across the following
commits on main (Phases 1-15 of the multi-brand isolation plan; Phase 16
is the operator hard-flip and is tracked separately by the runbook):
- Phase 1 -- shared contracts and models:
53639f23,1eca859c,bc258a05. - Phase 2 -- migrations and
defaultbrand backfill:304deec6,d01f85a9,c5d39d52,d7d0e134,69c5d5a0. - Phase 3 -- domain-to-brand map and edge resolution (soft-fail):
8011f2d6,030383d5,586b6588. - Phase 4 -- player_service brand scoping:
288a5f18,5730f74e,a410cbce. - Phase 5 -- brand-aware JWT issuance (RS256 with
kid):910ab341,5c77fc81. - Phase 6 -- wallet_service brand scoping (per-brand topology, policy,
and X-Brand-Signature verification):
63db9df4,c218e91d. - Phase 7 -- rolling_service brand scoping:
e16d0974. - Phase 8 -- promotion_service brand scoping:
c56438b5. - Phase 9 -- game_service brand scoping (outbound namespacing + inbound
reverse-parse):
8da5848a. - Phase 10 -- recon_service brand scoping:
2a6e9728. - Phase 11 -- agent_service global agent + brand allow-list:
06e853f2. - Phase 12 -- admin_service brand catalog, brand_config, agent_brand
CRUD, and staff route removal:
dd8a0007. - Phase 13 -- observability wiring (counters, gauges, structured-log
brand enrichment):
e2d25f0f. - Phase 14 -- local Docker two-brand E2E harness:
b73cc22c. - Phase 15 -- stable-doc refresh, spec promotion (
Approved), runbook promotion (Ready): delivered by the same commit that adds this section.
Phase 16 (the runtime flip from MULTI_BRAND_ENFORCEMENT=observe to
enforce) is operator action driven by
docs/runbooks/multi-brand/phase-16-hard-flip-checklist.md and the
operator script under
servers_v2/tools/multi_brand_backfill/flip_enforcement.py.