跳到主要内容

Gateway

Status

Active

Date

2026-04-28

Owners

  • Platform Backend

Last Verified Commit

7a579730

Runtime

API only

Purpose

gateway is the player-facing HTTP edge for servers_v2. It keeps old player route families externally stable while routing requests to the appropriate owner service.

Primary Entry Points

  • player route families under /api/v1/*
  • legacy root-level player aliases such as /user/*, /game/*, /finance/*
  • provider pass-through routes handled by provider_routes.py

Dependencies

  • Redis
  • JWT secret
  • downstream URLs for:
    • player_service
    • wallet_service
    • game_service
    • rolling_service
    • promotion_service
  • admin_service and agent_service are not gateway downstreams; their frontends connect directly to those services.

Background Work

None.

Owned Data

None.

gateway is an edge and compatibility layer, not a domain owner.

Events

Emits:

  • none

Consumes:

  • none

Health

  • exposes /health
  • exposes /metrics in Prometheus text format through the shared observability middleware
  • binds or generates X-Request-ID, echoes it on responses, and forwards it to downstream services
  • emits structured JSON logs with service=gateway and request_id
  • no database dependency
  • request correctness relies on downstream health and middleware behavior

Key Env Vars

  • REDIS_URL
  • SECRET_KEY
  • MULTI_BRAND_ENFORCEMENT — required; one of off / observe / enforce. Production target is enforce post-Phase-16. Drives multi_brand_enforcement_mode{service="gateway"} gauge and the JWT-vs-domain reject decision in BrandEnforcementMiddleware.
  • BRAND_SIGNING_KEYrequired in production. HMAC-SHA256 secret used to sign X-Brand-Signature on outbound brand-scoped wallet writes (gateway is one of the 6 wallet write callers).
  • INTERNAL_SERVICE_TOKEN_GATEWAYrequired in production; per-caller token presented to every downstream service. Required once PER_CALLER_TOKEN_REQUIRED=on on the consumer.
  • JWT_PUBLIC_KEYSrequired in production; JSON map of kid -> public_key for RS256 verification. Hot-reloadable via the jwt_public_keys_changed Redis pub/sub channel (T4-D-I4); spikes in gateway_jwt_unknown_kid_total indicate stale verifier caches.
  • JWT_KID — currently-active kid; used for diagnostic /health reporting. Verifier accepts every kid in JWT_PUBLIC_KEYS.
  • PER_CALLER_TOKEN_REQUIREDon is the Phase 16 target on the consumer side; gateway is a caller, so this env var is consulted only by the downstream services it calls.
  • INTERNAL_SERVICE_TOKEN — legacy single-shared-token; deprecated. Phase 16 release gate requires the bare variant to be absent.
  • PLAYER_SERVICE_URL
  • WALLET_SERVICE_URL
  • GAME_SERVICE_URL
  • ROLLING_SERVICE_URL
  • PROMOTION_SERVICE_URL
  • RATE_LIMIT_PER_MINUTE

Multi-Brand Constraints

Per ADR-009:

  • player-facing routes resolve brand_id from the request domain via the existing _extract_domain helper plus the Redis maps domain:agent:{host} and domain:level:{host}; both maps now resolve to a value carrying brand_id
  • a request whose domain does not resolve to a brand is rejected at the edge
  • the resolved brand is attached to request.state.brand_id and forwarded to all downstream services as the X-Brand-Id header
  • after login, JWT carries a brand_id claim; gateway checks jwt.brand_id == request.state.brand_id and behaves per MULTI_BRAND_ENFORCEMENT: observe logs + counts mismatches, enforce hard-rejects with the documented error envelope. The current mode is included in /health response payload
  • gateway does not accept a brand override from request body, query, or inbound X-Brand-Id header from external clients (any inbound X-Brand-Id is stripped before the resolved value is injected)
  • X-Brand-Id must be > 0 (T6-E1): the BrandContext builder rejects values <= 0 (and any non-numeric value) by treating them as missing. Previously a 0 or negative value silently passed through and downstream services treated it as "brand 0" -- the legacy default. Callers that need to operate without a brand must omit the header entirely; they must NOT send X-Brand-Id: 0
  • _extract_domain precedence: Origin first, then Host fallback; trust assumes TLS-terminating LB upstream
  • JWT signing uses RS256 with public key bundled in gateway for verification; verifier rejects unknown kid, alg=none, and any HS-family algorithm. Private signing key is held by player_service, agent_service, and (post-T6-B1) admin_service
  • internal calls to downstream services use per-caller tokens (INTERNAL_SERVICE_TOKEN_GATEWAY); brand-scoped wallet write commands carry X-Brand-Signature HMAC over (gateway, brand_id, request_id, timestamp) using BRAND_SIGNING_KEY
  • legacy player routes inject domain into the body the same way as today; the brand projection is layered on top, not replaced
  • POST /api/v1/game/launch performs a best-effort provider allow-list precheck from Redis key brand:provider_allowlist:{brand_id} when present. game_service remains authoritative and rechecks brand_provider_config from PostgreSQL before minting any provider launch URL
  • POST /api/v1/game/launch is JWT-protected and is the only supported player launch entrypoint. The gateway overwrites any client-supplied player_id with the verified JWT player id, forwards the resolved X-Brand-Id, and adds internal-service headers to the backing game_service /integration/launch call; direct account-only or caller-supplied-player-id launch requests are not part of the public contract.

Security

Player session cross-check (T1-D-C1, post Phase 4E)

The gateway validates every authenticated request against the active player session row written by player_service at login. The lookup contract:

  1. The brand-prefixed key user-session:brand:{brand_id}:{account} (matching player_service.app.services.cache_keys.session_key) is the only session key that gateway reads. The brand_id source is, in order:
    • request.state.brand_id (set by BrandResolutionMiddleware)
    • the verified JWT brand_id claim (Phase 5B and later tokens)
  2. If neither source carries a brand_id, gateway returns the v1 AUTH_REQUIRED envelope. It does not fall back to the pre-Phase-4E user-session-{account} key.
  3. If the brand-prefixed lookup misses, the gateway returns the v1 AUTH_REQUIRED envelope (HTTP 200 + {"status":3,...}) AND increments gateway_jwt_session_missing_total so dashboards can split "JWT-valid-but-session-gone" from "JWT-bad/expired" (the latter returns from the verifier earlier and never reaches the session lookup).

The legacy fallback and its gateway_session_legacy_key_used_total cutover metric were removed by the Phase 4E dual-read sunset cleanup. Player sessions minted before brand-prefixed session keys are no longer accepted.

JWT verifier hot-reload (T4-D-I4)

Each gateway pod caches a JwtVerifier on app.state.jwt_verifier so the per-request hot path does not re-parse JWT_PUBLIC_KEYS on every call. That cache is invalidated by Redis pub/sub on the jwt_public_keys_changed channel:

  1. Operator distributes the new public key to every pod (env-var refresh
    • SIGHUP / process-manager reload). The new JWT_PUBLIC_KEYS map should contain BOTH the old and new kid for the duration of the overlap window.
  2. Operator runs tools/jwt_kid/rotate_jwt_kid.sh (or redis-cli PUBLISH jwt_public_keys_changed <ts>) once. Every gateway pod is subscribed at startup; on receiving the message it sets app.state.jwt_verifier = None and the next request rebuilds with the freshly-loaded settings.
  3. After the overlap window, drop the old kid from the env and re-run the rotate script.

The old "rolling restart per kid rotation" workflow is no longer required.

gateway_jwt_unknown_kid_total{service="gateway"} is emitted whenever the verifier rejects a token because its kid header is not in the active key set. A spike during rotation indicates pods that haven't seen the pub/sub message yet (transient -- should drain within the cache- invalidation latency); sustained non-zero indicates either a stale signer or attempted forgery.

Middleware-order structural assertion (T7-B5 / T8-D2)

Starlette executes middleware in LIFO order relative to add_middleware calls — middleware added LATER runs EARLIER on the request. The brand resolver MUST run before JWT auth so JWT can read the brand-prefixed Redis session key (otherwise JWT silently falls back to the legacy unprefixed key, defeating cross-brand session isolation).

tests/test_middleware_order.py::test_brand_resolution_runs_before_jwt_auth asserts the contract structurally by introspecting app.user_middleware after import: BrandResolutionMiddleware must appear at an earlier index than JWTAuthMiddleware (= registered later = runs first). A future PR that reorders add_middleware calls in app/main.py and breaks the contract fails this test immediately rather than at runtime via mysterious cross-brand session leaks.

Tests

  • cd servers_v2/gateway && uv run pytest
  • key suites:
  • tests/test_player_route_contracts.py
  • tests/test_legacy_routes.py
  • tests/test_middleware.py
    • tests/test_internal_auth_headers.py
    • tests/test_brand_session_lookup.py -- T1-D-C1 brand-aware session cross-check + dual-read fallback + cross-brand isolation