跳到主要内容

Admin Service

Status

Active

Date

2026-04-28

Owners

  • Platform Backend

Last Verified Commit

934d47f4 (T16-A admin_service Critical+Important leaks closed + T16-C classifier dynamic-SQL upgrade + T16-D doc sync).

T16-A logs.py brand-scope contract

servers_v2/admin_service/app/api/routes/logs.py::_query_log_table is the shared paginator for every operator log read. Pre-T16-A the helper built WHERE 1=1 and appended user-supplied filters with no brand predicate, so any operator could read every brand's player_balance_log / agent_balance_log / player_login_log / agent_login_log history by guessing pagination params (CR9-D-Crit-1).

Post-T16-A:

  • _query_log_table(session, table, body, *, brand_id, ...) — the brand_id argument is required.
  • The first predicate the helper appends is unconditionally conditions.append("brand_id = :brand_id"), so the brand_id pin is always present BEFORE any user-supplied filter.
  • The 4 real-table endpoints (player_balance_logs, agent_balance_logs, player_login_logs, agent_login_logs) call resolve_brand_id
    • brand_required_envelope and refuse the request when X-Brand-Id is missing.
  • The 9 endpoints whose target tables are not yet provisioned (rebate_log, cashback_log, etc.) get the same brand-required plumbing for future safety.

The static RW classifier could not have caught this helper because the SQL was built dynamically (f"SELECT ... FROM {table} WHERE {where} ..."). T16-C added a FIX-DYNAMIC class to the classifier that scans for the WHERE 1=1 builder + f-string template patterns and flags them when the surrounding scope has no recognised conditions.append("brand_id ...") idiom — so a regression in this contract (or any sibling helper that crops up later) breaks CI the same way a static missing-predicate would.

Runtime

API + standalone worker

Purpose

admin_service is the only intended back-office HTTP edge in servers_v2. It owns top-info websocket delivery, admin-side operational APIs, selected aggregation adapters such as recon top-info, and the admin-facing control surface used by bo/admin.

It also owns the admin-facing wallet topology and wallet policy configuration edge, while delegating all money mutation execution to wallet_service.

Primary Entry Points

  • /api/v1/*
  • /api/v1/wallet/*
  • /ws
  • /internal/meta/*

Removed by ADR-009 (now respond 404):

  • /v2/* compatibility slice for legacy admin_server cutover (served by deleted legacy_admin_v2.py)
  • /api/admin/user/* (served by deleted legacy_auth.py)
  • /api/admin/pushbullet/* and /api/admin/shooter/* (served by deleted legacy_recon.py)
  • /api/admin/web/* (served by deleted legacy_web_content.py)

Dependencies

  • PostgreSQL
  • Redis
  • JWT secret
  • AES and password-salt config
  • wallet_service
  • rolling_service
  • promotion_service
  • recon_service
  • Cloudflare R2-compatible object storage
  • internal service token for trusted internal sync paths

Background Work

admin_worker runs scheduler-backed operational jobs, including:

  • statistics jobs
  • level checks
  • expiration jobs
  • attendance processing
  • tag tasks
  • optional Vera sync

Owned Data

  • admin edge compatibility semantics
  • top-info websocket aggregation
  • admin-facing operational query and control surfaces

Important boundary:

  • admin_service orchestrates and approves
  • admin_service does not become a second money writer
  • wallet config writes (/api/v1/wallet/topologies/*, /api/v1/wallet/policies/*) are control-plane operations only; money movement stays in wallet_service

Legacy migration note:

  • /v2/* compatibility is being restored incrementally for no-diff admin cutover
  • the current source of truth for that work is docs/specs/migrations/2026-04-22-admin-v2-route-compatibility.md

Events

Emits:

  • no primary Redis stream ownership is documented

Consumes:

  • no primary Redis stream consumption is documented

Internal signaling:

  • internal websocket sync hooks are used to refresh top-info after meaningful state changes

Health

  • API health endpoints are exposed without auth
  • admin_worker health depends on heartbeat plus scheduler and job-health readiness

Key Env Vars

  • DATABASE_URL
  • REDIS_URL
  • SECRET_KEYlegacy session-cookie material only post-T6-B1; admin JWTs no longer use HS256, so this value is consulted by the legacy session cookie path (and a small set of pre-RS256 fixtures) but never by decode_admin_token. Keep set in dev/test for fixture compatibility; production deployments may leave it unset once all session-cookie consumers are retired.
  • JWT_PRIVATE_KEYrequired in production (T6-B1); RSA private key (PEM) used by create_admin_token to mint RS256 admin JWTs. Provision per env via Vault path secret/auth/admin/jwt-private-key/<env>.
  • JWT_PUBLIC_KEYSrequired in production (T6-B1); JSON {kid: pem} map of accepted RSA public keys used by decode_admin_token to verify inbound admin JWTs. Multiple kids allow rotation without downtime. T7-B4 added assert_jwt_public_keys_configured boot guard: empty value in a production-grade runtime refuses to boot — otherwise the in-process test-keypair singleton fallback would silently accept attacker-forged tokens.
  • JWT_KIDrequired in production (T6-B1); the kid value create_admin_token stamps into the JWT header so verifiers know which entry of JWT_PUBLIC_KEYS to use.
  • PASSWORD_SALT
  • AES_KEYrequired in production for admin-owned PII helpers; 32-byte AES-256 key. New writes use versioned AES-256-GCM with a random nonce (aesgcm:v1: prefix), and reads keep legacy AES-256-CBC compatibility for historical rows.
  • AES_IVrequired in production until the legacy CBC backfill is complete; 16-byte IV used only to decrypt historical AES-256-CBC rows. New writes do not reuse a static IV.
  • MULTI_BRAND_ENFORCEMENT — required; one of off / observe / enforce. Production target is enforce post-Phase-16. Drives multi_brand_enforcement_mode{service="admin_service"} gauge. Admin itself does not gate operator-supplied brand_id (admin operators are authenticated via JWT and trusted to scope their own queries); the flag governs admin's outbound enforcement posture for downstream calls.
  • BRAND_SIGNING_KEYrequired in production; HMAC-SHA256 secret used to sign X-Brand-Signature on outbound brand-scoped wallet writes from admin-side flows (one of the 6 wallet write callers).
  • INTERNAL_SERVICE_TOKEN_ADMINrequired in production; per-caller token presented to wallet/rolling/promotion/recon when admin calls them.
  • PER_CALLER_TOKEN_REQUIREDon is the Phase 16 target on the consumer side. Admin is a caller; the consumer-side env vars on wallet/rolling/etc are what advance the gate.
  • WALLET_SERVICE_URL
  • ROLLING_SERVICE_URL
  • PROMOTION_SERVICE_URL
  • RECON_SERVICE_URL
  • INTERNAL_SERVICE_TOKEN — legacy single-shared-token; deprecated. Phase 16 release gate requires the bare variant to be absent.

Known Migration Gap

admin_service is strong on core admin flows, but it is not yet the full replacement for all legacy middle_server modules.

Still unresolved for a full legacy cutover:

  • role and menu management
  • legacy config editing
  • i18n management
  • BI tooling
  • coin passthrough behavior

Multi-Brand Constraints

Per ADR-009:

  • admin_service runs behind VPN ingress with an IP allow-list; every write resolves an operator id from the admin JWT (admin_id claim, see app/services/auth.py::create_admin_token). The legacy X-Operator-Id header is still accepted for backward compatibility with the SSO LB but is cross-checked against the JWT claim — see Security below. Operator id is captured on every audit row. Network-isolation + JWT-bound operator id replaces the deleted staff identity layer
  • admin_service owns the brand catalog, the brand_config per-brand configuration aggregate, and brand_provider_config provider allow-list policy; brand-create rejects brand_code values that prefix-collide with existing brand codes or with legacy player.account strings already namespaced and sent to providers. The first single-provider PATCH /api/v1/brands/{brand_id}/providers/{provider_id} on a brand with no provider rows materializes the current globally-visible provider set before applying the edit, so that brand stops inheriting future provider additions until the policy is cleared or replaced
  • brand_config now has an operator-facing schema registry at GET /api/v1/brands/config/schema and a bulk import route at POST /api/v1/brands/{brand_id}/config/import. The import accepts {config: {...}}, {entries: [...]}, or the static server-config export shape under brand.config; known keys are normalized and validated before write, while dry-run returns the normalized payload without mutating data. The first seven operational customizations are implemented as brand-scoped policy keys: frontend_config, registration_policy, feature_policy, payment_policy, sms_policy, provider allow-list (brand_provider_config), and the existing reward/rolling scalar config (rebate_rate, cashback_rate, lossback_rate, rolling_completion_ratio, valid_odds_threshold). Runtime consumers are intentionally split by ownership: player_service exposes public brand settings and enforces registration/login/SMS policy; wallet_service enforces payment channel, limits, and maintenance policy; game_service enforces game list/launch switches in addition to provider allow-list; promotion_service enforces promotion feature switches and brand-scoped event/coupon reads/writes
  • Policy precedence is deliberate: feature_policy.*_enabled is the maintenance/kill-switch layer, while registration_policy and payment_policy are product/channel-level gates. The effective runtime decision is an AND across the relevant layers. For payment limits and Plisio, the payment_policy object wins over legacy scalar compatibility keys (deposit_min_amount, payment_channel_plisio_enabled, etc.); new server-config exports should emit only the policy object. payment_policy.deposit_maintenance_enabled and its maintenance time fields are tri-state: omitted/null falls back to legacy global_var, false or empty string is an explicit brand override
  • admin_service owns the write surface for agent_brand (which brands an agent may serve)
  • Public content rows (player_notice, banner, promotion, promotion_event) are brand-scoped in runtime reads. When enabling a new brand or migrating old global content, operators must confirm each existing row has the intended brand_id; one announcement or promotion no longer fans out to every brand automatically
  • Release note for frontend/ops: site_setting now also returns public brand config objects, Plisio support endpoints require X-Brand-Id, and gate_create/coin_create reject amounts outside the effective brand deposit range. Existing JS clients that ignore unknown fields continue to work, but strict clients and direct internal health checks should be updated before rollout. See docs/runbooks/multi-brand/2026-05-08-brand-customization-release-notes.md
  • the admin entry surface is brand-global only for catalog/global-control routes. Routes that read or write brand-scoped operational tables (player, player_deposit, player_withdraw, player_message, coupon, coupon_log, coupon_recycle_log, etc.) must resolve one operating brand from request.state.brand_id and apply it as a SQL predicate before touching tenant data. Callers should provide X-Brand-Id through the gateway/operator UI; if a brand-scoped route cannot resolve a positive brand id it returns the standard status=54 / brand context required envelope instead of defaulting to all brands. Downstream services still apply MULTI_BRAND_ENFORCEMENT semantics on the brand-id admin forwards, so a stale or wrong brand_id from admin is caught at the wallet/rolling/promotion edge via *_cross_brand_rejected_total. The audit row written for every money-mutating admin route captures brand_id so cross-brand admin actions are observable post-hoc
  • per-brand configuration writes are audited (timestamp, payload, prior value); per-brand topology and policy writes flow through wallet_service topology/policy CRUD with the brand pinned in the request
  • between Phase 2 and Phase 12, a temporary Postgres BEFORE INSERT trigger on agent auto-creates the matching agent_brand(agent_id, default_brand_id, 'enabled') row; from Phase 12 onward admin_service (and agent_service's agent-create path, if any) writes agent_brand explicitly, and the trigger is dropped by 0028_remove_agent_brand_autoseed_trigger.py
  • after every brand create / disable, admin_service publishes BRAND_CATALOG_CHANGED on Redis pub/sub so game_service reverse-parse caches invalidate within seconds; after every agent_brand write admin_service publishes AGENT_BRAND_CHANGED so agent_service allow-list caches invalidate; after every brand_provider_config write it publishes BRAND_PROVIDER_CHANGED and refreshes the Redis brand:provider_allowlist:{brand_id} cache used by gateway prechecks. That Redis snapshot stores configured provider ids; admin read APIs return effective ids after applying global provider.is_show
  • seven admin_service legacy route files are deleted by ADR-009: legacy_admin_v2.py, legacy_auth.py, legacy_agents_v2.py, legacy_agent_withdrawals_v2.py, legacy_meta_v2.py, legacy_recon.py, legacy_web_content.py; the supporting staff identity helpers are removed with them; previously served paths under /api/admin/user/*, /api/admin/pushbullet/*, /api/admin/shooter/*, /api/admin/web/rules/*, /api/admin/web/faq/*, and /api/admin/web/config/* now return 404; back-office routes that survive operate without per-staff identity for the duration of this change
  • BrandResolutionMiddleware (T6-B / T7-B6 / T8-D1). A pure-ASGI middleware (app/middleware/brand.py) resolves request.state.brand_id once per request so admin routes can read the brand consistently without re-parsing the header. Resolution order is (1) X-Brand-Id header — set by the gateway after T6-B2 or by the operator UI; (2) Redis domain map (domain:agent:{host} / domain:level:{host}) when admin_service is hit directly. The middleware itself is best-effort: a missing header AND Redis miss leaves brand_id = None. Brand-scoped route handlers are not best-effort; they call brand_required_envelope() and refuse the operation when brand_id is missing. Two T8-D1 hardenings: (a) the middleware skips /health, /healthz, /docs, /openapi.json, and /redoc so a tight liveness probe schedule does not amplify a Redis brownout into route latency; /auth/login is intentionally NOT skipped because login may legitimately need a brand mapping; (b) the previously-silent except Exception Redis path now emits brand_resolution_failed_total{service="admin_service",reason="redis_error"} so a Redis incident is alarmable instead of degrading silently.

Security

Operator identity (P1-3)

require_operator_id (app/api/deps.py) is the canonical resolver for the operator id captured on every audit row. It enforces:

  1. JWT cross-check. Reads the admin_id claim from the verified admin JWT. If X-Operator-Id is also present and disagrees, the request is hard-rejected with 403 and admin_operator_id_mismatch_total{reason="header_vs_jwt"} is incremented. Operator identity is a security boundary, not a tenancy concern — there is no observe mode.
  2. JWT fallback. If X-Operator-Id is absent, the JWT admin_id is used. The header is deprecated; new callers should not send it.
  3. Legacy token rejection. If the JWT lacks an admin_id claim and a header is supplied, the request is rejected 403 with admin_operator_id_mismatch_total{reason="jwt_missing_claim"}. Such tokens cannot attribute writes to a real operator.
  4. Last-resort 400. If neither header nor JWT carries an operator id, the request is rejected 400 — no audit row could be attributed.

Money-write audit guarantees (P1-4)

Every admin endpoint that mutates a player or agent balance (directly or via wallet_service) writes an admin_audit row after the wallet/state mutation returns successfully. Routes covered:

  • player.py: POST /players/{id}/adjustADJUST_BALANCE
  • player_finance.py: POST /players/deposits/{id}/agreeAPPROVE_DEPOSIT; POST /players/deposits/{id}/declineDECLINE_DEPOSIT; POST /players/deposits/{id}/recycleRECYCLE_DEPOSIT; POST /players/deposits/{id}/coin-agree and POST /player/deposit/coin/agreeAPPROVE_COIN_DEPOSIT; POST /players/deposits/shooter-agreeAPPROVE_DEPOSIT (per-item); POST /players/withdrawals/{id}/review-agreeREVIEW_AGREE_WITHDRAW; POST /players/withdrawals/{id}/review-declineREVIEW_DECLINE_WITHDRAW; POST /players/withdrawals/{id}/pay-agreePAY_AGREE_WITHDRAW; POST /players/withdrawals/{id}/pay-declinePAY_DECLINE_WITHDRAW
  • agent_finance.py: POST /agents/withdrawals/{id}/agreeAGREE_AGENT_WITHDRAW; POST /agents/withdrawals/{id}/declineDECLINE_AGENT_WITHDRAW; POST /agents/withdrawals/{id}/payPAY_AGENT_WITHDRAW
  • agent.py (T4-D-I3): POST /agents/{id}/reset-passwordRESET_AGENT_PASSWORD; POST /agents/{id}/lockLOCK_AGENT; POST /agents/{id}/unlockUNLOCK_AGENT; POST /agents/{id}/update-commissionUPDATE_AGENT_COMMISSION; POST /agents/{id}/adjust-balanceADJUST_AGENT_BALANCE
  • player.py (T4-D-I3): POST /players/{id}/edit-passwordEDIT_PLAYER_PASSWORD; POST /players/{id}/edit-phoneEDIT_PLAYER_PHONE

Audit rows capture operator_id (from the resolver above), target_type (player/agent), target_id, brand_id (looked up from the affected row), before/after deltas (status, amount, transaction_id only — never full rows), reason (from the request body when present), and the request X-Request-Id. The shared writer is app/services/admin_audit.py::write_money_admin_audit.

If the wallet_service call fails, the audit row is not written — the action did not happen. If the audit insert fails after the money write committed (a partition or DB outage), the route still returns the wallet response (the money write cannot be rolled back) but logs at ERROR level and increments admin_audit_write_failed_total{route=<route>}. On-call must reconcile the audit gap manually.

Sensitive payload handling. Password routes capture {"changed": true} only — never the plaintext password, hash, or salt. The phone route captures sha256-first-8 hashes for old + new values so reviewers can detect "did the number actually change" without storing PII in the audit table.

Legacy direct write. adjust_agent_balance still mutates agent.balance directly (bypassing wallet_service) and increments agent_balance_legacy_write_total{change_type} on every call. The counter must hit zero before the route can be migrated to wallet_service.

Admin WebSocket auth (T4-D-I7)

POST /api/v1/ws (the back-office real-time notifications channel) accepts the admin JWT via:

  1. Preferred — Sec-WebSocket-Protocol subprotocol. Browser usage: new WebSocket(url, ["bearer", "<jwt>"]). The server echoes Sec-WebSocket-Protocol: bearer on accept. Tokens delivered this way do not appear in nginx / ALB access logs, browser history, proxy logs, or Referer headers.
  2. Legacy — ?token=<jwt> query string. Allowed only when admin_ws_allow_query_token=True AND the runtime env is non-production. Production-runtime requests carrying a query-string token are closed with code 4003. Each fallback emits a loud WARN log + admin_ws_legacy_query_token_total so cutover progress is visible. Once the counter is silent, set admin_ws_allow_query_token=False to harden production permanently.

Additional gates:

  • Per-IP rate limit. admin_ws_rate_limit_per_min (default 20) caps accept() calls per IP per 60s window using a Redis counter. Excess connections are closed with policy-violation (1008) and admin_ws_rate_limited_total{reason="ip"} is emitted. When Redis is unavailable, prod-runtime fails closed (reason="redis_outage"); dev fails open.
  • Lifetime bound to JWT exp. A timer scheduled at accept time closes the socket with policy-violation (1008) when the token's exp claim passes — a long-lived connection cannot outlive its auth.

Tests

  • cd servers_v2/admin_service && uv run pytest
  • key suites:
    • tests/test_admin_integration.py
    • tests/test_client_urls.py