Admin Service
Status
Active
Date
2026-04-28
Owners
- Platform Backend
Last Verified Commit
934d47f4 (T16-A admin_service Critical+Important leaks closed +
T16-C classifier dynamic-SQL upgrade + T16-D doc sync).
T16-A logs.py brand-scope contract
servers_v2/admin_service/app/api/routes/logs.py::_query_log_table
is the shared paginator for every operator log read. Pre-T16-A the
helper built WHERE 1=1 and appended user-supplied filters with no
brand predicate, so any operator could read every brand's
player_balance_log / agent_balance_log / player_login_log /
agent_login_log history by guessing pagination params (CR9-D-Crit-1).
Post-T16-A:
_query_log_table(session, table, body, *, brand_id, ...)— thebrand_idargument is required.- The first predicate the helper appends is unconditionally
conditions.append("brand_id = :brand_id"), so the brand_id pin is always present BEFORE any user-supplied filter. - The 4 real-table endpoints (
player_balance_logs,agent_balance_logs,player_login_logs,agent_login_logs) callresolve_brand_idbrand_required_envelopeand refuse the request whenX-Brand-Idis missing.
- The 9 endpoints whose target tables are not yet provisioned
(
rebate_log,cashback_log, etc.) get the same brand-required plumbing for future safety.
The static RW classifier could not have caught this helper because
the SQL was built dynamically (f"SELECT ... FROM {table} WHERE {where} ..."). T16-C added a FIX-DYNAMIC class to the classifier
that scans for the WHERE 1=1 builder + f-string template patterns
and flags them when the surrounding scope has no recognised
conditions.append("brand_id ...") idiom — so a regression in this
contract (or any sibling helper that crops up later) breaks CI the
same way a static missing-predicate would.
Runtime
API + standalone worker
Purpose
admin_service is the only intended back-office HTTP edge in servers_v2.
It owns top-info websocket delivery, admin-side operational APIs, selected
aggregation adapters such as recon top-info, and the admin-facing control
surface used by bo/admin.
It also owns the admin-facing wallet topology and wallet policy configuration
edge, while delegating all money mutation execution to wallet_service.
Primary Entry Points
/api/v1/*/api/v1/wallet/*/ws/internal/meta/*
Removed by ADR-009 (now respond 404):
/v2/*compatibility slice for legacyadmin_servercutover (served by deletedlegacy_admin_v2.py)/api/admin/user/*(served by deletedlegacy_auth.py)/api/admin/pushbullet/*and/api/admin/shooter/*(served by deletedlegacy_recon.py)/api/admin/web/*(served by deletedlegacy_web_content.py)
Dependencies
- PostgreSQL
- Redis
- JWT secret
- AES and password-salt config
wallet_servicerolling_servicepromotion_servicerecon_service- Cloudflare R2-compatible object storage
- internal service token for trusted internal sync paths
Background Work
admin_worker runs scheduler-backed operational jobs, including:
- statistics jobs
- level checks
- expiration jobs
- attendance processing
- tag tasks
- optional Vera sync
Owned Data
- admin edge compatibility semantics
- top-info websocket aggregation
- admin-facing operational query and control surfaces
Important boundary:
admin_serviceorchestrates and approvesadmin_servicedoes not become a second money writer- wallet config writes (
/api/v1/wallet/topologies/*,/api/v1/wallet/policies/*) are control-plane operations only; money movement stays inwallet_service
Legacy migration note:
/v2/*compatibility is being restored incrementally for no-diff admin cutover- the current source of truth for that work is
docs/specs/migrations/2026-04-22-admin-v2-route-compatibility.md
Events
Emits:
- no primary Redis stream ownership is documented
Consumes:
- no primary Redis stream consumption is documented
Internal signaling:
- internal websocket sync hooks are used to refresh top-info after meaningful state changes
Health
- API health endpoints are exposed without auth
admin_workerhealth depends on heartbeat plus scheduler and job-health readiness
Key Env Vars
DATABASE_URLREDIS_URLSECRET_KEY— legacy session-cookie material only post-T6-B1; admin JWTs no longer use HS256, so this value is consulted by the legacy session cookie path (and a small set of pre-RS256 fixtures) but never bydecode_admin_token. Keep set in dev/test for fixture compatibility; production deployments may leave it unset once all session-cookie consumers are retired.JWT_PRIVATE_KEY— required in production (T6-B1); RSA private key (PEM) used bycreate_admin_tokento mint RS256 admin JWTs. Provision per env via Vault pathsecret/auth/admin/jwt-private-key/<env>.JWT_PUBLIC_KEYS— required in production (T6-B1); JSON{kid: pem}map of accepted RSA public keys used bydecode_admin_tokento verify inbound admin JWTs. Multiplekids allow rotation without downtime. T7-B4 addedassert_jwt_public_keys_configuredboot guard: empty value in a production-grade runtime refuses to boot — otherwise the in-process test-keypair singleton fallback would silently accept attacker-forged tokens.JWT_KID— required in production (T6-B1); thekidvaluecreate_admin_tokenstamps into the JWT header so verifiers know which entry ofJWT_PUBLIC_KEYSto use.PASSWORD_SALTAES_KEY— required in production for admin-owned PII helpers; 32-byte AES-256 key. New writes use versioned AES-256-GCM with a random nonce (aesgcm:v1:prefix), and reads keep legacy AES-256-CBC compatibility for historical rows.AES_IV— required in production until the legacy CBC backfill is complete; 16-byte IV used only to decrypt historical AES-256-CBC rows. New writes do not reuse a static IV.MULTI_BRAND_ENFORCEMENT— required; one ofoff/observe/enforce. Production target isenforcepost-Phase-16. Drivesmulti_brand_enforcement_mode{service="admin_service"}gauge. Admin itself does not gate operator-suppliedbrand_id(admin operators are authenticated via JWT and trusted to scope their own queries); the flag governs admin's outbound enforcement posture for downstream calls.BRAND_SIGNING_KEY— required in production; HMAC-SHA256 secret used to signX-Brand-Signatureon outbound brand-scoped wallet writes from admin-side flows (one of the 6 wallet write callers).INTERNAL_SERVICE_TOKEN_ADMIN— required in production; per-caller token presented to wallet/rolling/promotion/recon when admin calls them.PER_CALLER_TOKEN_REQUIRED—onis the Phase 16 target on the consumer side. Admin is a caller; the consumer-side env vars on wallet/rolling/etc are what advance the gate.WALLET_SERVICE_URLROLLING_SERVICE_URLPROMOTION_SERVICE_URLRECON_SERVICE_URLINTERNAL_SERVICE_TOKEN— legacy single-shared-token; deprecated. Phase 16 release gate requires the bare variant to be absent.
Known Migration Gap
admin_service is strong on core admin flows, but it is not yet the full
replacement for all legacy middle_server modules.
Still unresolved for a full legacy cutover:
- role and menu management
- legacy config editing
- i18n management
- BI tooling
- coin passthrough behavior
Multi-Brand Constraints
Per ADR-009:
admin_serviceruns behind VPN ingress with an IP allow-list; every write resolves an operator id from the admin JWT (admin_idclaim, seeapp/services/auth.py::create_admin_token). The legacyX-Operator-Idheader is still accepted for backward compatibility with the SSO LB but is cross-checked against the JWT claim — see Security below. Operator id is captured on every audit row. Network-isolation + JWT-bound operator id replaces the deleted staff identity layeradmin_serviceowns thebrandcatalog, thebrand_configper-brand configuration aggregate, andbrand_provider_configprovider allow-list policy; brand-create rejectsbrand_codevalues that prefix-collide with existing brand codes or with legacyplayer.accountstrings already namespaced and sent to providers. The first single-providerPATCH /api/v1/brands/{brand_id}/providers/{provider_id}on a brand with no provider rows materializes the current globally-visible provider set before applying the edit, so that brand stops inheriting future provider additions until the policy is cleared or replacedbrand_confignow has an operator-facing schema registry atGET /api/v1/brands/config/schemaand a bulk import route atPOST /api/v1/brands/{brand_id}/config/import. The import accepts{config: {...}},{entries: [...]}, or the static server-config export shape underbrand.config; known keys are normalized and validated before write, while dry-run returns the normalized payload without mutating data. The first seven operational customizations are implemented as brand-scoped policy keys:frontend_config,registration_policy,feature_policy,payment_policy,sms_policy, provider allow-list (brand_provider_config), and the existing reward/rolling scalar config (rebate_rate,cashback_rate,lossback_rate,rolling_completion_ratio,valid_odds_threshold). Runtime consumers are intentionally split by ownership:player_serviceexposes public brand settings and enforces registration/login/SMS policy;wallet_serviceenforces payment channel, limits, and maintenance policy;game_serviceenforces game list/launch switches in addition to provider allow-list;promotion_serviceenforces promotion feature switches and brand-scoped event/coupon reads/writes- Policy precedence is deliberate:
feature_policy.*_enabledis the maintenance/kill-switch layer, whileregistration_policyandpayment_policyare product/channel-level gates. The effective runtime decision is an AND across the relevant layers. For payment limits and Plisio, thepayment_policyobject wins over legacy scalar compatibility keys (deposit_min_amount,payment_channel_plisio_enabled, etc.); new server-config exports should emit only the policy object.payment_policy.deposit_maintenance_enabledand its maintenance time fields are tri-state: omitted/null falls back to legacyglobal_var,falseor empty string is an explicit brand override admin_serviceowns the write surface foragent_brand(which brands an agent may serve)- Public content rows (
player_notice,banner,promotion,promotion_event) are brand-scoped in runtime reads. When enabling a new brand or migrating old global content, operators must confirm each existing row has the intendedbrand_id; one announcement or promotion no longer fans out to every brand automatically - Release note for frontend/ops:
site_settingnow also returns public brand config objects, Plisio support endpoints requireX-Brand-Id, andgate_create/coin_createreject amounts outside the effective brand deposit range. Existing JS clients that ignore unknown fields continue to work, but strict clients and direct internal health checks should be updated before rollout. Seedocs/runbooks/multi-brand/2026-05-08-brand-customization-release-notes.md - the admin entry surface is brand-global only for catalog/global-control routes. Routes that read or write brand-scoped operational tables (
player,player_deposit,player_withdraw,player_message,coupon,coupon_log,coupon_recycle_log, etc.) must resolve one operating brand fromrequest.state.brand_idand apply it as a SQL predicate before touching tenant data. Callers should provideX-Brand-Idthrough the gateway/operator UI; if a brand-scoped route cannot resolve a positive brand id it returns the standardstatus=54/brand context requiredenvelope instead of defaulting to all brands. Downstream services still applyMULTI_BRAND_ENFORCEMENTsemantics on the brand-id admin forwards, so a stale or wrongbrand_idfrom admin is caught at the wallet/rolling/promotion edge via*_cross_brand_rejected_total. The audit row written for every money-mutating admin route capturesbrand_idso cross-brand admin actions are observable post-hoc - per-brand configuration writes are audited (timestamp, payload, prior value); per-brand topology and policy writes flow through
wallet_servicetopology/policy CRUD with the brand pinned in the request - between Phase 2 and Phase 12, a temporary Postgres
BEFORE INSERTtrigger onagentauto-creates the matchingagent_brand(agent_id, default_brand_id, 'enabled')row; from Phase 12 onwardadmin_service(andagent_service's agent-create path, if any) writesagent_brandexplicitly, and the trigger is dropped by0028_remove_agent_brand_autoseed_trigger.py - after every
brandcreate / disable,admin_servicepublishesBRAND_CATALOG_CHANGEDon Redis pub/sub sogame_servicereverse-parse caches invalidate within seconds; after everyagent_brandwriteadmin_servicepublishesAGENT_BRAND_CHANGEDsoagent_serviceallow-list caches invalidate; after everybrand_provider_configwrite it publishesBRAND_PROVIDER_CHANGEDand refreshes the Redisbrand:provider_allowlist:{brand_id}cache used by gateway prechecks. That Redis snapshot stores configured provider ids; admin read APIs return effective ids after applying globalprovider.is_show - seven
admin_servicelegacy route files are deleted byADR-009:legacy_admin_v2.py,legacy_auth.py,legacy_agents_v2.py,legacy_agent_withdrawals_v2.py,legacy_meta_v2.py,legacy_recon.py,legacy_web_content.py; the supporting staff identity helpers are removed with them; previously served paths under/api/admin/user/*,/api/admin/pushbullet/*,/api/admin/shooter/*,/api/admin/web/rules/*,/api/admin/web/faq/*, and/api/admin/web/config/*now return404; back-office routes that survive operate without per-staff identity for the duration of this change BrandResolutionMiddleware(T6-B / T7-B6 / T8-D1). A pure-ASGI middleware (app/middleware/brand.py) resolvesrequest.state.brand_idonce per request so admin routes can read the brand consistently without re-parsing the header. Resolution order is (1)X-Brand-Idheader — set by the gateway after T6-B2 or by the operator UI; (2) Redis domain map (domain:agent:{host}/domain:level:{host}) when admin_service is hit directly. The middleware itself is best-effort: a missing header AND Redis miss leavesbrand_id = None. Brand-scoped route handlers are not best-effort; they callbrand_required_envelope()and refuse the operation whenbrand_idis missing. Two T8-D1 hardenings: (a) the middleware skips/health,/healthz,/docs,/openapi.json, and/redocso a tight liveness probe schedule does not amplify a Redis brownout into route latency;/auth/loginis intentionally NOT skipped because login may legitimately need a brand mapping; (b) the previously-silentexcept ExceptionRedis path now emitsbrand_resolution_failed_total{service="admin_service",reason="redis_error"}so a Redis incident is alarmable instead of degrading silently.
Security
Operator identity (P1-3)
require_operator_id (app/api/deps.py) is the canonical resolver for the operator id captured on every audit row. It enforces:
- JWT cross-check. Reads the
admin_idclaim from the verified admin JWT. IfX-Operator-Idis also present and disagrees, the request is hard-rejected with 403 andadmin_operator_id_mismatch_total{reason="header_vs_jwt"}is incremented. Operator identity is a security boundary, not a tenancy concern — there is no observe mode. - JWT fallback. If
X-Operator-Idis absent, the JWTadmin_idis used. The header is deprecated; new callers should not send it. - Legacy token rejection. If the JWT lacks an
admin_idclaim and a header is supplied, the request is rejected 403 withadmin_operator_id_mismatch_total{reason="jwt_missing_claim"}. Such tokens cannot attribute writes to a real operator. - Last-resort 400. If neither header nor JWT carries an operator id, the request is rejected 400 — no audit row could be attributed.
Money-write audit guarantees (P1-4)
Every admin endpoint that mutates a player or agent balance (directly or via wallet_service) writes an admin_audit row after the wallet/state mutation returns successfully. Routes covered:
player.py:POST /players/{id}/adjust→ADJUST_BALANCEplayer_finance.py:POST /players/deposits/{id}/agree→APPROVE_DEPOSIT;POST /players/deposits/{id}/decline→DECLINE_DEPOSIT;POST /players/deposits/{id}/recycle→RECYCLE_DEPOSIT;POST /players/deposits/{id}/coin-agreeandPOST /player/deposit/coin/agree→APPROVE_COIN_DEPOSIT;POST /players/deposits/shooter-agree→APPROVE_DEPOSIT(per-item);POST /players/withdrawals/{id}/review-agree→REVIEW_AGREE_WITHDRAW;POST /players/withdrawals/{id}/review-decline→REVIEW_DECLINE_WITHDRAW;POST /players/withdrawals/{id}/pay-agree→PAY_AGREE_WITHDRAW;POST /players/withdrawals/{id}/pay-decline→PAY_DECLINE_WITHDRAWagent_finance.py:POST /agents/withdrawals/{id}/agree→AGREE_AGENT_WITHDRAW;POST /agents/withdrawals/{id}/decline→DECLINE_AGENT_WITHDRAW;POST /agents/withdrawals/{id}/pay→PAY_AGENT_WITHDRAWagent.py(T4-D-I3):POST /agents/{id}/reset-password→RESET_AGENT_PASSWORD;POST /agents/{id}/lock→LOCK_AGENT;POST /agents/{id}/unlock→UNLOCK_AGENT;POST /agents/{id}/update-commission→UPDATE_AGENT_COMMISSION;POST /agents/{id}/adjust-balance→ADJUST_AGENT_BALANCEplayer.py(T4-D-I3):POST /players/{id}/edit-password→EDIT_PLAYER_PASSWORD;POST /players/{id}/edit-phone→EDIT_PLAYER_PHONE
Audit rows capture operator_id (from the resolver above), target_type (player/agent), target_id, brand_id (looked up from the affected row), before/after deltas (status, amount, transaction_id only — never full rows), reason (from the request body when present), and the request X-Request-Id. The shared writer is app/services/admin_audit.py::write_money_admin_audit.
If the wallet_service call fails, the audit row is not written — the action did not happen. If the audit insert fails after the money write committed (a partition or DB outage), the route still returns the wallet response (the money write cannot be rolled back) but logs at ERROR level and increments admin_audit_write_failed_total{route=<route>}. On-call must reconcile the audit gap manually.
Sensitive payload handling. Password routes capture {"changed": true} only — never the plaintext password, hash, or salt. The phone route captures sha256-first-8 hashes for old + new values so reviewers can detect "did the number actually change" without storing PII in the audit table.
Legacy direct write. adjust_agent_balance still mutates agent.balance directly (bypassing wallet_service) and increments agent_balance_legacy_write_total{change_type} on every call. The counter must hit zero before the route can be migrated to wallet_service.
Admin WebSocket auth (T4-D-I7)
POST /api/v1/ws (the back-office real-time notifications channel) accepts the admin JWT via:
- Preferred —
Sec-WebSocket-Protocolsubprotocol. Browser usage:new WebSocket(url, ["bearer", "<jwt>"]). The server echoesSec-WebSocket-Protocol: beareron accept. Tokens delivered this way do not appear in nginx / ALB access logs, browser history, proxy logs, orRefererheaders. - Legacy —
?token=<jwt>query string. Allowed only whenadmin_ws_allow_query_token=TrueAND the runtime env is non-production. Production-runtime requests carrying a query-string token are closed with code 4003. Each fallback emits a loud WARN log +admin_ws_legacy_query_token_totalso cutover progress is visible. Once the counter is silent, setadmin_ws_allow_query_token=Falseto harden production permanently.
Additional gates:
- Per-IP rate limit.
admin_ws_rate_limit_per_min(default20) capsaccept()calls per IP per 60s window using a Redis counter. Excess connections are closed with policy-violation (1008) andadmin_ws_rate_limited_total{reason="ip"}is emitted. When Redis is unavailable, prod-runtime fails closed (reason="redis_outage"); dev fails open. - Lifetime bound to JWT exp. A timer scheduled at accept time closes the socket with policy-violation (1008) when the token's
expclaim passes — a long-lived connection cannot outlive its auth.
Tests
cd servers_v2/admin_service && uv run pytest- key suites:
tests/test_admin_integration.pytests/test_client_urls.py