跳到主要内容

Legacy middle_server Retirement Runbook

Status

Planning stage — not yet executable. Every row in §Scope below still has Owner: TBD and Migration: TBD. Operators MUST NOT treat this as a step-by-step cut-over guide until each row is fully assigned and a Rollback section is filled in (see §Rollback below, which is currently a stub). This document captures the scope of the retirement and the per-prefix design discussion; the execution runbook for each prefix will land as a sibling document under docs/runbooks/legacy-middle-server-retirement/ once owners pick up individual prefixes.

Last Verified Commit

(see git log -- docs/runbooks/legacy-middle-server-retirement.md)

Goal

Track the 7 remaining middle_server route prefixes that block full retirement of the legacy servers/ tree. Each prefix entry below documents the migration target, dependencies, expected effort, and the cut-over plan.

This runbook is the single source of truth for retiring the legacy back-office sidecar surface. docs/architecture/migration-readiness.md points here for that scope; the in-code 410 stubs in servers_v2/admin_service/app/api/routes/legacy_sunset.py link here via their Link: rel="sunset" header.

Scope

PrefixTarget serviceFrontend callerMigration ownerEffortStatus
/api/admin/role/*admin_service (new module)bo/admin UITBD1-2 weeksnot started
/api/admin/menu/*admin_service (new module)bo/admin UITBD1 weeknot started
/api/admin/config/*admin_service (extend existing system.py / brand_config.py)bo/admin UITBD1 weeknot started
/api/admin/i18n/*admin_service (new module)bo/admin UITBD1-2 weeksnot started
/api/admin/bi/*new bi_service or admin_service extensionbo/admin UITBD2+ weeksnot started
/coin/*wallet_service or new coin_serviceTBD (admin + player)TBD2 weeksnot started
POST /{path:path} catch-allenumerate first, then explicit routesTBDTBD2-3 weeksnot started

Spot-check the rows above against servers_v2/admin_service/app/api/routes/ when this runbook is reopened: any prefix that has gained a partial implementation should flip its Status column to partial and call out which subpaths are migrated. As of 3f36e7a8 none of the seven prefixes have an admin_service implementation — the only related code paths are /api/v1/player/deposit/coin/* (player_finance.py, NOT under the root /coin/* prefix) and brand-scoped config under /api/v1/brands/{brand_id}/config/* (NOT under /api/admin/config/*). Those are explicitly out of scope for the retirement entries below.

Per-prefix detail

/api/admin/role/*

Legacy: CRUD for the staff RBAC role table (role name, permission bit mask, allowed menu entries). Backs the "Roles" tab in bo/admin.

Data lives in the legacy staff_role and staff_role_permission tables. The staff table itself was deleted by ADR-009 in favour of the operator-id-from-JWT pattern, but the role / permission catalog was not deleted in the same wave because bo/admin still renders permission checkboxes from it.

v2 target: new app/api/routes/role.py in admin_service, mounted under /api/v1/roles. Authoritative permission resolution moves into the JWT mint path in app/services/auth.py (the role claim is already there — what is missing is a write surface to manage the catalog). Tests: route parity + a contract test that locks the permission-bit shape against bo/admin.

Gate: bo/admin must be ready to call the new path before legacy delete; cut-over is a coordinated front+back deploy.

/api/admin/menu/*

Legacy: stores the bo/admin left-nav menu tree (menu_id, parent_id, icon, path, required_permission_bit). Hot data — operators rearrange menus during onboarding for a new brand.

Coupled to /api/admin/role/* (each menu entry references a permission bit). Migrate together; the same admin_service module that owns role CRUD should own menu CRUD.

v2 target: app/api/routes/menu.py next to role.py. Storage stays in the same PostgreSQL instance under a renamed brand-scoped table (brand_menu_entry) so each brand can carry its own nav.

Gate: same as /api/admin/role/* — frontend redeploy required.

/api/admin/config/*

Legacy: free-form key/value config the bo/admin "System Settings" tab writes to. Overlaps two real things:

  • per-brand operational policy (already migrated to brand_config under /api/v1/brands/{brand_id}/config)
  • platform-global knobs (maintenance window, feature kill switches), which admin_service exposes under /api/v1/global-vars and /api/v1/settings (super_admin only).

Migration is therefore mostly a frontend re-point. Enumerate every key bo/admin reads under /api/admin/config/*, classify each as brand-scoped or platform-global, and route it to the existing admin_service surface. Net new admin_service code should be minimal.

Effort estimate is "1 week" only because the enumeration step is boring and error-prone, not because the implementation is heavy.

Gate: brand vs platform classification must be reviewed by multi-brand leads before redirecting keys.

/api/admin/i18n/*

Legacy: server-side translation strings for the bo/admin UI and the player-facing notice templates. Stored in i18n_string (key, locale, text) with admin-side CRUD.

This is the only retirement item that has a real platform-wide publish channel: changes are picked up by the player web bundle on the next reload. Cut-over must therefore keep the read path live for existing player traffic — design the new endpoint to serve the same JSON shape the player bundle currently consumes from middle_server.

v2 target: new app/api/routes/i18n.py with read + write under /api/v1/i18n. Player-side reads should move to a fast Redis-cached endpoint on player_service to remove the cross-service dependency.

Gate: player bundle compatibility test; verify the JSON shape parses in the existing web/blue translation loader.

/api/admin/bi/*

Legacy: aggregated BI queries (player counts, GGR by date range, funnel breakdown). Most queries hit admin_stat_day, agent_stat_day, and the player ledger.

Largest item by effort: needs an aggregator design (materialised views? on-demand SQL? a thin BI service that calls admin_service read-only?) before implementation. Defer until the read-replica-or-warehouse direction is decided.

v2 target: candidate names bi_service or admin_service extension. Don't pick until the aggregator design lands; the current stats.py in admin_service already handles real-time counts and we should not expand that file to BI before deciding the boundary.

Gate: design doc; performance test against production data volumes.

/coin/*

Legacy: crypto deposit / withdrawal flows specific to the legacy coin balance bucket. Most coin functionality already exists in v2 (Plisio callback in wallet_service, coin-deposit approval in admin_service.player_finance under /api/v1/player/deposit/coin/*), but the root /coin/* surface is a separate caller contract used by older player frontends.

v2 target: either fold the remaining endpoints into wallet_service under /internal/wallet/coin/* (preferred — keeps the single-money- writer invariant) or carve out a coin_service. The former is the default unless a concrete reason emerges.

Gate: enumerate every player-side caller of /coin/*; some of those calls cross into web_server legacy territory and must be migrated along with the player gateway cut-over.

POST /{path:path} catch-all

Legacy: middle_server had a generic catch-all that forwarded any unmatched POST to an internal service based on a path-prefix table. It is by far the riskiest of the seven — implicit routing is the hardest thing to retire safely.

Plan:

  1. Run a 30-day request log capture against the legacy stack to enumerate every concrete path that hits the catch-all. Existing nginx access logs are the cheapest source.
  2. Classify each into "already migrated to an explicit v2 route" vs "still needs a target".
  3. For the still-needs-a-target set, design explicit routes; do not re-introduce path-based forwarding under any circumstances.
  4. After all explicit paths land, delete the legacy catch-all.

A legacy-catchall 410 stub is exposed at /api/admin/legacy-catchall/{path:path} so frontends that still expect a write-able catch-all surface fail loudly during the enumeration phase.

Gate: cannot delete the catch-all until every explicit path is migrated AND bo/admin + player web bundle confirm zero remaining calls in a staging-mirror traffic replay.

Cut-over Sequence

The migration must respect frontend deploy cadence. Recommended order:

  1. /api/admin/config/* — lowest risk, used only at admin login and in the System Settings tab. Most subpaths can be redirected to the existing brand_config / global-vars surfaces with no new backend code.
  2. /api/admin/i18n/* — read-mostly. The write surface has few operators; the read surface needs a compatibility shape test against the player bundle.
  3. /api/admin/role/* and /api/admin/menu/* — coupled (menus reference role permission bits). Migrate together. Front+back coordinated deploy.
  4. /coin/* — money-adjacent. Requires wallet_service coordination and an enumeration of player-side callers.
  5. /api/admin/bi/* — heaviest. Defer until the aggregator direction is decided. Do not block earlier prefixes on this one.
  6. POST /{path:path} catch-all — must be last. Cannot be retired until every explicit prefix above is migrated and a traffic replay confirms no other callers.

Stub routes

servers_v2/admin_service/app/api/routes/legacy_sunset.py returns 410 Gone with Sunset (RFC 8594) and Deprecation headers for every prefix above plus a /api/admin/legacy-catchall/{path} stub for the catch-all enumeration phase.

Why a stub instead of letting FastAPI 404:

  • 404 is silent — a forgotten caller looks like a deployment glitch, not an intentional retirement.
  • 410 + Sunset is the IETF-blessed shape for "this URL is intentionally and permanently dead". Browsers, CDNs, and integration tools surface the deprecation metadata directly.
  • The stub also requires the same admin JWT every other route requires (Depends(get_current_admin)). It must NOT become an unauthenticated info-disclosure channel.

Guard test: servers_v2/admin_service/tests/test_legacy_sunset.py locks the 410 + Sunset contract and asserts unauthenticated callers get 401, not 410.

When a real implementation lands for one of the prefixes:

  1. Remove the prefix entry from _RETIRED_PREFIXES in legacy_sunset.py in the same commit that adds the new route — otherwise callers see a 410 → 200 oscillation.
  2. Update the corresponding row in this runbook from not started to done and link the new route.
  3. Update the parametrize list in test_legacy_sunset.py.

Ownership Follow-ups

Every row in the scope table currently lists TBD as Migration owner. Before any of these migrations starts, leads must assign named owners — the platform-backend group is the default fallback but the work crosses into payments (coin), data (BI), and frontend (role/menu/i18n) territory and each should sign off explicitly.

Rollback

Each per-prefix migration is independent and rolls back to "no admin_service implementation, legacy_sunset stub returns 410". The generic recipe (to be specialised per prefix as it's adopted):

  1. Revert the per-prefix admin_service router commit + matching client-side migration in bo/admin to keep calling the legacy middle_server host.
  2. Roll back the matching legacy_sunset.py 410-stub entry so callers don't hit Sunset: headers after the revert.
  3. Re-run the test_legacy_sunset.py parametrize list to confirm the prefix is excluded from the sunset surface.

Each individual per-prefix runbook (filed under docs/runbooks/legacy-middle-server-retirement/<prefix>.md once owners adopt it) must include:

  • A concrete git revert SHA pointer once the migration commit lands.
  • The exact bo/admin config flag (or build) to flip back to the legacy host.
  • DB-level rollback (none expected — these are routing migrations, not schema migrations — but each runbook should explicitly state "no DB changes" or document them).
  • Verification step: which smoke test confirms the revert worked.

Until the per-prefix runbooks land, treat this section as the template for what they must contain, not as an executable recipe.