Phase 4E Dual-Read Fallback Sunset
Status
done — delivered 2026-05-06
Date
2026-05-05 (proposed) / 2026-05-06 (delivered)
Delivered
We are pre-deployment. There is no production session backlog to drain, so the soak-window criteria below are not load-bearing — the cleanup landed directly:
legacy_session_key()deleted fromservers_v2/player_service/app/services/cache_keys.py.read_session_id()body collapsed to a singleawait redis.get(session_key(brand_id, account)).legacy_session_keyimport dropped fromservers_v2/player_service/app/api/routes/auth.py.- Gateway middleware (
servers_v2/gateway/app/middleware/auth.py) removed_session_cache_key,_record_legacy_session_key_used,_is_legacy_session_key_disabled, and theGATEWAY_SESSION_LEGACY_KEY_DISABLEDoperator flag. The middleware now reads only the brand-prefixed key; missing brand context → AUTH_REQUIRED. - Tests updated:
servers_v2/player_service/tests/test_brand_cache_keys.pyandservers_v2/gateway/tests/test_brand_session_lookup.pyno longer assert against the legacy key path.
The gateway_jwt_session_missing_total counter is retained — it
distinguishes JWT-valid-but-session-gone from JWT-bad and remains
useful post-cleanup.
The historical sunset criteria (Redis scan, 30-day soak, etc.) are preserved below for the production deployment runbook but are not gating in the dev phase.
Owners
- Platform Backend
- Player Domain
Background
Phase 4E of the multi-brand isolation work introduced brand-prefixed Redis
keys for player session lookup. Writers always use the new
user-session:brand:{brand_id}:{account} form, while
read_session_id
falls back to the legacy user-session-{account} key for
pre-Phase-4E sessions still pinned in Redis.
The fallback was always meant to be temporary. The current code carries two TODO markers without a concrete sunset date:
legacy_session_key()—TODO(phase-4E-rollout): remove this and the dual-read fallbackread_session_id()—TODO(phase-4E-rollout): remove the legacy fallback once the rollout window closes
Leaving the fallback in place indefinitely costs one Redis round-trip per session check on cache miss and keeps a code path alive that has no production justification once all legacy sessions have expired.
Goal
Remove legacy_session_key() and the dual-read branch in
read_session_id() once we are confident no production session relies on
the legacy key shape.
Sunset Criteria
All of the following must be true before the cleanup PR lands:
- The Phase 4E enforcement rollout has been at
enforcemode in production for at least 30 days. - The longest-lived player session TTL (the
JWT_EXPIRE_MINenvelope plus any "remember me" extension) has fully elapsed since enforcement. - A production Redis scan confirms zero keys matching
user-session-*(legacy form) for at least 7 consecutive days. The scan should be added to the multi-brand rollout dashboard as a counter. - No
read_session_idlegacy-branch hits have been recorded in observability for 7 consecutive days. This requires adding a counter (see "Pre-Sunset Instrumentation" below).
Pre-Sunset Instrumentation
Before scheduling the cleanup, add a counter to read_session_id that
distinguishes the two read paths:
new_hit = await redis.get(new_key)
if new_hit is not None:
metrics.session_read_total.labels(path="brand_scoped").inc()
return new_hit
legacy_hit = await redis.get(legacy_key)
if legacy_hit is not None:
metrics.session_read_total.labels(path="legacy_fallback").inc()
return legacy_hit
This is a separate, smaller PR landed before the sunset window opens.
Scope of the Cleanup PR
When the criteria are met:
- Delete
legacy_session_key()fromservers_v2/player_service/app/services/cache_keys.py. - Replace
read_session_idbody with a singleawait redis.get(session_key(brand_id, account)). - Remove the import of
legacy_session_keyfromservers_v2/player_service/app/api/routes/auth.py:24. - Remove the
path="legacy_fallback"counter (it is now unreachable). - Remove this plan and the related TODO references.
Acceptance
rg "legacy_session_key|legacy_fallback"in the repository returns zero matches outside of historical worklogs.- Player session integration tests still pass.
Out of Scope
- Other Phase 4E brand-prefixed key families
(
sms_captcha_key,login_wrong_key,register_sms_key) had no legacy pre-brand form because their callers always re-issued from scratch on Phase 4E rollout. They have no fallback to remove.
References
servers_v2/player_service/app/services/cache_keys.pydocs/runbooks/multi-brand/multi-brand-isolation-rollout.mddocs/adr/ADR-009-multi-brand-domain-routed-isolation.md