Skip to main content

Phase 4E Dual-Read Fallback Sunset

Status

done — delivered 2026-05-06

Date

2026-05-05 (proposed) / 2026-05-06 (delivered)

Delivered

We are pre-deployment. There is no production session backlog to drain, so the soak-window criteria below are not load-bearing — the cleanup landed directly:

  1. legacy_session_key() deleted from servers_v2/player_service/app/services/cache_keys.py.
  2. read_session_id() body collapsed to a single await redis.get(session_key(brand_id, account)).
  3. legacy_session_key import dropped from servers_v2/player_service/app/api/routes/auth.py.
  4. Gateway middleware (servers_v2/gateway/app/middleware/auth.py) removed _session_cache_key, _record_legacy_session_key_used, _is_legacy_session_key_disabled, and the GATEWAY_SESSION_LEGACY_KEY_DISABLED operator flag. The middleware now reads only the brand-prefixed key; missing brand context → AUTH_REQUIRED.
  5. Tests updated: servers_v2/player_service/tests/test_brand_cache_keys.py and servers_v2/gateway/tests/test_brand_session_lookup.py no longer assert against the legacy key path.

The gateway_jwt_session_missing_total counter is retained — it distinguishes JWT-valid-but-session-gone from JWT-bad and remains useful post-cleanup.

The historical sunset criteria (Redis scan, 30-day soak, etc.) are preserved below for the production deployment runbook but are not gating in the dev phase.

Owners

  • Platform Backend
  • Player Domain

Background

Phase 4E of the multi-brand isolation work introduced brand-prefixed Redis keys for player session lookup. Writers always use the new user-session:brand:{brand_id}:{account} form, while read_session_id falls back to the legacy user-session-{account} key for pre-Phase-4E sessions still pinned in Redis.

The fallback was always meant to be temporary. The current code carries two TODO markers without a concrete sunset date:

  • legacy_session_key()TODO(phase-4E-rollout): remove this and the dual-read fallback
  • read_session_id()TODO(phase-4E-rollout): remove the legacy fallback once the rollout window closes

Leaving the fallback in place indefinitely costs one Redis round-trip per session check on cache miss and keeps a code path alive that has no production justification once all legacy sessions have expired.

Goal

Remove legacy_session_key() and the dual-read branch in read_session_id() once we are confident no production session relies on the legacy key shape.

Sunset Criteria

All of the following must be true before the cleanup PR lands:

  1. The Phase 4E enforcement rollout has been at enforce mode in production for at least 30 days.
  2. The longest-lived player session TTL (the JWT_EXPIRE_MIN envelope plus any "remember me" extension) has fully elapsed since enforcement.
  3. A production Redis scan confirms zero keys matching user-session-* (legacy form) for at least 7 consecutive days. The scan should be added to the multi-brand rollout dashboard as a counter.
  4. No read_session_id legacy-branch hits have been recorded in observability for 7 consecutive days. This requires adding a counter (see "Pre-Sunset Instrumentation" below).

Pre-Sunset Instrumentation

Before scheduling the cleanup, add a counter to read_session_id that distinguishes the two read paths:

new_hit = await redis.get(new_key)
if new_hit is not None:
metrics.session_read_total.labels(path="brand_scoped").inc()
return new_hit
legacy_hit = await redis.get(legacy_key)
if legacy_hit is not None:
metrics.session_read_total.labels(path="legacy_fallback").inc()
return legacy_hit

This is a separate, smaller PR landed before the sunset window opens.

Scope of the Cleanup PR

When the criteria are met:

  1. Delete legacy_session_key() from servers_v2/player_service/app/services/cache_keys.py.
  2. Replace read_session_id body with a single await redis.get(session_key(brand_id, account)).
  3. Remove the import of legacy_session_key from servers_v2/player_service/app/api/routes/auth.py:24.
  4. Remove the path="legacy_fallback" counter (it is now unreachable).
  5. Remove this plan and the related TODO references.

Acceptance

  • rg "legacy_session_key|legacy_fallback" in the repository returns zero matches outside of historical worklogs.
  • Player session integration tests still pass.

Out of Scope

  • Other Phase 4E brand-prefixed key families (sms_captcha_key, login_wrong_key, register_sms_key) had no legacy pre-brand form because their callers always re-issued from scratch on Phase 4E rollout. They have no fallback to remove.

References

  • servers_v2/player_service/app/services/cache_keys.py
  • docs/runbooks/multi-brand/multi-brand-isolation-rollout.md
  • docs/adr/ADR-009-multi-brand-domain-routed-isolation.md