Ruby Wallet Topology Rollout Runbook
Status
Approved
Last Approved: 2026-05-15 (HEAD=3f36e7a8)
Date
2026-04-23
Owners
- Platform Backend
- Wallet Domain
Affected Services
wallet_servicewallet_workerrolling_servicerolling_workerpromotion_servicepromotion_workergame_servicegatewayadmin_service
Related Docs
docs/adr/ADR-005-wallet-topology-bucket-ledger-model.mddocs/specs/wallet/2026-04-23-ruby-wallet-split-structure.mddocs/plans/wallet/2026-04-23-ruby-wallet-split-implementation.mddocs/runbooks/local-docker/local-docker-full-stack.md
Goal
Validate and roll out the topology-driven wallet model safely after the implementation is complete.
This runbook is not approved for production use until the related spec, implementation plan, tests, and migrations are complete.
Preconditions
- ADR-005 remains accepted.
- Wallet spec status is
Approved. - Implementation plan done definition is satisfied.
- Database migrations are reviewed and reversible for non-production environments.
- Fresh local or staging data exists, or a signed data migration plan exists.
- No service writes legacy flat wallet fields as canonical money state.
wallet_serviceowns all wallet bucket, coupon grant, policy, authorization, transfer, and ledger writes.- Worker health checks reflect forward progress according to ADR-003.
- Promotion settlement failures surface according to ADR-004.
- Alerting exists for wallet ledger failures, idempotency mismatches, missing bet authorizations, negative-balance prevention, and rolling creation failures after wallet credit.
Local Docker Validation
- Start the full local stack using the local Docker runbook.
- Apply wallet topology migrations.
- Load exactly one topology and policy seed for the target product mode:
RUBY_SPLIT_V1for sports/casino separated normal and bonus wallets, orRUBY_UNIFIED_V1for one common normal and bonus wallet across all provider types. - Confirm
wallet_service,rolling_service,promotion_service,game_service,gateway, andadmin_servicehealth is green. - Confirm wallet workers report forward-progress readiness.
- Run unit and contract tests for:
- wallet topology validation
- wallet policy engine
- wallet API commands
- rolling attribution
- promotion points credit
- coupon grant saga
- gateway split-balance routes
- admin topology and policy validation
- Run local end-to-end flows:
- sports normal deposit, sports bet, settlement, rolling, withdrawable
- casino normal deposit, live bet, settlement, rolling, withdrawable
- slots bet using casino policy
- sports bonus and casino bonus active at the same time
- sports-only coupon accepted for sports and rejected for live/slots
- casino-only coupon accepted for live/slots and rejected for sports
- provider-only coupon accepted only for configured provider IDs
- points credit, points transfer to every configured normal-wallet target, generated rolling
- split topology only: sports normal to casino normal transfer with rolling inheritance
- split topology only: casino normal to sports normal transfer with rolling inheritance
- unified topology only: legacy normal-wallet transfer aliases return a no-op response because there is no cross-normal transfer to perform
- withdrawable-funded bet under every configured withdrawable policy
- bet rollback restores exact authorization funding breakdown
- Confirm every money movement has immutable
wallet_ledgerrows. - Confirm every settled or rolled-back bet has a stored
wallet_bet_authorization. - Confirm policy and topology snapshots exist on transactional records.
Staging Validation
- Deploy services and migrations to staging.
- Activate the selected topology only after seed validation succeeds.
- Run the same Docker validation flows against staging.
- Compare wallet bucket totals with ledger-derived totals.
- Confirm admin policy CRUD creates immutable policy versions.
- Confirm topology activation is blocked when required policy or bucket definitions are missing.
- Confirm game provider callbacks cannot override server-side funding policy.
- Confirm operational dashboards show wallet command success/failure rates, ledger latency, coupon rejection reasons, transfer rejection reasons, and rolling creation failures.
Release Gate
Do not enable the new wallet model in any shared or production-like environment if any of the following are true:
- wallet topology or policy tests are missing or failing
- local Docker end-to-end flows are failing
- staging ledger totals do not match bucket totals
- any money movement bypasses
wallet_service - any settlement or rollback path recomputes funding from current balances
- coupon grants are collapsed into a generic coupon balance
- points can be bet or withdrawn directly
- topology activation can leave active balances or records unreachable
- alerting for critical wallet failures is missing
Rollback Trigger
Pause rollout or roll back immediately if:
- ledger writes fail or are delayed beyond the release threshold
- bucket totals diverge from ledger-derived totals
- accepted bets cannot be settled because authorization records are missing
- rollback restores a different source than the original authorization
- topology or policy activation changes current-balance rows unexpectedly
- provider callbacks start failing because wallet policy cannot be resolved
- workers remain healthy while wallet outbox or rolling creation is stalled
Rollback Steps
For non-production environments:
- Stop provider-facing traffic to the new wallet command paths.
- Stop workers that consume wallet events.
- Export wallet topology, policy, bucket, authorization, transfer, coupon grant, and ledger tables for investigation.
- Deactivate the candidate topology if no accepted bets or unresolved wallet records depend on it.
- Restore from the pre-rollout database snapshot if destructive schema or seed changes were applied.
- Restart services and workers.
- Re-run health checks and local/staging smoke checks.
Production rollback procedure
Production rollback is incident-commander gated and tiered: pick the lowest tier that resolves the incident. Tiers 0–2 are non-destructive and preferred; Tier 3 (snapshot restore) is money-lossy and is the last resort. Money is frozen before any state change in every tier.
Step 0 — Freeze money (all tiers). Apply the narrowest containment
lever from incident-response.md §3 (platform
PUT /api/v1/maintenance, or DEPOSIT_MAINTENANCE_STATUS via
PUT /api/v1/global-vars/{key}, or per-brand
feature_policy/payment_policy). Pause provider ingress at the gateway
so no new /v2/bets/authorize lands. Record the freeze timestamp.
Step 1 — Drain, don't drop, in-flight money events. Do not kill
wallet_worker immediately. Wait until wallet_outbox has no rows with
published_at IS NULL (query it) so no settled-bet / deposit event is
lost, then stop the wallet-event consumers (rolling_service,
promotion_service workers). The outbox + inbox idempotency design makes
a later resume replay-safe; a dropped unpublished row does not.
Tier 1 — Code rollback (bad commit < 24h, no schema/seed change). This is the common case. No data rollback needed — the ledger is authoritative and idempotency keys make reprocessing safe.
git revert <bad_sha> && git push origin main(repo builds from source — see incident-response.md Step 4).- Redeploy affected services + workers from the reverted commit.
- Go to Step 4 (Verify).
Tier 2 — Topology/policy activation rollback (no destructive schema).
Only the active wallet_topology / wallet_policy pointer changed.
- Export
wallet_topology, wallet_policy, wallet_bucket, wallet_bet_authorization, wallet_transfer, wallet_coupon_grant, wallet_ledgerfor forensics. - Re-activate the previous topology only if no accepted bet or
unresolved authorization references the candidate topology (check
wallet_bet_authorizationforstatus='AUTHORIZED'rows on the candidate topology — if any exist, resolve/settle/rollback them first; activating away from a topology with live authorizations strands those stakes). - Confirm reconciliation is clean after re-activation: force a cycle
(or wait one hourly tick) and verify
wallet_ledger_reconciliation_drift_buckets == 0— thewallet_service/app/tasks/ledger_reconciliation.pycheck joins buckets and coupon grants to their ledger tip on the full(brand_id, player_id, topology_code, bucket_type_code)key, so a topology re-point that strands or mismatches any balance shows up here. - Go to Step 4 (Verify).
Tier 3 — Snapshot restore (LAST RESORT — money-lossy). Only if destructive schema/seed changes were applied and Tiers 1–2 cannot recover. Requires incident-commander + finance sign-off because every money mutation after the snapshot timestamp is lost and must be reconciled manually against provider records.
- Confirm a verified pre-rollout snapshot exists and its timestamp.
- Capture a current full dump first (post-incident forensics + manual replay source).
- Restore the snapshot.
- Manually reconcile provider settlements and deposits that occurred between the snapshot and the freeze against provider statements before unfreezing.
- Go to Step 4 (Verify).
Step 4 — Verify before all-clear (all tiers).
- Restart services + workers;
GET /healthgreen for every service andwallet_worker/player_workerheartbeats fresh. - Run the Smoke Checks below.
- Force a reconciliation cycle (or wait one hourly tick) and confirm
wallet_ledger_reconciliation_drift_buckets == 0(this single gauge covers both the bucket and coupon-grant segments). - Execute one real deposit → bet authorize → settle → withdraw round-trip on a canary account; confirm balances and ledger tip agree.
- Only then reverse Step 0 (lift maintenance/flags, restore provider ingress) and declare all-clear.
Never auto-correct a reconciliation drift as part of rollback — investigate root cause first. Auto-correcting an unexplained drift converts a detectable accounting bug into silent money loss.
Smoke Checks
GET /healthsucceeds for every affected service.- wallet topology read endpoint returns the active topology and version.
- wallet policy read endpoint returns the active policy versions.
- player wallet snapshot includes topology code and version.
- a small sports test bet authorizes, settles, and produces ledger rows.
- a small casino test bet authorizes, settles, and produces ledger rows.
- duplicate request IDs are idempotent with matching payloads.
- duplicate request IDs reject payload mismatches.