본문으로 건너뛰기

Plan: Bring Agent Balance Under the Single Money Writer

Status

proposed

Date

2026-05-05

Owners

  • Platform Backend
  • Wallet Domain
  • Agent Domain

Problem

servers_v2/CLAUDE.md declares wallet_service as the 唯一资金写入者 (Single Money Writer) — every change to player money must flow through the wallet topology bucket model. Agent money does not honor that constraint today.

Concrete examples of agent-side money writes that bypass wallet_service:

  • servers_v2/agent_service/app/api/routes/withdraw.py — direct UPDATE agent SET balance = ... after a SELECT ... FOR UPDATE on the agent table. Idempotency is now Redis-side (see the recent withdraw idempotency change), but the write itself is still local.
  • Other agent-side balance mutations live next to commission/recon flows in agent_service and admin_service and are similarly outside the wallet domain.

Consequences:

  1. Audit/recon gaps. recon_service consumes wallet events; agent balance changes never appear in wallet:events, so cross-checking an agent ledger requires a second, parallel pipeline.
  2. Two truths for "balance." Player wallet topology is the policy surface for limits, holds, fail-closed semantics. Agent balance is a single column with no holds, no buckets, and no per-domain policies. Any future feature (agent rolling, agent freeze) has to re-implement what the topology already provides.
  3. Multi-writer risk. If agent commission accrual ever needs to land in the same balance the withdraw handler is locking, the FOR UPDATE inside agent_service must coordinate with whatever other service writes it. Today there is no such other writer; the moment there is, we have a deadlock or a torn write.

Goal

Treat the agent balance as a wallet topology bucket owned by wallet_service. All reads and writes go through wallet HTTP endpoints + outbox events; agent_service retains only orchestration (JWT, request shaping, business rules), not money state.

Non-Goals

  • Changing how agent commissions are computed. The calculation logic stays in agent_service / admin_service; only the persistence and emission of money deltas moves.
  • Replacing the agent table column outright in one shot. The plan uses dual-write + cutover, not big-bang.

Phased Plan

Phase A — Topology & Contract

  1. Add an agent topology bucket family to wallet topology. Schema change in servers_v2/shared/rgb_db/migrations/versions/. The bucket key is (brand_id, agent_id).
  2. Define wallet-domain commands:
    • POST /internal/wallet/agent/credit (request_id, agent_id, amount, reason, metadata)
    • POST /internal/wallet/agent/debit (same shape)
    • GET /internal/wallet/agent/{agent_id}/balance
  3. Add the AgentBalanceChanged event to wallet:events per the contracts in servers_v2/shared/contracts/.

Phase B — Dual-Write Window

  1. agent_service continues to update agent.balance directly and issues the matching wallet command in the same DB transaction (via outbox row, not synchronous HTTP).
  2. recon_service consumes the new event and verifies the agent.balance column matches the wallet bucket sum at fixed intervals. Mismatches alert; they do not fail closed yet.
  3. Land the dual-write behind RGB_AGENT_BALANCE_DUAL_WRITE enforcement modes — off / observe / enforce, mirroring the multi-brand rollout pattern from ADR-009.

Phase C — Authoritative Cutover

  1. After ≥30 days of enforce with zero recon mismatches: flip the read path. agent_service reads balance from wallet, not from the agent row.
  2. The agent.balance column becomes a denormalized cache, written only by a wallet→agent backfill consumer for legacy callers that still SELECT it. Tracked separately like the player legacy balance columns are tracked in servers_v2/CLAUDE.md data-ownership table.

Phase D — Column Removal

  1. Drop the legacy agent.balance column once no service reads it.
  2. Remove the dual-write code from agent_service.

Risks

  • Wallet bucket count explosion. Adding agent buckets ~doubles the number of bucket rows. Confirm topology read paths stay under their performance budget before Phase A merges.
  • Brand-scoping invariants. Agent buckets must be brand-scoped from day one — agents in brand X must not see brand Y agent balance. Aligns with the multi-brand event scoping plan (docs/plans/multi-brand/2026-05-05-event-brand-scoping.md).
  • Commission timing. Agent commission landings happen on a scheduler today. Switching to wallet-domain writes means the scheduler now produces outbox rows; idempotency keys must be stable across reruns (use (commission_period, agent_id) as the key).

Acceptance

  • All agent balance mutations originate from wallet_service writes.
  • recon_service exposes a counter for agent_balance_recon_mismatch that is at zero in production for 30+ consecutive days.
  • rg "UPDATE agent SET balance" in the repo returns matches only in wallet_service (the topology compatibility writer) or in legacy archives.

References

  • servers_v2/CLAUDE.md — Single Money Writer principle
  • docs/adr/ADR-005-wallet-topology-bucket-ledger-model.md
  • docs/adr/ADR-009-multi-brand-domain-routed-isolation.md
  • servers_v2/agent_service/app/api/routes/withdraw.py