Skip to main content

ADR-010: agent.balance / agent.coupon_balance Single Money Writer Migration

Status

Proposed

Date

2026-05-15

Last Verified Commit

3f36e7a8

Owners

  • Platform Backend
  • Agent Domain
  • Wallet Domain

Affected Services

  • admin_service
  • agent_service
  • wallet_service
  • servers_v2/CLAUDE.md ("数据 ownership 未隔离")
  • docs/architecture/data-ownership.md
  • docs/architecture/migration-readiness.md

Context

The platform's Single Money Writer (SMW) invariant — "only wallet_service mutates player wallet balance" — holds today for the player.* columns and the topology wallet_bucket ledger. It does not hold for agent.balance and agent.coupon_balance:

Call siteFileOperation
adjust_agent_balanceservers_v2/admin_service/app/api/routes/agent.py:815UPDATE agent SET balance (with migration 0041 idempotency)
decline_agent_withdrawal refundservers_v2/admin_service/app/api/routes/agent_finance.py:251UPDATE agent SET balance = balance + :amt (FOR UPDATE + state guard)
withdraw_addservers_v2/agent_service/app/api/routes/withdraw.py:358UPDATE agent SET balance (with migration 0040 idempotency)
grant_couponservers_v2/agent_service/app/api/routes/coupon.py:262UPDATE agent SET coupon_balance (with new migration 0045 idempotency)
recycle_couponservers_v2/agent_service/app/api/routes/coupon.py:407UPDATE agent SET coupon_balance (with migration 0045 idempotency)
legacy passthroughservers_v2/agent_service/app/api/routes/legacy_routes.py:1114UPDATE agent SET balance (compat shim — to be removed)

Each call site already carries proper safety scaffolding (FOR UPDATE locks, state guards, idempotency tables via migrations 0040/0041/0045). What's missing is the ownership boundary: agent balance is a real money domain, and the writers are scattered across three services and two write surfaces. The audit in 2026-05-15 flagged this as the largest remaining "Single Money Writer" violation.

record_agent_balance_legacy_write (rgb_contracts.infra.brand_observability) emits agent_balance_legacy_write_total{change_type} on every call. The counter target post-migration is zero.

Decision

We will migrate agent.balance and agent.coupon_balance mutations to wallet_service over a multi-phase rollout, modelled on the player topology cut-over (ADR-005, ADR-007):

  1. Phase 1 — Define agent-wallet domain in topology. Add agent as a first-class wallet subject in wallet_topology / wallet_policy. Introduce agent_wallet_bucket (or reuse wallet_bucket keyed by (subject_type, subject_id)) so that agent balance lives in the same ledger model as player balance. Schema migration only; no behavior change.

  2. Phase 2 — Expose write commands on wallet_service:

    • POST /internal/wallet/agent/adjust (replaces adjust_agent_balance)
    • POST /internal/wallet/agent/coupon-grant (replaces grant_coupon)
    • POST /internal/wallet/agent/coupon-recycle (replaces recycle_coupon)
    • POST /internal/wallet/agent/withdraw-deduct and /refund (for agent withdraws) Each carries the same idempotency contract as the player surface ((brand_id, idempotency_key) claim, ON CONFLICT DO NOTHING).
  3. Phase 3 — Dual-write window. Caller services (admin_service, agent_service) call the new wallet endpoints and continue the legacy UPDATE in the same transaction. agent_balance_legacy_write_total keeps incrementing; new counter agent_balance_wallet_write_total tracks the new path. Reconcile the two via a nightly job; drift is loud-alerted.

  4. Phase 4 — Cut reads. Read-side callers (agent UI, admin reports) move to GET /internal/wallet/agent/balance/{agent_id}. The legacy agent.balance column becomes a denormalized cache.

  5. Phase 5 — Cut writes. Caller services stop emitting the legacy UPDATE. agent_balance_legacy_write_total should drop to zero. Operators verify zero for 7 consecutive days before Phase 6.

  6. Phase 6 — Remove legacy column writes. Delete the legacy UPDATE call sites. Keep the agent.balance column as a denormalized read cache, populated by a wallet-side projection on every write. Remove record_agent_balance_legacy_write and the counter.

  7. Phase 7 — (Optional) Remove legacy column. Once no consumer reads agent.balance directly (audit via SELECT grep + per-route tracing for a release), drop the column. This is a database migration; one-way.

Consequences

Positive

  • One single money writer for ALL money columns (player + agent + coupon ledgers).
  • Wallet-side audit trail and outbox events for every agent mutation. Downstream consumers (notification, BI) get the same model as player events.
  • Eliminates the dual-source-of-truth between agent.balance and wallet_bucket.
  • Idempotency semantics unified — no more agent_balance_log.request_id vs wallet_idempotency.idempotency_key split.

Negative

  • Adds an inter-service hop on every agent-balance write (admin/agent → wallet). Latency increase: ~3-8ms per call. Acceptable since these are operator-driven, low-QPS.
  • Requires a coordinated 6-phase rollout with dual-write window. Phase 3 doubles write volume on agent balance temporarily.
  • New tests and contract surface; admin_service / agent_service must depend on wallet_service for every agent write (today only some flows do).

Neutral

  • The legacy agent_balance_log table stays as the audit trail per record_agent_balance_legacy_write documentation. Wallet may emit AgentBalanceAdjusted events in parallel.

Implementation Notes

  • Idempotency keys during dual-write Phase 3: use deterministic f"agent-adjust:{agent_id}:{request_id}" so that both writers see the same key. Wallet claim happens first; if wallet claim succeeds and the legacy UPDATE fails, the next retry uses the same key and hits cached wallet response → does NOT re-debit.
  • Brand scoping: all new endpoints respect (brand_id, agent_id) composite, mirroring the player-side rules. ADR-009 multi-brand isolation already covers the brand resolution path.
  • Counter target: agent_balance_legacy_write_total == 0 for 7 consecutive days is the Phase 4→5 gate.
  • Rollback: each phase is reversible until Phase 7 column drop. If dual-write reconciliation surfaces drift, revert to legacy-only writes and re-investigate.

Alternatives Considered

  1. Leave as-is. Rejected: violates Single Money Writer; an audit at 2026-05-15 listed this as the highest-impact remaining architectural debt. Each new agent-balance writer becomes a real risk vector.

  2. Move all agent writers to admin_service only (consolidating into one of the existing two services). Rejected: doesn't solve the SMW problem; admin_service is not the wallet domain, and putting agent_service's self-service withdrawal flow under admin_service breaks the operator/agent privilege separation.

  3. Synchronous topology bucket re-keyed at agent_id without a separate command surface. Rejected: the player-side write commands already encode coupon vs cash, policy lookups, etc. — re-wiring agent into the same command surface couples agent semantics into the player command catalogue. Cleaner to add agent-specific commands that share the wallet_idempotency and wallet_outbox infrastructure but live behind dedicated /internal/wallet/agent/* routes.

Tracking

  • Counter target: agent_balance_legacy_write_total == 0
  • Migration tickets: TBD (to be filed in project tracker; reference this ADR).
  • Phase gating: each phase needs a runbook before execution. First runbook to author: docs/runbooks/agent-balance-smw-phase-1-topology.md (covers schema migration only).
  • Sunset target: 2026-Q4 for Phase 5, end of 2027 for Phase 7.