ADR-010: agent.balance / agent.coupon_balance Single Money Writer Migration
Status
Proposed
Date
2026-05-15
Last Verified Commit
3f36e7a8
Owners
- Platform Backend
- Agent Domain
- Wallet Domain
Affected Services
admin_serviceagent_servicewallet_service
Related Docs
servers_v2/CLAUDE.md("数据 ownership 未隔离")docs/architecture/data-ownership.mddocs/architecture/migration-readiness.md
Context
The platform's Single Money Writer (SMW) invariant — "only wallet_service mutates player wallet balance" — holds today for the player.* columns and the topology wallet_bucket ledger. It does not hold for agent.balance and agent.coupon_balance:
| Call site | File | Operation |
|---|---|---|
adjust_agent_balance | servers_v2/admin_service/app/api/routes/agent.py:815 | UPDATE agent SET balance (with migration 0041 idempotency) |
decline_agent_withdrawal refund | servers_v2/admin_service/app/api/routes/agent_finance.py:251 | UPDATE agent SET balance = balance + :amt (FOR UPDATE + state guard) |
withdraw_add | servers_v2/agent_service/app/api/routes/withdraw.py:358 | UPDATE agent SET balance (with migration 0040 idempotency) |
grant_coupon | servers_v2/agent_service/app/api/routes/coupon.py:262 | UPDATE agent SET coupon_balance (with new migration 0045 idempotency) |
recycle_coupon | servers_v2/agent_service/app/api/routes/coupon.py:407 | UPDATE agent SET coupon_balance (with migration 0045 idempotency) |
| legacy passthrough | servers_v2/agent_service/app/api/routes/legacy_routes.py:1114 | UPDATE agent SET balance (compat shim — to be removed) |
Each call site already carries proper safety scaffolding (FOR UPDATE locks, state guards, idempotency tables via migrations 0040/0041/0045). What's missing is the ownership boundary: agent balance is a real money domain, and the writers are scattered across three services and two write surfaces. The audit in 2026-05-15 flagged this as the largest remaining "Single Money Writer" violation.
record_agent_balance_legacy_write (rgb_contracts.infra.brand_observability) emits agent_balance_legacy_write_total{change_type} on every call. The counter target post-migration is zero.
Decision
We will migrate agent.balance and agent.coupon_balance mutations to wallet_service over a multi-phase rollout, modelled on the player topology cut-over (ADR-005, ADR-007):
-
Phase 1 — Define agent-wallet domain in topology. Add
agentas a first-class wallet subject inwallet_topology/wallet_policy. Introduceagent_wallet_bucket(or reusewallet_bucketkeyed by(subject_type, subject_id)) so that agent balance lives in the same ledger model as player balance. Schema migration only; no behavior change. -
Phase 2 — Expose write commands on
wallet_service:POST /internal/wallet/agent/adjust(replacesadjust_agent_balance)POST /internal/wallet/agent/coupon-grant(replacesgrant_coupon)POST /internal/wallet/agent/coupon-recycle(replacesrecycle_coupon)POST /internal/wallet/agent/withdraw-deductand/refund(for agent withdraws) Each carries the same idempotency contract as the player surface ((brand_id, idempotency_key)claim, ON CONFLICT DO NOTHING).
-
Phase 3 — Dual-write window. Caller services (
admin_service,agent_service) call the new wallet endpoints and continue the legacy UPDATE in the same transaction.agent_balance_legacy_write_totalkeeps incrementing; new counteragent_balance_wallet_write_totaltracks the new path. Reconcile the two via a nightly job; drift is loud-alerted. -
Phase 4 — Cut reads. Read-side callers (agent UI, admin reports) move to
GET /internal/wallet/agent/balance/{agent_id}. The legacyagent.balancecolumn becomes a denormalized cache. -
Phase 5 — Cut writes. Caller services stop emitting the legacy UPDATE.
agent_balance_legacy_write_totalshould drop to zero. Operators verify zero for 7 consecutive days before Phase 6. -
Phase 6 — Remove legacy column writes. Delete the legacy UPDATE call sites. Keep the
agent.balancecolumn as a denormalized read cache, populated by a wallet-side projection on every write. Removerecord_agent_balance_legacy_writeand the counter. -
Phase 7 — (Optional) Remove legacy column. Once no consumer reads
agent.balancedirectly (audit via SELECT grep + per-route tracing for a release), drop the column. This is a database migration; one-way.
Consequences
Positive
- One single money writer for ALL money columns (player + agent + coupon ledgers).
- Wallet-side audit trail and outbox events for every agent mutation. Downstream consumers (notification, BI) get the same model as player events.
- Eliminates the dual-source-of-truth between
agent.balanceandwallet_bucket. - Idempotency semantics unified — no more
agent_balance_log.request_idvswallet_idempotency.idempotency_keysplit.
Negative
- Adds an inter-service hop on every agent-balance write (admin/agent → wallet). Latency increase: ~3-8ms per call. Acceptable since these are operator-driven, low-QPS.
- Requires a coordinated 6-phase rollout with dual-write window. Phase 3 doubles write volume on agent balance temporarily.
- New tests and contract surface;
admin_service/agent_servicemust depend onwallet_servicefor every agent write (today only some flows do).
Neutral
- The legacy
agent_balance_logtable stays as the audit trail perrecord_agent_balance_legacy_writedocumentation. Wallet may emitAgentBalanceAdjustedevents in parallel.
Implementation Notes
- Idempotency keys during dual-write Phase 3: use deterministic
f"agent-adjust:{agent_id}:{request_id}"so that both writers see the same key. Wallet claim happens first; if wallet claim succeeds and the legacy UPDATE fails, the next retry uses the same key and hits cached wallet response → does NOT re-debit. - Brand scoping: all new endpoints respect
(brand_id, agent_id)composite, mirroring the player-side rules. ADR-009 multi-brand isolation already covers the brand resolution path. - Counter target:
agent_balance_legacy_write_total == 0for 7 consecutive days is the Phase 4→5 gate. - Rollback: each phase is reversible until Phase 7 column drop. If dual-write reconciliation surfaces drift, revert to legacy-only writes and re-investigate.
Alternatives Considered
-
Leave as-is. Rejected: violates Single Money Writer; an audit at
2026-05-15listed this as the highest-impact remaining architectural debt. Each new agent-balance writer becomes a real risk vector. -
Move all agent writers to
admin_serviceonly (consolidating into one of the existing two services). Rejected: doesn't solve the SMW problem;admin_serviceis not the wallet domain, and puttingagent_service's self-service withdrawal flow underadmin_servicebreaks the operator/agent privilege separation. -
Synchronous topology bucket re-keyed at
agent_idwithout a separate command surface. Rejected: the player-side write commands already encode coupon vs cash, policy lookups, etc. — re-wiring agent into the same command surface couples agent semantics into the player command catalogue. Cleaner to add agent-specific commands that share thewallet_idempotencyandwallet_outboxinfrastructure but live behind dedicated/internal/wallet/agent/*routes.
Tracking
- Counter target:
agent_balance_legacy_write_total == 0 - Migration tickets: TBD (to be filed in project tracker; reference this ADR).
- Phase gating: each phase needs a runbook before execution. First runbook to author:
docs/runbooks/agent-balance-smw-phase-1-topology.md(covers schema migration only). - Sunset target: 2026-Q4 for Phase 5, end of 2027 for Phase 7.