Worker Health And Settlement Hardening Spec

Status

Approved

Date

2026-04-22

Goal

Harden servers_v2 background execution so worker health and settlement outcomes reflect real business progress instead of process liveness or silent partial failure.

Scope

In scope:

wallet, rolling, and agent outbox publication semantics
worker health supervision and freshness rules
promotion settlement result semantics for rebate, cashback, and lossback
compose and health contract alignment
logging and operational visibility for these paths

Out of scope:

broad service-boundary redesign
new domain extraction work
full retry-platform redesign beyond what is needed to avoid silent success

Background

Two production-grade risks must be prevented:

outbox workers can appear healthy while row-level publication keeps failing
settlement jobs can appear successful while all eligible credits fail

Both failures create false-green operational signals.

Requirements

1. Outbox Publication Semantics

For wallet, rolling, and agent outbox pollers:

row-level publish exceptions must be surfaced to worker supervision
an iteration with failures must not be recorded as healthy
if all rows in the polled batch fail to publish, the iteration must fail loudly
if some rows publish and some fail, the iteration must still be treated as degraded and surfaced to supervision
already-published rows may still be committed, but supervision must see the degraded iteration

2. Worker Health Semantics

Worker health must use both:

consecutive failure budget
freshness of successful progress

Required behavior:

success freshness is tracked per critical loop
a loop that keeps failing beyond the configured budget becomes unhealthy
a loop that keeps running but has no successful progress beyond its freshness window becomes unhealthy
container health must depend on those readiness checks

3. Settlement Semantics

For rebate, cashback, and lossback settlement:

“no eligible records” is a valid clean outcome
“eligible records found, all credits failed” is a hard failure
“some credits succeeded and some failed” is a degraded outcome and must be visible in logs and monitoring
the settlement summary must make attempted, credited, and failed counts explicit

4. Observability

Required signals:

per-worker critical loop freshness
per-loop consecutive failures
outbox publish success/failure counts
settlement attempted/credited/failed counts
explicit logs for full-batch settlement failure
explicit logs for degraded partial settlement runs

5. Test Requirements

Required automated coverage:

outbox poller failure contract tests
worker health freshness tests
scheduler/worker health integration tests
settlement failure semantic tests
compose contract tests for worker health dependencies where applicable

Acceptance Criteria

wallet, rolling, and agent outbox paths do not record failed publication iterations as healthy
worker readiness can fail when forward progress stalls even if the process is still running
rebate, cashback, and lossback jobs raise when all eligible credits fail
settlement summaries distinguish clean, degraded, and failed outcomes
required tests exist and pass

Status​

Date​

Goal​

Scope​

Background​

Requirements​

1. Outbox Publication Semantics​

2. Worker Health Semantics​

3. Settlement Semantics​

4. Observability​

5. Test Requirements​

Acceptance Criteria​