Recon Service Cutover Runbook

Status

Ready

Roll out recon_service and recon_worker into servers_v2 without breaking existing back-office shooter and pushbullet workflows.

compatibility contract tests are green
worker freshness health tests are green
compose contract tests include recon service and worker
admin top-info aggregation has been updated and tested
/shooter/device/* has been explicitly classified
the implemented v2 surface has been frozen in docs/reference/recon/recon-service-v2-surface.md

Required configuration classes:

database access for shooter-related tables
Redis access
Pushbullet credentials
Telegram bot token if Telegram ingestion is enabled
OpenRouter configuration if AI regex generation is enabled
per-caller internal tokens configured for the recon/admin path:
- RGB_INTERNAL_SERVICE_TOKEN_ADMIN on consumers that receive calls from admin_service
- RGB_INTERNAL_SERVICE_TOKEN_RECON on consumers that receive calls from recon_worker
- RGB_PER_CALLER_TOKEN_REQUIRED=on once the Stage C hard flip is active
DingTalk or equivalent alert channel
RECON_SERVICE_URL configured in admin_service

Start local dependencies and services.
- cd servers_v2
- docker compose -f docker-compose.yml -f docker-compose.dev.yml --env-file .env.compose.local up -d --build
Confirm admin_service health is green.
Confirm recon_service health is green.
Confirm recon_worker health is green.
Exercise these back-office routes through admin_service:
- template list/add/delete
- phone whitelist list/add/delete
- SMS list/get/reset/check
- Pushbullet device list/add/delete
- shooter device list/trust/delete
Confirm Authorization header refresh still appears on successful legacy responses.
Confirm admin top-info shows sms_need_check_cnt.
Run verification test suites:
- cd servers_v2/admin_service && uv run pytest
- cd servers_v2/recon_service && uv run pytest
- cd servers_v2/rolling_service && uv run pytest
- cd servers_v2/wallet_service && uv run pytest
- cd servers_v2 && uv run pytest tests/test_compose_contract.py

Before production cutover:

Current implementation note:

admin_service compatibility routes already proxy to recon_service
there is no runtime feature flag today that can enable or disable the proxy path independently

Deploy and verify the recon path as one release unit:

Deploy the target admin_service, recon_service, and recon_worker builds together.
Verify internal auth:
- admin_service calls recon_service with X-Caller-Service: admin and RGB_INTERNAL_SERVICE_TOKEN_ADMIN
- recon_worker calls admin_service /internal/meta/ws/sync with X-Caller-Service: recon and RGB_INTERNAL_SERVICE_TOKEN_RECON
- in Stage C, the bare RGB_INTERNAL_SERVICE_TOKEN is absent or empty and RGB_PER_CALLER_TOKEN_REQUIRED=on is set on every internal-token consumer
Verify both health endpoints and worker freshness metrics.
Monitor:
- compatibility route errors
- worker freshness
- pending SMS backlog
- parse failure count
- match failure count
- top-info update lag
Keep the previous known-good deployment set ready for full service rollback until parity is confirmed.

Rollback triggers:

Rollback steps:

Roll back admin_service, recon_service, and recon_worker to the previous known-good release set.
Do not assume a runtime repoint or cutover flag exists unless it has been implemented and documented separately.
If immediate full rollback is not possible, isolate the faulty recon release from traffic by reverting the deployment or service image, not by editing undocumented runtime routing.
Preserve logs, queue state, and DB snapshots needed for analysis.
Open a follow-up incident doc before retrying cutover.