v1.56.91: add stale-running-job reaper + activate queue maintenance. Queue.reapStale re-queues (or fails, once attempts are spent) jobs stuck 'running' past STALE_RUNNING_SECS (900s) whose worker hung or died โ previously such rows (e.g. orphaned s3_upload jobs) sat 'running' forever and never retried. WorkerPool.maybeMaintain runs reapStale + the formerly-dead pruneCompleted once a minute, with a CAS so exactly one worker wins the slot; maintenance_at starts at 0 so the first worker reaps on boot, auto-recovering jobs orphaned by the last restart. PG-backed reapStale test (requeue vs fail vs leave-live-job). musl ReleaseSafe build exit 0 + jobs tests green vs throwaway PG (646 pass / 1 pre-existing team-tier live-PG fail)
$ koh steal kepr.uk/kepr@cb201981b21f
·
parent: b305de9f47d1
discussion
log in to leave a comment.