Act I · Infrastructure
Throttled at the Limit
agent pos seem to be exhausing CPU limits. what limit is set what should be increased to?
The agent pods were pegged. Two of them sat at 3998m / 4000m — CFS slicing them at the hard limit, build phases dragging because the kernel refused them more cycles. The per-pod ceiling had been cpu: 2 since the GKE Autopilot migration on 2026-03-13. Untouched for fifty-two days while the codebase doubled around it.
The first instinct is to blame the kernel. Then to suspect a regression. The git log told a quieter story — nothing reverted. The blog cornerstone batch had landed (PRs #2196–#2204) and next build grew teeth. Thirty-eight cornerstone posts. Forty-eight cover images. The same 2 CPU slot was now feeding a much hungrier wolf.
Bumped per-pod 2→4 in PR #2245. Pegged again. Bumped 4→8 in PR #2251. Settled. Then noticed Autopilot quietly upsized memory from 4Gi to 8Gi to honor the 1:1 CPU ratio rule — meaning the namespace quota only fit 16 pods, not the 32 we thought. Bumped memory quota 128→256 GiB in PR #2253 to actually unlock the advertised fan-out.
resources: requests: { cpu: "2", memory: 4Gi } limits: { cpu: "2", memory: 4Gi } resources: requests: { cpu: "8", memory: 4Gi } # Autopilot upsizes mem to 8Gi limits: { cpu: "8", memory: 4Gi } # via 1:1 CPU ratio rule