[CLAUDE] Docs: S39 wrap — gotcha #54+#55 + STATUS/HANDOFF S39 + session log S36-S39

Session S36-S39 end wrap (docs-only, CI skip per gotcha #41):
- gotcha #54 (529 Overload spawn fail → em main solo fallback, S29×2+S37×1)
- gotcha #55 (truncation mid-EXPLORATION extend #53 — heavy spec bloat trước write)
- gotcha 53→55 + quick-ref item 29+30
- STATUS S39 header (Opus 4.8 1M + multi-agent 4→7 + budget +50%)
- HANDOFF S39 (7-agent table + ⚠️ CLI restart required + S40 recommend)
- Session log 2026-05-29 S36-S39 (Phase 10 COMPLETE 11/11 + infra upgrade)
- .gitignore +tmp/ (sub-agent JSON dumps)

Memory user-level +2 (separate, user-scope):
- feedback_7agent_split_upgrade (4→7 BVAAU adapted decision)
- feedback_skeleton_first_aggressive_finish (schema FULL + logic SKELETON pattern)

Drift defer cron 2026-06-01: docs/CLAUDE.md count + schema-diagram §15+ + RAG re-ingest S37-S39.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
pqhuy1987
2026-05-29 10:55:04 +07:00
parent fd0554a585
commit 829969ac6e
5 changed files with 166 additions and 2 deletions

View File

@ -997,6 +997,38 @@ for h in resp.points: # ← .points không phải iterable trực tiếp
---
### 54. Anthropic API 529 Overloaded transient khi spawn sub-agent → 0 token fail (Session 37 Plan G-O3 + Session 29 cumulative)
**Triệu chứng:** Spawn Implementer/Reviewer/CICD qua Agent tool → completed status nhưng result = `API Error: 529 Overloaded` + `subagent_tokens=0` + `tool_uses=0`. Agent KHÔNG chạy gì, 0 token billed. Khác hẳn truncation (#53 — agent chạy nhưng cut output).
**Pattern empirical:** S37 FE Proposal spawn fail 529 (0 token, 218s duration = pure wait) + S29 Plan CA CICD verify fail 529 × 2 (transient Anthropic API overload window). Recurring khi Anthropic API load cao (peak hours / model release window).
**Phân biệt 529 vs #53 truncation:**
- 529 Overload: `tokens=0`, agent KHÔNG start → retry-able HOẶC em main solo fallback
- #53 truncation: agent chạy đầy đủ (~100-150K token) nhưng cut output mid-MEMORY/mid-exploration → KHÔNG retry (đã tốn token), em main grep verify manual
**Mitigation verified S37:**
- **A. Em main solo fallback** — KHÔNG retry loop (529 transient nhưng spawn lại có thể fail tiếp). Em main viết code trực tiếp reliable hơn (S37: BE 700 LOC + FE 4 file × 2 app solo sau 2 spawn fail). Proven faster than wait-retry.
- **B. Critical-path task KHÔNG để 1 agent block** — nếu task on critical path (cần ship trong session) → em main có sẵn fallback plan solo, KHÔNG block chờ agent.
- **C. Off-peak spawn** — heavy parallel spawn (3-4 agent) tránh giờ peak nếu không critical.
**Cumulative occurrence:** S29 × 2 (Plan CA CICD) + S37 × 1 (FE Proposal) = 3× across project. ~5-10% spawn fail rate observed at peak.
### 55. Sub-agent truncation mid-EXPLORATION phase (extend #53 — Session 37 Implementer BE)
**Triệu chứng:** Khác #53 (truncate mid-MEMORY end-of-task), S37 Implementer BE Proposal truncate NGAY ĐẦU ở exploration phase — return `"Now I need to look at Common Models... ICurrentUser, IDateTime..."` sau 30 tool uses, CHƯA write file nào. 150K token wasted (đọc reference + diagnose compile error mid-research).
**Root cause:** Heavy spec brief (~10K token) + agent đọc nhiều reference file (PE WorkflowService + Features + CodeSequence + Common Models) → context bloat trước khi bắt đầu write → truncate giữa research.
**Mitigation:**
- Brief WRITE agent ≤ 8K (gotcha #53 rule A reinforced — heavy spec ~10K = quá rủi ro)
- Pre-supply reference snippets trong brief (em main đọc + paste shape thay vì để agent đọc full) → agent KHÔNG cần exploration phase tốn token
- HOẶC em main solo cho task spec phức tạp cần đọc > 4 reference file (S37 lesson: BE Proposal mirror PE = nhiều reference → em main solo reliable)
**References:** S37 Implementer BE spawn `a3afd177` (truncate mid-exploration) + memory `feedback_implementer_truncation_mitigation` (heavy scaffold ≥30 file pattern). Cumulative truncation S35 × 3 (mid-MEMORY) + S37 × 1 (mid-exploration) = 4× extend #53.
---
## Checklist debug bug mới
1. Build pass không? → fail → check using + package version compat
@ -1028,3 +1060,5 @@ for h in resp.points: # ← .points không phải iterable trực tiếp
25. Nếu UI audit list show `Đã gửi duyệt → Đã gửi duyệt` lặp gây nhầm → drop dual-phase badge khi state machine self-loop, thay Decision badge + next-target hint parse từ comment (#49)
27. Nếu RAG `search_memory` trả 0 results dù Qdrant green + BM25 có data → `qdrant-client` upgrade xóa `search()` method, bị nuốt silent. Test: `python -c "from qdrant_client import QdrantClient; c=QdrantClient(url='http://127.0.0.1:6333'); c.search"`. Fix: dùng `query_points(query=...).points` (#52)
28. Nếu sub-agent (Reviewer/CICD) return PASS verdict bị cut mid-sentence ở "Update MEMORY.md" step → MEMORY > 25KB triggers truncation risk. Mitigation: tight brief ≤ 8K + em main grep verify manual + curate MEMORY pre-spawn nếu > 25KB (#53)
29. Nếu spawn sub-agent trả `API Error: 529 Overloaded` + `tokens=0` → Anthropic API transient overload, agent KHÔNG chạy. KHÔNG retry loop → em main solo fallback reliable (#54). Phân biệt với #53 truncation (agent chạy đủ token nhưng cut output)
30. Nếu sub-agent WRITE truncate NGAY ĐẦU exploration phase (chưa write file, đọc > 4 reference) → heavy spec ~10K + context bloat. Mitigation: brief ≤ 8K + pre-supply reference snippet trong brief HOẶC em main solo nếu cần đọc > 4 reference file (#55)