[CLAUDE] Docs+Memory: S35 wrap — FE forms + G-H2 BE CRUD + FE Admin deploy prod end-to-end

Cumulative S35 3 commit + 3 CI Run #242/#243/#244 ALL PASS: - `c3cd343` FE inline forms 5 satellite × 2 app cookie-cutter (+1758 LOC) - `909655c` G-H2 BE CRUD HrmConfig 16 endpoint (+576 LOC NEW) - `021674a` G-H2 FE Admin HrmConfigsPage declarative (+1388 LOC) ## Updates this commit (docs CI skip per gotcha #41) - docs/STATUS.md S35 wrap header (cumulative 3 chunk + Multi-agent ROI ~250K) - docs/HANDOFF.md S35 brief + S36 backlog 6 option - docs/gotchas.md +#53 sub-agent truncation/stall pattern S35 × 3 occurrence + Quick reference 28 - docs/changelog/sessions/2026-05-28-s35-fe-inline-forms-g-h2.md NEW session log - 4 sub-agent MEMORY auto-updated entry (Implementer + CICD + Reviewer + Investigator S35 spawns) ## Patterns reinforced cumulative S35 - Pattern 12-ter (within-module N-satellite) 6× cumulative - Pattern 12-bis (cross-module catalog mega) 3× cumulative - Pattern 16-bis (4-place mirror cross-app) 6× — staticMap 4th place mandatory (gotcha #50) - Smart Friend 9× cumulative clean (S22+S25+S29×2+S33×2+S35×3) - NEW: Declarative KIND_CONFIG Record pattern (single-page multi-kind CRUD reuse) ## Smart Friend Implementer 3 catch S35 (anti-pattern prevention) 1. Chunk 2 MaxLength validator vs EF config mismatch → aligned EF source-of-truth 2. Chunk 2 HRM entities NO HasQueryFilter → explicit .Where(!IsDeleted) 8 site 3. Chunk 3 em main spec gap Layout staticMap miss → Implementer enforced Pattern 16-bis 4-place ## State chốt S35 - 35 mig unchanged · 71 tables · ~185 endpoints (+16 HRM Configs) - 43 FE pages (+1 HrmConfigsPage) · 130 test PASS unchanged - 53 gotcha (+1 #53) · 27 memory user-level · 6 skills · 4 sub-agents ## Multi-agent ROI S35 ~250K - Implementer 3 spawn ~80K (3 cookie-cutter chunk + Smart Friend × 3 catch) - Investigator 1 spawn ~8K (G-H2 BE CRUD pre-flight + NamGroup MISS verdict) - Reviewer 3 spawn ~60K (Smart Friend 9× clean, 2 truncated + 1 tight brief PASS) - CICD 4 spawn ~70K (warm-up + 3 deploy verify, 1 stalled em main fallback) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-28 10:23:55 +07:00
parent 021674a66a
commit 8afdc1e826
8 changed files with 382 additions and 3 deletions
--- a/docs/gotchas.md
+++ b/docs/gotchas.md
@ -956,6 +956,40 @@ for h in resp.points:    # ← .points không phải iterable trực tiếp

 **Phòng tránh:** Pin `qdrant-client` version trong deps (`==1.x.y`). Hoặc thêm health-check startup: `assert hasattr(_qdrant, 'query_points'), "upgrade qdrant-client"`. **KHÔNG dùng `except Exception: continue` che-mờ lỗi API** — ít nhất log warning.

+### 53. Sub-agent truncation / stall pattern khi heavy MEMORY update phase end-of-task — Reviewer + CICD bị cut mid-sentence ở Update MEMORY.md step (Session 35 × 3 occurrence)
+
+**Triệu chứng:** Sub-agent (Reviewer + CICD spawn ~100K token budget, ~30+ tool uses) chạy adversarial checks / smoke verify hoàn chỉnh, returning verdict PASS qua snippet visible, NHƯNG output bị truncate ở final "Update MEMORY.md BEFORE stop" step. Em main không nhận structured verdict đầy đủ.
+
+**Pattern empirical S35:**
+- Reviewer FE forms (1200 LOC + 60 mutation): Cat 1 "wire BE PERFECT" + 33 tool uses → truncated mid-MEMORY append
+- Reviewer BE CRUD (576 LOC + 16 endpoint): "MEMORY size warning 24.6KB exceeds 24.4KB. Let me append concise entry but trim verbose..." → truncated mid-trim
+- CICD Run #244 verify (FE Admin deploy): "VPS mtime cross-check confirms ship at 10:05" → stalled 600s watchdog timeout
+
+**Root cause hypothesis:**
+1. Sub-agent context window approaches limit khi cumulative tool output (Read MEMORY ~25KB initial + Read references + Bash output + grep results) + 100K spawn budget
+2. MEMORY.md size ~25-31KB borderline triggers Edit/Write large operation late-stage → token overflow during streaming
+3. Stream watchdog 600s timeout không recover (CICD case) — process hung internally
+
+**Mitigation S35 verified:**
+- **A. Tight brief scope** — Reviewer FE Admin spawn (~5K token brief, 4 cat tight, "concise findings only") → PASS clean 5K return không truncated. Pattern: brief budget < 8K + scope ≤ 4 cat + explicit "DO NOT curate MEMORY heavy — short append only this time"
+- **B. Em main manual verify post-truncation** — Cat 2-6 grep-based verify (SHA256 diff exit code + grep count match) takes ~5 phút em main, faster than re-spawn
+- **C. Curate MEMORY pre-spawn nếu > 25KB** — agent MEMORY > threshold trigger truncation risk. Em main curate proxy archive q3.md trước spawn heavy.
+- **D. Avoid forcing MEMORY heavy update in agent spec** — phase "Update MEMORY.md BEFORE stop with detailed findings" → switch to "short append 1 entry FIFO most-recent-first, KHÔNG curate old"
+
+**Cumulative occurrences S35:** 3/8 sub-agent spawn (Reviewer × 2 + CICD × 1 = 37.5% truncation rate at borderline ~25-31KB MEMORY). Heavy task + large MEMORY = correlation point.
+
+**References:**
+- S33 Implementer truncation pattern 2/3 (memory `feedback_implementer_truncation_mitigation` user-level — heavy scaffold ≥30 file)
+- S35 Reviewer FE forms (Session 35 push #1 verify): output cut after Cat 1 PERFECT statement
+- S35 Reviewer BE CRUD (Session 35 push #2 verify): cut mid-MEMORY trim
+- S35 CICD Run #244 (Session 35 push #3 verify): stalled 600s watchdog after VPS mtime cross-check
+
+**Phòng tránh:**
+1. Tight brief scope ≤ 8K tokens cho Reviewer/CICD nếu task verifiable qua grep/diff em main
+2. MEMORY pre-spawn audit: nếu > 25KB → curate proxy archive trước spawn
+3. Agent spec ghi rõ "short append MEMORY only, NO curate", remove "BEFORE stop with detailed" directive khi MEMORY borderline
+4. Em main backup verify Cat 2-6 manual grep nếu Reviewer truncated mid-verdict
+
 **References:**
 - AI_INFRA: `claude-rag/lib/retrieval.py` `vector_search()` function — fixed 2026-05-26 S31
 - Eval run: `eval/runs/2026-05-26-baseline-v1.1-final.json`
@ -993,3 +1027,4 @@ for h in resp.points:    # ← .points không phải iterable trực tiếp
 26. Nếu new Seed method KHÔNG chạy prod dù dotnet build PASS + deploy SUCCESS → check nested inside `if (!demoSeedDisabled)` gate (Plan T S23 t10 flag enabled prod) → INFRASTRUCTURE seed phải PROMOTE OUT of DemoSeed gate (#51). Decision tree: production cần để work end-to-end? YES → ungate
 25. Nếu UI audit list show `Đã gửi duyệt → Đã gửi duyệt` lặp gây nhầm → drop dual-phase badge khi state machine self-loop, thay Decision badge + next-target hint parse từ comment (#49)
 27. Nếu RAG `search_memory` trả 0 results dù Qdrant green + BM25 có data → `qdrant-client` upgrade xóa `search()` method, bị nuốt silent. Test: `python -c "from qdrant_client import QdrantClient; c=QdrantClient(url='http://127.0.0.1:6333'); c.search"`. Fix: dùng `query_points(query=...).points` (#52)
+28. Nếu sub-agent (Reviewer/CICD) return PASS verdict bị cut mid-sentence ở "Update MEMORY.md" step → MEMORY > 25KB triggers truncation risk. Mitigation: tight brief ≤ 8K + em main grep verify manual + curate MEMORY pre-spawn nếu > 25KB (#53)