[CLAUDE] Docs: adopt Harness-15-v2 (Tang-1 hot-feed-lon + L2/L3 bo-tran + %-print + role-boundary, S82)

- token_governor v2: L1 12K -> hot-feed per-role (lead 60K/sub 20K/wf 16K); L2/L3 6K/4K -> NO-CAP - engine G.4 + session-start 2.1.6 + session-end L.b(c): %-print 2-dau-phien - role-boundary: con-so = quyen chu-du-an (anh), AI thuc-thi + bao-%; sua framing lead-authority S81 - mark H-15 v2-delta anh-confirm RC-pqhuy1987-21-06-2026-01-58-01 - 2-process: IMPLEMENT em-main D9 + REVIEW wf_04667b25-5fa 3/3 PASS (3 MINOR fixed) - check-email STAGE 1+2 (notify verify ok) + email AI_INFRA (a749bb6bd1be) Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-21 02:02:05 +07:00
parent 700e4c951b
commit aa3f0fefb4
9 changed files with 193 additions and 13 deletions
--- a/.claude/agent-memory/memory-budget.json
+++ b/.claude/agent-memory/memory-budget.json
@ -47,15 +47,25 @@
    "components_note": "persona(measured 1.3K-4.0K, DIRECTLY via Get-ChildItem byte-count of .claude/agents/*.md, NOT via measure-agent-memory.ps1 which only sizes agent-memory tiers) + tool-schema(est ~5K) + framing(est ~2K) + lead-pasted base-slice(est ~5-9K; em-main injects CLAUDE.md/README-slice + task ctx) + prompt(est ~1K). README decision-tree corpus = 32704B but em-main pastes a SLICE, not whole. The ~21K sum is SE's OWN reasoning over SE's OWN agents (persona byte-measured + each harness-injected sub-component estimated independently); it lands NEAR AI_INFRA's ~21K because the toolset family is identical (Read/Write/Edit/Bash/Grep/Glob/Skill/RAG), NOT because borrowed. SE's governing cap = 30K (SE's own round-up), independent of AI_INFRA's figure."
  },
  "token_governor": {
-    "_note": "Harness-15 A2/B(a)/B(e) (S81, 2026-06-20): SECOND governor (token) ORTHOGONAL to the BYTE governor (tiers/archive_gate above). Keep BOTH (B(e)) -- byte measures file-size-on-disk, token measures context-loaded; independent (an agent can exceed byte-cap yet load FEWER tokens; VN text ~3.0-3.5 byte/tok so byte/4 = upper bound => real headroom LARGER). Numbers = LEAD-AUTHORITY hard-cap derived from SE byte-caps x workload (multi-module ERP => heavier budget); NO AI may re-optimize them down (B(a)). Budget = MINIMUM-to-USE floor, NOT a ceiling to dribble against: FILL L1 with real work-state up to budget; under-fill ONLY when high-value content exhausted; NEVER garbage-stuff (token-saving = forgetting work).",
-    "l1_always_tokens": 12000,
-    "l1_always_note": "own agent-memory (MEMORY.md ~8K @ 25600B/3.3) + archive _INDEX map (~2K) + work-state block (~2K). Always-loaded.",
-    "l2_ondemand_tokens": 6000,
-    "l2_ondemand_note": "archive verbatim/gist sections + skill sections. Pulled per-need, NOT always-loaded => no context-rot when unused.",
-    "l3_rag_tokens": 4000,
-    "l3_rag_note": "RAG search_memory/search_code result per query. On-demand.",
-    "headline_floor_plus_l1_tokens": 42000,
-    "headline_note": "always-present per spawn = SAN(30K) + L1(12K). L2/L3 expand only on-demand (no always-cost)."
+    "_note": "Harness-15-v2 (S82, 2026-06-21): UPDATED by delta broadcast 2026-06-20-Governance-harness-15-v2-hot-feed-update (supersedes_scope = tier-1-sizing + L2/L3-caps ONLY; rest of H15 unchanged). TWO CHANGES vs S81: (1) Tier-1 = HOT-FEED LARGE per-role (was flat 12K -- too thin, caused lead to forget work across sessions); (2) L2/L3 caps REMOVED (on-demand, no artificial tier-limit, bounded only by model context window). Still the SECOND governor (token) ORTHOGONAL to the BYTE governor (tiers/archive_gate above) -- keep BOTH (B(e)); byte measures file-size-on-disk, token measures context-loaded; VN text ~3.0-3.5 byte/tok so byte/4 = upper bound => real headroom LARGER. Budget = MINIMUM-to-USE floor (FILL Tier-1 with real work-state up to the number; under-fill ONLY when high-value content exhausted; NEVER garbage-stuff -- token-saving = forgetting work).",
+    "role_boundary_note": "v2 §6 ROLE BOUNDARY (🔴): the budget numbers (Tier-1 per-role cap + per-bucket allocation) are ANH's (project-owner / chu-du-an) RIGHT to set -- NOT the AI-lead's. em-main's job is exactly two parts: (1) EXECUTE the config faithfully (load Tier-1 to the number, no-truncate, pull each bucket to target) + (2) REPORT %-composition at session-start (§2.1.6) and session-end (§L.b(c)) so anh decides. em-main self-measures + proposes numbers; em-main does NOT auto-tune them down. This corrects the S81 'LEAD-AUTHORITY' framing which conflated AI-lead with project-owner.",
+    "tier1_hotfeed_tokens": {
+      "_note": "Tier-1 always-loaded HOT-FEED, PER-ROLE (v2: large/generous, do NOT keep thin). FILL with the 4 work buckets: (1) WIP work-state, (2) recurring-bugs/anti-patterns/gotcha (value_protect, kept regardless of age), (3) backlog, (4) pending-decisions. Numbers below = SE self-measured-ESTIMATE per SE scale (Opus 4.8 1M context window + multi-module ERP workload); %-print at the two session ends shows the REAL composition. AI_INFRA reference (lead 220K / mem-sub 60K / wf-sub 50K) is THEIR measure on THEIR model+federation-scale -- NOT hard-applied; SE numbers are SMALLER (single project; sub MEMORY.md byte-capped at 30720B by design).",
+      "lead_tokens": 60000,
+      "lead_note": "em-main hot-feed: STATUS current-state + 4-bucket work-state block + ACTIVE-MARKS + recent-3-session HANDOFF slice + roster-slice + task-relevant gotchas. Opus 4.8 1M window => ample headroom above this. anh-adjustable.",
+      "memory_sub_tokens": 20000,
+      "memory_sub_note": "memory-bearing sub: own MEMORY.md (<=30720B ~9.3K tok) + archive _INDEX map (<=20480B ~6.2K tok) + work-state slice (~3-4K). Upper-bounded by the BYTE soft-cap on MEMORY.md.",
+      "workflow_sub_tokens": 16000,
+      "workflow_sub_note": "agent-in-workflow: MEMORY-PACK slice (hmw.js:124 args inject) + task context."
+    },
+    "l2_ondemand": "NO-CAP (v2: removed the 6K cap). On-demand: archive verbatim/gist sections + skill sections; pulled per-need, no artificial tier-limit; bounded only by model context window. On-demand => no permanent context-cost when unused.",
+    "l3_rag": "NO-CAP (v2: removed the 4K cap). On-demand: RAG search_memory/search_code per query; bounded only by model context window.",
+    "pct_print": {
+      "_note": "v2 §6: %-print Tier-1 composition at TWO session ends so anh sees what Tier-1 holds, which bucket is thin, headroom left. Estimate-by-ratio is enough (no exact measure). Headroom > 0 WHILE high-value content still unloaded = under-fill (WRONG) -> load more; chua headroom ONLY when high-value content truly exhausted. Headroom = a FLAG, NOT a saving target.",
+      "session_start": "session-start.md §2.1.6 (composition by % per bucket)",
+      "session_end": "session-end.md §L.b(c) (% after load + Headroom remaining)"
+    },
+    "honest_caveat": "v2 §5: large Tier-1 = HIGHER context-rot on the always-loaded part -- an accepted, deliberate trade-off (forgetting-work judged worse than rot), NOT 'rot disappears'. Small-context-window / light-workload projects may optimally pick a SMALLER Tier-1 -- the FLOOR is the ARCHITECTURE (hot-feed large + L2/L3 on-demand no-cap), not the numbers. 'No-cap' L2/L3 = no artificial tier-limit, still within the model context window."
  },
  "measured": {
    "_note": "S81 2026-06-20 FRESH re-measure (scripts/measure-agent-memory.ps1) post-S80 curate. Supersedes stale S71-seeded values. byte-governor snapshot (l1_hot = file-size-on-disk); cross-check token_governor for the orthogonal token-thread.",