Files
solution-erp/docs/governance/error-ledger.md
pqhuy1987 009dd94f22 [CLAUDE] Docs: S48 adap-* verify closure post-restart + Gov-v2 error-ledger + §L.b
- store_memory strip VERIFIED-runtime (registry 0/8 subs) — adap-report updated
- frontend-designer FD2 loop VERIFIED-RAN (first spawn) — adap-report updated
- Gov-v2 delta CLOSED: NEW docs/governance/error-ledger.md (blameless RCA + Active-Guards
  index + AS-1..AS-9 deterministic-detect + 3-ledger triad) + session-end.md Phase 1.5 §L.b 6-step
- STATUS/HANDOFF S48 + session log + frontend-designer MEMORY flush (FD2 rig + Tailwind-v4 fact)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-03 00:05:39 +07:00

8.0 KiB
Raw Blame History

Error-Ledger — SOLUTION_ERP (Gov-v2 §L keystone)

Living artifact. Blameless RCA + Active-Guards index for SE. Closes the open delta from adap-report 2026-06-02-Governance-gov-v2-session-cmd-framework (the only Gov-v2 floor item SE had distributed-but-not-formalized). Maintained at /session-end §L.b (deterministic step, not a daemon — G-015). Blameless = root-cause + guard, NOT blame.

📐 The 3-ledger triad (Gov-v2 §L.b / §G3 — form gộp, function intact)

SE maps the mandated 3 living ledgers onto existing + new artifacts (§F4 form-freedom):

Ledger (function) SE artifact Role
(i) error-ledger this file (docs/governance/error-ledger.md) RCA blameless · Active-Guards index · 3-axis tag · 2-strike promote
(ii) comms-ledger docs/governance/README.md "Cross-Project Adoption Ledger" + docs/governance/adap-reports/ 2-way cross-project OUT→ACK / IN→decided, link-not-copy
(iii) summary-index docs/STATUS.md "Recently Done" + docs/changelog/sessions/ timeline spine, pointer-not-log, reverse-chron

🔍 §L.a — Deterministic detect (action-signature scan @ session-end)

Detect by action-signature (NOT "AI tự phán có vi phạm không"). Scan the session for these; each hit → an RCA entry below. List is open — extend when a new class appears. (G-015: catches signatures in this list, NOT "mọi vi phạm".)

# Action-signature (grep/observe) Rule it violates On hit
AS-1 git add -A / git add . add-specific-files (concurrency safety, feedback_rag_mcp_recovery_concurrency) RCA + re-stage specific
AS-2 --no-verify / --no-gpg-sign / commit.gpgsign=false no hook/sign bypass unless asked RCA, justify or revert
AS-3 sub-agent invokes store_memory lead = sole RAG-writer (S47, mechanized) should be impossible (allowlist-stripped); if chunk-count jumps w/o lead write → investigate
AS-4 EF Mig adds UNIQUE/composite index on a soft-delete (IsDeleted) entity without .HasFilter("[IsDeleted]=0") gotcha #57 (recreate-on-soft-deleted-slot → 500) RCA + test-before + filter
AS-5 heavy/long agent spawn in foreground feedback_background_spawn_visibility (looks-frozen) note; prefer run_in_background
AS-6 docs-only commit that triggers a CI run gotcha #41 path-filter (paths-ignore) verify path-filter intact
AS-7 model downgrade (haiku/sonnet) on codegen/guard/financial/security critical-algo needs Max tier RCA, re-run on Max
AS-8 session-end memory .md Write leaving 0 bytes feedback_session_end_memory_write_verify (S46) re-write + verify byte>0
AS-9 A/B/C choice handed to anh without decision-brief trục Gov-v2 §G2 reframe as full brief

🛡️ Active-Guards index (2-strike promote: episodic → procedural)

net-effect rule: a guard that costs more than it saves (hại>lợi) → retire. verified = ran ≥1× and held. strikes = times the underlying error recurred before the guard.

Guard Counters Tier Strikes Verified Net
CI paths-ignore docs-only skip gotcha #41 (AS-6) procedural 2 (every docs commit 0s) +++
em-main verify-on-disk + proxy-append after agent return gotcha #53 truncation procedural 5× (S35-S42) +++
test-before bug-fix + soft-delete-UNIQUE .HasFilter gotcha #57 (AS-4) procedural 2 (Holiday S45 + latent LeaveType/Shift) Mig 43 ++
authz regression test per-action policy gotcha #44 silent-403 procedural 1 (promoted S45 +10 test) ++
agent frontmatter model: inherit (not [1m]) gotcha #37 procedural (FD agent loaded S48) ++
lead = sole RAG-writer (store_memory stripped, mechanized) store_memory rebootstrap-loss (S41) + AS-3 procedural 2 (NamGroup + SE S41) runtime S48 (0/8 subs) +++ (failure-safe)
session-end verify memory byte>0 S46 0-byte (AS-8) episodic→promote 1 (S46) wired §L.b S48, verify next run ++
heavy spawn → run_in_background looks-frozen episodic 2 (S45, S48) S48 (FD bg) +
RAG glob **/-anchored (not root) gotcha #10 node_modules leak procedural 1 (S41) (2406 clean) ++

📋 RCA entries (blameless — newest on top)

Format: E-NNN | date | rule | what | 5-why root | fix (prod-bug = 2-fix: code + guard) | prevention | tags[TYPE/ACTOR/COMPONENT]

E-004 — gotcha #53 agent truncation mid-MEMORY (recurring S35-S42)

  • rule: agent must flush MEMORY before return; em main must receive complete work.
  • what: heavy WRITE-agent (implementer/test-specialist) output truncates mid-MEMORY-update; return looks complete but isn't.
  • 5-why: brief too heavy → spawn output cap hit → truncation at the tail → MEMORY update is last step → silent partial.
  • fix: (code/process) em main grep-verify-on-disk after return + proxy-append the agent's MEMORY next session (Strategy B, feedback_implementer_truncation_mitigation). (guard) brief ≤8K + Tiered Memory L1 ~30KB cap.
  • prevention/guard: Active-Guard "verify-on-disk + proxy-append" (promoted, 5 strikes). 529 → em main solo fallback, no retry-loop.
  • tags: [process-truncation / sub-agent / agent-memory]

E-003 — gotcha #44 silent 403 (S18, regression-tested S45)

  • rule: authorization must fail loud, not silently break UX.
  • what: class-level [Authorize(Policy="Workflows.Read")] → non-admin 403 → TanStack Query catch silent → Drafter saw empty Workspace dropdown, no error.
  • 5-why: broad class-level policy → GET blocked for non-admin → FE swallowed 403 → no surfaced error → looked like "no data".
  • fix: (code) class-level [Authorize] only; GET for any-authenticated; POST/DELETE keep admin policy. (guard) test-specialist authz regression test +10 (S45) reflection-scan per-action policy.
  • prevention/guard: Active-Guard "authz regression test per-action policy" (promoted S45).
  • tags: [authz-regression / backend+frontend / ApprovalWorkflowsV2Controller]

E-002 — gotcha #57 Holiday UNIQUE unfiltered → 500 (S45, fixed Mig 43)

  • rule (AS-4): soft-delete entity + UNIQUE index MUST .HasFilter("[IsDeleted]=0").
  • what: Holidays DB UNIQUE (Year,Date) unfiltered vs handler !IsDeleted → admin delete + re-add same-date holiday = reachable 500.
  • 5-why: UNIQUE created unfiltered → soft-deleted row keeps the slot → handler allows logical re-create → INSERT hits dead UNIQUE → 500.
  • fix: (code) Mig 43 .HasFilter("[IsDeleted]=0") (matches 13× existing pattern). (guard) Gap1 test-before reproduced the 500 first.
  • prevention/guard: Active-Guard AS-4 + test-before. ⚠️ OPEN latent: LeaveType.Code + ShiftPattern.Code same class, still unfiltered → backlog test-before (2nd strike of this guard).
  • tags: [soft-delete-invariant / em-main+test-specialist / Holidays,LeaveType,ShiftPattern]

E-001 — S46 user-memory 0-byte (close-out truncation)

  • rule (AS-8): memory .md writes must persist (byte>0); index must not be empty.
  • what: S45 close-out left MEMORY.md index + 1 entry at 0 bytes → S46 bootstrap ran with NO memory auto-inject (silent degrade).
  • 5-why: session-end Write created stub → body Write truncated (gotcha #53) → 0-byte file → not git-tracked (outside repo) → undetected until next bootstrap audit.
  • fix: (process) rebuilt index + repopulated entry (S46). (guard) feedback_session_end_memory_write_verify + now session-end §L.b step (e)/(c) byte-check.
  • prevention/guard: Active-Guard "session-end verify byte>0" (episodic→promoted S48, wired §L.b). /session-start audit also re-checks 0-byte (caught it S46, re-ran clean S48).
  • tags: [memory-integrity / em-main / user-memory]

Maintenance: append RCA on each AS-hit; promote a guard to procedural on its 2nd strike; mark verified once it holds through a session; retire by net-effect. Pointer entries only — full narrative lives in session-logs (summary-index).